streaming/stream_session: Don't stop stream manager

We cannot stop the stream manager because it's accessible via the API server during shutdown, for example, which can cause a SIGSEGV. Spotted by ASan. Message-Id: <1453130811-22540-1-git-send-email-penberg@scylladb.com>
api/messaging_service: Fix heap-buffer-overflows in set_messaging_service()
2016-01-20 10:29:34 +02:00 · 2016-01-20 10:29:27 +02:00 · 2016-01-20 09:41:39 +02:00 · 2016-01-20 09:41:33 +02:00 · 2016-01-20 09:41:29 +02:00 · 2016-01-18 13:38:56 +02:00
315 changed files with 16080 additions and 4463 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,3 +5,6 @@ build
 build.ninja
 cscope.*
 /debian/
+dist/ami/files/*.rpm
+dist/ami/variables.json
+dist/ami/scylla_deploy.sh
--- a/76
+++ b/76
@@ -1 +1,77 @@
 http://git-wip-us.apache.org/repos/asf/cassandra.git trunk (bf599fb5b062cbcc652da78b7d699e7a01b949ad)
+
+import = bf599fb5b062cbcc652da78b7d699e7a01b949ad
+Y      = Already in scylla
+
+$ git log --oneline import..cassandra-2.1.11 -- gms/
+Y  484e645 Mark node as dead even if already left
+   d0c166f Add trampled commit back
+   ba5837e Merge branch 'cassandra-2.0' into cassandra-2.1
+   718e47f Forgot a damn c/r
+   a7282e4 Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  ae4cd69 Print versions for gossip states in gossipinfo.
+Y  7fba3d2 Don't mark nodes down before the max local pause interval once paused.
+   c2142e6 Merge branch 'cassandra-2.0' into cassandra-2.1
+   ba9a69e checkForEndpointCollision fails for legitimate collisions, finalized list of statuses and nits, CASSANDRA-9765
+   54470a2 checkForEndpointCollision fails for legitimate collisions, improved version after CR, CASSANDRA-9765
+   2c9b490 checkForEndpointCollision fails for legitimate collisions, CASSANDRA-9765
+   4c15970 Merge branch 'cassandra-2.0' into cassandra-2.1
+   ad8047a ArrivalWindow should use primitives
+Y  4012134 Failure detector detects and ignores local pauses
+   9bcdd0f Merge branch 'cassandra-2.0' into cassandra-2.1
+   cefaa4e Close incoming connections when MessagingService is stopped
+   ea1beda Merge branch 'cassandra-2.0' into cassandra-2.1
+   08dbbd6 Ignore gossip SYNs after shutdown
+   3c17ac6 Merge branch 'cassandra-2.0' into cassandra-2.1
+   a64bc43 lists work better when you initialize them
+   543a899 change list to arraylist
+   730d4d4 Merge branch 'cassandra-2.0' into cassandra-2.1
+   e3e2de0 change list to arraylist
+   f7884c5 Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  84b2846 remove redundant state
+   4f2c372 Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts
+Y  def4835 Add missing follow on fix for 7816 only applied to cassandra-2.1 branch in 763130bdbde2f4cec2e8973bcd5203caf51cc89f
+Y  763130b Followup commit for 7816
+   1376b8e Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  2199a87 Fix duplicate up/down messages sent to native clients
+   136042e Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  eb9c5bb Improve FD logging when the arrival time is ignored.
+
+$ git log --oneline import..cassandra-2.1.11 -- service/StorageService.java
+   92c5787 Keep StorageServiceMBean interface stable
+   6039d0e Fix DC and Rack in nodetool info
+   a2f0da0 Merge branch 'cassandra-2.0' into cassandra-2.1
+   c4de752 Follow-up to CASSANDRA-10238
+   e889ee4 2i key cache load fails
+   4b1d59e Merge branch 'cassandra-2.0' into cassandra-2.1
+   257cdaa Fix consolidating racks violating the RF contract
+Y  27754c0 refuse to decomission if not in state NORMAL patch by Jan Karlsson and Stefania for CASSANDRA-8741
+Y  5bc56c3 refuse to decomission if not in state NORMAL patch by Jan Karlsson and Stefania for CASSANDRA-8741
+Y  8f9ca07 Cannot replace token does not exist - DN node removed as Fat Client
+   c2142e6 Merge branch 'cassandra-2.0' into cassandra-2.1
+   54470a2 checkForEndpointCollision fails for legitimate collisions, improved version after CR, CASSANDRA-9765
+   1eccced Handle corrupt files on startup
+   2c9b490 checkForEndpointCollision fails for legitimate collisions, CASSANDRA-9765
+   c4b5260 Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  52dbc3f Can't transition from write survey to normal mode
+   9966419 Make rebuild only run one at a time
+   d693ca1 Merge branch 'cassandra-2.0' into cassandra-2.1
+   be9eff5 Add option to not validate atoms during scrub
+   2a4daaf followup fix for 8564
+   93478ab Wait for anticompaction to finish
+   9e9846e Fix for harmless exceptions being logged as ERROR
+   6d06f32 Fix anticompaction blocking ANTI_ENTROPY stage
+   4f2c372 Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts
+Y  cba1b68 Fix failed bootstrap/replace attempts being persisted in system.peers
+   f59df28 Allow takeColumnFamilySnapshot to take a list of tables patch by Sachin Jarin; reviewed by Nick Bailey for CASSANDRA-8348
+Y  ac46747 Fix failed bootstrap/replace attempts being persisted in system.peers
+   5abab57 Merge branch 'cassandra-2.0' into cassandra-2.1
+   0ff9c3c Allow reusing snapshot tags across different column families.
+   f9c57a5 Merge branch 'cassandra-2.0' into cassandra-2.1
+Y  b296c55 Fix MOVED_NODE client event
+   bbb3fc7 Merge branch 'cassandra-2.0' into cassandra-2.1
+   37eb2a0 Fix NPE in nodetool getendpoints with bad ks/cf
+   f8b43d4 Merge branch 'cassandra-2.0' into cassandra-2.1
+   e20810c Remove C* specific class from JMX API
--- a/README.md
+++ b/README.md
@@ -15,13 +15,13 @@ git submodule update --recursive
 * Installing required packages:

 ```
-sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan
+sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel
 ```

 * Build Scylla
 ```
 ./configure.py --mode=release --with=scylla --disable-xen
-ninja build/release/scylla -j2 # you can use more cpus if you have tons of RAM
+ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM

 ```

@@ -82,3 +82,15 @@ Run the image with:
 ```
 docker run -p $(hostname -i):9042:9042 -i -t <image name>
 ```
+
+
+## Contributing to Scylla
+
+Do not send pull requests.
+
+Send patches to the mailing list address scylladb-dev@googlegroups.com.
+Be sure to subscribe.
+
+In order for your patches to be merged, you must sign the Contributor's
+License Agreement, protecting your rights and ours.  See
+http://www.scylladb.com/opensource/cla/.
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 #!/bin/sh

-VERSION=development
+VERSION=0.16

 if test -f version
 then
--- a/api/api-doc/column_family.json
+++ b/api/api-doc/column_family.json
@@ -579,30 +579,6 @@
            }
         ]
      },
-      {
-         "path":"/column_family/sstables/snapshots_size/{name}",
-         "operations":[
-            {
-               "method":"GET",
-               "summary":"the size of SSTables in 'snapshots' subdirectory which aren't live anymore",
-               "type":"double",
-               "nickname":"true_snapshots_size",
-               "produces":[
-                  "application/json"
-               ],
-               "parameters":[
-                  {
-                     "name":"name",
-                     "description":"The column family name in keysspace:name format",
-                     "required":true,
-                     "allowMultiple":false,
-                     "type":"string",
-                     "paramType":"path"
-                  }
-               ]
-            }
-         ]
-      },
      {
         "path":"/column_family/metrics/memtable_columns_count/{name}",
         "operations":[
@@ -2041,7 +2017,7 @@
         ]
      },
      {
-         "path":"/column_family/metrics/true_snapshots_size/{name}",
+         "path":"/column_family/metrics/snapshots_size/{name}",
         "operations":[
            {
               "method":"GET",
--- a/api/api-doc/compaction_manager.json
+++ b/api/api-doc/compaction_manager.json
@@ -15,7 +15,7 @@
               "summary":"get List of running compactions",
               "type":"array",
               "items":{
-                  "type":"jsonmap"
+                  "type":"summary"
               },
               "nickname":"get_compactions",
               "produces":[
@@ -46,16 +46,16 @@
         ]
      },
      {
-         "path":"/compaction_manager/compaction_summary",
+         "path":"/compaction_manager/compaction_info",
         "operations":[
            {
               "method":"GET",
-               "summary":"get compaction summary",
+               "summary":"get a list of all active compaction info",
               "type":"array",
               "items":{
-                  "type":"string"
+                  "type":"compaction_info"
               },
-               "nickname":"get_compaction_summary",
+               "nickname":"get_compaction_info",
               "produces":[
                  "application/json"
               ],
@@ -174,30 +174,73 @@
    }
   ],
   "models":{
-      "mapper":{
-         "id":"mapper",
-         "description":"A key value mapping",
+      "row_merged":{
+         "id":"row_merged",
+         "description":"A row merged information",
         "properties":{
            "key":{
-               "type":"string",
-               "description":"The key"
+               "type":"int",
+               "description":"The number of sstable"
            },
            "value":{
-               "type":"string",
-               "description":"The value"
+               "type":"long",
+               "description":"The number or row compacted"
            }
         }
      },
-      "jsonmap":{
-         "id":"jsonmap",
-         "description":"A json representation of a map as a list of key value",
+      "compaction_info" :{
+          "id": "compaction_info",
+          "description":"A key value mapping",
+          "properties":{
+            "operation_type":{
+               "type":"string",
+               "description":"The operation type"
+            },
+            "completed":{
+               "type":"long",
+               "description":"The current completed"
+            },
+            "total":{
+               "type":"long",
+               "description":"The total to compact"
+            },
+            "unit":{
+               "type":"string",
+               "description":"The compacted unit"
+            }
+          }
+      },
+      "summary":{
+         "id":"summary",
+         "description":"A compaction summary object",
         "properties":{
-            "value":{
-               "type":"array",
-               "items":{
-                  "type":"mapper"
-               },
-               "description":"A list of key, value mapping"
+            "id":{
+               "type":"string",
+               "description":"The UUID"
+            },
+            "ks":{
+               "type":"string",
+               "description":"The keyspace name"
+            },
+            "cf":{
+               "type":"string",
+               "description":"The column family name"
+            },
+            "completed":{
+               "type":"long",
+               "description":"The number of units completed"
+            },
+            "total":{
+               "type":"long",
+               "description":"The total number of units"
+            },
+            "task_type":{
+               "type":"string",
+               "description":"The task compaction type"
+            },
+            "unit":{
+               "type":"string",
+               "description":"The units being used"
            }
         }
      },
@@ -232,7 +275,7 @@
            "rows_merged":{
               "type":"array",
               "items":{
-                  "type":"mapper"
+                  "type":"row_merged"
               },
               "description":"The merged rows"
            }
--- a/api/api-doc/failure_detector.json
+++ b/api/api-doc/failure_detector.json
@@ -48,7 +48,10 @@
            {
               "method":"GET",
               "summary":"Get all endpoint states",
-               "type":"string",
+               "type":"array",
+               "items":{
+                  "type":"endpoint_state"
+               },
               "nickname":"get_all_endpoint_states",
               "produces":[
                  "application/json"
@@ -148,6 +151,53 @@
                    "description": "The value"
                }
            }
+        },
+        "endpoint_state": {
+           "id": "states",
+           "description": "Holds an endpoint state",
+               "properties": {
+                "addrs": {
+                    "type": "string",
+                    "description": "The endpoint address"
+                },
+                "generation": {
+                    "type": "int",
+                    "description": "The heart beat generation"
+                },
+                "version": {
+                    "type": "int",
+                    "description": "The heart beat version"
+                },
+                "update_time": {
+                    "type": "long",
+                    "description": "The update timestamp"
+                },
+                "is_alive": {
+                    "type": "boolean",
+                    "description": "Is the endpoint alive"
+                },
+                "application_state" : {
+                    "type":"array",
+                    "items":{
+                        "type":"version_value"
+                    },
+                    "description": "Is the endpoint alive"
+                }
+            }
+        },
+        "version_value": {
+           "id": "version_value",
+           "description": "Holds a version value for an application state",
+               "properties": {
+                "application_state": {
+                    "type": "int",
+                    "description": "The application state enum index"
+                },
+                "value": {
+                    "type": "string",
+                    "description": "The version value"
+                }
+            }
        }
    }
 }
--- a/api/api-doc/messaging_service.json
+++ b/api/api-doc/messaging_service.json
@@ -184,6 +184,30 @@
               ]
            }
         ]
+      },
+      {
+         "path":"/messaging_service/version",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Get the version number",
+               "type":"int",
+               "nickname":"get_version",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[
+                  {
+                     "name":"addr",
+                     "description":"Address",
+                     "required":true,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  }
+               ]
+            }
+         ]
      }
   ],
   "models":{
@@ -209,46 +233,28 @@
            "verb":{
               "type":"string",
               "enum":[
-                  "MUTATION",
-                  "BINARY",
-                  "READ_REPAIR",
-                  "READ",
-                  "REQUEST_RESPONSE",
-                  "STREAM_INITIATE",
-                  "STREAM_INITIATE_DONE",
-                  "STREAM_REPLY",
-                  "STREAM_REQUEST",
-                  "RANGE_SLICE",
-                  "BOOTSTRAP_TOKEN",
-                  "TREE_REQUEST",
-                  "TREE_RESPONSE",
-                  "JOIN",
-                  "GOSSIP_DIGEST_SYN",
-                  "GOSSIP_DIGEST_ACK",
-                  "GOSSIP_DIGEST_ACK2",
-                  "DEFINITIONS_ANNOUNCE",
-                  "DEFINITIONS_UPDATE",
-                  "TRUNCATE",
-                  "SCHEMA_CHECK",
-                  "INDEX_SCAN",
-                  "REPLICATION_FINISHED",
-                  "INTERNAL_RESPONSE",
-                  "COUNTER_MUTATION",
-                  "STREAMING_REPAIR_REQUEST",
-                  "STREAMING_REPAIR_RESPONSE",
-                  "SNAPSHOT",
-                  "MIGRATION_REQUEST",
-                  "GOSSIP_SHUTDOWN",
-                  "_TRACE",
-                  "ECHO",
-                  "REPAIR_MESSAGE",
-                  "PAXOS_PREPARE",
-                  "PAXOS_PROPOSE",
-                  "PAXOS_COMMIT",
-                  "PAGED_RANGE",
-                  "UNUSED_1",
-                  "UNUSED_2",
-                  "UNUSED_3"
+                 "CLIENT_ID",
+                 "ECHO",
+                 "MUTATION",
+                 "MUTATION_DONE",
+                 "READ_DATA",
+                 "READ_MUTATION_DATA",
+                 "READ_DIGEST",
+                 "GOSSIP_DIGEST_SYN",
+                 "GOSSIP_DIGEST_ACK2",
+                 "GOSSIP_SHUTDOWN",
+                 "DEFINITIONS_UPDATE",
+                 "TRUNCATE",
+                 "REPLICATION_FINISHED",
+                 "MIGRATION_REQUEST",
+                 "STREAM_INIT_MESSAGE",
+                 "PREPARE_MESSAGE",
+                 "PREPARE_DONE_MESSAGE",
+                 "STREAM_MUTATION",
+                 "STREAM_MUTATION_DONE",
+                 "COMPLETE_MESSAGE",
+                 "REPAIR_CHECKSUM_RANGE",
+                 "GET_SCHEMA_VERSION"
               ]
            }
         }
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -425,7 +425,7 @@
               "summary":"load value. Keys are IP addresses",
               "type":"array",
               "items":{
-                  "type":"mapper"
+                  "type":"map_string_double"
               },
               "nickname":"get_load_map",
               "produces":[
@@ -797,8 +797,72 @@
                     "paramType":"path"
                  },
                  {
-                     "name":"options",
-                     "description":"Options for the repair",
+                     "name":"primaryRange",
+                     "description":"If the value is the string 'true' with any capitalization, repair only the first range returned by the partitioner.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"parallelism",
+                     "description":"Repair parallelism, can be 0 (sequential), 1 (parallel) or 2 (datacenter-aware).",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"incremental",
+                     "description":"If the value is the string 'true' with any capitalization, perform incremental repair.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"jobThreads",
+                     "description":"An integer specifying the parallelism on each node.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"ranges",
+                     "description":"An explicit list of ranges to repair, overriding the default choice. Each range is expressed as token1:token2, and multiple ranges can be given as a comma separated list.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"columnFamilies",
+                     "description":"Which column families to repair in the given keyspace. Multiple columns families can be named separated by commas. If this option is missing, all column families in the keyspace are repaired.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"dataCenters",
+                     "description":"Which data centers are to participate in this repair. Multiple data centers can be listed separated by commas.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"hosts",
+                     "description":"Which hosts are to participate in this repair. Multiple hosts can be listed separated by commas.",
+                     "required":false,
+                     "allowMultiple":false,
+                     "type":"string",
+                     "paramType":"query"
+                  },
+                  {
+                     "name":"trace",
+                     "description":"If the value is the string 'true' with any capitalization, enable tracing of the repair.",
                     "required":false,
                     "allowMultiple":false,
                     "type":"string",
@@ -1964,6 +2028,20 @@
            }
         }
      },
+      "map_string_double":{
+         "id":"map_string_double",
+         "description":"A key value mapping between a string and a double",
+         "properties":{
+            "key":{
+               "type":"string",
+               "description":"The key"
+            },
+            "value":{
+               "type":"double",
+               "description":"The value"
+            }
+         }
+      },
      "maplist_mapper":{
         "id":"maplist_mapper",
         "description":"A key value mapping, where key and value are list",
--- a/api/api.hh
+++ b/api/api.hh
@@ -128,47 +128,54 @@ inline double pow2(double a) {
    return a * a;
 }

-inline httpd::utils_json::histogram add_histogram(httpd::utils_json::histogram res,
+// FIXME: Move to utils::ihistogram::operator+=()
+inline utils::ihistogram add_histogram(utils::ihistogram res,
        const utils::ihistogram& val) {
-    if (!res.count._set) {
-        res = val;
-        return res;
+    if (res.count == 0) {
+        return val;
    }
    if (val.count == 0) {
-        return res;
+        return std::move(res);
    }
-    if (res.min() > val.min) {
+    if (res.min > val.min) {
        res.min = val.min;
    }
-    if (res.max() < val.max) {
+    if (res.max < val.max) {
        res.max = val.max;
    }
-    double ncount = res.count() + val.count;
+    double ncount = res.count + val.count;
    // To get an estimated sum we take the estimated mean
    // and multiply it by the true count
-    res.sum = res.sum() + val.mean * val.count;
-    double a = res.count()/ncount;
+    res.sum = res.sum + val.mean * val.count;
+    double a = res.count/ncount;
    double b = val.count/ncount;

-    double mean =  a * res.mean() + b * val.mean;
+    double mean =  a * res.mean + b * val.mean;

-    res.variance = (res.variance() + pow2(res.mean() - mean) )* a +
+    res.variance = (res.variance + pow2(res.mean - mean) )* a +
            (val.variance + pow2(val.mean -mean))* b;

    res.mean = mean;
-    res.count = res.count() + val.count;
+    res.count = res.count + val.count;
    for (auto i : val.sample) {
-        res.sample.push(i);
+        res.sample.push_back(i);
    }
    return res;
 }

+inline
+httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
+    httpd::utils_json::histogram h;
+    h = val;
+    return h;
+}
+
 template<class T, class F>
 future<json::json_return_type>  sum_histogram_stats(distributed<T>& d, utils::ihistogram F::*f) {

-    return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, httpd::utils_json::histogram(),
-            add_histogram).then([](const httpd::utils_json::histogram& val) {
-        return make_ready_future<json::json_return_type>(val);
+    return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, utils::ihistogram(),
+            add_histogram).then([](const utils::ihistogram& val) {
+        return make_ready_future<json::json_return_type>(to_json(val));
    });
 }

--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -64,21 +64,21 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<

 future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,
        int64_t column_family::stats::*f) {
-    return map_reduce_cf(ctx, name, 0, [f](const column_family& cf) {
+    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
        return cf.get_stats().*f;
    }, std::plus<int64_t>());
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx,
        int64_t column_family::stats::*f) {
-    return map_reduce_cf(ctx, 0, [f](const column_family& cf) {
+    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
        return cf.get_stats().*f;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,
        utils::ihistogram column_family::stats::*f) {
-    return map_reduce_cf(ctx, name, 0, [f](const column_family& cf) {
+    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
        return (cf.get_stats().*f).count;
    }, std::plus<int64_t>());
 }
@@ -101,7 +101,7 @@ static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const

 static future<json::json_return_type>  get_cf_stats_count(http_context& ctx,
        utils::ihistogram column_family::stats::*f) {
-    return map_reduce_cf(ctx, 0, [f](const column_family& cf) {
+    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
        return (cf.get_stats().*f).count;
    }, std::plus<int64_t>());
 }
@@ -110,28 +110,30 @@ static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const
        utils::ihistogram column_family::stats::*f) {
    utils::UUID uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([f, uuid](const database& p) {return p.find_column_family(uuid).get_stats().*f;},
-            httpd::utils_json::histogram(),
+            utils::ihistogram(),
            add_histogram)
-            .then([](const httpd::utils_json::histogram& val) {
-                return make_ready_future<json::json_return_type>(val);
+            .then([](const utils::ihistogram& val) {
+                return make_ready_future<json::json_return_type>(to_json(val));
    });
 }

 static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::ihistogram column_family::stats::*f) {
-    std::function<httpd::utils_json::histogram(const database&)> fun = [f] (const database& db)  {
-        httpd::utils_json::histogram res;
+    std::function<utils::ihistogram(const database&)> fun = [f] (const database& db)  {
+        utils::ihistogram res;
        for (auto i : db.get_column_families()) {
            res = add_histogram(res, i.second->get_stats().*f);
        }
        return res;
    };
-    return ctx.db.map(fun).then([](const std::vector<httpd::utils_json::histogram> &res) {
-        return make_ready_future<json::json_return_type>(res);
+    return ctx.db.map(fun).then([](const std::vector<utils::ihistogram> &res) {
+        std::vector<httpd::utils_json::histogram> r;
+        boost::copy(res | boost::adaptors::transformed(to_json), std::back_inserter(r));
+        return make_ready_future<json::json_return_type>(r);
    });
 }

 static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ctx, const sstring& name) {
-    return map_reduce_cf(ctx, name, 0, [](const column_family& cf) {
+    return map_reduce_cf(ctx, name, int64_t(0), [](const column_family& cf) {
        return cf.get_unleveled_sstables();
    }, std::plus<int64_t>());
 }
@@ -221,25 +223,25 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
            return cf.active_memtable().region().occupancy().total_space();
        }, std::plus<int64_t>());
    });

    cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
            return cf.active_memtable().region().occupancy().total_space();
        }, std::plus<int64_t>());
    });

    cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
            return cf.active_memtable().region().occupancy().used_space();
        }, std::plus<int64_t>());
    });

    cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
            return cf.active_memtable().region().occupancy().used_space();
        }, std::plus<int64_t>());
    });
@@ -254,7 +256,7 @@ void set_column_family(http_context& ctx, routes& r) {

    cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
        warn(unimplemented::cause::INDEXES);
-        return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
            return cf.occupancy().total_space();
        }, std::plus<int64_t>());
    });
@@ -263,21 +265,21 @@ void set_column_family(http_context& ctx, routes& r) {
        warn(unimplemented::cause::INDEXES);
        return ctx.db.map_reduce0([](const database& db){
            return db.dirty_memory_region_group().memory_used();
-        }, 0, std::plus<int64_t>()).then([](int res) {
+        }, int64_t(0), std::plus<int64_t>()).then([](int res) {
            return make_ready_future<json::json_return_type>(res);
        });
    });

    cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
        warn(unimplemented::cause::INDEXES);
-        return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
            return cf.occupancy().used_space();
        }, std::plus<int64_t>());
    });

    cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
        warn(unimplemented::cause::INDEXES);
-        return map_reduce_cf(ctx, 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
            return cf.active_memtable().region().occupancy().used_space();
        }, std::plus<int64_t>());
    });
@@ -302,7 +304,7 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
            uint64_t res = 0;
            for (auto i: *cf.get_sstables() ) {
                res += i.second->get_stats_metadata().estimated_row_size.count();
@@ -422,11 +424,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, req->param["name"], 0, max_row_size, max_int64);
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);
    });

    cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, 0, max_row_size, max_int64);
+        return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);
    });

    cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -537,20 +539,20 @@ void set_column_family(http_context& ctx, routes& r) {
        }, std::plus<uint64_t>());
    });

-    cf::get_index_summary_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // We are missing the off heap memory calculation
-        // Return 0 is the wrong value. It's a work around
-        // until the memory calculation will be available
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
+            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
+                return sst.second->get_summary().memory_footprint();
+            });
+        }, std::plus<uint64_t>());
    });

-    cf::get_all_index_summary_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
+            return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
+                return sst.second->get_summary().memory_footprint();
+            });
+        }, std::plus<uint64_t>());
    });

    cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
@@ -589,11 +591,16 @@ void set_column_family(http_context& ctx, routes& r) {
        return make_ready_future<json::json_return_type>(0);
    });

-    cf::get_true_snapshots_size.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<request> req) {
+        auto uuid = get_uuid(req->param["name"], ctx.db.local());
+        return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](
+                const std::unordered_map<sstring, column_family::snapshot_details>& sd) {
+            int64_t res = 0;
+            for (auto i : sd) {
+                res += i.second.total;
+            }
+            return make_ready_future<json::json_return_type>(res);
+        });
    });

    cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<request> req) {
@@ -616,25 +623,25 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, req->param["name"], 0, [](const column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
            return cf.get_row_cache().stats().hits;
        }, std::plus<int64_t>());
    });

    cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, 0, [](const column_family& cf) {
+        return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
            return cf.get_row_cache().stats().hits;
        }, std::plus<int64_t>());
    });

    cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, req->param["name"], 0, [](const column_family& cf) {
+        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
            return cf.get_row_cache().stats().misses;
        }, std::plus<int64_t>());
    });

    cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return map_reduce_cf(ctx, 0, [](const column_family& cf) {
+        return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
            return cf.get_row_cache().stats().misses;
        }, std::plus<int64_t>());

--- a/api/compaction_manager.cc
+++ b/api/compaction_manager.cc
@@ -21,12 +21,13 @@

 #include "compaction_manager.hh"
 #include "api/api-doc/compaction_manager.json.hh"
+#include "db/system_keyspace.hh"

 namespace api {

 using namespace scollectd;
 namespace cm = httpd::compaction_manager_json;
-
+using namespace json;

 static future<json::json_return_type> get_cm_stats(http_context& ctx,
        int64_t compaction_manager::stats::*f) {
@@ -38,29 +39,38 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
 }

 void set_compaction_manager(http_context& ctx, routes& r) {
-    cm::get_compactions.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        std::vector<cm::jsonmap> map;
-        return make_ready_future<json::json_return_type>(map);
-    });
+    cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return ctx.db.map_reduce0([](database& db) {
+            std::vector<cm::summary> summaries;
+            const compaction_manager& cm = db.get_compaction_manager();

-    cm::get_compaction_summary.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        std::vector<sstring> res;
-        return make_ready_future<json::json_return_type>(res);
+            for (const auto& c : cm.get_compactions()) {
+                cm::summary s;
+                s.ks = c->ks;
+                s.cf = c->cf;
+                s.unit = "keys";
+                s.task_type = "compaction";
+                s.completed = c->total_keys_written;
+                s.total = c->total_partitions;
+                summaries.push_back(std::move(s));
+            }
+            return summaries;
+        }, std::vector<cm::summary>(), concat<cm::summary>).then([](const std::vector<cm::summary>& res) {
+            return make_ready_future<json::json_return_type>(res);
+        });
    });

    cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>("");
+        // FIXME
+        warn(unimplemented::cause::API);
+        return make_ready_future<json::json_return_type>(json_void());
    });

    cm::stop_compaction.set(r, [] (std::unique_ptr<request> req) {
        //TBD
-        unimplemented();
+        // FIXME
+        warn(unimplemented::cause::API);
        return make_ready_future<json::json_return_type>("");
    });

@@ -81,14 +91,42 @@ void set_compaction_manager(http_context& ctx, routes& r) {

    cm::get_bytes_compacted.set(r, [] (std::unique_ptr<request> req) {
        //TBD
-        unimplemented();
+        // FIXME
+        warn(unimplemented::cause::API);
        return make_ready_future<json::json_return_type>(0);
    });

    cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
+        return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {
+            std::vector<cm::history> res;
+            res.reserve(history.size());
+
+            for (auto& entry : history) {
+                cm::history h;
+                h.id = entry.id.to_sstring();
+                h.ks = std::move(entry.ks);
+                h.cf = std::move(entry.cf);
+                h.compacted_at = entry.compacted_at;
+                h.bytes_in = entry.bytes_in;
+                h.bytes_out =  entry.bytes_out;
+                for (auto it : entry.rows_merged) {
+                    httpd::compaction_manager_json::row_merged e;
+                    e.key = it.first;
+                    e.value = it.second;
+                    h.rows_merged.push(std::move(e));
+                }
+                res.push_back(std::move(h));
+            }
+
+            return make_ready_future<json::json_return_type>(res);
+        });
+    });
+
+    cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {
        //TBD
-        unimplemented();
-        std::vector<cm::history> res;
+        // FIXME
+        warn(unimplemented::cause::API);
+        std::vector<cm::compaction_info> res;
        return make_ready_future<json::json_return_type>(res);
    });

--- a/api/failure_detector.cc
+++ b/api/failure_detector.cc
@@ -22,15 +22,33 @@
 #include "failure_detector.hh"
 #include "api/api-doc/failure_detector.json.hh"
 #include "gms/failure_detector.hh"
+#include "gms/application_state.hh"
+#include "gms/gossiper.hh"
 namespace api {

 namespace fd = httpd::failure_detector_json;

 void set_failure_detector(http_context& ctx, routes& r) {
    fd::get_all_endpoint_states.set(r, [](std::unique_ptr<request> req) {
-        return gms::get_all_endpoint_states().then([](const sstring& str) {
-            return make_ready_future<json::json_return_type>(str);
-        });
+        std::vector<fd::endpoint_state> res;
+        for (auto i : gms::get_local_gossiper().endpoint_state_map) {
+            fd::endpoint_state val;
+            val.addrs = boost::lexical_cast<std::string>(i.first);
+            val.is_alive = i.second.is_alive();
+            val.generation = i.second.get_heart_beat_state().get_generation();
+            val.version = i.second.get_heart_beat_state().get_heart_beat_version();
+            val.update_time = i.second.get_update_timestamp().time_since_epoch().count();
+            for (auto a : i.second.get_application_state_map()) {
+                fd::version_value version_val;
+                // We return the enum index and not it's name to stay compatible to origin
+                // method that the state index are static but the name can be changed.
+                version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);
+                version_val.value = a.second.value;
+                val.application_state.push(version_val);
+            }
+            res.push_back(val);
+        }
+        return make_ready_future<json::json_return_type>(res);
    });

    fd::get_up_endpoint_count.set(r, [](std::unique_ptr<request> req) {
--- a/api/messaging_service.cc
+++ b/api/messaging_service.cc
@@ -32,9 +32,9 @@ using namespace net;
 namespace api {

 using shard_info = messaging_service::shard_info;
-using shard_id = messaging_service::shard_id;
+using msg_addr = messaging_service::msg_addr;

-static const int32_t num_verb = static_cast<int32_t>(messaging_verb::UNUSED_3) + 1;
+static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST);

 std::vector<message_counter> map_to_message_counters(
        const std::unordered_map<gms::inet_address, unsigned long>& map) {
@@ -58,7 +58,7 @@ future_json_function get_client_getter(std::function<uint64_t(const shard_info&)
        using map_type = std::unordered_map<gms::inet_address, uint64_t>;
        auto get_shard_map = [f](messaging_service& ms) {
            std::unordered_map<gms::inet_address, unsigned long> map;
-            ms.foreach_client([&map, f] (const shard_id& id, const shard_info& info) {
+            ms.foreach_client([&map, f] (const msg_addr& id, const shard_info& info) {
                map[id.addr] = f(info);
            });
            return map;
@@ -119,8 +119,12 @@ void set_messaging_service(http_context& ctx, routes& r) {
        return c.sent_messages;
    }));

+    get_version.set(r, [](const_req req) {
+        return net::get_local_messaging_service().get_raw_version(req.get_query_param("addr"));
+    });
+
    get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
-        shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb, 0);
+        shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);

        return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
            for (auto i = 0; i < num_verb; i++) {
@@ -133,8 +137,12 @@ void set_messaging_service(http_context& ctx, routes& r) {
            for (auto i : verb_counter::verb_wrapper::all_items()) {
                verb_counter c;
                messaging_verb v = i; // for type safety we use messaging_verb values
-                if ((*map)[static_cast<int32_t>(v)] > 0) {
-                    c.count = (*map)[static_cast<int32_t>(v)];
+                auto idx = static_cast<uint32_t>(v);
+                if (idx >= map->size()) {
+                    throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
+                }
+                if ((*map)[idx] > 0) {
+                    c.count = (*map)[idx];
                    c.verb = i;
                    res.push_back(c);
                }
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -89,7 +89,7 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_token_endpoint.set(r, [] (const_req req) {
-        auto token_to_ep = service::get_local_storage_service().get_token_metadata().get_token_to_endpoint();
+        auto token_to_ep = service::get_local_storage_service().get_token_to_endpoint_map();
        std::vector<storage_service_json::mapper> res;
        return map_to_key_value(token_to_ep, res);
    });
@@ -169,8 +169,14 @@ void set_storage_service(http_context& ctx, routes& r) {

    ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
        return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
-            std::vector<ss::mapper> res;
-            return make_ready_future<json::json_return_type>(map_to_key_value(load_map, res));
+            std::vector<ss::map_string_double> res;
+            for (auto i : load_map) {
+                ss::map_string_double val;
+                val.key = i.first;
+                val.value = i.second;
+                res.push_back(val);
+            }
+            return make_ready_future<json::json_return_type>(res);
        });
    });

@@ -312,18 +318,14 @@ void set_storage_service(http_context& ctx, routes& r) {


    ss::repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
-        // Currently, we get all the repair options encoded in a single
-        // "options" option, and split it to a map using the "," and ":"
-        // delimiters. TODO: consider if it doesn't make more sense to just
-        // take all the query parameters as this map and pass it to the repair
-        // function.
+        static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
+                "jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace"};
        std::unordered_map<sstring, sstring> options_map;
-        for (auto s : split(req->get_query_param("options"), ",")) {
-            auto kv = split(s, ":");
-            if (kv.size() != 2) {
-                throw httpd::bad_param_exception("malformed async repair options");
+        for (auto o : options) {
+            auto s = req->get_query_param(o);
+            if (s != "") {
+                options_map[o] = s;
            }
-            options_map.emplace(std::move(kv[0]), std::move(kv[1]));
        }

        // The repair process is asynchronous: repair_start only starts it and
@@ -415,15 +417,18 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_drain_progress.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>("");
+        return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {
+            return ss.get_drain_progress();
+        }).then([] (auto&& progress) {
+            auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
+            return make_ready_future<json::json_return_type>(std::move(progress_str));
+        });
    });

    ss::drain.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(json_void());
+        return service::get_local_storage_service().drain().then([] {
+            return make_ready_future<json::json_return_type>(json_void());
+        });
    });
    ss::truncate.set(r, [&ctx](std::unique_ptr<request> req) {
        //TBD
@@ -537,10 +542,9 @@ void set_storage_service(http_context& ctx, routes& r) {
        return make_ready_future<json::json_return_type>(0);
    });

-    ss::get_compaction_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    ss::get_compaction_throughput_mb_per_sec.set(r, [&ctx](std::unique_ptr<request> req) {
+        int value = ctx.db.local().get_config().compaction_throughput_mb_per_sec();
+        return make_ready_future<json::json_return_type>(value);
    });

    ss::set_compaction_throughput_mb_per_sec.set(r, [](std::unique_ptr<request> req) {
--- a/api/stream_manager.cc
+++ b/api/stream_manager.cc
@@ -72,11 +72,12 @@ static hs::stream_state get_state(
        si.peer = boost::lexical_cast<std::string>(info.peer);
        si.session_index = info.session_index;
        si.state = info.state;
-        si.connecting = boost::lexical_cast<std::string>(info.connecting);
+        si.connecting = si.peer;
        set_summaries(info.receiving_summaries, si.receiving_summaries);
        set_summaries(info.sending_summaries, si.sending_summaries);
        set_files(info.receiving_files, si.receiving_files);
        set_files(info.sending_files, si.sending_files);
+        state.sessions.push(si);
    }
    return state;
 }
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -272,39 +272,6 @@ template<typename T>
 class serializer;
 }

-// A variant type that can hold either an atomic_cell, or a serialized collection.
-// Which type is stored is determined by the schema.
-class atomic_cell_or_collection final {
-    managed_bytes _data;
-
-    template<typename T>
-    friend class db::serializer;
-private:
-    atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
-public:
-    atomic_cell_or_collection() = default;
-    atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
-    static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
-    atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
-    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
-    explicit operator bool() const {
-        return !_data.empty();
-    }
-    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
-        return std::move(data.data);
-    }
-    collection_mutation_view as_collection_mutation() const {
-        return collection_mutation_view{_data};
-    }
-    bytes_view serialize() const {
-        return _data;
-    }
-    bool operator==(const atomic_cell_or_collection& other) const {
-        return _data == other._data;
-    }
-    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
-};
-
 class column_definition;

 int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
--- a/atomic_cell_hash.hh
+++ b/atomic_cell_hash.hh
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2015 Cloudius Systems, Ltd.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
+
+#include "types.hh"
+#include "atomic_cell.hh"
+#include "hashing.hh"
+
+template<typename Hasher>
+void feed_hash(collection_mutation_view cell, Hasher& h, const data_type& type) {
+    auto&& ctype = static_pointer_cast<const collection_type_impl>(type);
+    auto m_view = ctype->deserialize_mutation_form(cell);
+    ::feed_hash(h, m_view.tomb);
+    for (auto&& key_and_value : m_view.cells) {
+        ::feed_hash(h, key_and_value.first);
+        ::feed_hash(h, key_and_value.second);
+    }
+}
+
+template<>
+struct appending_hash<atomic_cell_view> {
+    template<typename Hasher>
+    void operator()(Hasher& h, atomic_cell_view cell) const {
+        feed_hash(h, cell.is_live());
+        feed_hash(h, cell.timestamp());
+        if (cell.is_live()) {
+            if (cell.is_live_and_has_ttl()) {
+                feed_hash(h, cell.expiry());
+                feed_hash(h, cell.ttl());
+            }
+            feed_hash(h, cell.value());
+        } else {
+            feed_hash(h, cell.deletion_time());
+        }
+    }
+};
--- a/atomic_cell_or_collection.hh
+++ b/atomic_cell_or_collection.hh
@@ -0,0 +1,73 @@
+/*
+ * Copyright (C) 2015 Cloudius Systems, Ltd.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "atomic_cell.hh"
+#include "schema.hh"
+#include "hashing.hh"
+
+// A variant type that can hold either an atomic_cell, or a serialized collection.
+// Which type is stored is determined by the schema.
+class atomic_cell_or_collection final {
+    managed_bytes _data;
+
+    template<typename T>
+    friend class db::serializer;
+private:
+    atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
+public:
+    atomic_cell_or_collection() = default;
+    atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
+    static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
+    atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
+    atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
+    explicit operator bool() const {
+        return !_data.empty();
+    }
+    static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
+        return std::move(data.data);
+    }
+    collection_mutation_view as_collection_mutation() const {
+        return collection_mutation_view{_data};
+    }
+    bytes_view serialize() const {
+        return _data;
+    }
+    bool operator==(const atomic_cell_or_collection& other) const {
+        return _data == other._data;
+    }
+    template<typename Hasher>
+    void feed_hash(Hasher& h, const column_definition& def) const {
+        if (def.is_atomic()) {
+            ::feed_hash(h, as_atomic_cell());
+        } else {
+            ::feed_hash(as_collection_mutation(), h, def.type);
+        }
+    }
+    void linearize() {
+        _data.linearize();
+    }
+    void unlinearize() {
+        _data.scatter();
+    }
+    friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
+};
--- a/auth/auth.cc
+++ b/auth/auth.cc
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <seastar/core/sleep.hh>
+
+#include "auth.hh"
+#include "authenticator.hh"
+#include "database.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/statements/cf_statement.hh"
+#include "cql3/statements/create_table_statement.hh"
+#include "db/config.hh"
+#include "service/migration_manager.hh"
+
+const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
+const sstring auth::auth::AUTH_KS("system_auth");
+const sstring auth::auth::USERS_CF("users");
+
+static const sstring USER_NAME("name");
+static const sstring SUPER("super");
+
+static logging::logger logger("auth");
+
+// TODO: configurable
+using namespace std::chrono_literals;
+const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
+
+class auth_migration_listener : public service::migration_listener {
+    void on_create_keyspace(const sstring& ks_name) override {}
+    void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
+    void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+
+    void on_update_keyspace(const sstring& ks_name) override {}
+    void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
+    void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+
+    void on_drop_keyspace(const sstring& ks_name) override {
+        // TODO:
+        //DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.keyspace(ksName));
+
+    }
+    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
+        // TODO:
+        //DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.columnFamily(ksName, cfName));
+    }
+    void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
+    void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
+    void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
+};
+
+static auth_migration_listener auth_migration;
+
+bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
+    if (type == classname) {
+        return true;
+    }
+    auto i = classname.find_last_of('.');
+    return classname.compare(i + 1, sstring::npos, type) == 0;
+}
+
+future<> auth::auth::setup() {
+    auto& db = cql3::get_local_query_processor().db().local();
+    auto& cfg = db.get_config();
+    auto type = cfg.authenticator();
+
+    if (is_class_type(type, authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)) {
+        return authenticator::setup(type).discard_result(); // just create the object
+    }
+
+    future<> f = make_ready_future();
+
+    if (!db.has_keyspace(AUTH_KS)) {
+        std::map<sstring, sstring> opts;
+        opts["replication_factor"] = "1";
+        auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
+        f = service::get_local_migration_manager().announce_new_keyspace(ksm, false);
+    }
+
+    return f.then([] {
+        return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
+                                        AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
+                                        90 * 24 * 60 * 60)); // 3 months.
+    }).then([type] {
+        return authenticator::setup(type).discard_result();
+    }).then([] {
+        // TODO authorizer
+    }).then([] {
+        service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
+        // instead of once-timer, just schedule this later
+        sleep(SUPERUSER_SETUP_DELAY).then([] {
+            // setup default super user
+            return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
+                if (!exists) {
+                    auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
+                                    AUTH_KS, USERS_CF, USER_NAME, SUPER);
+                    cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
+                        logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
+                    }).handle_exception([](auto ep) {
+                        try {
+                            std::rethrow_exception(ep);
+                        } catch (exceptions::request_execution_exception&) {
+                            logger.warn("Skipped default superuser setup: some nodes were not ready");
+                        }
+                    });
+                }
+            });
+        });
+    });
+}
+
+static db::consistency_level consistency_for_user(const sstring& username) {
+    if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
+        return db::consistency_level::QUORUM;
+    }
+    return db::consistency_level::LOCAL_ONE;
+}
+
+static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
+    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
+    // fine, but since we in testing set up and tear down system over and over, we'd start using
+    // obsolete prepared statements pretty quickly.
+    // Rely on query processing caching statements instead, and lets assume
+    // that a map lookup string->statement is not gonna kill us much.
+    return cql3::get_local_query_processor().process(
+                    sprint("SELECT * FROM %s.%s WHERE %s = ?",
+                                    auth::auth::AUTH_KS, auth::auth::USERS_CF,
+                                    USER_NAME), consistency_for_user(username),
+                    { username }, true);
+}
+
+future<bool> auth::auth::is_existing_user(const sstring& username) {
+    return select_user(username).then(
+                    [](::shared_ptr<cql3::untyped_result_set> res) {
+                        return make_ready_future<bool>(!res->empty());
+                    });
+}
+
+future<bool> auth::auth::is_super_user(const sstring& username) {
+    return select_user(username).then(
+                    [](::shared_ptr<cql3::untyped_result_set> res) {
+                        return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
+                    });
+}
+
+future<> auth::auth::insert_user(const sstring& username, bool is_super)
+                throw (exceptions::request_execution_exception) {
+    return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
+                    AUTH_KS, USERS_CF, USER_NAME, SUPER),
+                    consistency_for_user(username), { username, is_super }).discard_result();
+}
+
+future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
+    return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
+                    AUTH_KS, USERS_CF, USER_NAME),
+                    consistency_for_user(username), { username }).discard_result();
+}
+
+future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
+    auto& qp = cql3::get_local_query_processor();
+    auto& db = qp.db().local();
+
+    if (db.has_schema(AUTH_KS, name)) {
+        return make_ready_future();
+    }
+
+    ::shared_ptr<cql3::statements::cf_statement> parsed = static_pointer_cast<
+                    cql3::statements::cf_statement>(cql3::query_processor::parse_statement(cql));
+    parsed->prepare_keyspace(AUTH_KS);
+    ::shared_ptr<cql3::statements::create_table_statement> statement =
+                    static_pointer_cast<cql3::statements::create_table_statement>(
+                                    parsed->prepare(db)->statement);
+    // Origin sets "Legacy Cf Id" for the new table. We have no need to be
+    // pre-2.1 compatible (afaik), so lets skip a whole lotta hoolaballo
+    return statement->announce_migration(qp.proxy(), false).then([statement](bool) {});
+}
+
+future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
+    auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
+    auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
+
+    return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
+        if (!res->empty()) {
+            return make_ready_future<bool>(true);
+        }
+        return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
+            if (!res->empty()) {
+                return make_ready_future<bool>(true);
+            }
+            return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
+                return make_ready_future<bool>(!res->empty());
+            });
+        });
+    });
+}
+
--- a/auth/auth.hh
+++ b/auth/auth.hh
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <chrono>
+#include <seastar/core/sstring.hh>
+#include <seastar/core/future.hh>
+
+#include "exceptions/exceptions.hh"
+
+namespace auth {
+
+class auth {
+public:
+    static const sstring DEFAULT_SUPERUSER_NAME;
+    static const sstring AUTH_KS;
+    static const sstring USERS_CF;
+    static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
+
+    static bool is_class_type(const sstring& type, const sstring& classname);
+
+#if 0
+    public static Set<Permission> getPermissions(AuthenticatedUser user, IResource resource)
+    {
+        return permissionsCache.getPermissions(user, resource);
+    }
+#endif
+
+    /**
+     * Checks if the username is stored in AUTH_KS.USERS_CF.
+     *
+     * @param username Username to query.
+     * @return whether or not Cassandra knows about the user.
+     */
+    static future<bool> is_existing_user(const sstring& username);
+
+    /**
+     * Checks if the user is a known superuser.
+     *
+     * @param username Username to query.
+     * @return true is the user is a superuser, false if they aren't or don't exist at all.
+     */
+    static future<bool> is_super_user(const sstring& username);
+
+    /**
+     * Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
+     *
+     * @param username Username to insert.
+     * @param isSuper User's new status.
+     * @throws RequestExecutionException
+     */
+    static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
+
+    /**
+     * Deletes the user from AUTH_KS.USERS_CF.
+     *
+     * @param username Username to delete.
+     * @throws RequestExecutionException
+     */
+    static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
+
+    /**
+     * Sets up Authenticator and Authorizer.
+     */
+    static future<> setup();
+
+    /**
+     * Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
+     *
+     * @param name name of the table
+     * @param cql CREATE TABLE statement
+     */
+    static future<> setup_table(const sstring& name, const sstring& cql);
+
+    static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
+};
+}
--- a/auth/authenticated_user.cc
+++ b/auth/authenticated_user.cc
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+#include "authenticated_user.hh"
+
+const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
+
+auth::authenticated_user::authenticated_user()
+                : _anon(true)
+{}
+
+auth::authenticated_user::authenticated_user(sstring name)
+                : _name(name), _anon(false)
+{}
+
+const sstring& auth::authenticated_user::name() const {
+    return _anon ? ANONYMOUS_USERNAME : _name;
+}
+
+bool auth::authenticated_user::operator==(const authenticated_user& v) const {
+    return _anon ? v._anon : _name == v._name;
+}
--- a/auth/authenticated_user.hh
+++ b/auth/authenticated_user.hh
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <seastar/core/sstring.hh>
+
+namespace auth {
+
+class authenticated_user {
+public:
+    static const sstring ANONYMOUS_USERNAME;
+
+    authenticated_user();
+    authenticated_user(sstring name);
+
+    const sstring& name() const;
+
+    /**
+     * Checks the user's superuser status.
+     * Only a superuser is allowed to perform CREATE USER and DROP USER queries.
+     * Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
+     * (depends on IAuthorizer implementation).
+     */
+    bool is_super() const;
+
+    /**
+     * If IAuthenticator doesn't require authentication, this method may return true.
+     */
+    bool is_anonymous() const {
+        return _anon;
+    }
+
+    bool operator==(const authenticated_user&) const;
+private:
+    sstring _name;
+    bool _anon;
+};
+
+}
+
--- a/auth/authenticator.cc
+++ b/auth/authenticator.cc
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "authenticator.hh"
+#include "authenticated_user.hh"
+#include "password_authenticator.hh"
+#include "auth.hh"
+#include "db/config.hh"
+
+const sstring auth::authenticator::USERNAME_KEY("username");
+const sstring auth::authenticator::PASSWORD_KEY("password");
+const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
+
+/**
+ * Authenticator is assumed to be a fully state-less immutable object (note all the const).
+ * We thus store a single instance globally, since it should be safe/ok.
+ */
+static std::unique_ptr<auth::authenticator> global_authenticator;
+
+future<>
+auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
+    if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
+        class allow_all_authenticator : public authenticator {
+        public:
+            const sstring& class_name() const override {
+                return ALLOW_ALL_AUTHENTICATOR_NAME;
+            }
+            bool require_authentication() const override {
+                return false;
+            }
+            option_set supported_options() const override {
+                return option_set();
+            }
+            option_set alterable_options() const override {
+                return option_set();
+            }
+            future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
+                return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
+            }
+            future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
+                return make_ready_future();
+            }
+            future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
+                return make_ready_future();
+            }
+            future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
+                return make_ready_future();
+            }
+            resource_ids protected_resources() const override {
+                return resource_ids();
+            }
+            ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
+                throw std::runtime_error("Should not reach");
+            }
+        };
+        global_authenticator = std::make_unique<allow_all_authenticator>();
+    } else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
+        auto pwa = std::make_unique<password_authenticator>();
+        auto f = pwa->init();
+        return f.then([pwa = std::move(pwa)]() mutable {
+            global_authenticator = std::move(pwa);
+        });
+    } else {
+        throw exceptions::configuration_exception("Invalid authenticator type: " + type);
+    }
+    return make_ready_future();
+}
+
+auth::authenticator& auth::authenticator::get() {
+    assert(global_authenticator);
+    return *global_authenticator;
+}
--- a/auth/authenticator.hh
+++ b/auth/authenticator.hh
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <memory>
+#include <unordered_map>
+#include <set>
+#include <stdexcept>
+#include <boost/any.hpp>
+
+#include <seastar/core/sstring.hh>
+#include <seastar/core/future.hh>
+#include <seastar/core/shared_ptr.hh>
+#include <seastar/core/enum.hh>
+
+#include "bytes.hh"
+#include "data_resource.hh"
+#include "enum_set.hh"
+#include "exceptions/exceptions.hh"
+
+namespace db {
+    class config;
+}
+
+namespace auth {
+
+class authenticated_user;
+
+class authenticator {
+public:
+    static const sstring USERNAME_KEY;
+    static const sstring PASSWORD_KEY;
+    static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
+
+    /**
+     * Supported CREATE USER/ALTER USER options.
+     * Currently only PASSWORD is available.
+     */
+    enum class option {
+        PASSWORD
+    };
+
+    using option_set = enum_set<super_enum<option, option::PASSWORD>>;
+    using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
+    using credentials_map = std::unordered_map<sstring, sstring>;
+
+    /**
+     * Resource id mappings, i.e. keyspace and/or column families.
+     */
+    using resource_ids = std::set<data_resource>;
+
+    /**
+     * Setup is called once upon system startup to initialize the IAuthenticator.
+     *
+     * For example, use this method to create any required keyspaces/column families.
+     * Note: Only call from main thread.
+     */
+    static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
+
+    /**
+     * Returns the system authenticator. Must have called setup before calling this.
+     */
+    static authenticator& get();
+
+    virtual ~authenticator()
+    {}
+
+    virtual const sstring& class_name() const = 0;
+
+    /**
+     * Whether or not the authenticator requires explicit login.
+     * If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
+     */
+    virtual bool require_authentication() const = 0;
+
+    /**
+     * Set of options supported by CREATE USER and ALTER USER queries.
+     * Should never return null - always return an empty set instead.
+     */
+    virtual option_set supported_options() const = 0;
+
+    /**
+     * Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
+     * Should never return null - always return an empty set instead.
+     */
+    virtual option_set alterable_options() const = 0;
+
+    /**
+     * Authenticates a user given a Map<String, String> of credentials.
+     * Should never return null - always throw AuthenticationException instead.
+     * Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
+     *
+     * @throws authentication_exception if credentials don't match any known user.
+     */
+    virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
+
+    /**
+     * Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
+     * If authenticator is static then the body of the method should be left blank, but don't throw an exception.
+     * options are guaranteed to be a subset of supportedOptions().
+     *
+     * @param username Username of the user to create.
+     * @param options Options the user will be created with.
+     * @throws exceptions::request_validation_exception
+     * @throws exceptions::request_execution_exception
+     */
+    virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
+
+    /**
+     * Called during execution of ALTER USER query.
+     * options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
+     * is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
+     * Keep the body of the method blank if your implementation doesn't support any options.
+     *
+     * @param username Username of the user that will be altered.
+     * @param options Options to alter.
+     * @throws exceptions::request_validation_exception
+     * @throws exceptions::request_execution_exception
+     */
+    virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
+
+
+    /**
+     * Called during execution of DROP USER query.
+     *
+     * @param username Username of the user that will be dropped.
+     * @throws exceptions::request_validation_exception
+     * @throws exceptions::request_execution_exception
+     */
+    virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
+
+     /**
+     * Set of resources that should be made inaccessible to users and only accessible internally.
+     *
+     * @return Keyspaces, column families that will be unmodifiable by users; other resources.
+     * @see resource_ids
+     */
+    virtual resource_ids protected_resources() const = 0;
+
+    class sasl_challenge {
+    public:
+        virtual ~sasl_challenge() {}
+        virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
+        virtual bool is_complete() const = 0;
+        virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
+    };
+
+    /**
+     * Provide a sasl_challenge to be used by the CQL binary protocol server. If
+     * the configured authenticator requires authentication but does not implement this
+     * interface we refuse to start the binary protocol server as it will have no way
+     * of authenticating clients.
+     * @return sasl_challenge implementation
+     */
+    virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
+};
+
+}
+
--- a/auth/data_resource.cc
+++ b/auth/data_resource.cc
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "data_resource.hh"
+
+#include <regex>
+#include "service/storage_proxy.hh"
+
+const sstring auth::data_resource::ROOT_NAME("data");
+
+auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
+    : _ks(ks), _cf(cf)
+{
+    if (l != get_level()) {
+        throw std::invalid_argument("level/keyspace/column mismatch");
+    }
+}
+
+auth::data_resource::data_resource()
+    : data_resource(level::ROOT)
+{}
+
+auth::data_resource::data_resource(const sstring& ks)
+    : data_resource(level::KEYSPACE, ks)
+{}
+
+auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
+    : data_resource(level::COLUMN_FAMILY, ks, cf)
+{}
+
+auth::data_resource::level auth::data_resource::get_level() const {
+    if (!_cf.empty()) {
+        assert(!_ks.empty());
+        return level::COLUMN_FAMILY;
+    }
+    if (!_ks.empty()) {
+        return level::KEYSPACE;
+    }
+    return level::ROOT;
+}
+
+auth::data_resource auth::data_resource::from_name(
+                const sstring& s) {
+
+    static std::regex slash_regex("/");
+
+    auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
+                    s.end(), slash_regex, -1);
+    auto e = std::regex_token_iterator<sstring::const_iterator>();
+    auto n = std::distance(i, e);
+
+    if (n > 3 || ROOT_NAME != sstring(*i++)) {
+        throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
+    }
+
+    if (n == 1) {
+        return data_resource();
+    }
+    auto ks = *i++;
+    if (n == 2) {
+        return data_resource(ks.str());
+    }
+    auto cf = *i++;
+    return data_resource(ks.str(), cf.str());
+}
+
+sstring auth::data_resource::name() const {
+    switch (get_level()) {
+        case level::ROOT:
+            return ROOT_NAME;
+        case level::KEYSPACE:
+            return sprint("%s/%s", ROOT_NAME, _ks);
+        case level::COLUMN_FAMILY:
+        default:
+            return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
+    }
+}
+
+auth::data_resource auth::data_resource::get_parent() const {
+    switch (get_level()) {
+    case level::KEYSPACE:
+        return data_resource();
+    case level::COLUMN_FAMILY:
+        return data_resource(_ks);
+    default:
+        throw std::invalid_argument("Root-level resource can't have a parent");
+    }
+}
+
+const sstring& auth::data_resource::keyspace() const
+                throw (std::invalid_argument) {
+    if (is_root_level()) {
+        throw std::invalid_argument("ROOT data resource has no keyspace");
+    }
+    return _ks;
+}
+
+const sstring& auth::data_resource::column_family() const
+                throw (std::invalid_argument) {
+    if (!is_column_family_level()) {
+        throw std::invalid_argument(sprint("%s data resource has no column family", name()));
+    }
+    return _cf;
+}
+
+bool auth::data_resource::has_parent() const {
+    return !is_root_level();
+}
+
+bool auth::data_resource::exists() const {
+    switch (get_level()) {
+        case level::ROOT:
+            return true;
+        case level::KEYSPACE:
+            return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
+        case level::COLUMN_FAMILY:
+        default:
+            return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
+    }
+}
+
+sstring auth::data_resource::to_string() const {
+    return name();
+}
+
+bool auth::data_resource::operator==(const data_resource& v) const {
+    return _ks == v._ks && _cf == v._cf;
+}
+
+bool auth::data_resource::operator<(const data_resource& v) const {
+    return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
+}
+
+std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
+    return os << r.name();
+}
+
--- a/auth/data_resource.hh
+++ b/auth/data_resource.hh
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <iosfwd>
+#include <seastar/core/sstring.hh>
+
+namespace auth {
+
+class data_resource {
+private:
+    enum class level {
+        ROOT, KEYSPACE, COLUMN_FAMILY
+    };
+
+    static const sstring ROOT_NAME;
+
+    sstring _ks;
+    sstring _cf;
+
+    data_resource(level, const sstring& ks = {}, const sstring& cf = {});
+
+    level get_level() const;
+public:
+    /**
+     * Creates a DataResource representing the root-level resource.
+     * @return the root-level resource.
+     */
+    data_resource();
+    /**
+     * Creates a DataResource representing a keyspace.
+     *
+     * @param keyspace Name of the keyspace.
+     */
+    data_resource(const sstring& ks);
+    /**
+     * Creates a DataResource instance representing a column family.
+     *
+     * @param keyspace Name of the keyspace.
+     * @param columnFamily Name of the column family.
+     */
+    data_resource(const sstring& ks, const sstring& cf);
+
+    /**
+     * Parses a data resource name into a DataResource instance.
+     *
+     * @param name Name of the data resource.
+     * @return DataResource instance matching the name.
+     */
+    static data_resource from_name(const sstring&);
+
+    /**
+     * @return Printable name of the resource.
+     */
+    sstring name() const;
+
+    /**
+     * @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
+     */
+    data_resource get_parent() const;
+
+    bool is_root_level() const {
+        return get_level() == level::ROOT;
+    }
+
+    bool is_keyspace_level() const {
+        return get_level() == level::KEYSPACE;
+    }
+
+    bool is_column_family_level() const {
+        return get_level() == level::COLUMN_FAMILY;
+    }
+
+    /**
+     * @return keyspace of the resource.
+     * @throws std::invalid_argument if it's the root-level resource.
+     */
+    const sstring& keyspace() const throw(std::invalid_argument);
+
+    /**
+     * @return column family of the resource.
+     * @throws std::invalid_argument if it's not a cf-level resource.
+     */
+    const sstring& column_family() const throw(std::invalid_argument);
+
+    /**
+     * @return Whether or not the resource has a parent in the hierarchy.
+     */
+    bool has_parent() const;
+
+    /**
+     * @return Whether or not the resource exists in scylla.
+     */
+    bool exists() const;
+
+    sstring to_string() const;
+
+    bool operator==(const data_resource&) const;
+    bool operator<(const data_resource&) const;
+};
+
+std::ostream& operator<<(std::ostream&, const data_resource&);
+
+}
+
+
+
--- a/auth/password_authenticator.cc
+++ b/auth/password_authenticator.cc
@@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <unistd.h>
+#include <crypt.h>
+#include <random>
+#include <chrono>
+
+#include <seastar/core/reactor.hh>
+
+#include "auth.hh"
+#include "password_authenticator.hh"
+#include "authenticated_user.hh"
+#include "cql3/query_processor.hh"
+#include "log.hh"
+
+const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
+
+// name of the hash column.
+static const sstring SALTED_HASH = "salted_hash";
+static const sstring USER_NAME = "username";
+static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
+static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
+static const sstring CREDENTIALS_CF = "credentials";
+
+static logging::logger logger("password_authenticator");
+
+auth::password_authenticator::~password_authenticator()
+{}
+
+auth::password_authenticator::password_authenticator()
+{}
+
+// TODO: blowfish
+// Origin uses Java bcrypt library, i.e. blowfish salt
+// generation and hashing, which is arguably a "better"
+// password hash than sha/md5 versions usually available in
+// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
+// which should be the same order of safe, so the only
+// real issue should be salted hash compatibility with
+// origin if importing system tables from there.
+//
+// Since bcrypt/blowfish is _not_ (afaict) not available
+// as a dev package/lib on most linux distros, we'd have to
+// copy and compile for example OWL  crypto
+// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
+// to be fully bit-compatible.
+//
+// Until we decide this is needed, let's just use crypt_r,
+// and some old-fashioned random salt generation.
+
+static constexpr size_t rand_bytes = 16;
+
+static sstring hashpw(const sstring& pass, const sstring& salt) {
+    // crypt_data is huge. should this be a thread_local static?
+    auto tmp = std::make_unique<crypt_data>();
+    tmp->initialized = 0;
+    auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
+    if (res == nullptr) {
+        throw std::system_error(errno, std::system_category());
+    }
+    return res;
+}
+
+static bool checkpw(const sstring& pass, const sstring& salted_hash) {
+    auto tmp = hashpw(pass, salted_hash);
+    return tmp == salted_hash;
+}
+
+static sstring gensalt() {
+    static sstring prefix;
+
+    std::random_device rd;
+    std::default_random_engine e1(rd());
+    std::uniform_int_distribution<char> dist;
+
+    sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
+    sstring input(rand_bytes, 0);
+
+    for (char&c : input) {
+        c = valid_salt[dist(e1) % valid_salt.size()];
+    }
+
+    sstring salt;
+
+    if (!prefix.empty()) {
+        return prefix + salt;
+    }
+
+    auto tmp = std::make_unique<crypt_data>();
+    tmp->initialized = 0;
+
+    // Try in order:
+    // blowfish 2011 fix, blowfish, sha512, sha256, md5
+    for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
+        salt = pfx + input;
+        if (crypt_r("fisk", salt.c_str(), tmp.get())) {
+            prefix = pfx;
+            return salt;
+        }
+    }
+    throw std::runtime_error("Could not initialize hashing algorithm");
+}
+
+static sstring hashpw(const sstring& pass) {
+    return hashpw(pass, gensalt());
+}
+
+future<> auth::password_authenticator::init() {
+    gensalt(); // do this once to determine usable hashing
+
+    sstring create_table = sprint(
+                    "CREATE TABLE %s.%s ("
+                                    "%s text,"
+                                    "%s text," // salt + hash + number of rounds
+                                    "options map<text,text>,"// for future extensions
+                                    "PRIMARY KEY(%s)"
+                                    ") WITH gc_grace_seconds=%d",
+                    auth::auth::AUTH_KS,
+                    CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
+                    90 * 24 * 60 * 60); // 3 months.
+
+    return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
+        // instead of once-timer, just schedule this later
+        sleep(auth::SUPERUSER_SETUP_DELAY).then([] {
+            auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
+                if (!exists) {
+                    cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
+                                                    auth::AUTH_KS,
+                                                    CREDENTIALS_CF,
+                                                    USER_NAME, SALTED_HASH
+                                    ),
+                                    db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
+                                        logger.info("Created default user '{}'", DEFAULT_USER_NAME);
+                                    });
+                }
+            });
+        });
+    });
+}
+
+db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
+    if (username == DEFAULT_USER_NAME) {
+        return db::consistency_level::QUORUM;
+    }
+    return db::consistency_level::LOCAL_ONE;
+}
+
+const sstring& auth::password_authenticator::class_name() const {
+    return PASSWORD_AUTHENTICATOR_NAME;
+}
+
+bool auth::password_authenticator::require_authentication() const {
+    return true;
+}
+
+auth::authenticator::option_set auth::password_authenticator::supported_options() const {
+    return option_set::of<option::PASSWORD>();
+}
+
+auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
+    return option_set::of<option::PASSWORD>();
+}
+
+future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
+                const credentials_map& credentials) const
+                                throw (exceptions::authentication_exception) {
+    if (!credentials.count(USERNAME_KEY)) {
+        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
+    }
+    if (!credentials.count(PASSWORD_KEY)) {
+        throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
+    }
+
+    auto& username = credentials.at(USERNAME_KEY);
+    auto& password = credentials.at(PASSWORD_KEY);
+
+    // Here was a thread local, explicit cache of prepared statement. In normal execution this is
+    // fine, but since we in testing set up and tear down system over and over, we'd start using
+    // obsolete prepared statements pretty quickly.
+    // Rely on query processing caching statements instead, and lets assume
+    // that a map lookup string->statement is not gonna kill us much.
+    auto& qp = cql3::get_local_query_processor();
+    return qp.process(
+                    sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
+                                    auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
+                    consistency_for_user(username), { username }, true).then_wrapped(
+                    [=](future<::shared_ptr<cql3::untyped_result_set>> f) {
+        try {
+            auto res = f.get0();
+            if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
+                throw exceptions::authentication_exception("Username and/or password are incorrect");
+            }
+            return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
+        } catch (std::system_error &) {
+            std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
+        } catch (exceptions::request_execution_exception& e) {
+            std::throw_with_nested(exceptions::authentication_exception(e.what()));
+        }
+    });
+}
+
+future<> auth::password_authenticator::create(sstring username,
+                const option_map& options)
+                                throw (exceptions::request_validation_exception,
+                                exceptions::request_execution_exception) {
+    try {
+        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
+        auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
+                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
+        auto& qp = cql3::get_local_query_processor();
+        return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
+    } catch (std::out_of_range&) {
+        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+    }
+}
+
+future<> auth::password_authenticator::alter(sstring username,
+                const option_map& options)
+                                throw (exceptions::request_validation_exception,
+                                exceptions::request_execution_exception) {
+    try {
+        auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
+        auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
+                        auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
+        auto& qp = cql3::get_local_query_processor();
+        return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
+    } catch (std::out_of_range&) {
+        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+    }
+}
+
+future<> auth::password_authenticator::drop(sstring username)
+                throw (exceptions::request_validation_exception,
+                exceptions::request_execution_exception) {
+    try {
+        auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
+                        auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
+        auto& qp = cql3::get_local_query_processor();
+        return qp.process(query, consistency_for_user(username), { username }).discard_result();
+    } catch (std::out_of_range&) {
+        throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
+    }
+}
+
+auth::authenticator::resource_ids auth::password_authenticator::protected_resources() const {
+    return { data_resource(auth::AUTH_KS, CREDENTIALS_CF) };
+}
+
+::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
+    class plain_text_password_challenge: public sasl_challenge {
+    public:
+        plain_text_password_challenge(const password_authenticator& a)
+                        : _authenticator(a)
+        {}
+
+        /**
+         * SASL PLAIN mechanism specifies that credentials are encoded in a
+         * sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
+         * The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
+         * authzId is optional, and in fact we don't care about it here as we'll
+         * set the authzId to match the authnId (that is, there is no concept of
+         * a user being authorized to act on behalf of another).
+         *
+         * @param bytes encoded credentials string sent by the client
+         * @return map containing the username/password pairs in the form an IAuthenticator
+         * would expect
+         * @throws javax.security.sasl.SaslException
+         */
+        bytes evaluate_response(bytes_view client_response)
+                        throw (exceptions::authentication_exception) override {
+            logger.debug("Decoding credentials from client token");
+
+            sstring username, password;
+
+            auto b = client_response.crbegin();
+            auto e = client_response.crend();
+            auto i = b;
+
+            while (i != e) {
+                if (*i == 0) {
+                    sstring tmp(i.base(), b.base());
+                    if (password.empty()) {
+                        password = std::move(tmp);
+                    } else if (username.empty()) {
+                        username = std::move(tmp);
+                    }
+                    b = ++i;
+                    continue;
+                }
+                ++i;
+            }
+
+            if (username.empty()) {
+                throw exceptions::authentication_exception("Authentication ID must not be null");
+            }
+            if (password.empty()) {
+                throw exceptions::authentication_exception("Password must not be null");
+            }
+
+            _credentials[USERNAME_KEY] = std::move(username);
+            _credentials[PASSWORD_KEY] = std::move(password);
+            _complete = true;
+            return {};
+        }
+        bool is_complete() const override {
+            return _complete;
+        }
+        future<::shared_ptr<authenticated_user>> get_authenticated_user() const
+                        throw (exceptions::authentication_exception) override {
+            return _authenticator.authenticate(_credentials);
+        }
+    private:
+        const password_authenticator& _authenticator;
+        credentials_map _credentials;
+        bool _complete = false;
+    };
+    return ::make_shared<plain_text_password_challenge>(*this);
+}
--- a/auth/password_authenticator.hh
+++ b/auth/password_authenticator.hh
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "authenticator.hh"
+
+namespace auth {
+
+class password_authenticator : public authenticator {
+public:
+    static const sstring PASSWORD_AUTHENTICATOR_NAME;
+
+    password_authenticator();
+    ~password_authenticator();
+
+    future<> init();
+
+    const sstring& class_name() const override;
+    bool require_authentication() const override;
+    option_set supported_options() const override;
+    option_set alterable_options() const override;
+    future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
+    future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
+    future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
+    future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
+    resource_ids protected_resources() const override;
+    ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
+
+
+    static db::consistency_level consistency_for_user(const sstring& username);
+};
+
+}
+
--- a/auth/permission.cc
+++ b/auth/permission.cc
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "permission.hh"
+
+const auth::permission_set auth::ALL_DATA = auth::permission_set::of
+                < auth::permission::CREATE, auth::permission::ALTER,
+                auth::permission::DROP, auth::permission::SELECT,
+                auth::permission::MODIFY, auth::permission::AUTHORIZE>();
+const auth::permission_set auth::ALL = auth::ALL_DATA;
+const auth::permission_set auth::NONE;
--- a/auth/permission.hh
+++ b/auth/permission.hh
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2016 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "enum_set.hh"
+
+namespace auth {
+
+enum class permission {
+    //Deprecated
+    READ,
+    //Deprecated
+    WRITE,
+
+    // schema management
+    CREATE, // required for CREATE KEYSPACE and CREATE TABLE.
+    ALTER,  // required for ALTER KEYSPACE, ALTER TABLE, CREATE INDEX, DROP INDEX.
+    DROP,   // required for DROP KEYSPACE and DROP TABLE.
+
+    // data access
+    SELECT, // required for SELECT.
+    MODIFY, // required for INSERT, UPDATE, DELETE, TRUNCATE.
+
+    // permission management
+    AUTHORIZE, // required for GRANT and REVOKE.
+};
+
+typedef enum_set<super_enum<permission,
+                permission::READ,
+                permission::WRITE,
+                permission::CREATE,
+                permission::ALTER,
+                permission::DROP,
+                permission::SELECT,
+                permission::MODIFY,
+                permission::AUTHORIZE>> permission_set;
+
+extern const permission_set ALL_DATA;
+extern const permission_set ALL;
+extern const permission_set NONE;
+
+}
--- a/bytes.hh
+++ b/bytes.hh
@@ -22,6 +22,7 @@
 #pragma once

 #include "core/sstring.hh"
+#include "hashing.hh"
 #include <experimental/optional>
 #include <iosfwd>
 #include <functional>
@@ -57,3 +58,20 @@ std::ostream& operator<<(std::ostream& os, const bytes_view& b);

 }

+template<>
+struct appending_hash<bytes> {
+    template<typename Hasher>
+    void operator()(Hasher& h, const bytes& v) const {
+        feed_hash(h, v.size());
+        h.update(reinterpret_cast<const char*>(v.cbegin()), v.size() * sizeof(bytes::value_type));
+    }
+};
+
+template<>
+struct appending_hash<bytes_view> {
+    template<typename Hasher>
+    void operator()(Hasher& h, bytes_view v) const {
+        feed_hash(h, v.size());
+        h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
+    }
+};
--- a/bytes_ostream.hh
+++ b/bytes_ostream.hh
@@ -24,6 +24,7 @@
 #include "types.hh"
 #include "net/byteorder.hh"
 #include "core/unaligned.hh"
+#include "hashing.hh"

 /**
 * Utility for writing data into a buffer when its final size is not known up front.
@@ -33,8 +34,10 @@
 *
 */
 class bytes_ostream {
+public:
    using size_type = bytes::size_type;
    using value_type = bytes::value_type;
+private:
    static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
    struct chunk {
        // FIXME: group fragment pointers to reduce pointer chasing when packetizing
@@ -117,13 +120,13 @@ private:
        };
    }
 public:
-    bytes_ostream()
+    bytes_ostream() noexcept
        : _begin()
        , _current(nullptr)
        , _size(0)
    { }

-    bytes_ostream(bytes_ostream&& o)
+    bytes_ostream(bytes_ostream&& o) noexcept
        : _begin(std::move(o._begin))
        , _current(o._current)
        , _size(o._size)
@@ -148,7 +151,7 @@ public:
        return *this;
    }

-    bytes_ostream& operator=(bytes_ostream&& o) {
+    bytes_ostream& operator=(bytes_ostream&& o) noexcept {
        _size = o._size;
        _begin = std::move(o._begin);
        _current = o._current;
@@ -330,3 +333,13 @@ public:
        _current->offset = pos._offset;
    }
 };
+
+template<>
+struct appending_hash<bytes_ostream> {
+    template<typename Hasher>
+    void operator()(Hasher& h, const bytes_ostream& b) const {
+        for (auto&& frag : b.fragments()) {
+            feed_hash(h, frag);
+        }
+    }
+};
--- a/caching_options.hh
+++ b/caching_options.hh
@@ -82,6 +82,12 @@ public:
        }
        return caching_options(k, r);
    }
+    bool operator==(const caching_options& other) const {
+        return _key_cache == other._key_cache && _row_cache == other._row_cache;
+    }
+    bool operator!=(const caching_options& other) const {
+        return !(*this == other);
+    }
 };


--- a/canonical_mutation.cc
+++ b/canonical_mutation.cc
@@ -0,0 +1,98 @@
+/*
+ * Copyright (C) 2015 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "canonical_mutation.hh"
+#include "mutation.hh"
+#include "mutation_partition_serializer.hh"
+#include "converting_mutation_partition_applier.hh"
+#include "hashing_partition_visitor.hh"
+
+template class db::serializer<canonical_mutation>;
+
+//
+// Representation layout:
+//
+// <canonical_mutation> ::= <column_family_id> <table_schema_version> <partition_key> <column-mapping> <partition>
+//
+// For <partition> see mutation_partition_serializer.cc
+// For <column-mapping> see db::serializer<column_mapping>
+//
+
+canonical_mutation::canonical_mutation(bytes data)
+        : _data(std::move(data))
+{ }
+
+canonical_mutation::canonical_mutation(const mutation& m)
+    : _data([&m] {
+        bytes_ostream out;
+        db::serializer<utils::UUID>(m.column_family_id()).write(out);
+        db::serializer<table_schema_version>(m.schema()->version()).write(out);
+        db::serializer<partition_key_view>(m.key()).write(out);
+        db::serializer<column_mapping>(m.schema()->get_column_mapping()).write(out);
+        mutation_partition_serializer ser(*m.schema(), m.partition());
+        ser.write(out);
+        return to_bytes(out.linearize());
+    }())
+{ }
+
+mutation canonical_mutation::to_mutation(schema_ptr s) const {
+    data_input in(_data);
+
+    auto cf_id = db::serializer<utils::UUID>::read(in);
+    if (s->id() != cf_id) {
+        throw std::runtime_error(sprint("Attempted to deserialize canonical_mutation of table %s with schema of table %s (%s.%s)",
+                                        cf_id, s->id(), s->ks_name(), s->cf_name()));
+    }
+
+    auto version = db::serializer<table_schema_version>::read(in);
+    auto pk = partition_key(db::serializer<partition_key_view>::read(in));
+
+    mutation m(std::move(pk), std::move(s));
+
+    if (version == m.schema()->version()) {
+        db::serializer<column_mapping>::skip(in);
+        auto partition_view = mutation_partition_serializer::read_as_view(in);
+        m.partition().apply(*m.schema(), partition_view, *m.schema());
+    } else {
+        column_mapping cm = db::serializer<column_mapping>::read(in);
+        converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
+        auto partition_view = mutation_partition_serializer::read_as_view(in);
+        partition_view.accept(cm, v);
+    }
+    return m;
+}
+
+template<>
+db::serializer<canonical_mutation>::serializer(const canonical_mutation& v)
+        : _item(v)
+        , _size(db::serializer<bytes>(v._data).size())
+{ }
+
+template<>
+void
+db::serializer<canonical_mutation>::write(output& out, const canonical_mutation& v) {
+    db::serializer<bytes>(v._data).write(out);
+}
+
+template<>
+canonical_mutation db::serializer<canonical_mutation>::read(input& in) {
+    return canonical_mutation(db::serializer<bytes>::read(in));
+}
--- a/canonical_mutation.hh
+++ b/canonical_mutation.hh
@@ -0,0 +1,71 @@
+/*
+ * Copyright (C) 2015 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "bytes.hh"
+#include "schema.hh"
+#include "database_fwd.hh"
+#include "db/serializer.hh"
+#include "mutation_partition_visitor.hh"
+#include "mutation_partition_serializer.hh"
+
+// Immutable mutation form which can be read using any schema version of the same table.
+// Safe to access from other shards via const&.
+// Safe to pass serialized across nodes.
+class canonical_mutation {
+    bytes _data;
+    canonical_mutation(bytes);
+public:
+    explicit canonical_mutation(const mutation&);
+
+    canonical_mutation(canonical_mutation&&) = default;
+    canonical_mutation(const canonical_mutation&) = default;
+    canonical_mutation& operator=(const canonical_mutation&) = default;
+    canonical_mutation& operator=(canonical_mutation&&) = default;
+
+    // Create a mutation object interpreting this canonical mutation using
+    // given schema.
+    //
+    // Data which is not representable in the target schema is dropped. If this
+    // is not intended, user should sync the schema first.
+    mutation to_mutation(schema_ptr) const;
+
+    friend class db::serializer<canonical_mutation>;
+};
+//
+//template<>
+//struct hash<canonical_mutation> {
+//    template<typename Hasher>
+//    void operator()(Hasher& h, const canonical_mutation& m) const {
+//        m.feed_hash(h);
+//    }
+//};
+
+namespace db {
+
+template<> serializer<canonical_mutation>::serializer(const canonical_mutation&);
+template<> void serializer<canonical_mutation>::write(output&, const canonical_mutation&);
+template<> canonical_mutation serializer<canonical_mutation>::read(input&);
+
+extern template class serializer<canonical_mutation>;
+
+}
--- a/compound.hh
+++ b/compound.hh
@@ -68,7 +68,7 @@ public:
        , _byte_order_equal(std::all_of(_types.begin(), _types.end(), [] (auto t) {
                return t->is_byte_order_equal();
            }))
-        , _byte_order_comparable(_types.size() == 1 && _types[0]->is_byte_order_comparable())
+        , _byte_order_comparable(!is_prefixable && _types.size() == 1 && _types[0]->is_byte_order_comparable())
        , _is_reversed(_types.size() == 1 && _types[0]->is_reversed())
    { }

@@ -278,10 +278,10 @@ public:
            });
    }
    bytes from_string(sstring_view s) {
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
    }
    sstring to_string(const bytes& b) {
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
    }
    // Retruns true iff given prefix has no missing components
    bool is_full(bytes_view v) const {
--- a/compress.hh
+++ b/compress.hh
@@ -114,6 +114,14 @@ public:
        }
        return opts;
    }
+    bool operator==(const compression_parameters& other) const {
+        return _compressor == other._compressor
+               && _chunk_length == other._chunk_length
+               && _crc_check_chance == other._crc_check_chance;
+    }
+    bool operator!=(const compression_parameters& other) const {
+        return !(*this == other);
+    }
 private:
    void validate_options(const std::map<sstring, sstring>& options) {
        // currently, there are no options specific to a particular compressor
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -169,6 +169,17 @@ rpc_address: localhost
 # port for Thrift to listen for clients on
 rpc_port: 9160

+# port for REST API server
+api_port: 10000
+
+# IP for the REST API server
+api_address: 127.0.0.1
+
+# Log WARN on any batch size exceeding this value. 5kb per batch by default.
+# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
+batch_size_warn_threshold_in_kb: 5
+
+
 ###################################################
 ## Not currently supported, reserved for future use
 ###################################################
@@ -205,7 +216,7 @@ rpc_port: 9160
 # reduced proportionally to the number of nodes in the cluster.
 # batchlog_replay_throttle_in_kb: 1024

-# Authentication backend, implementing IAuthenticator; used to identify users
+# Authentication backend, identifying users
 # Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
 # PasswordAuthenticator}.
 #
@@ -599,10 +610,6 @@ commitlog_total_space_in_mb: -1
 # column_index_size_in_kb: 64


-# Log WARN on any batch size exceeding this value. 5kb per batch by default.
-# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
-# batch_size_warn_threshold_in_kb: 5
-
 # Number of simultaneous compactions to allow, NOT including
 # validation "compactions" for anti-entropy repair.  Simultaneous
 # compactions can help preserve read performance in a mixed read/write
@@ -782,40 +789,25 @@ commitlog_total_space_in_mb: -1
 # the request scheduling. Currently the only valid option is keyspace.
 # request_scheduler_id: keyspace

-# Enable or disable inter-node encryption
-# Default settings are TLS v1, RSA 1024-bit keys (it is imperative that
-# users generate their own keys) TLS_RSA_WITH_AES_128_CBC_SHA as the cipher
-# suite for authentication, key exchange and encryption of the actual data transfers.
-# Use the DHE/ECDHE ciphers if running in FIPS 140 compliant mode.
-# NOTE: No custom encryption options are enabled at the moment
+# Enable or disable inter-node encryption. 
+# You must also generate keys and provide the appropriate key and trust store locations and passwords. 
+# No custom encryption options are currently enabled. The available options are:
+#
 # The available internode options are : all, none, dc, rack
-#
-# If set to dc cassandra will encrypt the traffic between the DCs
-# If set to rack cassandra will encrypt the traffic between the racks
-#
-# The passwords used in these options must match the passwords used when generating
-# the keystore and truststore.  For instructions on generating these files, see:
-# http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore
+# If set to dc scylla  will encrypt the traffic between the DCs
+# If set to rack scylla  will encrypt the traffic between the racks
 #
 # server_encryption_options:
 #    internode_encryption: none
-#    keystore: conf/.keystore
-#    keystore_password: cassandra
-#    truststore: conf/.truststore
-#    truststore_password: cassandra
-
-    # More advanced defaults below:
-    # protocol: TLS
-    # algorithm: SunX509
-    # store_type: JKS
-    # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
-    # require_client_auth: false
+#    certificate: conf/scylla.crt
+#    keyfile: conf/scylla.key
+#    truststore: <none, use system trust>

 # enable or disable client/server encryption.
 # client_encryption_options:
 #    enabled: false
-#    keystore: conf/.keystore
-#    keystore_password: cassandra
+#    certificate: conf/scylla.crt
+#    keyfile: conf/scylla.key

    # require_client_auth: false
    # Set trustore and truststore_password if require_client_auth is true
@@ -839,3 +831,17 @@ commitlog_total_space_in_mb: -1
 # reducing overhead from the TCP protocol itself, at the cost of increasing
 # latency if you block for cross-datacenter responses.
 # inter_dc_tcp_nodelay: false
+
+# Relaxation of environment checks.
+#
+# Scylla places certain requirements on its environment.  If these requirements are
+# not met, performance and reliability can be degraded.
+#
+# These requirements include:
+#    - A filesystem with good support for aysnchronous I/O (AIO). Currently,
+#      this means XFS.
+#
+# false: strict environment checks are in place; do not start if they are not met.
+# true: relaxed environment checks; performance and reliability may degraade.
+#
+# developer_mode: false
--- a/configure.py
+++ b/configure.py
@@ -50,6 +50,9 @@ def apply_tristate(var, test, note, missing):
            return False
    return False

+def have_pkg(package):
+    return subprocess.call(['pkg-config', package]) == 0
+
 def pkg_config(option, package):
    output = subprocess.check_output(['pkg-config', option, package])
    return output.decode('utf-8').strip()
@@ -134,6 +137,7 @@ modes = {

 scylla_tests = [
    'tests/mutation_test',
+    'tests/canonical_mutation_test',
    'tests/range_test',
    'tests/types_test',
    'tests/keys_test',
@@ -151,6 +155,7 @@ scylla_tests = [
    'tests/perf/perf_sstable',
    'tests/cql_query_test',
    'tests/storage_proxy_test',
+    'tests/schema_change_test',
    'tests/mutation_reader_test',
    'tests/key_reader_test',
    'tests/mutation_query_test',
@@ -183,6 +188,8 @@ scylla_tests = [
    'tests/managed_vector_test',
    'tests/crc_test',
    'tests/flush_queue_test',
+    'tests/dynamic_bitset_test',
+    'tests/auth_test',
 ]

 apps = [
@@ -221,6 +228,8 @@ arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_t
 			help = 'Link libgcc and libstdc++ statically')
 arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
                        help = 'Enable(1)/disable(0)compiler debug information generation for tests')
+arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
+                        help = 'Python3 path')
 add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
 add_tristate(arg_parser, name = 'xen', dest = 'xen', help = 'Xen support')
 args = arg_parser.parse_args()
@@ -234,11 +243,15 @@ cassandra_interface = Thrift(source = 'interface/cassandra.thrift', service = 'C

 scylla_core = (['database.cc',
                 'schema.cc',
+                 'frozen_schema.cc',
+                 'schema_registry.cc',
                 'bytes.cc',
                 'mutation.cc',
                 'row_cache.cc',
+                 'canonical_mutation.cc',
                 'frozen_mutation.cc',
                 'memtable.cc',
+                 'schema_mutations.cc',
                 'release.cc',
                 'utils/logalloc.cc',
                 'utils/large_bitset.cc',
@@ -280,6 +293,8 @@ scylla_core = (['database.cc',
                 'cql3/statements/schema_altering_statement.cc',
                 'cql3/statements/ks_prop_defs.cc',
                 'cql3/statements/modification_statement.cc',
+                 'cql3/statements/parsed_statement.cc',
+                 'cql3/statements/property_definitions.cc',
                 'cql3/statements/update_statement.cc',
                 'cql3/statements/delete_statement.cc',
                 'cql3/statements/batch_statement.cc',
@@ -289,6 +304,7 @@ scylla_core = (['database.cc',
                 'cql3/statements/index_target.cc',
                 'cql3/statements/create_index_statement.cc',
                 'cql3/statements/truncate_statement.cc',
+                 'cql3/statements/alter_table_statement.cc',
                 'cql3/update_parameters.cc',
                 'cql3/ut_name.cc',
                 'thrift/handler.cc',
@@ -339,6 +355,7 @@ scylla_core = (['database.cc',
                 'utils/rate_limiter.cc',
                 'utils/compaction_manager.cc',
                 'utils/file_lock.cc',
+                 'utils/dynamic_bitset.cc',
                 'gms/version_generator.cc',
                 'gms/versioned_value.cc',
                 'gms/gossiper.cc',
@@ -370,6 +387,7 @@ scylla_core = (['database.cc',
                 'locator/ec2_snitch.cc',
                 'locator/ec2_multi_region_snitch.cc',
                 'message/messaging_service.cc',
+                 'service/client_state.cc',
                 'service/migration_task.cc',
                 'service/storage_service.cc',
                 'service/pending_range_calculator_service.cc',
@@ -403,6 +421,12 @@ scylla_core = (['database.cc',
                 'repair/repair.cc',
                 'exceptions/exceptions.cc',
                 'dns.cc',
+                 'auth/auth.cc',
+                 'auth/authenticated_user.cc',
+                 'auth/authenticator.cc',
+                 'auth/data_resource.cc',
+                 'auth/password_authenticator.cc',
+                 'auth/permission.cc',
                 ]
                + [Antlr3Grammar('cql3/Cql.g')]
                + [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -464,6 +488,7 @@ tests_not_using_seastar_test_framework = set([
    'tests/partitioner_test',
    'tests/map_difference_test',
    'tests/frozen_mutation_test',
+    'tests/canonical_mutation_test',
    'tests/perf/perf_mutation',
    'tests/lsa_async_eviction_test',
    'tests/lsa_sync_eviction_test',
@@ -482,6 +507,7 @@ tests_not_using_seastar_test_framework = set([
    'tests/crc_test',
    'tests/perf/perf_sstable',
    'tests/managed_vector_test',
+    'tests/dynamic_bitset_test',
 ])

 for t in tests_not_using_seastar_test_framework:
@@ -498,7 +524,7 @@ deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
 deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
 deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
 deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
-deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc']
+deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc', 'utils/dynamic_bitset.cc']

 warnings = [
    '-Wno-mismatched-tags',  # clang-only
@@ -524,6 +550,17 @@ else:
    args.pie = ''
    args.fpie = ''

+optional_packages = ['libsystemd']
+pkgs = []
+
+for pkg in optional_packages:
+    if have_pkg(pkg):
+        pkgs.append(pkg)
+        upkg = pkg.upper().replace('-', '_')
+        defines.append('HAVE_{}=1'.format(upkg))
+    else:
+        print('Missing optional package {pkg}'.format(**locals()))
+
 defines = ' '.join(['-D' + d for d in defines])

 globals().update(vars(args))
@@ -556,7 +593,7 @@ elif args.dpdk_target:
 seastar_cflags = args.user_cflags + " -march=nehalem"
 seastar_flags += ['--compiler', args.cxx, '--cflags=%s' % (seastar_cflags)]

-status = subprocess.call(['./configure.py'] + seastar_flags, cwd = 'seastar')
+status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')

 if status != 0:
    print('Seastar configuration failed')
@@ -585,7 +622,10 @@ for mode in build_modes:
 seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'

 args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
-libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem'
+libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem' + ' -lcrypt'
+for pkg in pkgs:
+    args.user_cflags += ' ' + pkg_config('--cflags', pkg)
+    libs += ' ' + pkg_config('--libs', pkg)
 user_cflags = args.user_cflags
 user_ldflags = args.user_ldflags
 if args.staticcxx:
@@ -752,7 +792,7 @@ with open(buildfile, 'w') as f:
    f.write('build {}: phony\n'.format(seastar_deps))
    f.write(textwrap.dedent('''\
        rule configure
-          command = python3 configure.py $configure_args
+          command = {python} configure.py $configure_args
          generator = 1
        build build.ninja: configure | configure.py
        rule cscope
--- a/converting_mutation_partition_applier.hh
+++ b/converting_mutation_partition_applier.hh
@@ -0,0 +1,119 @@
+/*
+ * Copyright (C) 2015 Cloudius Systems, Ltd.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "mutation_partition_view.hh"
+#include "schema.hh"
+
+// Mutation partition visitor which applies visited data into
+// existing mutation_partition. The visited data may be of a different schema.
+// Data which is not representable in the new schema is dropped.
+// Weak exception guarantees.
+class converting_mutation_partition_applier : public mutation_partition_visitor {
+    const schema& _p_schema;
+    mutation_partition& _p;
+    const column_mapping& _visited_column_mapping;
+    deletable_row* _current_row;
+private:
+    static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
+        return new_def.kind == kind && new_def.type->is_value_compatible_with(*old_type);
+    }
+    void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
+        if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
+            dst.apply(new_def, atomic_cell_or_collection(cell));
+        }
+    }
+    void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
+        if (!is_compatible(new_def, old_type, kind)) {
+            return;
+        }
+        auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
+        auto old_view = ctype->deserialize_mutation_form(cell);
+
+        collection_type_impl::mutation_view new_view;
+        if (old_view.tomb.timestamp > new_def.dropped_at()) {
+            new_view.tomb = old_view.tomb;
+        }
+        for (auto& c : old_view.cells) {
+            if (c.second.timestamp() > new_def.dropped_at()) {
+                new_view.cells.emplace_back(std::move(c));
+            }
+        }
+        dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
+    }
+public:
+    converting_mutation_partition_applier(
+            const column_mapping& visited_column_mapping,
+            const schema& target_schema,
+            mutation_partition& target)
+        : _p_schema(target_schema)
+        , _p(target)
+        , _visited_column_mapping(visited_column_mapping)
+    { }
+
+    virtual void accept_partition_tombstone(tombstone t) override {
+        _p.apply(t);
+    }
+
+    virtual void accept_static_cell(column_id id, atomic_cell_view cell) override {
+        const column_mapping::column& col = _visited_column_mapping.static_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
+        }
+    }
+
+    virtual void accept_static_cell(column_id id, collection_mutation_view collection) override {
+        const column_mapping::column& col = _visited_column_mapping.static_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
+        }
+    }
+
+    virtual void accept_row_tombstone(clustering_key_prefix_view prefix, tombstone t) override {
+        _p.apply_row_tombstone(_p_schema, prefix, t);
+    }
+
+    virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {
+        deletable_row& r = _p.clustered_row(_p_schema, key);
+        r.apply(rm);
+        r.apply(deleted_at);
+        _current_row = &r;
+    }
+
+    virtual void accept_row_cell(column_id id, atomic_cell_view cell) override {
+        const column_mapping::column& col = _visited_column_mapping.regular_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
+        }
+    }
+
+    virtual void accept_row_cell(column_id id, collection_mutation_view collection) override {
+        const column_mapping::column& col = _visited_column_mapping.regular_column_at(id);
+        const column_definition* def = _p_schema.get_column_definition(col.name());
+        if (def) {
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
+        }
+    }
+};
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -31,6 +31,7 @@ options {

@parser::includes {
 #include "cql3/selection/writetime_or_ttl.hh"
+#include "cql3/statements/alter_table_statement.hh"
 #include "cql3/statements/create_keyspace_statement.hh"
 #include "cql3/statements/drop_keyspace_statement.hh"
 #include "cql3/statements/create_index_statement.hh"
@@ -269,7 +270,9 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
    | st12=dropTableStatement          { $stmt = st12; }
 #if 0
    | st13=dropIndexStatement          { $stmt = st13; }
+#endif
    | st14=alterTableStatement         { $stmt = st14; }
+#if 0
    | st15=alterKeyspaceStatement      { $stmt = st15; }
    | st16=grantStatement              { $stmt = st16; }
    | st17=revokeStatement             { $stmt = st17; }
@@ -768,7 +771,7 @@ alterKeyspaceStatement returns [AlterKeyspaceStatement expr]
    : K_ALTER K_KEYSPACE ks=keyspaceName
        K_WITH properties[attrs] { $expr = new AlterKeyspaceStatement(ks, attrs); }
    ;
-
+#endif

 /**
 * ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
@@ -777,27 +780,29 @@ alterKeyspaceStatement returns [AlterKeyspaceStatement expr]
 * ALTER COLUMN FAMILY <CF> WITH <property> = <value>;
 * ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>;
 */
-alterTableStatement returns [AlterTableStatement expr]
+alterTableStatement returns [shared_ptr<alter_table_statement> expr]
    @init {
-        AlterTableStatement.Type type = null;
-        CFPropDefs props = new CFPropDefs();
-        Map<ColumnIdentifier.Raw, ColumnIdentifier.Raw> renames = new HashMap<ColumnIdentifier.Raw, ColumnIdentifier.Raw>();
-        boolean isStatic = false;
+        alter_table_statement::type type;
+        auto props = make_shared<cql3::statements::cf_prop_defs>();;
+        std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames;
+        bool is_static = false;
    }
    : K_ALTER K_COLUMNFAMILY cf=columnFamilyName
-          ( K_ALTER id=cident K_TYPE v=comparatorType { type = AlterTableStatement.Type.ALTER; }
-          | K_ADD   id=cident v=comparatorType ({ isStatic=true; } K_STATIC)? { type = AlterTableStatement.Type.ADD; }
-          | K_DROP  id=cident                         { type = AlterTableStatement.Type.DROP; }
-          | K_WITH  properties[props]                 { type = AlterTableStatement.Type.OPTS; }
-          | K_RENAME                                  { type = AlterTableStatement.Type.RENAME; }
-               id1=cident K_TO toId1=cident { renames.put(id1, toId1); }
-               ( K_AND idn=cident K_TO toIdn=cident { renames.put(idn, toIdn); } )*
+          ( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; }
+          | K_ADD   id=cident v=comparatorType ({ is_static=true; } K_STATIC)? { type = alter_table_statement::type::add; }
+          | K_DROP  id=cident                         { type = alter_table_statement::type::drop; }
+          | K_WITH  properties[props]                 { type = alter_table_statement::type::opts; }
+          | K_RENAME                                  { type = alter_table_statement::type::rename; }
+               id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); }
+               ( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )*
          )
    {
-        $expr = new AlterTableStatement(cf, type, id, v, props, renames, isStatic);
+        $expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(id),
+            std::move(v), std::move(props), std::move(renames), is_static);
    }
    ;

+#if 0
 /**
 * ALTER TYPE <name> ALTER <field> TYPE <newtype>;
 * ALTER TYPE <name> ADD <field> <newtype>;
@@ -856,7 +861,7 @@ dropIndexStatement returns [DropIndexStatement expr]
  * TRUNCATE <CF>;
  */
 truncateStatement returns [::shared_ptr<truncate_statement> stmt]
-    : K_TRUNCATE cf=columnFamilyName { $stmt = ::make_shared<truncate_statement>(cf); }
+    : K_TRUNCATE (K_COLUMNFAMILY)? cf=columnFamilyName { $stmt = ::make_shared<truncate_statement>(cf); }
    ;

 #if 0
@@ -1243,6 +1248,7 @@ relationType returns [const cql3::operator_type* op = nullptr]
    ;

 relation[std::vector<cql3::relation_ptr>& clauses]
+    @init{ const cql3::operator_type* rt = nullptr; }
    : name=cident type=relationType t=term { $clauses.emplace_back(::make_shared<cql3::single_column_relation>(std::move(name), *type, std::move(t))); }

    | K_TOKEN l=tupleOfIdentifiers type=relationType t=term
@@ -1252,11 +1258,9 @@ relation[std::vector<cql3::relation_ptr>& clauses]
        { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), cql3::operator_type::IN, std::move(marker))); }
    | name=cident K_IN in_values=singleColumnInValues
        { $clauses.emplace_back(cql3::single_column_relation::create_in_relation(std::move(name), std::move(in_values))); }
-#if 0
-    | name=cident K_CONTAINS { Operator rt = Operator.CONTAINS; } (K_KEY { rt = Operator.CONTAINS_KEY; })?
-        t=term { $clauses.add(new SingleColumnRelation(name, rt, t)); }
-    | name=cident '[' key=term ']' type=relationType t=term { $clauses.add(new SingleColumnRelation(name, key, type, t)); }
-#endif
+    | name=cident K_CONTAINS { rt = &cql3::operator_type::CONTAINS; } (K_KEY { rt = &cql3::operator_type::CONTAINS_KEY; })?
+        t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), *rt, std::move(t))); }
+    | name=cident '[' key=term ']' type=relationType t=term { $clauses.emplace_back(make_shared<cql3::single_column_relation>(std::move(name), std::move(key), *type, std::move(t))); }
    | ids=tupleOfIdentifiers
      ( K_IN
          ( '(' ')'
--- a/cql3/column_identifier.hh
+++ b/cql3/column_identifier.hh
@@ -55,14 +55,11 @@ namespace cql3 {
 * Represents an identifer for a CQL column definition.
 * TODO : should support light-weight mode without text representation for when not interned
 */
-class column_identifier final : public selection::selectable /* implements IMeasurableMemory*/ {
+class column_identifier final : public selection::selectable {
 public:
    bytes bytes_;
 private:
    sstring _text;
-#if 0
-    private static final long EMPTY_SIZE = ObjectSizes.measure(new ColumnIdentifier("", true));
-#endif
 public:
    column_identifier(sstring raw_text, bool keep_case);

@@ -83,20 +80,6 @@ public:
    }

 #if 0
-    public long unsharedHeapSize()
-    {
-        return EMPTY_SIZE
-             + ObjectSizes.sizeOnHeapOf(bytes)
-             + ObjectSizes.sizeOf(text);
-    }
-
-    public long unsharedHeapSizeExcludingData()
-    {
-        return EMPTY_SIZE
-             + ObjectSizes.sizeOnHeapExcludingData(bytes)
-             + ObjectSizes.sizeOf(text);
-    }
-
    public ColumnIdentifier clone(AbstractAllocator allocator)
    {
        return new ColumnIdentifier(allocator.clone(bytes), text);
--- a/cql3/maps.cc
+++ b/cql3/maps.cc
@@ -114,30 +114,26 @@ maps::literal::validate_assignable_to(database& db, const sstring& keyspace, col

 assignment_testable::test_result
 maps::literal::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {
-    throw std::runtime_error("not implemented");
-#if 0
-    if (!(receiver.type instanceof MapType))
-        return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
-
+    if (!dynamic_pointer_cast<const map_type_impl>(receiver->type)) {
+        return assignment_testable::test_result::NOT_ASSIGNABLE;
+    }
    // If there is no elements, we can't say it's an exact match (an empty map if fundamentally polymorphic).
-    if (entries.isEmpty())
-        return AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
-
-    ColumnSpecification keySpec = Maps.keySpecOf(receiver);
-    ColumnSpecification valueSpec = Maps.valueSpecOf(receiver);
+    if (entries.empty()) {
+        return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
+    }
+    auto key_spec = maps::key_spec_of(*receiver);
+    auto value_spec = maps::value_spec_of(*receiver);
    // It's an exact match if all are exact match, but is not assignable as soon as any is non assignable.
-    AssignmentTestable.TestResult res = AssignmentTestable.TestResult.EXACT_MATCH;
-    for (Pair<Term.Raw, Term.Raw> entry : entries)
-    {
-        AssignmentTestable.TestResult t1 = entry.left.testAssignment(keyspace, keySpec);
-        AssignmentTestable.TestResult t2 = entry.right.testAssignment(keyspace, valueSpec);
-        if (t1 == AssignmentTestable.TestResult.NOT_ASSIGNABLE || t2 == AssignmentTestable.TestResult.NOT_ASSIGNABLE)
-            return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
-        if (t1 != AssignmentTestable.TestResult.EXACT_MATCH || t2 != AssignmentTestable.TestResult.EXACT_MATCH)
-            res = AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
+    auto res = assignment_testable::test_result::EXACT_MATCH;
+    for (auto entry : entries) {
+        auto t1 = entry.first->test_assignment(db, keyspace, key_spec);
+        auto t2 = entry.second->test_assignment(db, keyspace, value_spec);
+        if (t1 == assignment_testable::test_result::NOT_ASSIGNABLE || t2 == assignment_testable::test_result::NOT_ASSIGNABLE)
+            return assignment_testable::test_result::NOT_ASSIGNABLE;
+        if (t1 != assignment_testable::test_result::EXACT_MATCH || t2 != assignment_testable::test_result::EXACT_MATCH)
+            res = assignment_testable::test_result::WEAKLY_ASSIGNABLE;
    }
    return res;
-#endif
 }

 sstring
--- a/cql3/operation.hh
+++ b/cql3/operation.hh
@@ -199,13 +199,7 @@ public:
        }

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver);
-#if 0
-        protected String toString(ColumnSpecification column)
-        {
-            return String.format("%s[%s] = %s", column.name, selector, value);
-        }

-#endif
        virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
    };

@@ -218,13 +212,6 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;

-#if 0
-        protected String toString(ColumnSpecification column)
-        {
-            return String.format("%s = %s + %s", column.name, column.name, value);
-        }
-#endif
-
        virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
    };

@@ -237,13 +224,6 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;

-#if 0
-        protected String toString(ColumnSpecification column)
-        {
-            return String.format("%s = %s - %s", column.name, column.name, value);
-        }
-#endif
-
        virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
    };

@@ -256,12 +236,6 @@ public:

        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;

-#if 0
-        protected String toString(ColumnSpecification column)
-        {
-            return String.format("%s = %s - %s", column.name, value, column.name);
-        }
-#endif
        virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
    };

--- a/cql3/query_options.cc
+++ b/cql3/query_options.cc
@@ -99,9 +99,9 @@ query_options::query_options(query_options&& o, std::vector<std::vector<bytes_vi
    _batch_options = std::move(tmp);
 }

-query_options::query_options(std::vector<bytes_opt> values)
+query_options::query_options(db::consistency_level cl, std::vector<bytes_opt> values)
    : query_options(
-          db::consistency_level::ONE,
+          cl,
          {},
          std::move(values),
          {},
@@ -120,6 +120,11 @@ query_options::query_options(std::vector<bytes_opt> values)
    }
 }

+query_options::query_options(std::vector<bytes_opt> values)
+    : query_options(
+          db::consistency_level::ONE, std::move(values))
+{}
+
 db::consistency_level query_options::get_consistency() const
 {
    return _consistency;
--- a/cql3/query_options.hh
+++ b/cql3/query_options.hh
@@ -112,6 +112,7 @@ public:

    // forInternalUse
    explicit query_options(std::vector<bytes_opt> values);
+    explicit query_options(db::consistency_level, std::vector<bytes_opt> values);

    db::consistency_level get_consistency() const;
    bytes_view_opt get_value_at(size_t idx) const;
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -109,6 +109,7 @@ future<> query_processor::stop()
 future<::shared_ptr<result_message>>
 query_processor::process(const sstring_view& query_string, service::query_state& query_state, query_options& options)
 {
+    log.trace("process: \"{}\"", query_string);
    auto p = get_statement(query_string, query_state.get_client_state());
    options.prepare(p->bound_names);
    auto cql_statement = p->statement;
@@ -178,7 +179,7 @@ query_processor::prepare(const std::experimental::string_view& query_string, con
 query_processor::get_stored_prepared_statement(const std::experimental::string_view& query_string, const sstring& keyspace, bool for_thrift)
 {
    if (for_thrift) {
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 #if 0
        Integer thriftStatementId = computeThriftId(queryString, keyspace);
        ParsedStatement.Prepared existing = thriftPreparedStatements.get(thriftStatementId);
@@ -209,7 +210,7 @@ query_processor::store_prepared_statement(const std::experimental::string_view&
                                                        MAX_CACHE_PREPARED_MEMORY));
 #endif
    if (for_thrift) {
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 #if 0
        Integer statementId = computeThriftId(queryString, keyspace);
        thriftPreparedStatements.put(statementId, prepared);
@@ -299,8 +300,9 @@ query_processor::parse_statement(const sstring_view& query)
 }

 query_options query_processor::make_internal_options(
-        ::shared_ptr<statements::parsed_statement::prepared> p,
-        const std::initializer_list<data_value>& values) {
+                ::shared_ptr<statements::parsed_statement::prepared> p,
+                const std::initializer_list<data_value>& values,
+                db::consistency_level cl) {
    if (p->bound_names.size() != values.size()) {
        throw std::invalid_argument(sprint("Invalid number of values. Expecting %d but got %d", p->bound_names.size(), values.size()));
    }
@@ -316,13 +318,12 @@ query_options query_processor::make_internal_options(
            bound_values.push_back({n->type->decompose(v)});
        }
    }
-    return query_options(bound_values);
+    return query_options(cl, bound_values);
 }

 ::shared_ptr<statements::parsed_statement::prepared> query_processor::prepare_internal(
-        const std::experimental::string_view& query_string) {
-
-    auto& p = _internal_statements[sstring(query_string.begin(), query_string.end())];
+        const sstring& query_string) {
+    auto& p = _internal_statements[query_string];
    if (p == nullptr) {
        auto np = parse_statement(query_string)->prepare(_db.local());
        np->statement->validate(_proxy, *_internal_state);
@@ -332,19 +333,54 @@ query_options query_processor::make_internal_options(
 }

 future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
-        const std::experimental::string_view& query_string,
+        const sstring& query_string,
        const std::initializer_list<data_value>& values) {
+    if (log.is_enabled(logging::log_level::trace)) {
+        log.trace("execute_internal: \"{}\" ({})", query_string, ::join(", ", values));
+    }
    auto p = prepare_internal(query_string);
+    return execute_internal(p, values);
+}
+
+future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
+        ::shared_ptr<statements::parsed_statement::prepared> p,
+        const std::initializer_list<data_value>& values) {
    auto opts = make_internal_options(p, values);
    return do_with(std::move(opts),
            [this, p = std::move(p)](query_options & opts) {
                return p->statement->execute_internal(_proxy, *_internal_state, opts).then(
-                        [](::shared_ptr<transport::messages::result_message> msg) {
+                        [p](::shared_ptr<transport::messages::result_message> msg) {
                            return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
                        });
            });
 }

+future<::shared_ptr<untyped_result_set>> query_processor::process(
+                const sstring& query_string,
+                db::consistency_level cl, const std::initializer_list<data_value>& values, bool cache)
+{
+    auto p = cache ? prepare_internal(query_string) : parse_statement(query_string)->prepare(_db.local());
+    if (!cache) {
+        p->statement->validate(_proxy, *_internal_state);
+    }
+    return process(p, cl, values);
+}
+
+future<::shared_ptr<untyped_result_set>> query_processor::process(
+                ::shared_ptr<statements::parsed_statement::prepared> p,
+                db::consistency_level cl, const std::initializer_list<data_value>& values)
+{
+    auto opts = make_internal_options(p, values, cl);
+    return do_with(std::move(opts),
+            [this, p = std::move(p)](query_options & opts) {
+                return p->statement->execute(_proxy, *_internal_state, opts).then(
+                        [p](::shared_ptr<transport::messages::result_message> msg) {
+                            return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
+                        });
+            });
+}
+
+
 future<::shared_ptr<transport::messages::result_message>>
 query_processor::process_batch(::shared_ptr<statements::batch_statement> batch, service::query_state& query_state, query_options& options) {
    auto& client_state = query_state.get_client_state();
@@ -385,8 +421,12 @@ void query_processor::migration_subscriber::on_update_keyspace(const sstring& ks
 {
 }

-void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name)
+void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed)
 {
+    if (columns_changed) {
+        log.info("Column definitions for {}.{} changed, invalidating related prepared statements", ks_name, cf_name);
+        remove_invalid_prepared_statements(ks_name, cf_name);
+    }
 }

 void query_processor::migration_subscriber::on_update_user_type(const sstring& ks_name, const sstring& type_name)
@@ -436,9 +476,7 @@ void query_processor::migration_subscriber::remove_invalid_prepared_statements(s
        }
    }
    for (auto& id : invalid) {
-        get_query_processor().invoke_on_all([id] (auto& qp) {
-            qp.invalidate_prepared_statement(id);
-        });
+        _qp->invalidate_prepared_statement(id);
    }
 }

--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -322,14 +322,25 @@ public:
    }
 #endif
 private:
-    ::shared_ptr<statements::parsed_statement::prepared> prepare_internal(const std::experimental::string_view& query);
-    query_options make_internal_options(::shared_ptr<statements::parsed_statement::prepared>, const std::initializer_list<data_value>&);
-
+    query_options make_internal_options(::shared_ptr<statements::parsed_statement::prepared>, const std::initializer_list<data_value>&, db::consistency_level = db::consistency_level::ONE);
 public:
    future<::shared_ptr<untyped_result_set>> execute_internal(
-            const std::experimental::string_view& query_string,
+            const sstring& query_string,
            const std::initializer_list<data_value>& = { });

+    ::shared_ptr<statements::parsed_statement::prepared> prepare_internal(const sstring& query);
+
+    future<::shared_ptr<untyped_result_set>> execute_internal(
+            ::shared_ptr<statements::parsed_statement::prepared>,
+            const std::initializer_list<data_value>& = { });
+
+    future<::shared_ptr<untyped_result_set>> process(
+                    const sstring& query_string,
+                    db::consistency_level, const std::initializer_list<data_value>& = { }, bool cache = false);
+    future<::shared_ptr<untyped_result_set>> process(
+                    ::shared_ptr<statements::parsed_statement::prepared>,
+                    db::consistency_level, const std::initializer_list<data_value>& = { });
+
    /*
     * This function provides a timestamp that is guaranteed to be higher than any timestamp
     * previously used in internal queries.
@@ -486,7 +497,7 @@ public:
    virtual void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;

    virtual void on_update_keyspace(const sstring& ks_name) override;
-    virtual void on_update_column_family(const sstring& ks_name, const sstring& cf_name) override;
+    virtual void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override;
    virtual void on_update_user_type(const sstring& ks_name, const sstring& type_name) override;
    virtual void on_update_function(const sstring& ks_name, const sstring& function_name) override;
    virtual void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;
--- a/cql3/restrictions/multi_column_restriction.hh
+++ b/cql3/restrictions/multi_column_restriction.hh
@@ -374,7 +374,7 @@ public:
    }

    virtual std::vector<bytes_opt> bounds(statements::bound b, const query_options& options) const override {
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 #if 0
        return Composites.toByteBuffers(boundsAsComposites(b, options));
 #endif
--- a/cql3/restrictions/statement_restrictions.cc
+++ b/cql3/restrictions/statement_restrictions.cc
@@ -41,13 +41,13 @@ public:

    ::shared_ptr<primary_key_restrictions<T>> do_merge_to(schema_ptr schema, ::shared_ptr<restriction> restriction) const {
        if (restriction->is_multi_column()) {
-            throw std::runtime_error("not implemented");
+            throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
        }
        return ::make_shared<single_column_primary_key_restrictions<T>>(schema)->merge_to(schema, restriction);
    }
    ::shared_ptr<primary_key_restrictions<T>> merge_to(schema_ptr schema, ::shared_ptr<restriction> restriction) override {
        if (restriction->is_multi_column()) {
-            throw std::runtime_error("not implemented");
+            throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
        }
        if (restriction->is_on_token()) {
            return static_pointer_cast<token_restriction>(restriction);
--- a/cql3/selection/selection.cc
+++ b/cql3/selection/selection.cc
@@ -384,7 +384,11 @@ void result_set_builder::visitor::accept_new_row(
            _builder.add(_partition_key[def->component_index()]);
            break;
        case column_kind::clustering_key:
-            _builder.add(_clustering_key[def->component_index()]);
+            if (_clustering_key.size() > def->component_index()) {
+                _builder.add(_clustering_key[def->component_index()]);
+            } else {
+                _builder.add({});
+            }
            break;
        case column_kind::regular_column:
            add_value(*def, row_iterator);
--- a/cql3/selection/selection.hh
+++ b/cql3/selection/selection.hh
@@ -134,7 +134,7 @@ public:
     * @return <code>true</code> if this selection contains a collection, <code>false</code> otherwise.
     */
    bool contains_a_collection() const {
-        if (!_schema->has_collections()) {
+        if (!_schema->has_multi_cell_collections()) {
            return false;
        }

--- a/cql3/single_column_relation.hh
+++ b/cql3/single_column_relation.hh
@@ -159,7 +159,7 @@ protected:
    virtual shared_ptr<restrictions::restriction> new_contains_restriction(database& db, schema_ptr schema,
                                                 ::shared_ptr<variable_specifications> bound_names,
                                                 bool is_key) override {
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 #if 0
        ColumnDefinition columnDef = toColumnDefinition(schema, entity);
        Term term = toTerm(toReceivers(schema, columnDef), value, schema.ksName, bound_names);
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -0,0 +1,278 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2015 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cql3/statements/alter_table_statement.hh"
+#include "service/migration_manager.hh"
+#include "validation.hh"
+#include "db/config.hh"
+
+namespace cql3 {
+
+namespace statements {
+
+alter_table_statement::alter_table_statement(shared_ptr<cf_name> name,
+                                             type t,
+                                             shared_ptr<column_identifier::raw> column_name,
+                                             shared_ptr<cql3_type::raw> validator,
+                                             shared_ptr<cf_prop_defs> properties,
+                                             renames_type renames,
+                                             bool is_static)
+    : schema_altering_statement(std::move(name))
+    , _type(t)
+    , _raw_column_name(std::move(column_name))
+    , _validator(std::move(validator))
+    , _properties(std::move(properties))
+    , _renames(std::move(renames))
+    , _is_static(is_static)
+{
+}
+
+void alter_table_statement::check_access(const service::client_state& state)
+{
+    warn(unimplemented::cause::PERMISSIONS);
+#if 0
+    state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.ALTER);
+#endif
+}
+
+void alter_table_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state)
+{
+    // validated in announce_migration()
+}
+
+static const sstring ALTER_TABLE_FEATURE = "ALTER TABLE";
+
+future<bool> alter_table_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
+{
+    auto& db = proxy.local().get_db().local();
+    db.get_config().check_experimental(ALTER_TABLE_FEATURE);
+
+    auto schema = validation::validate_column_family(db, keyspace(), column_family());
+    auto cfm = schema_builder(schema);
+
+    shared_ptr<cql3_type> validator;
+    if (_validator) {
+        validator = _validator->prepare(db, keyspace());
+    }
+    shared_ptr<column_identifier> column_name;
+    const column_definition* def = nullptr;
+    if (_raw_column_name) {
+        column_name = _raw_column_name->prepare_column_identifier(schema);
+        def = get_column_definition(schema, *column_name);
+    }
+
+    switch (_type) {
+    case alter_table_statement::type::add:
+    {
+        assert(column_name);
+        if (schema->is_dense()) {
+            throw exceptions::invalid_request_exception("Cannot add new column to a COMPACT STORAGE table");
+        }
+
+        if (_is_static) {
+            if (!schema->is_compound()) {
+                throw exceptions::invalid_request_exception("Static columns are not allowed in COMPACT STORAGE tables");
+            }
+            if (!schema->clustering_key_size()) {
+                throw exceptions::invalid_request_exception("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
+            }
+        }
+
+        if (def) {
+            if (def->is_partition_key()) {
+                throw exceptions::invalid_request_exception(sprint("Invalid column name %s because it conflicts with a PRIMARY KEY part", column_name));
+            } else {
+                throw exceptions::invalid_request_exception(sprint("Invalid column name %s because it conflicts with an existing column", column_name));
+            }
+        }
+
+        // Cannot re-add a dropped counter column. See #7831.
+        if (schema->is_counter() && schema->dropped_columns().count(column_name->text())) {
+            throw exceptions::invalid_request_exception(sprint("Cannot re-add previously dropped counter column %s", column_name));
+        }
+
+        auto type = validator->get_type();
+        if (type->is_collection() && type->is_multi_cell()) {
+            if (!schema->is_compound()) {
+                throw exceptions::invalid_request_exception("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
+            }
+            if (schema->is_super()) {
+                throw exceptions::invalid_request_exception("Cannot use non-frozen collections with super column families");
+            }
+        }
+
+        cfm.with_column(column_name->name(), type, _is_static ? column_kind::static_column : column_kind::regular_column);
+        break;
+    }
+    case alter_table_statement::type::alter:
+    {
+        assert(column_name);
+        if (!def) {
+            throw exceptions::invalid_request_exception(sprint("Column %s was not found in table %s", column_name, column_family()));
+        }
+
+        auto type = validator->get_type();
+        switch (def->kind) {
+        case column_kind::partition_key:
+            if (type->is_counter()) {
+                throw exceptions::invalid_request_exception(sprint("counter type is not supported for PRIMARY KEY part %s", column_name));
+            }
+
+            if (!type->is_value_compatible_with(*def->type)) {
+                throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are incompatible.",
+                    column_name,
+                    def->type->as_cql3_type(),
+                    validator));
+            }
+            break;
+
+        case column_kind::clustering_key:
+            if (!schema->is_cql3_table()) {
+                throw exceptions::invalid_request_exception(sprint("Cannot alter clustering column %s in a non-CQL3 table", column_name));
+            }
+
+            // Note that CFMetaData.validateCompatibility already validate the change we're about to do. However, the error message it
+            // sends is a bit cryptic for a CQL3 user, so validating here for a sake of returning a better error message
+            // Do note that we need isCompatibleWith here, not just isValueCompatibleWith.
+            if (!type->is_compatible_with(*def->type)) {
+                throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are not order-compatible.",
+                    column_name,
+                    def->type->as_cql3_type(),
+                    validator));
+            }
+            break;
+
+        case column_kind::compact_column:
+        case column_kind::regular_column:
+        case column_kind::static_column:
+            // Thrift allows to change a column validator so CFMetaData.validateCompatibility will let it slide
+            // if we change to an incompatible type (contrarily to the comparator case). But we don't want to
+            // allow it for CQL3 (see #5882) so validating it explicitly here. We only care about value compatibility
+            // though since we won't compare values (except when there is an index, but that is validated by
+            // ColumnDefinition already).
+            if (!type->is_value_compatible_with(*def->type)) {
+                throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are incompatible.",
+                    column_name,
+                    def->type->as_cql3_type(),
+                    validator));
+            }
+            break;
+        }
+        // In any case, we update the column definition
+        cfm.with_altered_column_type(column_name->name(), type);
+        break;
+    }
+    case alter_table_statement::type::drop:
+        assert(column_name);
+        if (!schema->is_cql3_table()) {
+            throw exceptions::invalid_request_exception("Cannot drop columns from a non-CQL3 table");
+        }
+        if (!def) {
+            throw exceptions::invalid_request_exception(sprint("Column %s was not found in table %s", column_name, column_family()));
+        }
+
+        if (def->is_primary_key()) {
+            throw exceptions::invalid_request_exception(sprint("Cannot drop PRIMARY KEY part %s", column_name));
+        } else {
+            for (auto&& column_def : boost::range::join(schema->static_columns(), schema->regular_columns())) { // find
+                if (column_def.name() == column_name->name()) {
+                    cfm.without_column(column_name->name());
+                    break;
+                }
+            }
+        }
+        break;
+
+    case alter_table_statement::type::opts:
+        if (!_properties) {
+            throw exceptions::invalid_request_exception("ALTER COLUMNFAMILY WITH invoked, but no parameters found");
+        }
+
+        _properties->validate();
+
+        if (schema->is_counter() && _properties->get_default_time_to_live() > 0) {
+            throw exceptions::invalid_request_exception("Cannot set default_time_to_live on a table with counters");
+        }
+
+        _properties->apply_to_builder(cfm);
+        break;
+
+    case alter_table_statement::type::rename:
+        for (auto&& entry : _renames) {
+            auto from = entry.first->prepare_column_identifier(schema);
+            auto to = entry.second->prepare_column_identifier(schema);
+
+            auto def = schema->get_column_definition(from->name());
+            if (!def) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename unknown column %s in table %s", from, column_family()));
+            }
+
+            if (schema->get_column_definition(to->name())) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename column %s to %s in table %s; another column of that name already exist", from, to, column_family()));
+            }
+
+            if (def->is_part_of_cell_name()) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename non PRIMARY KEY part %s", from));
+            }
+
+            if (def->is_indexed()) {
+                throw exceptions::invalid_request_exception(sprint("Cannot rename column %s because it is secondary indexed", from));
+            }
+
+            cfm.with_column_rename(from->name(), to->name());
+        }
+        break;
+    }
+
+    return service::get_local_migration_manager().announce_column_family_update(cfm.build(), false, is_local_only).then([] {
+        return true;
+    });
+}
+
+shared_ptr<transport::event::schema_change> alter_table_statement::change_event()
+{
+    return make_shared<transport::event::schema_change>(transport::event::schema_change::change_type::UPDATED,
+        transport::event::schema_change::target_type::TABLE, keyspace(), column_family());
+}
+
+}
+
+}
--- a/cql3/statements/alter_table_statement.hh
+++ b/cql3/statements/alter_table_statement.hh
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2015 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "cql3/statements/schema_altering_statement.hh"
+#include "cql3/statements/cf_prop_defs.hh"
+#include "cql3/cql3_type.hh"
+
+namespace cql3 {
+
+namespace statements {
+
+class alter_table_statement : public schema_altering_statement {
+public:
+    enum class type {
+        add,
+        alter,
+        drop,
+        opts,
+        rename,
+    };
+    using renames_type = std::vector<std::pair<shared_ptr<column_identifier::raw>,
+                                               shared_ptr<column_identifier::raw>>>;
+private:
+    const type _type;
+    const shared_ptr<column_identifier::raw> _raw_column_name;
+    const shared_ptr<cql3_type::raw> _validator;
+    const shared_ptr<cf_prop_defs> _properties;
+    const renames_type _renames;
+    const bool _is_static;
+public:
+    alter_table_statement(shared_ptr<cf_name> name,
+                          type t,
+                          shared_ptr<column_identifier::raw> column_name,
+                          shared_ptr<cql3_type::raw> validator,
+                          shared_ptr<cf_prop_defs> properties,
+                          renames_type renames,
+                          bool is_static);
+
+    virtual void check_access(const service::client_state& state) override;
+    virtual void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
+    virtual future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
+    virtual shared_ptr<transport::event::schema_change> change_event() override;
+};
+
+}
+
+}
--- a/cql3/statements/batch_statement.cc
+++ b/cql3/statements/batch_statement.cc
@@ -38,6 +38,7 @@
 */

 #include "batch_statement.hh"
+#include "db/config.hh"

 namespace cql3 {

@@ -55,6 +56,50 @@ bool batch_statement::depends_on_column_family(const sstring& cf_name) const
    return false;
 }

+void batch_statement::verify_batch_size(const std::vector<mutation>& mutations) {
+    size_t warn_threshold = service::get_local_storage_proxy().get_db().local().get_config().batch_size_warn_threshold_in_kb();
+
+    class my_partition_visitor : public mutation_partition_visitor {
+    public:
+        void accept_partition_tombstone(tombstone) override {}
+        void accept_static_cell(column_id, atomic_cell_view v)  override {
+            size += v.value().size();
+        }
+        void accept_static_cell(column_id, collection_mutation_view v) override {
+            size += v.data.size();
+        }
+        void accept_row_tombstone(clustering_key_prefix_view, tombstone) override {}
+        void accept_row(clustering_key_view, tombstone, const row_marker&) override {}
+        void accept_row_cell(column_id, atomic_cell_view v) override {
+            size += v.value().size();
+        }
+        void accept_row_cell(column_id id, collection_mutation_view v) override {
+            size += v.data.size();
+        }
+
+        size_t size = 0;
+    };
+
+    my_partition_visitor v;
+
+    for (auto&m : mutations) {
+        m.partition().accept(*m.schema(), v);
+    }
+
+    auto size = v.size / 1024;
+
+    if (v.size > warn_threshold) {
+        std::unordered_set<sstring> ks_cf_pairs;
+        for (auto&& m : mutations) {
+            ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
+        }
+        _logger.warn(
+                        "Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}",
+                        join(", ", ks_cf_pairs), size, warn_threshold,
+                        size - warn_threshold, "");
+    }
+}
+
 }

 }
--- a/cql3/statements/batch_statement.hh
+++ b/cql3/statements/batch_statement.hh
@@ -196,27 +196,8 @@ public:
     * Checks batch size to ensure threshold is met. If not, a warning is logged.
     * @param cfs ColumnFamilies that will store the batch's mutations.
     */
-    static void verify_batch_size(const std::vector<mutation>& mutations) {
-        size_t warn_threshold = 1000; // FIXME: database_descriptor::get_batch_size_warn_threshold();
-        size_t fail_threshold = 2000; // FIXME: database_descriptor::get_batch_size_fail_threshold();
+    static void verify_batch_size(const std::vector<mutation>& mutations);

-        size_t size = mutations.size();
-
-        if (size > warn_threshold) {
-            std::unordered_set<sstring> ks_cf_pairs;
-            for (auto&& m : mutations) {
-                ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
-            }
-            const char* format = "Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}";
-            if (size > fail_threshold) {
-                // FIXME: Tracing.trace(format, new Object[] {ksCfPairs, size, failThreshold, size - failThreshold, " (see batch_size_fail_threshold_in_kb)"});
-                _logger.error(format, join(", ", ks_cf_pairs), size, fail_threshold, size - fail_threshold, " (see batch_size_fail_threshold_in_kb)");
-                throw exceptions::invalid_request_exception("Batch too large");
-            } else {
-                _logger.warn(format, join(", ", ks_cf_pairs), size, warn_threshold, size - warn_threshold, "");
-            }
-        }
-    }
    virtual future<shared_ptr<transport::messages::result_message>> execute(
            distributed<service::storage_proxy>& storage, service::query_state& state, const query_options& options) override {
        return execute(storage, state, options, false, options.get_timestamp(state));
@@ -322,7 +303,7 @@ public:
    virtual future<shared_ptr<transport::messages::result_message>> execute_internal(
            distributed<service::storage_proxy>& proxy,
            service::query_state& query_state, const query_options& options) override {
-        throw "not implemented";
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 #if 0
        assert !hasConditions;
        for (IMutation mutation : getMutations(BatchQueryOptions.withoutPerStatementVariables(options), true, queryState.getTimestamp()))
--- a/cql3/statements/cf_prop_defs.cc
+++ b/cql3/statements/cf_prop_defs.cc
@@ -139,6 +139,11 @@ std::map<sstring, sstring> cf_prop_defs::get_compression_options() const {
    return std::map<sstring, sstring>{};
 }

+int32_t cf_prop_defs::get_default_time_to_live() const
+{
+    return get_int(KW_DEFAULT_TIME_TO_LIVE, 0);
+}
+
 void cf_prop_defs::apply_to_builder(schema_builder& builder) {
    if (has_property(KW_COMMENT)) {
        builder.set_comment(get_string(KW_COMMENT, ""));
--- a/cql3/statements/cf_prop_defs.hh
+++ b/cql3/statements/cf_prop_defs.hh
@@ -100,6 +100,8 @@ public:
        return options;
    }
 #endif
+    int32_t get_default_time_to_live() const;
+
    void apply_to_builder(schema_builder& builder);
    void validate_minimum_int(const sstring& field, int32_t minimum_value, int32_t default_value) const;
 };
--- a/cql3/statements/create_index_statement.cc
+++ b/cql3/statements/create_index_statement.cc
@@ -81,7 +81,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
    auto cd = schema->get_column_definition(target->column->name());

    if (cd == nullptr) {
-        throw exceptions::invalid_request_exception(sprint("No column definition found for column %s", target->column->name()));
+        throw exceptions::invalid_request_exception(sprint("No column definition found for column %s", *target->column));
    }

    bool is_map = dynamic_cast<const collection_type_impl *>(cd->type.get()) != nullptr
@@ -93,7 +93,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
            throw exceptions::invalid_request_exception(
                    sprint("Cannot create index on %s of frozen<map> column %s",
                            index_target::index_option(target->type),
-                            target->column->name()));
+                            *target->column));
        }
    } else {
        // validateNotFullIndex
@@ -107,7 +107,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
                    sprint(
                            "Cannot create index on %s of column %s; only non-frozen collections support %s indexes",
                            index_target::index_option(target->type),
-                            target->column->name(),
+                            *target->column,
                            index_target::index_option(target->type)));
        }
        // validateTargetColumnIsMapIfIndexInvolvesKeys
@@ -118,7 +118,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
                        sprint(
                                "Cannot create index on %s of column %s with non-map type",
                                index_target::index_option(target->type),
-                                target->column->name()));
+                                *target->column));

            }
        }
@@ -132,9 +132,9 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
                            "Cannot create index on %s(%s): an index on %s(%s) already exists and indexing "
                                    "a map on more than one dimension at the same time is not currently supported",
                            index_target::index_option(target->type),
-                            target->column->name(),
+                            *target->column,
                            index_target::index_option(prev_type),
-                            target->column->name()));
+                            *target->column));
        }
        if (_if_not_exists) {
            return;
@@ -164,12 +164,13 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
        throw exceptions::invalid_request_exception(
                sprint(
                        "Cannot create secondary index on partition key column %s",
-                        target->column->name()));
+                        *target->column));
    }
 }

 future<bool>
 cql3::statements::create_index_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) {
+    throw std::runtime_error("Indexes are not supported yet");
    auto schema = proxy.local().get_db().local().find_schema(keyspace(), column_family());
    auto target = _raw_target->prepare(schema);

--- a/cql3/statements/delete_statement.cc
+++ b/cql3/statements/delete_statement.cc
@@ -45,6 +45,14 @@ namespace cql3 {

 namespace statements {

+delete_statement::delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
+        : modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
+{ }
+
+bool delete_statement::require_full_clustering_key() const {
+    return false;
+}
+
 void delete_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
    if (_column_operations.empty()) {
        m.partition().apply_delete(*s, prefix, params.make_tombstone());
@@ -96,5 +104,17 @@ delete_statement::parsed::prepare_internal(database& db, schema_ptr schema, ::sh
    return stmt;
 }

+delete_statement::parsed::parsed(::shared_ptr<cf_name> name,
+                                 ::shared_ptr<attributes::raw> attrs,
+                                 std::vector<::shared_ptr<operation::raw_deletion>> deletions,
+                                 std::vector<::shared_ptr<relation>> where_clause,
+                                 conditions_vector conditions,
+                                 bool if_exists)
+    : modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, if_exists)
+    , _deletions(std::move(deletions))
+    , _where_clause(std::move(where_clause))
+{ }
+
 }
+
 }
--- a/cql3/statements/delete_statement.hh
+++ b/cql3/statements/delete_statement.hh
@@ -55,13 +55,9 @@ namespace statements {
 */
 class delete_statement : public modification_statement {
 public:
-    delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
-            : modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
-    { }
+    delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs);

-    virtual bool require_full_clustering_key() const override {
-        return false;
-    }
+    virtual bool require_full_clustering_key() const override;

    virtual void add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override;

@@ -94,11 +90,7 @@ public:
               std::vector<::shared_ptr<operation::raw_deletion>> deletions,
               std::vector<::shared_ptr<relation>> where_clause,
               conditions_vector conditions,
-               bool if_exists)
-            : modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, if_exists)
-            , _deletions(std::move(deletions))
-            , _where_clause(std::move(where_clause))
-        { }
+               bool if_exists);
    protected:
        virtual ::shared_ptr<modification_statement> prepare_internal(database& db, schema_ptr schema,
            ::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs);
--- a/cql3/statements/modification_statement.cc
+++ b/cql3/statements/modification_statement.cc
@@ -71,6 +71,81 @@ operator<<(std::ostream& out, modification_statement::statement_type t) {
    return out;
 }

+modification_statement::modification_statement(statement_type type_, uint32_t bound_terms, schema_ptr schema_, std::unique_ptr<attributes> attrs_)
+    : type{type_}
+    , _bound_terms{bound_terms}
+    , s{schema_}
+    , attrs{std::move(attrs_)}
+    , _column_operations{}
+{ }
+
+bool modification_statement::uses_function(const sstring& ks_name, const sstring& function_name) const {
+    if (attrs->uses_function(ks_name, function_name)) {
+        return true;
+    }
+    for (auto&& e : _processed_keys) {
+        auto r = e.second;
+        if (r && r->uses_function(ks_name, function_name)) {
+            return true;
+        }
+    }
+    for (auto&& operation : _column_operations) {
+        if (operation && operation->uses_function(ks_name, function_name)) {
+            return true;
+        }
+    }
+    for (auto&& condition : _column_conditions) {
+        if (condition && condition->uses_function(ks_name, function_name)) {
+            return true;
+        }
+    }
+    for (auto&& condition : _static_conditions) {
+        if (condition && condition->uses_function(ks_name, function_name)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+uint32_t modification_statement::get_bound_terms() {
+    return _bound_terms;
+}
+
+sstring modification_statement::keyspace() const {
+    return s->ks_name();
+}
+
+sstring modification_statement::column_family() const {
+    return s->cf_name();
+}
+
+bool modification_statement::is_counter() const {
+    return s->is_counter();
+}
+
+int64_t modification_statement::get_timestamp(int64_t now, const query_options& options) const {
+    return attrs->get_timestamp(now, options);
+}
+
+bool modification_statement::is_timestamp_set() const {
+    return attrs->is_timestamp_set();
+}
+
+gc_clock::duration modification_statement::get_time_to_live(const query_options& options) const {
+    return gc_clock::duration(attrs->get_time_to_live(options));
+}
+
+void modification_statement::check_access(const service::client_state& state) {
+    warn(unimplemented::cause::PERMISSIONS);
+#if 0
+    state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.MODIFY);
+
+    // CAS updates can be used to simulate a SELECT query, so should require Permission.SELECT as well.
+    if (hasConditions())
+        state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
+#endif
+}
+
 future<std::vector<mutation>>
 modification_statement::get_mutations(distributed<service::storage_proxy>& proxy, const query_options& options, bool local, int64_t now) {
    auto keys = make_lw_shared(build_partition_keys(options));
@@ -195,7 +270,7 @@ modification_statement::read_required_rows(
    for (auto&& pk : *keys) {
        pr.emplace_back(dht::global_partitioner().decorate_key(*s, pk));
    }
-    query::read_command cmd(s->id(), ps, std::numeric_limits<uint32_t>::max());
+    query::read_command cmd(s->id(), s->version(), ps, std::numeric_limits<uint32_t>::max());
    // FIXME: ignoring "local"
    return proxy.local().query(s, make_lw_shared(std::move(cmd)), std::move(pr), cl).then([this, ps] (auto result) {
        // FIXME: copying
@@ -549,6 +624,63 @@ bool modification_statement::depends_on_column_family(const sstring& cf_name) co
    return column_family() == cf_name;
 }

+void modification_statement::add_operation(::shared_ptr<operation> op) {
+    if (op->column.is_static()) {
+        _sets_static_columns = true;
+    } else {
+        _sets_regular_columns = true;
+    }
+    _column_operations.push_back(std::move(op));
+}
+
+void modification_statement::add_condition(::shared_ptr<column_condition> cond) {
+    if (cond->column.is_static()) {
+        _sets_static_columns = true;
+        _static_conditions.emplace_back(std::move(cond));
+    } else {
+        _sets_regular_columns = true;
+        _column_conditions.emplace_back(std::move(cond));
+    }
+}
+
+void modification_statement::set_if_not_exist_condition() {
+    _if_not_exists = true;
+}
+
+bool modification_statement::has_if_not_exist_condition() const {
+    return _if_not_exists;
+}
+
+void modification_statement::set_if_exist_condition() {
+    _if_exists = true;
+}
+
+bool modification_statement::has_if_exist_condition() const {
+    return _if_exists;
+}
+
+bool modification_statement::requires_read() {
+    return std::any_of(_column_operations.begin(), _column_operations.end(), [] (auto&& op) {
+        return op->requires_read();
+    });
+}
+
+bool modification_statement::has_conditions() {
+    return _if_not_exists || _if_exists || !_column_conditions.empty() || !_static_conditions.empty();
+}
+
+void modification_statement::validate_where_clause_for_conditions() {
+    //  no-op by default
+}
+
+modification_statement::parsed::parsed(::shared_ptr<cf_name> name, ::shared_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists)
+    : cf_statement{std::move(name)}
+    , _attrs{std::move(attrs)}
+    , _conditions{std::move(conditions)}
+    , _if_not_exists{if_not_exists}
+    , _if_exists{if_exists}
+{ }
+
 }

 }
--- a/cql3/statements/modification_statement.hh
+++ b/cql3/statements/modification_statement.hh
@@ -107,84 +107,29 @@ private:
        };

 public:
-    modification_statement(statement_type type_, uint32_t bound_terms, schema_ptr schema_, std::unique_ptr<attributes> attrs_)
-        : type{type_}
-        , _bound_terms{bound_terms}
-        , s{schema_}
-        , attrs{std::move(attrs_)}
-        , _column_operations{}
-    { }
+    modification_statement(statement_type type_, uint32_t bound_terms, schema_ptr schema_, std::unique_ptr<attributes> attrs_);

-    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override {
-        if (attrs->uses_function(ks_name, function_name)) {
-            return true;
-        }
-        for (auto&& e : _processed_keys) {
-            auto r = e.second;
-            if (r && r->uses_function(ks_name, function_name)) {
-                return true;
-            }
-        }
-        for (auto&& operation : _column_operations) {
-            if (operation && operation->uses_function(ks_name, function_name)) {
-                return true;
-            }
-        }
-        for (auto&& condition : _column_conditions) {
-            if (condition && condition->uses_function(ks_name, function_name)) {
-                return true;
-            }
-        }
-        for (auto&& condition : _static_conditions) {
-            if (condition && condition->uses_function(ks_name, function_name)) {
-                return true;
-            }
-        }
-        return false;
-    }
+    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override;

    virtual bool require_full_clustering_key() const = 0;

    virtual void add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) = 0;

-    virtual uint32_t get_bound_terms() override {
-        return _bound_terms;
-    }
+    virtual uint32_t get_bound_terms() override;

-    virtual sstring keyspace() const {
-        return s->ks_name();
-    }
+    virtual sstring keyspace() const;

-    virtual sstring column_family() const {
-        return s->cf_name();
-    }
+    virtual sstring column_family() const;

-    virtual bool is_counter() const {
-        return s->is_counter();
-    }
+    virtual bool is_counter() const;

-    int64_t get_timestamp(int64_t now, const query_options& options) const {
-        return attrs->get_timestamp(now, options);
-    }
+    int64_t get_timestamp(int64_t now, const query_options& options) const;

-    bool is_timestamp_set() const {
-        return attrs->is_timestamp_set();
-    }
+    bool is_timestamp_set() const;

-    gc_clock::duration get_time_to_live(const query_options& options) const {
-        return gc_clock::duration(attrs->get_time_to_live(options));
-    }
+    gc_clock::duration get_time_to_live(const query_options& options) const;

-    virtual void check_access(const service::client_state& state) override {
-        warn(unimplemented::cause::PERMISSIONS);
-#if 0
-        state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.MODIFY);
-
-        // CAS updates can be used to simulate a SELECT query, so should require Permission.SELECT as well.
-        if (hasConditions())
-            state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
-#endif
-    }
+    virtual void check_access(const service::client_state& state) override;

    void validate(distributed<service::storage_proxy>&, const service::client_state& state) override;

@@ -192,14 +137,7 @@ public:

    virtual bool depends_on_column_family(const sstring& cf_name) const override;

-    void add_operation(::shared_ptr<operation> op) {
-        if (op->column.is_static()) {
-            _sets_static_columns = true;
-        } else {
-            _sets_regular_columns = true;
-        }
-        _column_operations.push_back(std::move(op));
-    }
+    void add_operation(::shared_ptr<operation> op);

 #if 0
    public Iterable<ColumnDefinition> getColumnsWithConditions()
@@ -212,31 +150,15 @@ public:
    }
 #endif
 public:
-    void add_condition(::shared_ptr<column_condition> cond) {
-        if (cond->column.is_static()) {
-            _sets_static_columns = true;
-            _static_conditions.emplace_back(std::move(cond));
-        } else {
-            _sets_regular_columns = true;
-            _column_conditions.emplace_back(std::move(cond));
-        }
-    }
+    void add_condition(::shared_ptr<column_condition> cond);

-    void set_if_not_exist_condition() {
-        _if_not_exists = true;
-    }
+    void set_if_not_exist_condition();

-    bool has_if_not_exist_condition() const {
-        return _if_not_exists;
-    }
+    bool has_if_not_exist_condition() const;

-    void set_if_exist_condition() {
-        _if_exists = true;
-    }
+    void set_if_exist_condition();

-    bool has_if_exist_condition() const {
-        return _if_exists;
-    }
+    bool has_if_exist_condition() const;

 private:
    void add_key_values(const column_definition& def, ::shared_ptr<restrictions::restriction> values);
@@ -254,11 +176,7 @@ protected:
    const column_definition* get_first_empty_key();

 public:
-    bool requires_read() {
-        return std::any_of(_column_operations.begin(), _column_operations.end(), [] (auto&& op) {
-            return op->requires_read();
-        });
-    }
+    bool requires_read();

 protected:
    future<update_parameters::prefetched_rows_type> read_required_rows(
@@ -269,9 +187,7 @@ protected:
                db::consistency_level cl);

 public:
-    bool has_conditions() {
-        return _if_not_exists || _if_exists || !_column_conditions.empty() || !_static_conditions.empty();
-    }
+    bool has_conditions();

    virtual future<::shared_ptr<transport::messages::result_message>>
    execute(distributed<service::storage_proxy>& proxy, service::query_state& qs, const query_options& options) override;
@@ -428,9 +344,7 @@ protected:
     * processed to check that they are compatible.
     * @throws InvalidRequestException
     */
-    virtual void validate_where_clause_for_conditions() {
-        //  no-op by default
-    }
+    virtual void validate_where_clause_for_conditions();

 public:
    class parsed : public cf_statement {
@@ -443,13 +357,7 @@ public:
        const bool _if_not_exists;
        const bool _if_exists;
    protected:
-        parsed(::shared_ptr<cf_name> name, ::shared_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists)
-            : cf_statement{std::move(name)}
-            , _attrs{std::move(attrs)}
-            , _conditions{std::move(conditions)}
-            , _if_not_exists{if_not_exists}
-            , _if_exists{if_exists}
-        { }
+        parsed(::shared_ptr<cf_name> name, ::shared_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists);

    public:
        virtual ::shared_ptr<parsed_statement::prepared> prepare(database& db) override;
--- a/cql3/statements/parsed_statement.cc
+++ b/cql3/statements/parsed_statement.cc
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2014 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cql3/statements/parsed_statement.hh"
+
+namespace cql3 {
+
+namespace statements {
+
+parsed_statement::~parsed_statement()
+{ }
+
+shared_ptr<variable_specifications> parsed_statement::get_bound_variables() {
+    return _variables;
+}
+
+// Used by the parser and preparable statement
+void parsed_statement::set_bound_variables(const std::vector<::shared_ptr<column_identifier>>& bound_names) {
+    _variables = ::make_shared<variable_specifications>(bound_names);
+}
+
+bool parsed_statement::uses_function(const sstring& ks_name, const sstring& function_name) const {
+    return false;
+}
+
+parsed_statement::prepared::prepared(::shared_ptr<cql_statement> statement_, std::vector<::shared_ptr<column_specification>> bound_names_)
+    : statement(std::move(statement_))
+    , bound_names(std::move(bound_names_))
+{ }
+
+parsed_statement::prepared::prepared(::shared_ptr<cql_statement> statement_, const variable_specifications& names)
+    : prepared(statement_, names.get_specifications())
+{ }
+
+parsed_statement::prepared::prepared(::shared_ptr<cql_statement> statement_, variable_specifications&& names)
+    : prepared(statement_, std::move(names).get_specifications())
+{ }
+
+parsed_statement::prepared::prepared(::shared_ptr<cql_statement>&& statement_)
+    : prepared(statement_, std::vector<::shared_ptr<column_specification>>())
+{ }
+
+}
+
+}
--- a/cql3/statements/parsed_statement.hh
+++ b/cql3/statements/parsed_statement.hh
@@ -60,47 +60,29 @@ private:
    ::shared_ptr<variable_specifications> _variables;

 public:
-    virtual ~parsed_statement()
-    { }
+    virtual ~parsed_statement();

-    shared_ptr<variable_specifications> get_bound_variables() {
-        return _variables;
-    }
+    shared_ptr<variable_specifications> get_bound_variables();

-    // Used by the parser and preparable statement
-    void set_bound_variables(const std::vector<::shared_ptr<column_identifier>>& bound_names)
-    {
-        _variables = ::make_shared<variable_specifications>(bound_names);
-    }
+    void set_bound_variables(const std::vector<::shared_ptr<column_identifier>>& bound_names);

    class prepared {
    public:
        const ::shared_ptr<cql_statement> statement;
        const std::vector<::shared_ptr<column_specification>> bound_names;

-        prepared(::shared_ptr<cql_statement> statement_, std::vector<::shared_ptr<column_specification>> bound_names_)
-            : statement(std::move(statement_))
-            , bound_names(std::move(bound_names_))
-        { }
+        prepared(::shared_ptr<cql_statement> statement_, std::vector<::shared_ptr<column_specification>> bound_names_);

-        prepared(::shared_ptr<cql_statement> statement_, const variable_specifications& names)
-            : prepared(statement_, names.get_specifications())
-        { }
+        prepared(::shared_ptr<cql_statement> statement_, const variable_specifications& names);

-        prepared(::shared_ptr<cql_statement> statement_, variable_specifications&& names)
-            : prepared(statement_, std::move(names).get_specifications())
-        { }
+        prepared(::shared_ptr<cql_statement> statement_, variable_specifications&& names);

-        prepared(::shared_ptr<cql_statement>&& statement_)
-            : prepared(statement_, std::vector<::shared_ptr<column_specification>>())
-        { }
+        prepared(::shared_ptr<cql_statement>&& statement_);
    };

    virtual ::shared_ptr<prepared> prepare(database& db) = 0;

-    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const {
-        return false;
-    }
+    virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const;
 };

 }
--- a/cql3/statements/property_definitions.cc
+++ b/cql3/statements/property_definitions.cc
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Copyright 2015 Cloudius Systems
+ *
+ * Modified by Cloudius Systems
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cql3/statements/property_definitions.hh"
+
+namespace cql3 {
+
+namespace statements {
+
+property_definitions::property_definitions()
+    : _properties{}
+{ }
+
+void property_definitions::add_property(const sstring& name, sstring value) {
+    auto it = _properties.find(name);
+    if (it != _properties.end()) {
+        throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
+    }
+    _properties.emplace(name, value);
+}
+
+void property_definitions::add_property(const sstring& name, const std::map<sstring, sstring>& value) {
+    auto it = _properties.find(name);
+    if (it != _properties.end()) {
+        throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
+    }
+    _properties.emplace(name, value);
+}
+
+void property_definitions::validate(const std::set<sstring>& keywords, const std::set<sstring>& obsolete) {
+    for (auto&& kv : _properties) {
+        auto&& name = kv.first;
+        if (keywords.count(name)) {
+            continue;
+        }
+        if (obsolete.count(name)) {
+#if 0
+            logger.warn("Ignoring obsolete property {}", name);
+#endif
+        } else {
+            throw exceptions::syntax_exception(sprint("Unknown property '%s'", name));
+        }
+    }
+}
+
+std::experimental::optional<sstring> property_definitions::get_simple(const sstring& name) const {
+    auto it = _properties.find(name);
+    if (it == _properties.end()) {
+        return std::experimental::nullopt;
+    }
+    try {
+        return boost::any_cast<sstring>(it->second);
+    } catch (const boost::bad_any_cast& e) {
+        throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a string", name));
+    }
+}
+
+std::experimental::optional<std::map<sstring, sstring>> property_definitions::get_map(const sstring& name) const {
+    auto it = _properties.find(name);
+    if (it == _properties.end()) {
+        return std::experimental::nullopt;
+    }
+    try {
+        return boost::any_cast<std::map<sstring, sstring>>(it->second);
+    } catch (const boost::bad_any_cast& e) {
+        throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a map.", name));
+    }
+}
+
+bool property_definitions::has_property(const sstring& name) const {
+    return _properties.find(name) != _properties.end();
+}
+
+sstring property_definitions::get_string(sstring key, sstring default_value) const {
+    auto value = get_simple(key);
+    if (value) {
+        return value.value();
+    } else {
+        return default_value;
+    }
+}
+
+// Return a property value, typed as a Boolean
+bool property_definitions::get_boolean(sstring key, bool default_value) const {
+    auto value = get_simple(key);
+    if (value) {
+        std::string s{value.value()};
+        std::transform(s.begin(), s.end(), s.begin(), ::tolower);
+        return s == "1" || s == "true" || s == "yes";
+    } else {
+        return default_value;
+    }
+}
+
+// Return a property value, typed as a double
+double property_definitions::get_double(sstring key, double default_value) const {
+    auto value = get_simple(key);
+    return to_double(key, value, default_value);
+}
+
+double property_definitions::to_double(sstring key, std::experimental::optional<sstring> value, double default_value) {
+    if (value) {
+        auto val = value.value();
+        try {
+            return std::stod(val);
+        } catch (const std::exception& e) {
+            throw exceptions::syntax_exception(sprint("Invalid double value %s for '%s'", val, key));
+        }
+    } else {
+        return default_value;
+    }
+}
+
+// Return a property value, typed as an Integer
+int32_t property_definitions::get_int(sstring key, int32_t default_value) const {
+    auto value = get_simple(key);
+    return to_int(key, value, default_value);
+}
+
+int32_t property_definitions::to_int(sstring key, std::experimental::optional<sstring> value, int32_t default_value) {
+    if (value) {
+        auto val = value.value();
+        try {
+            return std::stoi(val);
+        } catch (const std::exception& e) {
+            throw exceptions::syntax_exception(sprint("Invalid integer value %s for '%s'", val, key));
+        }
+    } else {
+        return default_value;
+    }
+}
+
+long property_definitions::to_long(sstring key, std::experimental::optional<sstring> value, long default_value) {
+    if (value) {
+        auto val = value.value();
+        try {
+            return std::stol(val);
+        } catch (const std::exception& e) {
+            throw exceptions::syntax_exception(sprint("Invalid long value %s for '%s'", val, key));
+        }
+    } else {
+        return default_value;
+    }
+}
+
+}
+
+}
--- a/cql3/statements/property_definitions.hh
+++ b/cql3/statements/property_definitions.hh
@@ -66,141 +66,38 @@ protected:
 #endif
    std::unordered_map<sstring, boost::any> _properties;

-    property_definitions()
-        : _properties{}
-    { }
+    property_definitions();
 public:
-    void add_property(const sstring& name, sstring value) {
-        auto it = _properties.find(name);
-        if (it != _properties.end()) {
-            throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
-        }
-        _properties.emplace(name, value);
-    }
+    void add_property(const sstring& name, sstring value);

-    void add_property(const sstring& name, const std::map<sstring, sstring>& value) {
-        auto it = _properties.find(name);
-        if (it != _properties.end()) {
-            throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
-        }
-        _properties.emplace(name, value);
-    }
+    void add_property(const sstring& name, const std::map<sstring, sstring>& value);
+
+    void validate(const std::set<sstring>& keywords, const std::set<sstring>& obsolete);

-    void validate(const std::set<sstring>& keywords, const std::set<sstring>& obsolete) {
-        for (auto&& kv : _properties) {
-            auto&& name = kv.first;
-            if (keywords.count(name)) {
-                continue;
-            }
-            if (obsolete.count(name)) {
-#if 0
-                logger.warn("Ignoring obsolete property {}", name);
-#endif
-            } else {
-                throw exceptions::syntax_exception(sprint("Unknown property '%s'", name));
-            }
-        }
-    }
 protected:
-    std::experimental::optional<sstring> get_simple(const sstring& name) const {
-        auto it = _properties.find(name);
-        if (it == _properties.end()) {
-            return std::experimental::nullopt;
-        }
-        try {
-            return boost::any_cast<sstring>(it->second);
-        } catch (const boost::bad_any_cast& e) {
-            throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a string", name));
-        }
-    }
+    std::experimental::optional<sstring> get_simple(const sstring& name) const;
+
+    std::experimental::optional<std::map<sstring, sstring>> get_map(const sstring& name) const;

-    std::experimental::optional<std::map<sstring, sstring>> get_map(const sstring& name) const {
-        auto it = _properties.find(name);
-        if (it == _properties.end()) {
-            return std::experimental::nullopt;
-        }
-        try {
-            return boost::any_cast<std::map<sstring, sstring>>(it->second);
-        } catch (const boost::bad_any_cast& e) {
-            throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a map.", name));
-        }
-    }
 public:
-    bool has_property(const sstring& name) const {
-        return _properties.find(name) != _properties.end();
-    }
+    bool has_property(const sstring& name) const;

-    sstring get_string(sstring key, sstring default_value) const {
-        auto value = get_simple(key);
-        if (value) {
-            return value.value();
-        } else {
-            return default_value;
-        }
-    }
+    sstring get_string(sstring key, sstring default_value) const;

    // Return a property value, typed as a Boolean
-    bool get_boolean(sstring key, bool default_value) const {
-        auto value = get_simple(key);
-        if (value) {
-            std::string s{value.value()};
-            std::transform(s.begin(), s.end(), s.begin(), ::tolower);
-            return s == "1" || s == "true" || s == "yes";
-        } else {
-            return default_value;
-        }
-    }
+    bool get_boolean(sstring key, bool default_value) const;

    // Return a property value, typed as a double
-    double get_double(sstring key, double default_value) const {
-        auto value = get_simple(key);
-        return to_double(key, value, default_value);
-    }
+    double get_double(sstring key, double default_value) const;

-    static double to_double(sstring key, std::experimental::optional<sstring> value, double default_value) {
-        if (value) {
-            auto val = value.value();
-            try {
-                return std::stod(val);
-            } catch (const std::exception& e) {
-                throw exceptions::syntax_exception(sprint("Invalid double value %s for '%s'", val, key));
-            }
-        } else {
-            return default_value;
-        }
-    }
+    static double to_double(sstring key, std::experimental::optional<sstring> value, double default_value);

    // Return a property value, typed as an Integer
-    int32_t get_int(sstring key, int32_t default_value) const {
-        auto value = get_simple(key);
-        return to_int(key, value, default_value);
-    }
+    int32_t get_int(sstring key, int32_t default_value) const;

-    static int32_t to_int(sstring key, std::experimental::optional<sstring> value, int32_t default_value) {
-        if (value) {
-            auto val = value.value();
-            try {
-                return std::stoi(val);
-            } catch (const std::exception& e) {
-                throw exceptions::syntax_exception(sprint("Invalid integer value %s for '%s'", val, key));
-            }
-        } else {
-            return default_value;
-        }
-    }
+    static int32_t to_int(sstring key, std::experimental::optional<sstring> value, int32_t default_value);

-    static long to_long(sstring key, std::experimental::optional<sstring> value, long default_value) {
-        if (value) {
-            auto val = value.value();
-            try {
-                return std::stol(val);
-            } catch (const std::exception& e) {
-                throw exceptions::syntax_exception(sprint("Invalid long value %s for '%s'", val, key));
-            }
-        } else {
-            return default_value;
-        }
-    }
+    static long to_long(sstring key, std::experimental::optional<sstring> value, long default_value);
 };

 }
--- a/cql3/statements/select_statement.cc
+++ b/cql3/statements/select_statement.cc
@@ -54,6 +54,31 @@ namespace statements {

 thread_local const shared_ptr<select_statement::parameters> select_statement::_default_parameters = ::make_shared<select_statement::parameters>();

+select_statement::parameters::parameters()
+    : _is_distinct{false}
+    , _allow_filtering{false}
+{ }
+
+select_statement::parameters::parameters(orderings_type orderings,
+    bool is_distinct,
+    bool allow_filtering)
+    : _orderings{std::move(orderings)}
+    , _is_distinct{is_distinct}
+    , _allow_filtering{allow_filtering}
+{ }
+
+bool select_statement::parameters::is_distinct() {
+    return _is_distinct;
+}
+
+bool select_statement::parameters::allow_filtering() {
+    return _allow_filtering;
+}
+
+select_statement::parameters::orderings_type const& select_statement::parameters::orderings() {
+    return _orderings;
+}
+
 select_statement::select_statement(schema_ptr schema,
    uint32_t bound_terms,
    ::shared_ptr<parameters> parameters,
@@ -115,6 +140,14 @@ bool select_statement::depends_on_column_family(const sstring& cf_name) const {
    return column_family() == cf_name;
 }

+const sstring& select_statement::keyspace() const {
+    return _schema->ks_name();
+}
+
+const sstring& select_statement::column_family() const {
+    return _schema->cf_name();
+}
+
 query::partition_slice
 select_statement::make_partition_slice(const query_options& options) {
    std::vector<column_id> static_columns;
@@ -185,7 +218,8 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
    int32_t limit = get_limit(options);
    auto now = db_clock::now();

-    auto command = ::make_lw_shared<query::read_command>(_schema->id(), make_partition_slice(options), limit, to_gc_clock(now));
+    auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
+        make_partition_slice(options), limit, to_gc_clock(now));

    int32_t page_size = options.get_page_size();

@@ -275,7 +309,8 @@ future<::shared_ptr<transport::messages::result_message>>
 select_statement::execute_internal(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
    int32_t limit = get_limit(options);
    auto now = db_clock::now();
-    auto command = ::make_lw_shared<query::read_command>(_schema->id(), make_partition_slice(options), limit);
+    auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
+        make_partition_slice(options), limit);
    auto partition_ranges = _restrictions->get_partition_key_ranges(options);

    if (needs_post_query_ordering() && _limit) {
@@ -318,6 +353,18 @@ shared_ptr<transport::messages::result_message> select_statement::process_result
    return ::make_shared<transport::messages::result_message::rows>(std::move(rs));
 }

+select_statement::raw_statement::raw_statement(::shared_ptr<cf_name> cf_name,
+                                               ::shared_ptr<parameters> parameters,
+                                               std::vector<::shared_ptr<selection::raw_selector>> select_clause,
+                                               std::vector<::shared_ptr<relation>> where_clause,
+                                               ::shared_ptr<term::raw> limit)
+    : cf_statement(std::move(cf_name))
+    , _parameters(std::move(parameters))
+    , _select_clause(std::move(select_clause))
+    , _where_clause(std::move(where_clause))
+    , _limit(std::move(limit))
+{ }
+
 ::shared_ptr<parsed_statement::prepared>
 select_statement::raw_statement::prepare(database& db) {
    schema_ptr schema = validation::validate_column_family(db, keyspace(), column_family());
--- a/cql3/statements/select_statement.hh
+++ b/cql3/statements/select_statement.hh
@@ -72,20 +72,13 @@ public:
        const bool _is_distinct;
        const bool _allow_filtering;
    public:
-        parameters()
-            : _is_distinct{false}
-            , _allow_filtering{false}
-        { }
+        parameters();
        parameters(orderings_type orderings,
            bool is_distinct,
-            bool allow_filtering)
-            : _orderings{std::move(orderings)}
-            , _is_distinct{is_distinct}
-            , _allow_filtering{allow_filtering}
-        { }
-        bool is_distinct() { return _is_distinct; }
-        bool allow_filtering() { return _allow_filtering; }
-        orderings_type const& orderings() { return _orderings; }
+            bool allow_filtering);
+        bool is_distinct();
+        bool allow_filtering();
+        orderings_type const& orderings();
    };
 private:
    static constexpr int DEFAULT_COUNT_PAGE_SIZE = 10000;
@@ -195,13 +188,9 @@ public:
    }
 #endif

-    const sstring& keyspace() const {
-        return _schema->ks_name();
-    }
+    const sstring& keyspace() const;

-    const sstring& column_family() const {
-        return _schema->cf_name();
-    }
+    const sstring& column_family() const;

    query::partition_slice make_partition_slice(const query_options& options);

@@ -457,13 +446,7 @@ public:
            ::shared_ptr<parameters> parameters,
            std::vector<::shared_ptr<selection::raw_selector>> select_clause,
            std::vector<::shared_ptr<relation>> where_clause,
-            ::shared_ptr<term::raw> limit)
-        : cf_statement(std::move(cf_name))
-        , _parameters(std::move(parameters))
-        , _select_clause(std::move(select_clause))
-        , _where_clause(std::move(where_clause))
-        , _limit(std::move(limit))
-    { }
+            ::shared_ptr<term::raw> limit);

    virtual ::shared_ptr<prepared> prepare(database& db) override;
 private:
--- a/cql3/statements/update_statement.cc
+++ b/cql3/statements/update_statement.cc
@@ -48,6 +48,14 @@ namespace cql3 {

 namespace statements {

+update_statement::update_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
+    : modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
+{ }
+
+bool update_statement::require_full_clustering_key() const {
+    return true;
+}
+
 void update_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
    if (s->is_dense()) {
        if (!prefix || (prefix.size() == 1 && prefix.components().front().empty())) {
@@ -100,6 +108,16 @@ void update_statement::add_update_for_key(mutation& m, const exploded_clustering
 #endif
 }

+update_statement::parsed_insert::parsed_insert(::shared_ptr<cf_name> name,
+                                               ::shared_ptr<attributes::raw> attrs,
+                                               std::vector<::shared_ptr<column_identifier::raw>> column_names,
+                                               std::vector<::shared_ptr<term::raw>> column_values,
+                                               bool if_not_exists)
+    : modification_statement::parsed{std::move(name), std::move(attrs), conditions_vector{}, if_not_exists, false}
+    , _column_names{std::move(column_names)}
+    , _column_values{std::move(column_values)}
+{ }
+
 ::shared_ptr<modification_statement>
 update_statement::parsed_insert::prepare_internal(database& db, schema_ptr schema,
    ::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs)
@@ -148,6 +166,16 @@ update_statement::parsed_insert::prepare_internal(database& db, schema_ptr schem
    return stmt;
 }

+update_statement::parsed_update::parsed_update(::shared_ptr<cf_name> name,
+                                               ::shared_ptr<attributes::raw> attrs,
+                                               std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> updates,
+                                               std::vector<relation_ptr> where_clause,
+                                               conditions_vector conditions)
+    : modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, false)
+    , _updates(std::move(updates))
+    , _where_clause(std::move(where_clause))
+{ }
+
 ::shared_ptr<modification_statement>
 update_statement::parsed_update::prepare_internal(database& db, schema_ptr schema,
    ::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs)
--- a/cql3/statements/update_statement.hh
+++ b/cql3/statements/update_statement.hh
@@ -64,14 +64,9 @@ public:
    private static final Constants.Value EMPTY = new Constants.Value(ByteBufferUtil.EMPTY_BYTE_BUFFER);
 #endif

-    update_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
-        : modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
-    { }
-
+    update_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs);
 private:
-    virtual bool require_full_clustering_key() const override {
-        return true;
-    }
+    virtual bool require_full_clustering_key() const override;

    virtual void add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override;
 public:
@@ -92,11 +87,7 @@ public:
                      ::shared_ptr<attributes::raw> attrs,
                      std::vector<::shared_ptr<column_identifier::raw>> column_names,
                      std::vector<::shared_ptr<term::raw>> column_values,
-                      bool if_not_exists)
-            : modification_statement::parsed{std::move(name), std::move(attrs), conditions_vector{}, if_not_exists, false}
-            , _column_names{std::move(column_names)}
-            , _column_values{std::move(column_values)}
-        { }
+                      bool if_not_exists);

        virtual ::shared_ptr<modification_statement> prepare_internal(database& db, schema_ptr schema,
                    ::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs) override;
@@ -122,11 +113,7 @@ public:
            ::shared_ptr<attributes::raw> attrs,
            std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> updates,
            std::vector<relation_ptr> where_clause,
-            conditions_vector conditions)
-                : modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, false)
-                , _updates(std::move(updates))
-                , _where_clause(std::move(where_clause))
-        { }
+            conditions_vector conditions);
    protected:
        virtual ::shared_ptr<modification_statement> prepare_internal(database& db, schema_ptr schema,
                    ::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs);
--- a/cql3/tuples.hh
+++ b/cql3/tuples.hh
@@ -224,14 +224,6 @@ public:
            // We don't "need" that override but it saves us the allocation of a Value object if used
            return options.make_temporary(_type->build_value(bind_internal(options)));
        }
-
-#if 0
-        @Override
-        public String toString()
-        {
-            return tupleToString(elements);
-        }
-#endif
    };

    /**
--- a/cql3/variable_specifications.hh
+++ b/cql3/variable_specifications.hh
@@ -88,14 +88,6 @@ public:
        }
        _specs[bind_index] = spec;
    }
-
-#if 0
-    @Override
-    public String toString()
-    {
-        return Arrays.toString(specs);
-    }
-#endif
 };

 }
--- a/database.cc
+++ b/database.cc
@@ -57,6 +57,7 @@
 #include <seastar/core/enum.hh>
 #include "utils/latency.hh"
 #include "utils/flush_queue.hh"
+#include "schema_registry.hh"

 using namespace std::chrono_literals;

@@ -126,8 +127,8 @@ column_family::make_partition_presence_checker(lw_shared_ptr<sstable_list> old_s

 mutation_source
 column_family::sstables_as_mutation_source() {
-    return [this] (const query::partition_range& r) {
-        return make_sstable_reader(r);
+    return [this] (schema_ptr s, const query::partition_range& r) {
+        return make_sstable_reader(std::move(s), r);
    };
 }

@@ -206,16 +207,16 @@ public:
 };

 mutation_reader
-column_family::make_sstable_reader(const query::partition_range& pr) const {
+column_family::make_sstable_reader(schema_ptr s, const query::partition_range& pr) const {
    if (pr.is_singular() && pr.start()->value().has_key()) {
        const dht::ring_position& pos = pr.start()->value();
        if (dht::shard_of(pos.token()) != engine().cpu_id()) {
            return make_empty_reader(); // range doesn't belong to this shard
        }
-        return make_mutation_reader<single_key_sstable_reader>(_schema, _sstables, *pos.key());
+        return make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key());
    } else {
        // range_sstable_reader is not movable so we need to wrap it
-        return make_mutation_reader<range_sstable_reader>(_schema, _sstables, pr);
+        return make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr);
    }
 }

@@ -239,9 +240,9 @@ key_source column_family::sstables_as_key_source() const {

 // Exposed for testing, not performance critical.
 future<column_family::const_mutation_partition_ptr>
-column_family::find_partition(const dht::decorated_key& key) const {
-    return do_with(query::partition_range::make_singular(key), [this] (auto& range) {
-        return do_with(this->make_reader(range), [] (mutation_reader& reader) {
+column_family::find_partition(schema_ptr s, const dht::decorated_key& key) const {
+    return do_with(query::partition_range::make_singular(key), [s = std::move(s), this] (auto& range) {
+        return do_with(this->make_reader(s, range), [] (mutation_reader& reader) {
            return reader().then([] (mutation_opt&& mo) -> std::unique_ptr<const mutation_partition> {
                if (!mo) {
                    return {};
@@ -253,13 +254,13 @@ column_family::find_partition(const dht::decorated_key& key) const {
 }

 future<column_family::const_mutation_partition_ptr>
-column_family::find_partition_slow(const partition_key& key) const {
-    return find_partition(dht::global_partitioner().decorate_key(*_schema, key));
+column_family::find_partition_slow(schema_ptr s, const partition_key& key) const {
+    return find_partition(s, dht::global_partitioner().decorate_key(*s, key));
 }

 future<column_family::const_row_ptr>
-column_family::find_row(const dht::decorated_key& partition_key, clustering_key clustering_key) const {
-    return find_partition(partition_key).then([clustering_key = std::move(clustering_key)] (const_mutation_partition_ptr p) {
+column_family::find_row(schema_ptr s, const dht::decorated_key& partition_key, clustering_key clustering_key) const {
+    return find_partition(std::move(s), partition_key).then([clustering_key = std::move(clustering_key)] (const_mutation_partition_ptr p) {
        if (!p) {
            return make_ready_future<const_row_ptr>();
        }
@@ -274,8 +275,8 @@ column_family::find_row(const dht::decorated_key& partition_key, clustering_key
 }

 mutation_reader
-column_family::make_reader(const query::partition_range& range) const {
-    if (query::is_wrap_around(range, *_schema)) {
+column_family::make_reader(schema_ptr s, const query::partition_range& range) const {
+    if (query::is_wrap_around(range, *s)) {
        // make_combined_reader() can't handle streams that wrap around yet.
        fail(unimplemented::cause::WRAP_AROUND);
    }
@@ -304,13 +305,13 @@ column_family::make_reader(const query::partition_range& range) const {
    // https://github.com/scylladb/scylla/issues/185

    for (auto&& mt : *_memtables) {
-        readers.emplace_back(mt->make_reader(range));
+        readers.emplace_back(mt->make_reader(s, range));
    }

    if (_config.enable_cache) {
-        readers.emplace_back(_cache.make_reader(range));
+        readers.emplace_back(_cache.make_reader(s, range));
    } else {
-        readers.emplace_back(make_sstable_reader(range));
+        readers.emplace_back(make_sstable_reader(s, range));
    }

    return make_combined_reader(std::move(readers));
@@ -318,7 +319,7 @@ column_family::make_reader(const query::partition_range& range) const {

 template <typename Func>
 future<bool>
-column_family::for_all_partitions(Func&& func) const {
+column_family::for_all_partitions(schema_ptr s, Func&& func) const {
    static_assert(std::is_same<bool, std::result_of_t<Func(const dht::decorated_key&, const mutation_partition&)>>::value,
                  "bad Func signature");

@@ -329,13 +330,13 @@ column_family::for_all_partitions(Func&& func) const {
        bool empty = false;
    public:
        bool done() const { return !ok || empty; }
-        iteration_state(const column_family& cf, Func&& func)
-            : reader(cf.make_reader())
+        iteration_state(schema_ptr s, const column_family& cf, Func&& func)
+            : reader(cf.make_reader(std::move(s)))
            , func(std::move(func))
        { }
    };

-    return do_with(iteration_state(*this, std::move(func)), [] (iteration_state& is) {
+    return do_with(iteration_state(std::move(s), *this, std::move(func)), [] (iteration_state& is) {
        return do_until([&is] { return is.done(); }, [&is] {
            return is.reader().then([&is](mutation_opt&& mo) {
                if (!mo) {
@@ -351,30 +352,39 @@ column_family::for_all_partitions(Func&& func) const {
 }

 future<bool>
-column_family::for_all_partitions_slow(std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const {
-    return for_all_partitions(std::move(func));
+column_family::for_all_partitions_slow(schema_ptr s, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const {
+    return for_all_partitions(std::move(s), std::move(func));
 }

 class lister {
 public:
    using dir_entry_types = std::unordered_set<directory_entry_type, enum_hash<directory_entry_type>>;
+    using walker_type = std::function<future<> (directory_entry)>;
+    using filter_type = std::function<bool (const sstring&)>;
 private:
    file _f;
-    std::function<future<> (directory_entry de)> _walker;
+    walker_type _walker;
+    filter_type _filter;
    dir_entry_types _expected_type;
    subscription<directory_entry> _listing;
    sstring _dirname;

 public:
-    lister(file f, dir_entry_types type, std::function<future<> (directory_entry)> walker, sstring dirname)
+    lister(file f, dir_entry_types type, walker_type walker, sstring dirname)
            : _f(std::move(f))
            , _walker(std::move(walker))
+            , _filter([] (const sstring& fname) { return true; })
            , _expected_type(type)
            , _listing(_f.list_directory([this] (directory_entry de) { return _visit(de); }))
            , _dirname(dirname) {
    }

-    static future<> scan_dir(sstring name, dir_entry_types type, std::function<future<> (directory_entry)> walker);
+    lister(file f, dir_entry_types type, walker_type walker, filter_type filter, sstring dirname)
+            : lister(std::move(f), type, std::move(walker), dirname) {
+        _filter = std::move(filter);
+    }
+
+    static future<> scan_dir(sstring name, dir_entry_types type, walker_type walker, filter_type filter = [] (const sstring& fname) { return true; });
 protected:
    future<> _visit(directory_entry de) {

@@ -383,6 +393,12 @@ protected:
            if ((!_expected_type.count(*(de.type))) || (de.name[0] == '.')) {
                return make_ready_future<>();
            }
+
+            // apply a filter
+            if (!_filter(_dirname + "/" + de.name)) {
+                return make_ready_future<>();
+            }
+
            return _walker(de);
        });

@@ -403,9 +419,9 @@ private:
 };


-future<> lister::scan_dir(sstring name, lister::dir_entry_types type, std::function<future<> (directory_entry)> walker) {
-    return engine().open_directory(name).then([type, walker = std::move(walker), name] (file f) {
-        auto l = make_lw_shared<lister>(std::move(f), type, walker, name);
+future<> lister::scan_dir(sstring name, lister::dir_entry_types type, walker_type walker, filter_type filter) {
+    return engine().open_directory(name).then([type, walker = std::move(walker), filter = std::move(filter), name] (file f) {
+        auto l = make_lw_shared<lister>(std::move(f), type, walker, filter, name);
        return l->done().then([l] { });
    });
 }
@@ -416,6 +432,23 @@ static std::vector<sstring> parse_fname(sstring filename) {
    return comps;
 }

+static bool belongs_to_current_shard(const schema& s, const partition_key& first, const partition_key& last) {
+    auto key_shard = [&s] (const partition_key& pk) {
+        auto token = dht::global_partitioner().get_token(s, pk);
+        return dht::shard_of(token);
+    };
+    auto s1 = key_shard(first);
+    auto s2 = key_shard(last);
+    auto me = engine().cpu_id();
+    return (s1 <= me) && (me <= s2);
+}
+
+static bool belongs_to_current_shard(const schema& s, range<partition_key> r) {
+    assert(r.start());
+    assert(r.end());
+    return belongs_to_current_shard(s, r.start()->value(), r.end()->value());
+}
+
 future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sstring fname) {

    using namespace sstables;
@@ -432,19 +465,32 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
    update_sstables_known_generation(comps.generation);
    assert(_sstables->count(comps.generation) == 0);

-    auto sst = std::make_unique<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
-    auto fut = sst->load();
-    return std::move(fut).then([this, sst = std::move(sst)] () mutable {
-        add_sstable(std::move(*sst));
-        return make_ready_future<>();
-    }).then_wrapped([fname, comps = std::move(comps)] (future<> f) {
+    auto fut = sstable::get_sstable_key_range(*_schema, _schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
+    return std::move(fut).then([this, sstdir = std::move(sstdir), comps] (range<partition_key> r) {
+        // Checks whether or not sstable belongs to current shard.
+        if (!belongs_to_current_shard(*_schema, std::move(r))) {
+            dblog.debug("sstable {} not relevant for this shard, ignoring",
+                    sstables::sstable::filename(sstdir, _schema->ks_name(), _schema->cf_name(), comps.version, comps.generation, comps.format,
+                            sstables::sstable::component_type::Data));
+            sstable::mark_sstable_for_deletion(_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
+            return make_ready_future<>();
+        }
+
+        auto sst = std::make_unique<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
+        auto fut = sst->load();
+        return std::move(fut).then([this, sst = std::move(sst)] () mutable {
+            add_sstable(std::move(*sst));
+            return make_ready_future<>();
+        });
+    }).then_wrapped([fname, comps] (future<> f) {
        try {
            f.get();
        } catch (malformed_sstable_exception& e) {
            dblog.error("malformed sstable {}: {}. Refusing to boot", fname, e.what());
            throw;
        } catch(...) {
-            dblog.error("Unrecognized error while processing {}: Refusing to boot", fname);
+            dblog.error("Unrecognized error while processing {}: {}. Refusing to boot",
+                    fname, std::current_exception());
            throw;
        }
        return make_ready_future<entry_descriptor>(std::move(comps));
@@ -462,19 +508,6 @@ void column_family::add_sstable(sstables::sstable&& sstable) {
 }

 void column_family::add_sstable(lw_shared_ptr<sstables::sstable> sstable) {
-    auto key_shard = [this] (const partition_key& pk) {
-        auto token = dht::global_partitioner().get_token(*_schema, pk);
-        return dht::shard_of(token);
-    };
-    auto s1 = key_shard(sstable->get_first_partition_key(*_schema));
-    auto s2 = key_shard(sstable->get_last_partition_key(*_schema));
-    auto me = engine().cpu_id();
-    auto included = (s1 <= me) && (me <= s2);
-    if (!included) {
-        dblog.info("sstable {} not relevant for this shard, ignoring", sstable->get_filename());
-        sstable->mark_for_deletion();
-        return;
-    }
    auto generation = sstable->generation();
    // allow in-progress reads to continue using old list
    _sstables = make_lw_shared<sstable_list>(*_sstables);
@@ -658,7 +691,7 @@ column_family::reshuffle_sstables(int64_t start) {
            // Those SSTables are not known by anyone in the system. So we don't have any kind of
            // object describing them. There isn't too much of a choice.
            return work.sstables[comps.generation]->read_toc();
-        }).then([&work] {
+        }, &manifest_json_filter).then([&work] {
            // Note: cannot be parallel because we will be shuffling things around at this stage. Can't race.
            return do_for_each(work.sstables, [&work] (auto& pair) {
                auto&& comps = std::move(work.descriptors.at(pair.first));
@@ -718,21 +751,22 @@ column_family::compact_sstables(sstables::compaction_descriptor descriptor) {
            std::unordered_set<sstables::shared_sstable> s(
                    sstables_to_compact->begin(), sstables_to_compact->end());
            for (const auto& oldtab : *current_sstables) {
+                // Checks if oldtab is a sstable not being compacted.
                if (!s.count(oldtab.second)) {
                    update_stats_for_new_sstable(oldtab.second->data_size());
                    _sstables->emplace(oldtab.first, oldtab.second);
                }
+            }

-                for (const auto& newtab : *new_tables) {
-                    // FIXME: rename the new sstable(s). Verify a rename doesn't cause
-                    // problems for the sstable object.
-                    update_stats_for_new_sstable(newtab.second->data_size());
-                    _sstables->emplace(newtab.first, newtab.second);
-                }
+            for (const auto& newtab : *new_tables) {
+                // FIXME: rename the new sstable(s). Verify a rename doesn't cause
+                // problems for the sstable object.
+                update_stats_for_new_sstable(newtab.second->data_size());
+                _sstables->emplace(newtab.first, newtab.second);
+            }

-                for (const auto& oldtab : *sstables_to_compact) {
-                    oldtab->mark_for_deletion();
-                }
+            for (const auto& oldtab : *sstables_to_compact) {
+                oldtab->mark_for_deletion();
            }
        });
    });
@@ -745,7 +779,13 @@ column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tab
        return sst->load().then([this, sst] {
            return sst->mutate_sstable_level(0);
        }).then([this, sst] {
-            this->add_sstable(sst);
+            auto first = sst->get_first_partition_key(*_schema);
+            auto last = sst->get_last_partition_key(*_schema);
+            if (belongs_to_current_shard(*_schema, first, last)) {
+                this->add_sstable(sst);
+            } else {
+                sst->mark_for_deletion();
+            }
            return make_ready_future<>();
        });
    });
@@ -817,6 +857,17 @@ lw_shared_ptr<sstable_list> column_family::get_sstables() {
    return _sstables;
 }

+inline bool column_family::manifest_json_filter(const sstring& fname) {
+    using namespace boost::filesystem;
+
+    path entry_path(fname);
+    if (!is_directory(status(entry_path)) && entry_path.filename() == path("manifest.json")) {
+        return false;
+    }
+
+    return true;
+}
+
 future<> column_family::populate(sstring sstdir) {
    // We can catch most errors when we try to load an sstable. But if the TOC
    // file is the one missing, we won't try to load the sstable at all. This
@@ -837,58 +888,77 @@ future<> column_family::populate(sstring sstdir) {
    auto verifier = make_lw_shared<std::unordered_map<unsigned long, status>>();
    auto descriptor = make_lw_shared<sstable_descriptor>();

-    return lister::scan_dir(sstdir, { directory_entry_type::regular }, [this, sstdir, verifier, descriptor] (directory_entry de) {
-        // FIXME: The secondary indexes are in this level, but with a directory type, (starting with ".")
-        return probe_file(sstdir, de.name).then([verifier, descriptor] (auto entry) {
-            if (verifier->count(entry.generation)) {
-                if (verifier->at(entry.generation) == status::has_toc_file) {
-                    if (entry.component == sstables::sstable::component_type::TOC) {
-                        throw sstables::malformed_sstable_exception("Invalid State encountered. TOC file already processed");
+    return do_with(std::vector<future<>>(), [this, sstdir, verifier, descriptor] (std::vector<future<>>& futures) {
+        return lister::scan_dir(sstdir, { directory_entry_type::regular }, [this, sstdir, verifier, descriptor, &futures] (directory_entry de) {
+            // FIXME: The secondary indexes are in this level, but with a directory type, (starting with ".")
+            auto f = probe_file(sstdir, de.name).then([verifier, descriptor] (auto entry) {
+                if (verifier->count(entry.generation)) {
+                    if (verifier->at(entry.generation) == status::has_toc_file) {
+                        if (entry.component == sstables::sstable::component_type::TOC) {
+                            throw sstables::malformed_sstable_exception("Invalid State encountered. TOC file already processed");
+                        } else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
+                            throw sstables::malformed_sstable_exception("Invalid State encountered. Temporary TOC file found after TOC file was processed");
+                        }
+                    } else if (entry.component == sstables::sstable::component_type::TOC) {
+                        verifier->at(entry.generation) = status::has_toc_file;
                    } else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
-                        throw sstables::malformed_sstable_exception("Invalid State encountered. Temporary TOC file found after TOC file was processed");
+                        verifier->at(entry.generation) = status::has_temporary_toc_file;
                    }
-                } else if (entry.component == sstables::sstable::component_type::TOC) {
-                    verifier->at(entry.generation) = status::has_toc_file;
-                } else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
-                    verifier->at(entry.generation) = status::has_temporary_toc_file;
-                }
-            } else {
-                if (entry.component == sstables::sstable::component_type::TOC) {
-                    verifier->emplace(entry.generation, status::has_toc_file);
-                } else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
-                    verifier->emplace(entry.generation, status::has_temporary_toc_file);
                } else {
-                    verifier->emplace(entry.generation, status::has_some_file);
+                    if (entry.component == sstables::sstable::component_type::TOC) {
+                        verifier->emplace(entry.generation, status::has_toc_file);
+                    } else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
+                        verifier->emplace(entry.generation, status::has_temporary_toc_file);
+                    } else {
+                        verifier->emplace(entry.generation, status::has_some_file);
+                    }
                }
-            }

-            // Retrieve both version and format used for this column family.
-            if (!descriptor->version) {
-                descriptor->version = entry.version;
-            }
-            if (!descriptor->format) {
-                descriptor->format = entry.format;
-            }
-        });
-    }).then([verifier, sstdir, descriptor, this] {
-        return parallel_for_each(*verifier, [sstdir = std::move(sstdir), descriptor, this] (auto v) {
-            if (v.second == status::has_temporary_toc_file) {
-                unsigned long gen = v.first;
-                assert(descriptor->version);
-                sstables::sstable::version_types version = descriptor->version.value();
-                assert(descriptor->format);
-                sstables::sstable::format_types format = descriptor->format.value();
-
-                if (engine().cpu_id() != 0) {
-                    dblog.info("At directory: {}, partial SSTable with generation {} not relevant for this shard, ignoring", sstdir, v.first);
-                    return make_ready_future<>();
+                // Retrieve both version and format used for this column family.
+                if (!descriptor->version) {
+                    descriptor->version = entry.version;
                }
-                // shard 0 is the responsible for removing a partial sstable.
-                return sstables::sstable::remove_sstable_with_temp_toc(_schema->ks_name(), _schema->cf_name(), sstdir, gen, version, format);
-            } else if (v.second != status::has_toc_file) {
-                throw sstables::malformed_sstable_exception(sprint("At directory: %s: no TOC found for SSTable with generation %d!. Refusing to boot", sstdir, v.first));
-            }
+                if (!descriptor->format) {
+                    descriptor->format = entry.format;
+                }
+            });
+
+            // push future returned by probe_file into an array of futures,
+            // so that the supplied callback will not block scan_dir() from
+            // reading the next entry in the directory.
+            futures.push_back(std::move(f));
+
            return make_ready_future<>();
+        }, &manifest_json_filter).then([&futures] {
+            return when_all(futures.begin(), futures.end()).then([] (std::vector<future<>> ret) {
+                try {
+                    for (auto& f : ret) {
+                        f.get();
+                    }
+                } catch(...) {
+                    throw;
+                }
+            });
+        }).then([verifier, sstdir, descriptor, this] {
+            return parallel_for_each(*verifier, [sstdir = std::move(sstdir), descriptor, this] (auto v) {
+                if (v.second == status::has_temporary_toc_file) {
+                    unsigned long gen = v.first;
+                    assert(descriptor->version);
+                    sstables::sstable::version_types version = descriptor->version.value();
+                    assert(descriptor->format);
+                    sstables::sstable::format_types format = descriptor->format.value();
+
+                    if (engine().cpu_id() != 0) {
+                        dblog.debug("At directory: {}, partial SSTable with generation {} not relevant for this shard, ignoring", sstdir, v.first);
+                        return make_ready_future<>();
+                    }
+                    // shard 0 is the responsible for removing a partial sstable.
+                    return sstables::sstable::remove_sstable_with_temp_toc(_schema->ks_name(), _schema->cf_name(), sstdir, gen, version, format);
+                } else if (v.second != status::has_toc_file) {
+                    throw sstables::malformed_sstable_exception(sprint("At directory: %s: no TOC found for SSTable with generation %d!. Refusing to boot", sstdir, v.first));
+                }
+                return make_ready_future<>();
+            });
        });
    });
 }
@@ -906,8 +976,6 @@ database::database(const db::config& cfg)
    if (!_memtable_total_space) {
        _memtable_total_space = memory::stats().total_memory() / 2;
    }
-    bool durable = cfg.data_file_directories().size() > 0;
-    db::system_keyspace::make(*this, durable, _cfg->volatile_system_keyspace_for_testing());
    // Start compaction manager with two tasks for handling compaction jobs.
    _compaction_manager.start(2);
    setup_collectd();
@@ -996,7 +1064,7 @@ template <typename Func>
 static future<>
 do_parse_system_tables(distributed<service::storage_proxy>& proxy, const sstring& _cf_name, Func&& func) {
    using namespace db::schema_tables;
-    static_assert(std::is_same<future<>, std::result_of_t<Func(schema_result::value_type&)>>::value,
+    static_assert(std::is_same<future<>, std::result_of_t<Func(schema_result_value_type&)>>::value,
                  "bad Func signature");


@@ -1031,11 +1099,11 @@ do_parse_system_tables(distributed<service::storage_proxy>& proxy, const sstring

 future<> database::parse_system_tables(distributed<service::storage_proxy>& proxy) {
    using namespace db::schema_tables;
-    return do_parse_system_tables(proxy, db::schema_tables::KEYSPACES, [this] (schema_result::value_type &v) {
+    return do_parse_system_tables(proxy, db::schema_tables::KEYSPACES, [this] (schema_result_value_type &v) {
        auto ksm = create_keyspace_from_schema_partition(v);
        return create_keyspace(ksm);
    }).then([&proxy, this] {
-        return do_parse_system_tables(proxy, db::schema_tables::COLUMNFAMILIES, [this, &proxy] (schema_result::value_type &v) {
+        return do_parse_system_tables(proxy, db::schema_tables::COLUMNFAMILIES, [this, &proxy] (schema_result_value_type &v) {
            return create_tables_from_tables_partition(proxy, v.second).then([this] (std::map<sstring, schema_ptr> tables) {
                for (auto& t: tables) {
                    auto s = t.second;
@@ -1050,6 +1118,9 @@ future<> database::parse_system_tables(distributed<service::storage_proxy>& prox

 future<>
 database::init_system_keyspace() {
+    bool durable = _cfg->data_file_directories().size() > 0;
+    db::system_keyspace::make(*this, durable, _cfg->volatile_system_keyspace_for_testing());
+
    // FIXME support multiple directories
    return touch_directory(_cfg->data_file_directories()[0] + "/" + db::system_keyspace::NAME).then([this] {
        return populate_keyspace(_cfg->data_file_directories()[0], db::system_keyspace::NAME).then([this]() {
@@ -1107,7 +1178,7 @@ void database::add_keyspace(sstring name, keyspace k) {
 }

 void database::update_keyspace(const sstring& name) {
-    throw std::runtime_error("not implemented");
+    throw std::runtime_error("update keyspace not implemented");
 }

 void database::drop_keyspace(const sstring& name) {
@@ -1115,6 +1186,8 @@ void database::drop_keyspace(const sstring& name) {
 }

 void database::add_column_family(schema_ptr schema, column_family::config cfg) {
+    schema = local_schema_registry().learn(schema);
+    schema->registry_entry()->mark_synced();
    auto uuid = schema->id();
    lw_shared_ptr<column_family> cf;
    if (cfg.enable_commitlog && _commitlog) {
@@ -1140,17 +1213,6 @@ void database::add_column_family(schema_ptr schema, column_family::config cfg) {
    _ks_cf_to_uuid.emplace(std::move(kscf), uuid);
 }

-future<> database::update_column_family(const sstring& ks_name, const sstring& cf_name) {
-    auto& proxy = service::get_storage_proxy();
-    auto old_cfm = find_schema(ks_name, cf_name);
-    return db::schema_tables::create_table_from_name(proxy, ks_name, cf_name).then([old_cfm] (auto&& new_cfm) {
-        if (old_cfm->id() != new_cfm->id()) {
-            return make_exception_future<>(exceptions::configuration_exception(sprint("Column family ID mismatch (found %s; expected %s)", new_cfm->id(), old_cfm->id())));
-        }
-        return make_exception_future<>(std::runtime_error("update column family not implemented"));
-    });
-}
-
 future<> database::drop_column_family(db_clock::time_point dropped_at, const sstring& ks_name, const sstring& cf_name) {
    auto uuid = find_uuid(ks_name, cf_name);
    auto& ks = find_keyspace(ks_name);
@@ -1414,13 +1476,17 @@ compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
 }

 struct query_state {
-    explicit query_state(const query::read_command& cmd, const std::vector<query::partition_range>& ranges)
-            : cmd(cmd)
+    explicit query_state(schema_ptr s,
+                         const query::read_command& cmd,
+                         const std::vector<query::partition_range>& ranges)
+            : schema(std::move(s))
+            , cmd(cmd)
            , builder(cmd.slice)
            , limit(cmd.row_limit)
            , current_partition_range(ranges.begin())
            , range_end(ranges.end()){
    }
+    schema_ptr schema;
    const query::read_command& cmd;
    query::result::builder builder;
    uint32_t limit;
@@ -1434,21 +1500,21 @@ struct query_state {
 };

 future<lw_shared_ptr<query::result>>
-column_family::query(const query::read_command& cmd, const std::vector<query::partition_range>& partition_ranges) {
+column_family::query(schema_ptr s, const query::read_command& cmd, const std::vector<query::partition_range>& partition_ranges) {
    utils::latency_counter lc;
    _stats.reads.set_latency(lc);
-    return do_with(query_state(cmd, partition_ranges), [this] (query_state& qs) {
+    return do_with(query_state(std::move(s), cmd, partition_ranges), [this] (query_state& qs) {
        return do_until(std::bind(&query_state::done, &qs), [this, &qs] {
            auto&& range = *qs.current_partition_range++;
-            qs.reader = make_reader(range);
+            qs.reader = make_reader(qs.schema, range);
            qs.range_empty = false;
-            return do_until([&qs] { return !qs.limit || qs.range_empty; }, [this, &qs] {
-                return qs.reader().then([this, &qs](mutation_opt mo) {
+            return do_until([&qs] { return !qs.limit || qs.range_empty; }, [&qs] {
+                return qs.reader().then([&qs](mutation_opt mo) {
                    if (mo) {
                        auto p_builder = qs.builder.add_partition(*mo->schema(), mo->key());
                        auto is_distinct = qs.cmd.slice.options.contains(query::partition_slice::option::distinct);
                        auto limit = !is_distinct ? qs.limit : 1;
-                        mo->partition().query(p_builder, *_schema, qs.cmd.timestamp, limit);
+                        mo->partition().query(p_builder, *qs.schema, qs.cmd.timestamp, limit);
                        qs.limit -= p_builder.row_count();
                    } else {
                        qs.range_empty = true;
@@ -1462,42 +1528,28 @@ column_family::query(const query::read_command& cmd, const std::vector<query::pa
    }).finally([lc, this]() mutable {
        _stats.reads.mark(lc);
        if (lc.is_start()) {
-            _stats.estimated_read.add(lc.latency_in_nano(), _stats.reads.count);
+            _stats.estimated_read.add(lc.latency(), _stats.reads.count);
        }
    });
 }

 mutation_source
 column_family::as_mutation_source() const {
-    return [this] (const query::partition_range& range) {
-        return this->make_reader(range);
+    return [this] (schema_ptr s, const query::partition_range& range) {
+        return this->make_reader(std::move(s), range);
    };
 }

 future<lw_shared_ptr<query::result>>
-database::query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges) {
-    static auto make_empty = [] {
-        return make_ready_future<lw_shared_ptr<query::result>>(make_lw_shared(query::result()));
-    };
-
-    try {
-        column_family& cf = find_column_family(cmd.cf_id);
-        return cf.query(cmd, ranges);
-    } catch (const no_such_column_family&) {
-        // FIXME: load from sstables
-        return make_empty();
-    }
+database::query(schema_ptr s, const query::read_command& cmd, const std::vector<query::partition_range>& ranges) {
+    column_family& cf = find_column_family(cmd.cf_id);
+    return cf.query(std::move(s), cmd, ranges);
 }

 future<reconcilable_result>
-database::query_mutations(const query::read_command& cmd, const query::partition_range& range) {
-    try {
-        column_family& cf = find_column_family(cmd.cf_id);
-        return mutation_query(cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.timestamp);
-    } catch (const no_such_column_family&) {
-        // FIXME: load from sstables
-        return make_ready_future<reconcilable_result>(reconcilable_result());
-    }
+database::query_mutations(schema_ptr s, const query::read_command& cmd, const query::partition_range& range) {
+    column_family& cf = find_column_family(cmd.cf_id);
+    return mutation_query(std::move(s), cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.timestamp);
 }

 std::unordered_set<sstring> database::get_initial_tokens() {
@@ -1512,6 +1564,31 @@ std::unordered_set<sstring> database::get_initial_tokens() {
    return tokens;
 }

+std::experimental::optional<gms::inet_address> database::get_replace_address() {
+    auto& cfg = get_config();
+    sstring replace_address = cfg.replace_address();
+    sstring replace_address_first_boot = cfg.replace_address_first_boot();
+    try {
+        if (!replace_address.empty()) {
+            return gms::inet_address(replace_address);
+        } else if (!replace_address_first_boot.empty()) {
+            return gms::inet_address(replace_address_first_boot);
+        }
+        return std::experimental::nullopt;
+    } catch (...) {
+        return std::experimental::nullopt;
+    }
+}
+
+bool database::is_replacing() {
+    sstring replace_address_first_boot = get_config().replace_address_first_boot();
+    if (!replace_address_first_boot.empty() && db::system_keyspace::bootstrap_complete()) {
+        dblog.info("Replace address on first boot requested; this node is already bootstrapped");
+        return false;
+    }
+    return bool(get_replace_address());
+}
+
 std::ostream& operator<<(std::ostream& out, const atomic_cell_or_collection& c) {
    return out << to_hex(c._data);
 }
@@ -1536,29 +1613,32 @@ std::ostream& operator<<(std::ostream& out, const database& db) {
    return out;
 }

-future<> database::apply_in_memory(const frozen_mutation& m, const db::replay_position& rp) {
+future<> database::apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
    try {
        auto& cf = find_column_family(m.column_family_id());
-        cf.apply(m, rp);
+        cf.apply(m, m_schema, rp);
    } catch (no_such_column_family&) {
-        // TODO: log a warning
-        // FIXME: load keyspace meta-data from storage
+        dblog.error("Attempting to mutate non-existent table {}", m.column_family_id());
    }
    return make_ready_future<>();
 }

-future<> database::do_apply(const frozen_mutation& m) {
+future<> database::do_apply(schema_ptr s, const frozen_mutation& m) {
    // I'm doing a nullcheck here since the init code path for db etc
    // is a little in flux and commitlog is created only when db is
    // initied from datadir.
-    auto& cf = find_column_family(m.column_family_id());
+    auto uuid = m.column_family_id();
+    auto& cf = find_column_family(uuid);
+    if (!s->is_synced()) {
+        throw std::runtime_error(sprint("attempted to mutate using not synced schema of %s.%s, version=%s",
+                                 s->ks_name(), s->cf_name(), s->version()));
+    }
    if (cf.commitlog() != nullptr) {
-        auto uuid = m.column_family_id();
        bytes_view repr = m.representation();
        auto write_repr = [repr] (data_output& out) { out.write(repr.begin(), repr.end()); };
-        return cf.commitlog()->add_mutation(uuid, repr.size(), write_repr).then([&m, this](auto rp) {
+        return cf.commitlog()->add_mutation(uuid, repr.size(), write_repr).then([&m, this, s](auto rp) {
            try {
-                return this->apply_in_memory(m, rp);
+                return this->apply_in_memory(m, s, rp);
            } catch (replay_position_reordered_exception&) {
                // expensive, but we're assuming this is super rare.
                // if we failed to apply the mutation due to future re-ordering
@@ -1566,11 +1646,11 @@ future<> database::do_apply(const frozen_mutation& m) {
                // let's just try again, add the mutation to the CL once more,
                // and assume success in inevitable eventually.
                dblog.debug("replay_position reordering detected");
-                return this->apply(m);
+                return this->apply(s, m);
            }
        });
    }
-    return apply_in_memory(m, db::replay_position());
+    return apply_in_memory(m, s, db::replay_position());
 }

 future<> database::throttle() {
@@ -1604,9 +1684,9 @@ void database::unthrottle() {
    }
 }

-future<> database::apply(const frozen_mutation& m) {
-    return throttle().then([this, &m] {
-        return do_apply(m);
+future<> database::apply(schema_ptr s, const frozen_mutation& m) {
+    return throttle().then([this, &m, s = std::move(s)] {
+        return do_apply(std::move(s), m);
    });
 }

@@ -1748,6 +1828,36 @@ const sstring& database::get_snitch_name() const {
    return _cfg->endpoint_snitch();
 }

+// For the filesystem operations, this code will assume that all keyspaces are visible in all shards
+// (as we have been doing for a lot of the other operations, like the snapshot itself).
+future<> database::clear_snapshot(sstring tag, std::vector<sstring> keyspace_names) {
+    std::vector<std::reference_wrapper<keyspace>> keyspaces;
+
+    if (keyspace_names.empty()) {
+        // if keyspace names are not given - apply to all existing local keyspaces
+        for (auto& ks: _keyspaces) {
+            keyspaces.push_back(std::reference_wrapper<keyspace>(ks.second));
+        }
+    } else {
+        for (auto& ksname: keyspace_names) {
+            try {
+                keyspaces.push_back(std::reference_wrapper<keyspace>(find_keyspace(ksname)));
+            } catch (no_such_keyspace& e) {
+                return make_exception_future(std::current_exception());
+            }
+        }
+    }
+
+    return parallel_for_each(keyspaces, [this, tag] (auto& ks) {
+        return parallel_for_each(ks.get().metadata()->cf_meta_data(), [this, tag] (auto& pair) {
+            auto& cf = this->find_column_family(pair.second);
+            return cf.clear_snapshot(tag);
+         }).then_wrapped([] (future<> f) {
+            dblog.debug("Cleared out snapshot directories");
+         });
+    });
+}
+
 future<> update_schema_version_and_announce(distributed<service::storage_proxy>& proxy)
 {
    return db::schema_tables::calculate_schema_digest(proxy).then([&proxy] (utils::UUID uuid) {
@@ -1812,7 +1922,7 @@ seal_snapshot(sstring jsondir) {
    dblog.debug("Storing manifest {}", jsonfile);

    return recursive_touch_directory(jsondir).then([jsonfile, json = std::move(json)] {
-        return engine().open_file_dma(jsonfile, open_flags::wo | open_flags::create | open_flags::truncate).then([json](file f) {
+        return open_file_dma(jsonfile, open_flags::wo | open_flags::create | open_flags::truncate).then([json](file f) {
            return do_with(make_file_output_stream(std::move(f)), [json] (output_stream<char>& out) {
                return out.write(json.c_str(), json.size()).then([&out] {
                   return out.flush();
@@ -1899,7 +2009,7 @@ future<> column_family::snapshot(sstring name) {
 }

 future<bool> column_family::snapshot_exists(sstring tag) {
-    sstring jsondir = _config.datadir + "/snapshots/";
+    sstring jsondir = _config.datadir + "/snapshots/" + tag;
    return engine().open_directory(std::move(jsondir)).then_wrapped([] (future<file> f) {
        try {
            f.get0();
@@ -1975,7 +2085,11 @@ future<> column_family::clear_snapshot(sstring tag) {
 future<std::unordered_map<sstring, column_family::snapshot_details>> column_family::get_snapshot_details() {
    std::unordered_map<sstring, snapshot_details> all_snapshots;
    return do_with(std::move(all_snapshots), [this] (auto& all_snapshots) {
-        return lister::scan_dir(_config.datadir + "/snapshots",  { directory_entry_type::directory }, [this, &all_snapshots] (directory_entry de) {
+        return engine().file_exists(_config.datadir + "/snapshots").then([this, &all_snapshots](bool file_exists) {
+            if (!file_exists) {
+                return make_ready_future<>();
+            }
+            return lister::scan_dir(_config.datadir + "/snapshots",  { directory_entry_type::directory }, [this, &all_snapshots] (directory_entry de) {
            auto snapshot_name = de.name;
            auto snapshot = _config.datadir + "/snapshots/" + snapshot_name;
            all_snapshots.emplace(snapshot_name, snapshot_details());
@@ -2010,6 +2124,7 @@ future<std::unordered_map<sstring, column_family::snapshot_details>> column_fami
                    });
                });
            });
+        });
        }).then([&all_snapshots] {
            return std::move(all_snapshots);
        });
@@ -2112,3 +2227,15 @@ std::ostream& operator<<(std::ostream& os, const keyspace_metadata& m) {
    os << "}";
    return os;
 }
+
+void column_family::set_schema(schema_ptr s) {
+    dblog.debug("Changing schema version of {}.{} ({}) from {} to {}",
+                _schema->ks_name(), _schema->cf_name(), _schema->id(), _schema->version(), s->version());
+
+    for (auto& m : *_memtables) {
+        m->set_schema(s);
+    }
+
+    _cache.set_schema(s);
+    _schema = std::move(s);
+}
--- a/database.hh
+++ b/database.hh
@@ -189,19 +189,20 @@ private:
    // Creates a mutation reader which covers sstables.
    // Caller needs to ensure that column_family remains live (FIXME: relax this).
    // The 'range' parameter must be live as long as the reader is used.
-    mutation_reader make_sstable_reader(const query::partition_range& range) const;
+    // Mutations returned by the reader will all have given schema.
+    mutation_reader make_sstable_reader(schema_ptr schema, const query::partition_range& range) const;

    mutation_source sstables_as_mutation_source();
    key_source sstables_as_key_source() const;
    partition_presence_checker make_partition_presence_checker(lw_shared_ptr<sstable_list> old_sstables);
-    // We will use highres because hopefully it won't take more than a few usecs
-    std::chrono::high_resolution_clock::time_point _sstable_writes_disabled_at;
+    std::chrono::steady_clock::time_point _sstable_writes_disabled_at;
 public:
    // Creates a mutation reader which covers all data sources for this column family.
    // Caller needs to ensure that column_family remains live (FIXME: relax this).
    // Note: for data queries use query() instead.
    // The 'range' parameter must be live as long as the reader is used.
-    mutation_reader make_reader(const query::partition_range& range = query::full_partition_range) const;
+    // Mutations returned by the reader will all have given schema.
+    mutation_reader make_reader(schema_ptr schema, const query::partition_range& range = query::full_partition_range) const;

    mutation_source as_mutation_source() const;

@@ -216,22 +217,31 @@ public:
        return _cache;
    }

+    row_cache& get_row_cache() {
+        return _cache;
+    }
+
    logalloc::occupancy_stats occupancy() const;
 public:
    column_family(schema_ptr schema, config cfg, db::commitlog& cl, compaction_manager&);
    column_family(schema_ptr schema, config cfg, no_commitlog, compaction_manager&);
    column_family(column_family&&) = delete; // 'this' is being captured during construction
    ~column_family();
-    schema_ptr schema() const { return _schema; }
+    const schema_ptr& schema() const { return _schema; }
+    void set_schema(schema_ptr);
    db::commitlog* commitlog() { return _commitlog; }
-    future<const_mutation_partition_ptr> find_partition(const dht::decorated_key& key) const;
-    future<const_mutation_partition_ptr> find_partition_slow(const partition_key& key) const;
-    future<const_row_ptr> find_row(const dht::decorated_key& partition_key, clustering_key clustering_key) const;
-    void apply(const frozen_mutation& m, const db::replay_position& = db::replay_position());
+    future<const_mutation_partition_ptr> find_partition(schema_ptr, const dht::decorated_key& key) const;
+    future<const_mutation_partition_ptr> find_partition_slow(schema_ptr, const partition_key& key) const;
+    future<const_row_ptr> find_row(schema_ptr, const dht::decorated_key& partition_key, clustering_key clustering_key) const;
+    // Applies given mutation to this column family
+    // The mutation is always upgraded to current schema.
+    void apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& = db::replay_position());
    void apply(const mutation& m, const db::replay_position& = db::replay_position());

    // Returns at most "cmd.limit" rows
-    future<lw_shared_ptr<query::result>> query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
+    future<lw_shared_ptr<query::result>> query(schema_ptr,
+        const query::read_command& cmd,
+        const std::vector<query::partition_range>& ranges);

    future<> populate(sstring datadir);

@@ -247,7 +257,7 @@ public:
    // to call this separately in all shards first, to guarantee that none of them are writing
    // new data before you can safely assume that the whole node is disabled.
    future<int64_t> disable_sstable_write() {
-        _sstable_writes_disabled_at = std::chrono::high_resolution_clock::now();
+        _sstable_writes_disabled_at = std::chrono::steady_clock::now();
        return _sstables_lock.write_lock().then([this] {
            return make_ready_future<int64_t>((*_sstables->end()).first);
        });
@@ -255,10 +265,10 @@ public:

    // SSTable writes are now allowed again, and generation is updated to new_generation
    // returns the amount of microseconds elapsed since we disabled writes.
-    std::chrono::high_resolution_clock::duration enable_sstable_write(int64_t new_generation) {
+    std::chrono::steady_clock::duration enable_sstable_write(int64_t new_generation) {
        update_sstables_known_generation(new_generation);
        _sstables_lock.write_unlock();
-        return std::chrono::high_resolution_clock::now() - _sstable_writes_disabled_at;
+        return std::chrono::steady_clock::now() - _sstable_writes_disabled_at;
    }

    // Make sure the generation numbers are sequential, starting from "start".
@@ -321,6 +331,10 @@ public:
        return _stats;
    }

+    compaction_manager& get_compaction_manager() const {
+        return _compaction_manager;
+    }
+
    template<typename Func, typename Result = futurize_t<std::result_of_t<Func()>>>
    Result run_with_compaction_disabled(Func && func) {
        ++_compaction_disabled;
@@ -344,20 +358,23 @@ private:
    // one are also complete
    future<> seal_active_memtable();

+    // filter manifest.json files out
+    static bool manifest_json_filter(const sstring& fname);
+
    seastar::gate _in_flight_seals;

    // Iterate over all partitions.  Protocol is the same as std::all_of(),
    // so that iteration can be stopped by returning false.
    // Func signature: bool (const decorated_key& dk, const mutation_partition& mp)
    template <typename Func>
-    future<bool> for_all_partitions(Func&& func) const;
+    future<bool> for_all_partitions(schema_ptr, Func&& func) const;
    future<sstables::entry_descriptor> probe_file(sstring sstdir, sstring fname);
    void seal_on_overflow();
    void check_valid_rp(const db::replay_position&) const;
 public:
    // Iterate over all partitions.  Protocol is the same as std::all_of(),
    // so that iteration can be stopped by returning false.
-    future<bool> for_all_partitions_slow(std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;
+    future<bool> for_all_partitions_slow(schema_ptr, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;

    friend std::ostream& operator<<(std::ostream& out, const column_family& cf);
    // Testing purposes.
@@ -531,7 +548,7 @@ class database {
    circular_buffer<promise<>> _throttled_requests;

    future<> init_commitlog();
-    future<> apply_in_memory(const frozen_mutation&, const db::replay_position&);
+    future<> apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position&);
    future<> populate(sstring datadir);
    future<> populate_keyspace(sstring datadir, sstring ks_name);

@@ -543,7 +560,7 @@ private:
    friend void db::system_keyspace::make(database& db, bool durable, bool volatile_testing_only);
    void setup_collectd();
    future<> throttle();
-    future<> do_apply(const frozen_mutation&);
+    future<> do_apply(schema_ptr, const frozen_mutation&);
    void unthrottle();
 public:
    static utils::UUID empty_version;
@@ -562,6 +579,9 @@ public:
        return _commitlog.get();
    }

+    compaction_manager& get_compaction_manager() {
+        return _compaction_manager;
+    }
    const compaction_manager& get_compaction_manager() const {
        return _compaction_manager;
    }
@@ -571,7 +591,6 @@ public:

    void add_column_family(schema_ptr schema, column_family::config cfg);

-    future<> update_column_family(const sstring& ks_name, const sstring& cf_name);
    future<> drop_column_family(db_clock::time_point changed_at, const sstring& ks_name, const sstring& cf_name);

    /* throws std::out_of_range if missing */
@@ -606,11 +625,12 @@ public:
    unsigned shard_of(const dht::token& t);
    unsigned shard_of(const mutation& m);
    unsigned shard_of(const frozen_mutation& m);
-    future<lw_shared_ptr<query::result>> query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
-    future<reconcilable_result> query_mutations(const query::read_command& cmd, const query::partition_range& range);
-    future<> apply(const frozen_mutation&);
+    future<lw_shared_ptr<query::result>> query(schema_ptr, const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
+    future<reconcilable_result> query_mutations(schema_ptr, const query::read_command& cmd, const query::partition_range& range);
+    future<> apply(schema_ptr, const frozen_mutation&);
    keyspace::config make_keyspace_config(const keyspace_metadata& ksm);
    const sstring& get_snitch_name() const;
+    future<> clear_snapshot(sstring tag, std::vector<sstring> keyspace_names);

    friend std::ostream& operator<<(std::ostream& out, const database& db);
    const std::unordered_map<sstring, keyspace>& get_keyspaces() const {
@@ -648,6 +668,8 @@ public:
    }

    std::unordered_set<sstring> get_initial_tokens();
+    std::experimental::optional<gms::inet_address> get_replace_address();
+    bool is_replacing();
 };

 // FIXME: stub
@@ -662,7 +684,7 @@ column_family::apply(const mutation& m, const db::replay_position& rp) {
    seal_on_overflow();
    _stats.writes.mark(lc);
    if (lc.is_start()) {
-        _stats.estimated_write.add(lc.latency_in_nano(), _stats.writes.count);
+        _stats.estimated_write.add(lc.latency(), _stats.writes.count);
    }
 }

@@ -688,15 +710,15 @@ column_family::check_valid_rp(const db::replay_position& rp) const {

 inline
 void
-column_family::apply(const frozen_mutation& m, const db::replay_position& rp) {
+column_family::apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
    utils::latency_counter lc;
    _stats.writes.set_latency(lc);
    check_valid_rp(rp);
-    active_memtable().apply(m, rp);
+    active_memtable().apply(m, m_schema, rp);
    seal_on_overflow();
    _stats.writes.mark(lc);
    if (lc.is_start()) {
-        _stats.estimated_write.add(lc.latency_in_nano(), _stats.writes.count);
+        _stats.estimated_write.add(lc.latency(), _stats.writes.count);
    }
 }

--- a/database_fwd.hh
+++ b/database_fwd.hh
@@ -31,12 +31,19 @@ class mutation_partition;
 // schema.hh
 class schema;
 class column_definition;
+class column_mapping;
+
+// schema_mutations.hh
+class schema_mutations;

 // keys.hh
 class exploded_clustering_prefix;
 class partition_key;
-class clustering_key;
+class partition_key_view;
 class clustering_key_prefix;
+class clustering_key_prefix_view;
+using clustering_key = clustering_key_prefix;
+using clustering_key_view = clustering_key_prefix_view;

 // memtable.hh
 class memtable;
--- a/db/batchlog_manager.cc
+++ b/db/batchlog_manager.cc
@@ -56,6 +56,8 @@
 #include "unimplemented.hh"
 #include "db/config.hh"
 #include "gms/failure_detector.hh"
+#include "service/storage_service.hh"
+#include "schema_registry.hh"

 static logging::logger logger("batchlog_manager");

@@ -87,10 +89,8 @@ future<> db::batchlog_manager::start() {
                                );
                            });
                });
-        _timer.arm(
-                lowres_clock::now()
-                        + std::chrono::milliseconds(
-                                service::storage_service::RING_DELAY));
+        auto ring_delay = service::get_local_storage_service().get_ring_delay();
+        _timer.arm(lowres_clock::now() + ring_delay);
    }
    return make_ready_future<>();
 }
@@ -115,7 +115,7 @@ mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<muta
 mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<mutation>& mutations, const utils::UUID& id, int32_t version, db_clock::time_point now) {
    auto schema = _qp.db().local().find_schema(system_keyspace::NAME, system_keyspace::BATCHLOG);
    auto key = partition_key::from_singular(*schema, id);
-    auto timestamp = db_clock::now_in_usecs();
+    auto timestamp = api::new_timestamp();
    auto data = [this, &mutations] {
        std::vector<frozen_mutation> fm(mutations.begin(), mutations.end());
        const auto size = std::accumulate(fm.begin(), fm.end(), size_t(0), [](size_t s, auto& m) {
@@ -181,9 +181,9 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
            auto& fm = fms->front();
            auto mid = fm.column_family_id();
            return system_keyspace::get_truncated_at(mid).then([this, &fm, written_at, mutations](db_clock::time_point t) {
-                auto schema = _qp.db().local().find_schema(fm.column_family_id());
+                warn(unimplemented::cause::SCHEMA_CHANGE);
+                auto schema = local_schema_registry().get(fm.schema_version());
                if (written_at > t) {
-                    auto schema = _qp.db().local().find_schema(fm.column_family_id());
                    mutations->emplace_back(fm.unfreeze(schema));
                }
            }).then([fms] {
--- a/db/commitlog/commitlog.cc
+++ b/db/commitlog/commitlog.cc
@@ -90,7 +90,7 @@ public:

 db::commitlog::config::config(const db::config& cfg)
    : commit_log_location(cfg.commitlog_directory())
-    , commitlog_total_space_in_mb(cfg.commitlog_total_space_in_mb() >= 0 ? cfg.commitlog_total_space_in_mb() : memory::stats().total_memory())
+    , commitlog_total_space_in_mb(cfg.commitlog_total_space_in_mb() >= 0 ? cfg.commitlog_total_space_in_mb() : memory::stats().total_memory() >> 20)
    , commitlog_segment_size_in_mb(cfg.commitlog_segment_size_in_mb())
    , commitlog_sync_period_in_ms(cfg.commitlog_sync_batch_window_in_ms())
    , mode(cfg.commitlog_sync() == "batch" ? sync_mode::BATCH : sync_mode::PERIODIC)
@@ -281,6 +281,43 @@ private:
 * A single commit log file on disk. Manages creation of the file and writing mutations to disk,
 * as well as tracking the last mutation position of any "dirty" CFs covered by the segment file. Segment
 * files are initially allocated to a fixed size and can grow to accomidate a larger value if necessary.
+ *
+ * The IO flow is somewhat convoluted and goes something like this:
+ *
+ * Mutation path:
+ *  - Adding data to the segment usually writes into the internal buffer
+ *  - On EOB or overflow we issue a write to disk ("cycle").
+ *      - A cycle call will acquire the segment read lock and send the
+ *        buffer to the corresponding position in the file
+ *  - If we are periodic and crossed a timing threshold, or running "batch" mode
+ *    we might be forced to issue a flush ("sync") after adding data
+ *      - A sync call acquires the write lock, thus locking out writes
+ *        and waiting for pending writes to finish. It then checks the
+ *        high data mark, and issues the actual file flush.
+ *        Note that the write lock is released prior to issuing the
+ *        actual file flush, thus we are allowed to write data to
+ *        after a flush point concurrently with a pending flush.
+ *
+ * Sync timer:
+ *  - In periodic mode, we try to primarily issue sync calls in
+ *    a timer task issued every N seconds. The timer does the same
+ *    operation as the above described sync, and resets the timeout
+ *    so that mutation path will not trigger syncs and delay.
+ *
+ * Note that we do not care which order segment chunks finish writing
+ * to disk, other than all below a flush point must finish before flushing.
+ *
+ * We currently do not wait for flushes to finish before issueing the next
+ * cycle call ("after" flush point in the file). This might not be optimal.
+ *
+ * To close and finish a segment, we first close the gate object that guards
+ * writing data to it, then flush it fully (including waiting for futures create
+ * by the timer to run their course), and finally wait for it to
+ * become "clean", i.e. get notified that all mutations it holds have been
+ * persisted to sstables elsewhere. Once this is done, we can delete the
+ * segment. If a segment (object) is deleted without being fully clean, we
+ * do not remove the file on disk.
+ *
 */

 class db::commitlog::segment: public enable_lw_shared_from_this<segment> {
@@ -370,6 +407,7 @@ public:
    void reset_sync_time() {
        _sync_time = clock_type::now();
    }
+    // See class comment for info
    future<sseg_ptr> sync() {
        // Note: this is not a marker for when sync was finished.
        // It is when it was initiated
@@ -386,6 +424,7 @@ public:
    future<> shutdown() {
        return _gate.close();
    }
+    // See class comment for info
    future<sseg_ptr> flush(uint64_t pos = 0) {
        auto me = shared_from_this();
        assert(!me.owned());
@@ -431,6 +470,7 @@ public:
    /**
     * Send any buffer contents to disk and get a new tmp buffer
     */
+    // See class comment for info
    future<sseg_ptr> cycle(size_t s = 0) {
        auto size = clear_buffer_slack();
        auto buf = std::move(_buffer);
@@ -847,7 +887,7 @@ void db::commitlog::segment_manager::flush_segments(bool force) {

 future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager::allocate_segment(bool active) {
    descriptor d(next_id());
-    return engine().open_file_dma(cfg.commit_log_location + "/" + d.filename(), open_flags::wo | open_flags::create).then([this, d, active](file f) {
+    return open_file_dma(cfg.commit_log_location + "/" + d.filename(), open_flags::wo | open_flags::create).then([this, d, active](file f) {
        // xfs doesn't like files extended betond eof, so enlarge the file
        return f.truncate(max_size).then([this, d, active, f] () mutable {
            auto s = make_lw_shared<segment>(this, d, std::move(f), active);
@@ -1097,7 +1137,7 @@ db::commitlog::commitlog(config cfg)
        : _segment_manager(new segment_manager(std::move(cfg))) {
 }

-db::commitlog::commitlog(commitlog&& v)
+db::commitlog::commitlog(commitlog&& v) noexcept
        : _segment_manager(std::move(v._segment_manager)) {
 }

@@ -1173,10 +1213,11 @@ const db::commitlog::config& db::commitlog::active_config() const {
    return _segment_manager->cfg;
 }

-future<subscription<temporary_buffer<char>, db::replay_position>>
+future<std::unique_ptr<subscription<temporary_buffer<char>, db::replay_position>>>
 db::commitlog::read_log_file(const sstring& filename, commit_load_reader_func next, position_type off) {
-    return engine().open_file_dma(filename, open_flags::ro).then([next = std::move(next), off](file f) {
-       return read_log_file(std::move(f), std::move(next), off);
+    return open_file_dma(filename, open_flags::ro).then([next = std::move(next), off](file f) {
+       return std::make_unique<subscription<temporary_buffer<char>, replay_position>>(
+           read_log_file(std::move(f), std::move(next), off));
    });
 }

@@ -1192,6 +1233,8 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
        size_t next = 0;
        size_t start_off = 0;
        size_t skip_to = 0;
+        size_t file_size = 0;
+        size_t corrupt_size = 0;
        bool eof = false;
        bool header = true;

@@ -1289,7 +1332,11 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type

                auto cs = crc.checksum();
                if (cs != checksum) {
-                    throw std::runtime_error("Checksum error in chunk header");
+                    // if a chunk header checksum is broken, we shall just assume that all
+                    // remaining is as well. We cannot trust the "next" pointer, so...
+                    logger.debug("Checksum error in segment chunk at {}.", pos);
+                    corrupt_size += (file_size - pos);
+                    return stop();
                }

                this->next = next;
@@ -1303,6 +1350,17 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
        }
        future<> read_entry() {
            static constexpr size_t entry_header_size = segment::entry_overhead_size - sizeof(uint32_t);
+
+            /**
+             * #598 - Must check that data left in chunk is enough to even read an entry.
+             * If not, this is small slack space in the chunk end, and we should just go
+             * to the next.
+             */
+            assert(pos <= next);
+            if ((pos + entry_header_size) >= next) {
+                return skip(next - pos);
+            }
+
            return fin.read_exactly(entry_header_size).then([this](temporary_buffer<char> buf) {
                replay_position rp(id, position_type(pos));

@@ -1315,21 +1373,24 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
                auto size = in.read<uint32_t>();
                auto checksum = in.read<uint32_t>();

-                if (size == 0) {
-                    // special scylla case: zero padding due to dma blocks
-                    auto slack = next - pos;
-                    return skip(slack);
-                }
+                crc32_nbo crc;
+                crc.process(size);

-                if (size < 3 * sizeof(uint32_t)) {
-                    throw std::runtime_error("Invalid entry size");
+                if (size < 3 * sizeof(uint32_t) || checksum != crc.checksum()) {
+                    auto slack = next - pos;
+                    if (size != 0) {
+                        logger.debug("Segment entry at {} has broken header. Skipping to next chunk ({} bytes)", rp, slack);
+                        corrupt_size += slack;
+                    }
+                    // size == 0 -> special scylla case: zero padding due to dma blocks
+                    return skip(slack);
                }

                if (start_off > pos) {
                    return skip(size - entry_header_size);
                }

-                return fin.read_exactly(size - entry_header_size).then([this, size, checksum, rp](temporary_buffer<char> buf) {
+                return fin.read_exactly(size - entry_header_size).then([this, size, crc = std::move(crc), rp](temporary_buffer<char> buf) mutable {
                    advance(buf);

                    data_input in(buf);
@@ -1338,12 +1399,15 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
                    in.skip(data_size);
                    auto checksum = in.read<uint32_t>();

-                    crc32_nbo crc;
-                    crc.process(size);
                    crc.process_bytes(buf.get(), data_size);

                    if (crc.checksum() != checksum) {
-                        throw std::runtime_error("Checksum error in data entry");
+                        // If we're getting a checksum error here, most likely the rest of
+                        // the file will be corrupt as well. But it does not hurt to retry.
+                        // Just go to the next entry (since "size" in header seemed ok).
+                        logger.debug("Segment entry at {} checksum error. Skipping {} bytes", rp, size);
+                        corrupt_size += size;
+                        return make_ready_future<>();
                    }

                    return s.produce(buf.share(0, data_size), rp);
@@ -1351,10 +1415,18 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
            });
        }
        future<> read_file() {
-            return read_header().then(
-                    [this] {
-                        return do_until(std::bind(&work::end_of_file, this), std::bind(&work::read_chunk, this));
-                    });
+            return f.size().then([this](uint64_t size) {
+                file_size = size;
+            }).then([this] {
+                return read_header().then(
+                        [this] {
+                            return do_until(std::bind(&work::end_of_file, this), std::bind(&work::read_chunk, this));
+                }).then([this] {
+                  if (corrupt_size > 0) {
+                      throw segment_data_corruption_error("Data corruption", corrupt_size);
+                  }
+                });
+            });
        }
    };

@@ -1382,6 +1454,10 @@ uint64_t db::commitlog::get_completed_tasks() const {
    return _segment_manager->totals.allocation_count;
 }

+uint64_t db::commitlog::get_flush_count() const {
+    return _segment_manager->totals.flush_count;
+}
+
 uint64_t db::commitlog::get_pending_tasks() const {
    return _segment_manager->totals.pending_operations;
 }
--- a/db/commitlog/commitlog.hh
+++ b/db/commitlog/commitlog.hh
@@ -139,7 +139,7 @@ public:
        const uint32_t ver;
    };

-    commitlog(commitlog&&);
+    commitlog(commitlog&&) noexcept;
    ~commitlog();

    /**
@@ -231,6 +231,7 @@ public:

    uint64_t get_total_size() const;
    uint64_t get_completed_tasks() const;
+    uint64_t get_flush_count() const;
    uint64_t get_pending_tasks() const;
    uint64_t get_num_segments_created() const;
    uint64_t get_num_segments_destroyed() const;
@@ -265,8 +266,21 @@ public:

    typedef std::function<future<>(temporary_buffer<char>, replay_position)> commit_load_reader_func;

+    class segment_data_corruption_error: public std::runtime_error {
+    public:
+        segment_data_corruption_error(std::string msg, uint64_t s)
+                : std::runtime_error(msg), _bytes(s) {
+        }
+        uint64_t bytes() const {
+            return _bytes;
+        }
+    private:
+        uint64_t _bytes;
+    };
+
    static subscription<temporary_buffer<char>, replay_position> read_log_file(file, commit_load_reader_func, position_type = 0);
-    static future<subscription<temporary_buffer<char>, replay_position>> read_log_file(const sstring&, commit_load_reader_func, position_type = 0);
+    static future<std::unique_ptr<subscription<temporary_buffer<char>, replay_position>>> read_log_file(
+            const sstring&, commit_load_reader_func, position_type = 0);
 private:
    commitlog(config);
 };
--- a/db/commitlog/commitlog_replayer.cc
+++ b/db/commitlog/commitlog_replayer.cc
@@ -69,6 +69,7 @@ public:
        uint64_t invalid_mutations = 0;
        uint64_t skipped_mutations = 0;
        uint64_t applied_mutations = 0;
+        uint64_t corrupt_bytes = 0;
    };

    future<> process(stats*, temporary_buffer<char> buf, replay_position rp);
@@ -166,9 +167,16 @@ db::commitlog_replayer::impl::recover(sstring file) {
    return db::commitlog::read_log_file(file,
            std::bind(&impl::process, this, s.get(), std::placeholders::_1,
                    std::placeholders::_2), p).then([](auto s) {
-        auto f = s.done();
+        auto f = s->done();
        return f.finally([s = std::move(s)] {});
-    }).then([s] {
+    }).then_wrapped([s](future<> f) {
+        try {
+            f.get();
+        } catch (commitlog::segment_data_corruption_error& e) {
+            s->corrupt_bytes += e.bytes();
+        } catch (...) {
+            throw;
+        }
        return make_ready_future<stats>(*s);
    });
 }
@@ -202,7 +210,7 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
            auto& cf = db.find_column_family(fm.column_family_id());

            if (logger.is_enabled(logging::log_level::debug)) {
-                logger.debug("replaying at {} {}:{} at {}", fm.column_family_id(),
+                logger.debug("replaying at {} v={} {}:{} at {}", fm.column_family_id(), fm.schema_version(),
                        cf.schema()->ks_name(), cf.schema()->cf_name(), rp);
            }
            // Removed forwarding "new" RP. Instead give none/empty.
@@ -210,7 +218,12 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
            // The end result should be that once sstables are flushed out
            // their "replay_position" attribute will be empty, which is
            // lower than anything the new session will produce.
-            cf.apply(fm);
+            if (cf.schema()->version() != fm.schema_version()) {
+                // TODO: Convert fm to current schema
+                fail(unimplemented::cause::SCHEMA_CHANGE);
+            } else {
+                cf.apply(fm, cf.schema());
+            }
            s->applied_mutations++;
            return make_ready_future<>();
        }).handle_exception([s](auto ep) {
@@ -233,7 +246,7 @@ db::commitlog_replayer::commitlog_replayer(seastar::sharded<cql3::query_processo
    : _impl(std::make_unique<impl>(qp))
 {}

-db::commitlog_replayer::commitlog_replayer(commitlog_replayer&& r)
+db::commitlog_replayer::commitlog_replayer(commitlog_replayer&& r) noexcept
    : _impl(std::move(r._impl))
 {}

@@ -250,31 +263,32 @@ future<db::commitlog_replayer> db::commitlog_replayer::create_replayer(seastar::
 }

 future<> db::commitlog_replayer::recover(std::vector<sstring> files) {
-    logger.info("Replaying {}", files);
-
    return parallel_for_each(files, [this](auto f) {
-        return this->recover(f).handle_exception([f](auto ep) {
-            logger.error("Error recovering {}: {}", f, ep);
-            try {
-                std::rethrow_exception(ep);
-            } catch (std::invalid_argument&) {
-                logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.");
-                throw;
-            } catch (...) {
-                throw;
-            }
-        });
+        return this->recover(f);
    });
 }

-future<> db::commitlog_replayer::recover(sstring file) {
-    return _impl->recover(file).then([file](impl::stats stats) {
+future<> db::commitlog_replayer::recover(sstring f) {
+    return _impl->recover(f).then([f](impl::stats stats) {
+        if (stats.corrupt_bytes != 0) {
+            logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
+        }
        logger.info("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
-                , file
+                , f
                , stats.applied_mutations
                , stats.invalid_mutations
                , stats.skipped_mutations
                );
-    });
+    }).handle_exception([f](auto ep) {
+        logger.error("Error recovering {}: {}", f, ep);
+        try {
+            std::rethrow_exception(ep);
+        } catch (std::invalid_argument&) {
+            logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.");
+            throw;
+        } catch (...) {
+            throw;
+        }
+    });;
 }

--- a/db/commitlog/commitlog_replayer.hh
+++ b/db/commitlog/commitlog_replayer.hh
@@ -57,7 +57,7 @@ class commitlog;

 class commitlog_replayer {
 public:
-    commitlog_replayer(commitlog_replayer&&);
+    commitlog_replayer(commitlog_replayer&&) noexcept;
    ~commitlog_replayer();

    static future<commitlog_replayer> create_replayer(seastar::sharded<cql3::query_processor>&);
--- a/db/config.cc
+++ b/db/config.cc
@@ -30,6 +30,7 @@
 #include "core/shared_ptr.hh"
 #include "core/fstream.hh"
 #include "core/do_with.hh"
+#include "core/print.hh"
 #include "log.hh"
 #include <boost/any.hpp>

@@ -117,8 +118,9 @@ template<typename K, typename V>
 struct convert<std::unordered_map<K, V>> {
    static Node encode(const std::unordered_map<K, V>& rhs) {
        Node node(NodeType::Map);
-        for(typename std::map<K, V>::const_iterator it=rhs.begin();it!=rhs.end();++it)
-            node.force_insert(it->first, it->second);
+        for (auto& p : rhs) {
+            node.force_insert(p.first, p.second);
+        }
        return node;
    }
    static bool decode(const Node& node, std::unordered_map<K, V>& rhs) {
@@ -409,7 +411,31 @@ future<> db::config::read_from_file(file f) {
 }

 future<> db::config::read_from_file(const sstring& filename) {
-    return engine().open_file_dma(filename, open_flags::ro).then([this](file f) {
+    return open_file_dma(filename, open_flags::ro).then([this](file f) {
       return read_from_file(std::move(f));
    });
 }
+
+boost::filesystem::path db::config::get_conf_dir() {
+    using namespace boost::filesystem;
+
+    path confdir;
+    auto* cd = std::getenv("SCYLLA_CONF");
+    if (cd != nullptr) {
+        confdir = path(cd);
+    } else {
+        auto* p = std::getenv("SCYLLA_HOME");
+        if (p != nullptr) {
+            confdir = path(p);
+        }
+        confdir /= "conf";
+    }
+
+    return confdir;
+}
+
+void db::config::check_experimental(const sstring& what) const {
+    if (!experimental()) {
+        throw std::runtime_error(sprint("%s is currently disabled. Start Scylla with --experimental=on to enable.", what));
+    }
+}
--- a/db/config.hh
+++ b/db/config.hh
@@ -102,6 +102,9 @@ public:

    config();

+    // Throws exception if experimental feature is disabled.
+    void check_experimental(const sstring& what) const;
+
    boost::program_options::options_description
    get_options_description();

@@ -121,23 +124,7 @@ public:
     * @return path of the directory where configuration files are located
     *         according the environment variables definitions.
     */
-    static boost::filesystem::path get_conf_dir() {
-        using namespace boost::filesystem;
-
-        path confdir;
-        auto* cd = std::getenv("SCYLLA_CONF");
-        if (cd != nullptr) {
-            confdir = path(cd);
-        } else {
-            auto* p = std::getenv("SCYLLA_HOME");
-            if (p != nullptr) {
-                confdir = path(p);
-            }
-            confdir /= "conf";
-        }
-
-        return confdir;
-    }
+    static boost::filesystem::path get_conf_dir();

    typedef std::unordered_map<sstring, sstring> string_map;
    typedef std::vector<sstring> string_list;
@@ -290,7 +277,7 @@ public:
            "Related information: Configuring compaction"   \
    )                                                   \
    /* Common fault detection setting */    \
-    val(phi_convict_threshold, uint32_t, 8, Unused,     \
+    val(phi_convict_threshold, uint32_t, 8, Used,     \
            "Adjusts the sensitivity of the failure detector on an exponential scale. Generally this setting never needs adjusting.\n"  \
            "Related information: Failure detection and recovery"  \
    )                                                   \
@@ -399,7 +386,7 @@ public:
            "This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. When initializing a fresh cluster with no data, add auto_bootstrap: false.\n"  \
            "Related information: Initializing a multiple node cluster (single data center) and Initializing a multiple node cluster (multiple data centers)."  \
    )   \
-    val(batch_size_warn_threshold_in_kb, uint32_t, 5, Unused,     \
+    val(batch_size_warn_threshold_in_kb, uint32_t, 5, Used,     \
            "Log WARN on any batch size exceeding this value in kilobytes. Caution should be taken on increasing the size of this threshold as it can lead to node instability."  \
    )   \
    val(broadcast_address, sstring, /* listen_address */, Used, \
@@ -560,7 +547,7 @@ public:
    )   \
    /* RPC (remote procedure call) settings */  \
    /* Settings for configuring and tuning client connections. */   \
-    val(broadcast_rpc_address, sstring, /* unset */, Unused,    \
+    val(broadcast_rpc_address, sstring, /* unset */, Used,    \
            "RPC address to broadcast to drivers and other Cassandra nodes. This cannot be set to 0.0.0.0. If blank, it is set to the value of the rpc_address or rpc_interface. If rpc_address or rpc_interfaceis set to 0.0.0.0, this property must be set.\n"    \
    )   \
    val(rpc_port, uint16_t, 9160, Used,                \
@@ -654,8 +641,8 @@ public:
    )   \
    /* Security properties */   \
    /* Server and client security settings. */  \
-    val(authenticator, sstring, "org.apache.cassandra.auth.AllowAllAuthenticator", Unused,     \
-            "The authentication backend. It implements IAuthenticator, which is used to identify users. The available authenticators are:\n"    \
+    val(authenticator, sstring, "org.apache.cassandra.auth.AllowAllAuthenticator", Used,     \
+            "The authentication backend, used to identify users. The available authenticators are:\n"    \
            "\n"    \
            "\torg.apache.cassandra.auth.AllowAllAuthenticator : Disables authentication; no checks are performed.\n"   \
            "\torg.apache.cassandra.auth.PasswordAuthenticator : Authenticates users with user names and hashed passwords stored in the system_auth.credentials table. If you use the default, 1, and the node with the lone replica goes down, you will not be able to log into the cluster because the system_auth keyspace was not replicated.\n"  \
@@ -682,7 +669,7 @@ public:
    val(permissions_update_interval_in_ms, uint32_t, 2000, Unused,     \
            "Refresh interval for permissions cache (if enabled). After this interval, cache entries become eligible for refresh. On next access, an async reload is scheduled and the old value is returned until it completes. If permissions_validity_in_ms , then this property must benon-zero."   \
    )   \
-    val(server_encryption_options, string_map, /*none*/, Unused,     \
+    val(server_encryption_options, string_map, /*none*/, Used,     \
            "Enable or disable inter-node encryption. You must also generate keys and provide the appropriate key and trust store locations and passwords. No custom encryption options are currently enabled. The available options are:\n"    \
            "\n"    \
            "internode_encryption : (Default: none ) Enable or disable encryption of inter-node communication using the TLS_RSA_WITH_AES_128_CBC_SHA cipher suite for authentication, key exchange, and encryption of data transfers. The available inter-node options are:\n"  \
@@ -690,44 +677,23 @@ public:
            "\tnone : No encryption.\n" \
            "\tdc : Encrypt the traffic between the data centers (server only).\n"  \
            "\track : Encrypt the traffic between the racks(server only).\n"    \
-            "\tkeystore : (Default: conf/.keystore ) The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer (SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.\n"    \
-            "\tkeystore_password : (Default: cassandra ) Password for the keystore.\n"  \
-            "\ttruststore : (Default: conf/.truststore ) Location of the truststore containing the trusted certificate for authenticating remote servers.\n"    \
-            "\ttruststore_password : (Default: cassandra ) Password for the truststore.\n"  \
-            "\n"    \
-            "The passwords used in these options must match the passwords used when generating the keystore and truststore. For instructions on generating these files, see Creating a Keystore to Use with JSSE.\n"   \
-            "\n"    \
-            "The advanced settings are:\n"  \
-            "\n"    \
-            "\tprotocol : (Default: TLS )\n"    \
-            "\talgorithm : (Default: SunX509 )\n"   \
-            "\tstore_type : (Default: JKS )\n"  \
-            "\tcipher_suites : (Default: TLS_RSA_WITH_AES_128_CBC_SHA , TLS_RSA_WITH_AES_256_CBC_SHA )\n"   \
-            "\trequire_client_auth : (Default: false ) Enables or disables certificate authentication.\n" \
+            "certificate : (Default: conf/scylla.crt) The location of a PEM-encoded x509 certificate used to identify and encrypt the internode communication.\n"    \
+            "keyfile : (Default: conf/scylla.key) PEM Key file associated with certificate.\n"  \
+            "truststore : (Default: <system truststore> ) Location of the truststore containing the trusted certificate for authenticating remote servers.\n"    \
            "Related information: Node-to-node encryption"  \
    )   \
-    val(client_encryption_options, string_map, /*none*/, Unused,     \
-            "Enable or disable client-to-node encryption. You must also generate keys and provide the appropriate key and trust store locations and passwords. No custom encryption options are currently enabled. The available options are:\n"    \
+    val(client_encryption_options, string_map, /*none*/, Used,     \
+            "Enable or disable client-to-node encryption. You must also generate keys and provide the appropriate key and certificate. No custom encryption options are currently enabled. The available options are:\n"    \
            "\n"    \
            "\tenabled : (Default: false ) To enable, set to true.\n"    \
-            "\tkeystore : (Default: conf/.keystore ) The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer (SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.\n"    \
-            "\tkeystore_password : (Default: cassandra ) Password for the keystore. This must match the password used when generating the keystore and truststore.\n"    \
-            "\trequire_client_auth : (Default: false ) Enables or disables certificate authentication. (Available starting with Cassandra 1.2.3.)\n"    \
-            "\ttruststore : (Default: conf/.truststore ) Set if require_client_auth is true.\n"    \
-            "\ttruststore_password : <truststore_password> Set if require_client_auth is true.\n"    \
-            "\n"    \
-            "The advanced settings are:\n"    \
-            "\n"    \
-            "\tprotocol : (Default: TLS )\n"    \
-            "\talgorithm : (Default: SunX509 )\n"    \
-            "\tstore_type : (Default: JKS )\n"    \
-            "\tcipher_suites : (Default: TLS_RSA_WITH_AES_128_CBC_SHA , TLS_RSA_WITH_AES_256_CBC_SHA )\n"  \
+            "\tcertificate: (Default: conf/scylla.crt) The location of a PEM-encoded x509 certificate used to identify and encrypt the client/server communication.\n"   \
+            "\tkeyfile: (Default: conf/scylla.key) PEM Key file associated with certificate.\n"   \
            "Related information: Client-to-node encryption"    \
    )   \
-    val(ssl_storage_port, uint32_t, 7001, Unused,     \
+    val(ssl_storage_port, uint32_t, 7001, Used,     \
            "The SSL port for encrypted communication. Unused unless enabled in encryption_options."  \
    )                                                   \
-    val(default_log_level, sstring, "warn", Used, \
+    val(default_log_level, sstring, "info", Used, \
            "Default log level for log messages.  Valid values are trace, debug, info, warn, error.") \
    val(logger_log_level, string_map, /* none */, Used,\
            "map of logger name to log level.  Valid values are trace, debug, info, warn, error.  " \
@@ -743,6 +709,18 @@ public:
    val(api_ui_dir, sstring, "swagger-ui/dist/", Used, "The directory location of the API GUI") \
    val(api_doc_dir, sstring, "api/api-doc/", Used, "The API definition file directory") \
    val(load_balance, sstring, "none", Used, "CQL request load balancing: 'none' or round-robin'") \
+    val(consistent_rangemovement, bool, true, Used, "When set to true, range movements will be consistent. It means: 1) it will refuse to bootstrapp a new node if other bootstrapping/leaving/moving nodes detected. 2) data will be streamed to a new node only from the node which is no longer responsible for the token range. Same as -Dcassandra.consistent.rangemovement in cassandra") \
+    val(join_ring, bool, true, Used, "When set to true, a node will join the token ring. When set to false, a node will not join the token ring. User can use nodetool join to initiate ring joinging later. Same as -Dcassandra.join_ring in cassandra.") \
+    val(load_ring_state, bool, true, Used, "When set to true, load tokens and host_ids previously saved. Same as -Dcassandra.load_ring_state in cassandra.") \
+    val(replace_node, sstring, "", Used, "The UUID of the node to replace. Same as -Dcassandra.replace_node in cssandra.") \
+    val(replace_token, sstring, "", Used, "The tokens of the node to replace. Same as -Dcassandra.replace_token in cassandra.") \
+    val(replace_address, sstring, "", Used, "The listen_address or broadcast_address of the dead node to replace. Same as -Dcassandra.replace_address.") \
+    val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped sucessfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
+    val(override_decommission, bool, false, Used, "Set true to force a decommissioned node to join the cluster") \
+    val(ring_delay_ms, uint32_t, 30 * 1000, Used, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.") \
+    val(developer_mode, bool, false, Used, "Relax environement checks. Setting to true can reduce performance and reliability significantly.") \
+    val(skip_wait_for_gossip_to_settle, int32_t, -1, Used, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.") \
+    val(experimental, bool, false, Used, "Set to true to unlock experimental features.") \
    /* done! */

 #define _make_value_member(name, type, deflt, status, desc, ...)    \
@@ -759,5 +737,4 @@ private:
    int _dummy;
 };

-
 }
--- a/db/schema_tables.cc
+++ b/db/schema_tables.cc
@@ -46,6 +46,7 @@
 #include "system_keyspace.hh"
 #include "query_context.hh"
 #include "query-result-set.hh"
+#include "query-result-writer.hh"
 #include "schema_builder.hh"
 #include "map_difference.hh"
 #include "utils/UUID_gen.hh"
@@ -53,9 +54,12 @@
 #include "core/thread.hh"
 #include "json.hh"
 #include "log.hh"
+#include "frozen_schema.hh"
+#include "schema_registry.hh"

 #include "db/marshal/type_parser.hh"
 #include "db/config.hh"
+#include "md5_hasher.hh"

 #include <boost/range/algorithm/copy.hpp>
 #include <boost/range/adaptor/map.hpp>
@@ -70,6 +74,36 @@ namespace schema_tables {

 logging::logger logger("schema_tables");

+struct qualified_name {
+    sstring keyspace_name;
+    sstring table_name;
+
+    qualified_name(sstring keyspace_name, sstring table_name)
+            : keyspace_name(std::move(keyspace_name))
+            , table_name(std::move(table_name))
+    { }
+
+    qualified_name(const schema_ptr& s)
+            : keyspace_name(s->ks_name())
+            , table_name(s->cf_name())
+    { }
+
+    bool operator<(const qualified_name& o) const {
+        return keyspace_name < o.keyspace_name
+               || (keyspace_name == o.keyspace_name && table_name < o.table_name);
+    }
+
+    bool operator==(const qualified_name& o) const {
+        return keyspace_name == o.keyspace_name && table_name == o.table_name;
+    }
+};
+
+static future<schema_mutations> read_table_mutations(distributed<service::storage_proxy>& proxy, const qualified_name& table);
+
+static void merge_tables(distributed<service::storage_proxy>& proxy,
+    std::map<qualified_name, schema_mutations>&& before,
+    std::map<qualified_name, schema_mutations>&& after);
+
 std::vector<const char*> ALL { KEYSPACES, COLUMNFAMILIES, COLUMNS, TRIGGERS, USERTYPES, /* not present in 2.1.8: FUNCTIONS, AGGREGATES */ };

 using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
@@ -95,7 +129,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "keyspace definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::yes);
+        builder.with(schema_builder::compact_storage::yes);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return keyspaces;
 }
@@ -147,7 +183,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "table definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return columnfamilies;
 }
@@ -176,7 +214,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "column definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return columns;
 }
@@ -200,7 +240,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "trigger definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return triggers;
 }
@@ -225,7 +267,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "user defined type definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return usertypes;
 }
@@ -254,7 +298,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "user defined type definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return functions;
 }
@@ -283,7 +329,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
        "user defined aggregate definitions"
        )));
        builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
-        return builder.build(schema_builder::compact_storage::no);
+        builder.with(schema_builder::compact_storage::no);
+        builder.with_version(generate_schema_version(builder.uuid()));
+        return builder.build();
    }();
    return aggregates;
 }
@@ -295,10 +343,11 @@ future<> save_system_keyspace_schema() {

    // delete old, possibly obsolete entries in schema tables
    return parallel_for_each(ALL, [ksm] (sstring cf) {
-        return db::execute_cql("DELETE FROM system.%s WHERE keyspace_name = ?", cf, ksm->name()).discard_result();
+        auto deletion_timestamp = schema_creation_timestamp() - 1;
+        return db::execute_cql(sprint("DELETE FROM system.%%s USING TIMESTAMP %s WHERE keyspace_name = ?",
+            deletion_timestamp), cf, ksm->name()).discard_result();
    }).then([ksm] {
-        // (+1 to timestamp to make sure we don't get shadowed by the tombstones we just added)
-        auto mvec  = make_create_keyspace_mutations(ksm, qctx->next_timestamp(), true);
+        auto mvec  = make_create_keyspace_mutations(ksm, schema_creation_timestamp(), true);
        return qctx->proxy().mutate_locally(std::move(mvec));
    });
 }
@@ -326,36 +375,30 @@ future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>&
    auto map = [&proxy] (sstring table) {
        return db::system_keyspace::query_mutations(proxy, table).then([&proxy, table] (auto rs) {
            auto s = proxy.local().get_db().local().find_schema(system_keyspace::NAME, table);
-            std::vector<query::result> results;
+            std::vector<mutation> mutations;
            for (auto&& p : rs->partitions()) {
                auto mut = p.mut().unfreeze(s);
                auto partition_key = value_cast<sstring>(utf8_type->deserialize(mut.key().get_component(*s, 0)));
                if (partition_key == system_keyspace::NAME) {
                    continue;
                }
-                auto slice = partition_slice_builder(*s).build();
-                results.emplace_back(mut.query(slice));
+                mutations.emplace_back(std::move(mut));
            }
-            return results;
+            return mutations;
        });
    };
-    auto reduce = [] (auto& hash, auto&& results) {
-        for (auto&& rs : results) {
-            for (auto&& f : rs.buf().fragments()) {
-                hash.Update(reinterpret_cast<const unsigned char*>(f.begin()), f.size());
-            }
+    auto reduce = [] (auto& hash, auto&& mutations) {
+        for (const mutation& m : mutations) {
+            feed_hash_for_schema_digest(hash, m);
        }
-        return make_ready_future<>();
    };
-    return do_with(CryptoPP::Weak::MD5{}, [map, reduce] (auto& hash) {
+    return do_with(md5_hasher(), [map, reduce] (auto& hash) {
        return do_for_each(ALL.begin(), ALL.end(), [&hash, map, reduce] (auto& table) {
-            return map(table).then([&hash, reduce] (auto&& results) {
-                return reduce(hash, results);
+            return map(table).then([&hash, reduce] (auto&& mutations) {
+                reduce(hash, mutations);
            });
        }).then([&hash] {
-            bytes digest{bytes::initialized_later(), CryptoPP::Weak::MD5::DIGESTSIZE};
-            hash.Final(reinterpret_cast<unsigned char*>(digest.begin()));
-            return make_ready_future<utils::UUID>(utils::UUID_gen::get_name_UUID(digest));
+            return make_ready_future<utils::UUID>(utils::UUID_gen::get_name_UUID(hash.finalize()));
        });
    });
 }
@@ -398,27 +441,51 @@ read_schema_for_keyspaces(distributed<service::storage_proxy>& proxy, const sstr
    return map_reduce(keyspace_names.begin(), keyspace_names.end(), map, schema_result{}, insert);
 }

-future<schema_result::value_type>
+static
+future<mutation> query_partition_mutation(service::storage_proxy& proxy,
+    schema_ptr s,
+    lw_shared_ptr<query::read_command> cmd,
+    partition_key pkey)
+{
+    auto dk = dht::global_partitioner().decorate_key(*s, pkey);
+    return do_with(query::partition_range::make_singular(dk), [&proxy, dk, s = std::move(s), cmd = std::move(cmd)] (auto& range) {
+        return proxy.query_mutations_locally(s, std::move(cmd), range)
+                .then([dk = std::move(dk), s](foreign_ptr<lw_shared_ptr<reconcilable_result>> res) {
+                    auto&& partitions = res->partitions();
+                    if (partitions.size() == 0) {
+                        return mutation(std::move(dk), s);
+                    } else if (partitions.size() == 1) {
+                        return partitions[0].mut().unfreeze(s);
+                    } else {
+                        assert(false && "Results must have at most one partition");
+                    }
+                });
+    });
+}
+
+future<schema_result_value_type>
 read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name)
 {
    auto schema = proxy.local().get_db().local().find_schema(system_keyspace::NAME, schema_table_name);
    auto keyspace_key = dht::global_partitioner().decorate_key(*schema,
        partition_key::from_singular(*schema, keyspace_name));
    return db::system_keyspace::query(proxy, schema_table_name, keyspace_key).then([keyspace_name] (auto&& rs) {
-        return schema_result::value_type{keyspace_name, std::move(rs)};
+        return schema_result_value_type{keyspace_name, std::move(rs)};
    });
 }

-future<schema_result::value_type>
+future<mutation>
 read_schema_partition_for_table(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name, const sstring& table_name)
 {
    auto schema = proxy.local().get_db().local().find_schema(system_keyspace::NAME, schema_table_name);
-    auto keyspace_key = dht::global_partitioner().decorate_key(*schema,
-        partition_key::from_singular(*schema, keyspace_name));
-    auto clustering_range = query::clustering_range(clustering_key_prefix::from_clustering_prefix(*schema, exploded_clustering_prefix({utf8_type->decompose(table_name)})));
-    return db::system_keyspace::query(proxy, schema_table_name, keyspace_key, clustering_range).then([keyspace_name] (auto&& rs) {
-        return schema_result::value_type{keyspace_name, std::move(rs)};
-    });
+    auto keyspace_key = partition_key::from_singular(*schema, keyspace_name);
+    auto clustering_range = query::clustering_range(clustering_key_prefix::from_clustering_prefix(
+            *schema, exploded_clustering_prefix({utf8_type->decompose(table_name)})));
+    auto slice = partition_slice_builder(*schema)
+            .with_range(std::move(clustering_range))
+            .build();
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(), std::move(slice), query::max_rows);
+    return query_partition_mutation(proxy.local(), std::move(schema), std::move(cmd), std::move(keyspace_key));
 }

 static semaphore the_merge_lock;
@@ -452,7 +519,7 @@ future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mu
 }

 future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations, bool do_flush)
-{ 
+{
    return merge_lock().then([&proxy, mutations = std::move(mutations), do_flush] () mutable {
        return do_merge_schema(proxy, std::move(mutations), do_flush);
    }).finally([] {
@@ -460,6 +527,35 @@ future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mu
    });
 }

+// Returns names of live table definitions of given keyspace
+future<std::vector<sstring>>
+static read_table_names_of_keyspace(distributed<service::storage_proxy>& proxy, const sstring& keyspace_name) {
+    auto s = columnfamilies();
+    auto pkey = dht::global_partitioner().decorate_key(*s, partition_key::from_singular(*s, keyspace_name));
+    return db::system_keyspace::query(proxy, COLUMNFAMILIES, pkey).then([] (auto&& rs) {
+        std::vector<sstring> result;
+        for (const query::result_set_row& row : rs->rows()) {
+            result.emplace_back(row.get_nonnull<sstring>("columnfamily_name"));
+        }
+        return result;
+    });
+}
+
+// Call inside a seastar thread
+static
+std::map<qualified_name, schema_mutations>
+read_tables_for_keyspaces(distributed<service::storage_proxy>& proxy, const std::set<sstring>& keyspace_names)
+{
+    std::map<qualified_name, schema_mutations> result;
+    for (auto&& keyspace_name : keyspace_names) {
+        for (auto&& table_name : read_table_names_of_keyspace(proxy, keyspace_name).get0()) {
+            auto qn = qualified_name(keyspace_name, table_name);
+            result.emplace(qn, read_table_mutations(proxy, qn).get0());
+        }
+    }
+    return result;
+}
+
 future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations, bool do_flush)
 {
   return seastar::async([&proxy, mutations = std::move(mutations), do_flush] () mutable {
@@ -474,7 +570,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector

       // current state of the schema
       auto&& old_keyspaces = read_schema_for_keyspaces(proxy, KEYSPACES, keyspaces).get0();
-       auto&& old_column_families = read_schema_for_keyspaces(proxy, COLUMNFAMILIES, keyspaces).get0();
+       auto&& old_column_families = read_tables_for_keyspaces(proxy, keyspaces);
       /*auto& old_types = */read_schema_for_keyspaces(proxy, USERTYPES, keyspaces).get0();
 #if 0 // not in 2.1.8
       /*auto& old_functions = */read_schema_for_keyspaces(proxy, FUNCTIONS, keyspaces).get0();
@@ -494,7 +590,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector

      // with new data applied
       auto&& new_keyspaces = read_schema_for_keyspaces(proxy, KEYSPACES, keyspaces).get0();
-       auto&& new_column_families = read_schema_for_keyspaces(proxy, COLUMNFAMILIES, keyspaces).get0();
+       auto&& new_column_families = read_tables_for_keyspaces(proxy, keyspaces);
       /*auto& new_types = */read_schema_for_keyspaces(proxy, USERTYPES, keyspaces).get0();
 #if 0 // not in 2.1.8
       /*auto& new_functions = */read_schema_for_keyspaces(proxy, FUNCTIONS, keyspaces).get0();
@@ -502,7 +598,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
 #endif

       std::set<sstring> keyspaces_to_drop = merge_keyspaces(proxy, std::move(old_keyspaces), std::move(new_keyspaces)).get0();
-       merge_tables(proxy, std::move(old_column_families), std::move(new_column_families)).get0();
+       merge_tables(proxy, std::move(old_column_families), std::move(new_column_families));
 #if 0
       mergeTypes(oldTypes, newTypes);
       mergeFunctions(oldFunctions, newFunctions);
@@ -512,15 +608,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
           // it is safe to drop a keyspace only when all nested ColumnFamilies where deleted
           for (auto&& keyspace_to_drop : keyspaces_to_drop) {
               db.drop_keyspace(keyspace_to_drop);
-           }
-           // FIXME: clean this up by reorganizing the code
-           // Send CQL events only once, not once per shard.
-           if (engine().cpu_id() == 0) {
-               return do_for_each(keyspaces_to_drop, [] (auto& ks_name) {
-                   return service::migration_manager::notify_drop_keyspace(ks_name);
-               });
-           } else {
-               return make_ready_future<>();
+               service::get_local_migration_manager().notify_drop_keyspace(keyspace_to_drop);
           }
       }).get0();
   });
@@ -528,7 +616,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector

 future<std::set<sstring>> merge_keyspaces(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after)
 {
-    std::vector<schema_result::value_type> created;
+    std::vector<schema_result_value_type> created;
    std::vector<sstring> altered;
    std::set<sstring> dropped;

@@ -551,138 +639,84 @@ future<std::set<sstring>> merge_keyspaces(distributed<service::storage_proxy>& p
    }
    for (auto&& key : diff.entries_only_on_right) {
        auto&& value = after[key];
-        if (!value->empty()) {
-            created.emplace_back(schema_result::value_type{key, std::move(value)});
-        }
+        created.emplace_back(schema_result_value_type{key, std::move(value)});
    }
    for (auto&& key : diff.entries_differing) {
-        sstring keyspace_name = key;
-
-        auto&& pre  = before[key];
-        auto&& post = after[key];
-
-        if (!pre->empty() && !post->empty()) {
-            altered.emplace_back(keyspace_name);
-        } else if (!pre->empty()) {
-            dropped.emplace(keyspace_name);
-        } else if (!post->empty()) { // a (re)created keyspace
-            created.emplace_back(schema_result::value_type{key, std::move(post)});
-        }
+        altered.emplace_back(key);
    }
    return do_with(std::move(created), [&proxy, altered = std::move(altered)] (auto& created) {
        return proxy.local().get_db().invoke_on_all([&created, altered = std::move(altered)] (database& db) {
-            return do_for_each(created, [&db] (auto&& val) {
+            return do_for_each(created, [&db](auto&& val) {
                auto ksm = create_keyspace_from_schema_partition(val);
-                return db.create_keyspace(std::move(ksm));
+                return db.create_keyspace(ksm).then([ksm] {
+                    service::get_local_migration_manager().notify_create_keyspace(ksm);
+                });
            }).then([&altered, &db] () mutable {
                for (auto&& name : altered) {
                    db.update_keyspace(name);
                }
-
-                return make_ready_future<>();
            });
-        }).then([&created] {
-            // FIXME: clean this up by reorganizing the code
-            // Send CQL events only once, not once per shard.
-            if (engine().cpu_id() == 0) {
-                return do_for_each(created, [] (auto&& partition) {
-                    auto ksm = create_keyspace_from_schema_partition(partition);
-                    return service::migration_manager::notify_create_keyspace(ksm);
-                });
-            } else {
-                return make_ready_future<>();
-            }
        });
    }).then([dropped = std::move(dropped)] () {
        return make_ready_future<std::set<sstring>>(dropped);
    });
 }

+static void update_column_family(database& db, schema_ptr new_schema) {
+    column_family& cfm = db.find_column_family(new_schema->id());
+
+    bool columns_changed = !cfm.schema()->equal_columns(*new_schema);
+
+    auto s = local_schema_registry().learn(new_schema);
+    s->registry_entry()->mark_synced();
+    cfm.set_schema(std::move(s));
+
+    service::get_local_migration_manager().notify_update_column_family(cfm.schema(), columns_changed);
+}
+
 // see the comments for merge_keyspaces()
-future<> merge_tables(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after)
+static void merge_tables(distributed<service::storage_proxy>& proxy,
+    std::map<qualified_name, schema_mutations>&& before,
+    std::map<qualified_name, schema_mutations>&& after)
 {
-    return do_with(std::make_pair(std::move(after), std::move(before)), [&proxy] (auto& pair) {
-        auto& after = pair.first;
-        auto& before = pair.second;
-        auto changed_at = db_clock::now();
-        return proxy.local().get_db().invoke_on_all([changed_at, &proxy, &before, &after] (database& db) {
-            return seastar::async([changed_at, &proxy, &db, &before, &after] {
-                std::vector<schema_ptr> created;
-                std::vector<schema_ptr> altered;
-                std::vector<schema_ptr> dropped;
-                auto diff = difference(before, after, [](const auto& x, const auto& y) -> bool {
-                    return *x == *y;
+    auto changed_at = db_clock::now();
+    std::vector<global_schema_ptr> created;
+    std::vector<global_schema_ptr> altered;
+    std::vector<global_schema_ptr> dropped;
+
+    auto diff = difference(before, after);
+    for (auto&& key : diff.entries_only_on_left) {
+        auto&& s = proxy.local().get_db().local().find_schema(key.keyspace_name, key.table_name);
+        dropped.emplace_back(s);
+    }
+    for (auto&& key : diff.entries_only_on_right) {
+        created.emplace_back(create_table_from_mutations(after.at(key)));
+    }
+    for (auto&& key : diff.entries_differing) {
+        altered.emplace_back(create_table_from_mutations(after.at(key)));
+    }
+
+    proxy.local().get_db().invoke_on_all([&created, &dropped, &altered, changed_at] (database& db) {
+        return seastar::async([&] {
+            for (auto&& gs : created) {
+                schema_ptr s = gs.get();
+                auto& ks = db.find_keyspace(s->ks_name());
+                auto cfg = ks.make_column_family_config(*s);
+                db.add_column_family(s, cfg);
+                ks.make_directory_for_column_family(s->cf_name(), s->id()).get();
+                service::get_local_migration_manager().notify_create_column_family(s);
+            }
+            for (auto&& gs : altered) {
+                update_column_family(db, gs.get());
+            }
+            parallel_for_each(dropped.begin(), dropped.end(), [changed_at, &db](auto&& gs) {
+                schema_ptr s = gs.get();
+                return db.drop_column_family(changed_at, s->ks_name(), s->cf_name()).then([s] {
+                    service::get_local_migration_manager().notify_drop_column_family(s);
                });
-                for (auto&& key : diff.entries_only_on_left) {
-                    auto&& rs = before[key];
-                    for (const query::result_set_row& row : rs->rows()) {
-                        auto ks_name = row.get_nonnull<sstring>("keyspace_name");
-                        auto cf_name = row.get_nonnull<sstring>("columnfamily_name");
-                        dropped.emplace_back(db.find_schema(ks_name, cf_name));
-                    }
-                }
-                for (auto&& key : diff.entries_only_on_right) {
-                    auto&& value = after[key];
-                    if (!value->empty()) {
-                        auto&& tables = create_tables_from_tables_partition(proxy, value).get0();
-                        boost::copy(tables | boost::adaptors::map_values, std::back_inserter(created));
-                    }
-                }
-                for (auto&& key : diff.entries_differing) {
-                    sstring keyspace_name = key;
-
-                    auto&& pre  = before[key];
-                    auto&& post = after[key];
-
-                    if (!pre->empty() && !post->empty()) {
-                        auto before = db.find_keyspace(keyspace_name).metadata()->cf_meta_data();
-                        auto after = create_tables_from_tables_partition(proxy, post).get0();
-                        auto delta = difference(std::map<sstring, schema_ptr>{before.begin(), before.end()}, after, [](const schema_ptr& x, const schema_ptr& y) -> bool {
-                            return *x == *y;
-                        });
-                        for (auto&& key : delta.entries_only_on_left) {
-                            dropped.emplace_back(before[key]);
-                        }
-                        for (auto&& key : delta.entries_only_on_right) {
-                            created.emplace_back(after[key]);
-                        }
-                        for (auto&& key : delta.entries_differing) {
-                            altered.emplace_back(after[key]);
-                        }
-                    } else if (!pre->empty()) {
-                        auto before = db.find_keyspace(keyspace_name).metadata()->cf_meta_data();
-                        boost::copy(before | boost::adaptors::map_values, std::back_inserter(dropped));
-                    } else if (!post->empty()) {
-                        auto tables = create_tables_from_tables_partition(proxy, post).get0();
-                        boost::copy(tables | boost::adaptors::map_values, std::back_inserter(created));
-                    }
-                }
-                for (auto&& cfm : created) {
-                    auto& ks = db.find_keyspace(cfm->ks_name());
-                    auto cfg = ks.make_column_family_config(*cfm);
-                    db.add_column_family(cfm, cfg);
-                }
-                parallel_for_each(altered.begin(), altered.end(), [&db] (auto&& cfm) {
-                    return db.update_column_family(cfm->ks_name(), cfm->cf_name());
-                }).get();
-                parallel_for_each(dropped.begin(), dropped.end(), [changed_at, &db] (auto&& cfm) {
-                    return db.drop_column_family(changed_at, cfm->ks_name(), cfm->cf_name());
-                }).get();
-                // FIXME: clean this up by reorganizing the code
-                // Send CQL events only once, not once per shard.
-                if (engine().cpu_id() == 0) {
-                    for (auto&& cfm : created) {
-                        service::migration_manager::notify_create_column_family(cfm).get0();
-                        auto& ks = db.find_keyspace(cfm->ks_name());
-                        ks.make_directory_for_column_family(cfm->cf_name(), cfm->id());
-                    }
-                    for (auto&& cfm : dropped) {
-                        service::migration_manager::notify_drop_column_family(cfm).get0();
-                    }
-                }
-            });
+            }).get();
        });
-    });
+    }).get();
 }

 #if 0
@@ -871,7 +905,7 @@ std::vector<mutation> make_create_keyspace_mutations(lw_shared_ptr<keyspace_meta
            addTypeToSchemaMutation(type, timestamp, mutation);
 #endif
        for (auto&& kv : keyspace->cf_meta_data()) {
-            add_table_to_schema_mutation(kv.second, timestamp, true, pkey, mutations);
+            add_table_to_schema_mutation(kv.second, timestamp, true, mutations);
        }
    }
    return mutations;
@@ -899,7 +933,7 @@ std::vector<mutation> make_drop_keyspace_mutations(lw_shared_ptr<keyspace_metada
 *
 * @param partition Keyspace attributes in serialized form
 */
-lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result::value_type& result)
+lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& result)
 {
    auto&& rs = result.second;
    if (rs->empty()) {
@@ -997,17 +1031,19 @@ std::vector<mutation> make_create_table_mutations(lw_shared_ptr<keyspace_metadat
 {
    // Include the serialized keyspace in case the target node missed a CREATE KEYSPACE migration (see CASSANDRA-5631).
    auto mutations = make_create_keyspace_mutations(keyspace, timestamp, false);
-    schema_ptr s = keyspaces();
-    auto pkey = partition_key::from_singular(*s, keyspace->name());
-    add_table_to_schema_mutation(table, timestamp, true, pkey, mutations);
+    add_table_to_schema_mutation(table, timestamp, true, mutations);
    return mutations;
 }

-void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, const partition_key& pkey, std::vector<mutation>& mutations)
+schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers)
 {
+    // When adding new schema properties, don't set cells for default values so that
+    // both old and new nodes will see the same version during rolling upgrades.
+
    // For property that can be null (and can be changed), we insert tombstones, to make sure
    // we don't keep a property the user has removed
    schema_ptr s = columnfamilies();
+    auto pkey = partition_key::from_singular(*s, table->ks_name());
    mutation m{pkey, s};
    auto ckey = clustering_key::from_singular(*s, table->cf_name());
    m.set_clustered_cell(ckey, "cf_id", table->id(), timestamp);
@@ -1066,16 +1102,24 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
    if (table->compact_columns_count() == 1) {
        m.set_clustered_cell(ckey, "value_alias", table->compact_column().name_as_text(), timestamp);
    } // null if none
-#if 0
-    for (Map.Entry<ColumnIdentifier, Long> entry : table.getDroppedColumns().entrySet())
-        adder.addMapEntry("dropped_columns", entry.getKey().toString(), entry.getValue());
-#endif
+
+    map_type_impl::mutation dropped_columns;
+    auto dropped_columns_column = s->get_column_definition("dropped_columns");
+    assert(dropped_columns_column);
+    auto dropped_columns_type = static_pointer_cast<const map_type_impl>(dropped_columns_column->type);
+    for (auto&& entry : table->dropped_columns()) {
+        dropped_columns.cells.emplace_back(dropped_columns_type->get_keys_type()->decompose(data_value(entry.first)),
+            atomic_cell::make_live(timestamp, dropped_columns_type->get_values_type()->decompose(entry.second)));
+    }
+    m.set_clustered_cell(ckey, *dropped_columns_column,
+        atomic_cell_or_collection::from_collection_mutation(dropped_columns_type->serialize_mutation_form(std::move(dropped_columns))));

    m.set_clustered_cell(ckey, "is_dense", table->is_dense(), timestamp);

+    mutation columns_mutation(pkey, columns());
    if (with_columns_and_triggers) {
        for (auto&& column : table->all_columns_in_select_order()) {
-            add_column_to_schema_mutation(table, column, timestamp, pkey, mutations);
+            add_column_to_schema_mutation(table, column, timestamp, columns_mutation);
        }

 #if 0
@@ -1083,42 +1127,51 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
            addTriggerToSchemaMutation(table, trigger, timestamp, mutation);
 #endif
    }
-    mutations.emplace_back(std::move(m));
+    return schema_mutations{std::move(m), std::move(columns_mutation)};
 }

-#if 0
-    public static Mutation makeUpdateTableMutation(KSMetaData keyspace,
-                                                   CFMetaData oldTable,
-                                                   CFMetaData newTable,
-                                                   long timestamp,
-                                                   boolean fromThrift)
-    {
-        Mutation mutation = makeCreateKeyspaceMutation(keyspace, timestamp, false);
+void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, std::vector<mutation>& mutations)
+{
+    make_table_mutations(table, timestamp, with_columns_and_triggers).copy_to(mutations);
+}

-        addTableToSchemaMutation(newTable, timestamp, false, mutation);
+std::vector<mutation> make_update_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace,
+    schema_ptr old_table,
+    schema_ptr new_table,
+    api::timestamp_type timestamp,
+    bool from_thrift)
+{
+    // Include the serialized keyspace in case the target node missed a CREATE KEYSPACE migration (see CASSANDRA-5631).
+    auto mutations = make_create_keyspace_mutations(keyspace, timestamp, false);

-        MapDifference<ByteBuffer, ColumnDefinition> columnDiff = Maps.difference(oldTable.getColumnMetadata(),
-                                                                                 newTable.getColumnMetadata());
+    add_table_to_schema_mutation(new_table, timestamp, false, mutations);

-        // columns that are no longer needed
-        for (ColumnDefinition column : columnDiff.entriesOnlyOnLeft().values())
-        {
-            // Thrift only knows about the REGULAR ColumnDefinition type, so don't consider other type
-            // are being deleted just because they are not here.
-            if (fromThrift && column.kind != ColumnDefinition.Kind.REGULAR)
-                continue;
+    mutation columns_mutation(partition_key::from_singular(*columns(), old_table->ks_name()), columns());

-            dropColumnFromSchemaMutation(oldTable, column, timestamp, mutation);
+    auto diff = difference(old_table->all_columns(), new_table->all_columns());
+
+    // columns that are no longer needed
+    for (auto&& name : diff.entries_only_on_left) {
+        // Thrift only knows about the REGULAR ColumnDefinition type, so don't consider other type
+        // are being deleted just because they are not here.
+        const column_definition& column = *old_table->all_columns().at(name);
+        if (from_thrift && !column.is_regular()) {
+            continue;
        }

-        // newly added columns
-        for (ColumnDefinition column : columnDiff.entriesOnlyOnRight().values())
-            addColumnToSchemaMutation(newTable, column, timestamp, mutation);
+        drop_column_from_schema_mutation(old_table, column, timestamp, mutations);
+    }

-        // old columns with updated attributes
-        for (ByteBuffer name : columnDiff.entriesDiffering().keySet())
-            addColumnToSchemaMutation(newTable, newTable.getColumnDefinition(name), timestamp, mutation);
+    // newly added columns and old columns with updated attributes
+    for (auto&& name : boost::range::join(diff.entries_differing, diff.entries_only_on_right)) {
+        const column_definition& column = *new_table->all_columns().at(name);
+        add_column_to_schema_mutation(new_table, column, timestamp, columns_mutation);
+    }

+    mutations.emplace_back(std::move(columns_mutation));
+
+    warn(unimplemented::cause::TRIGGERS);
+#if 0
        MapDifference<String, TriggerDefinition> triggerDiff = Maps.difference(oldTable.getTriggers(), newTable.getTriggers());

        // dropped triggers
@@ -1129,9 +1182,9 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
        for (TriggerDefinition trigger : triggerDiff.entriesOnlyOnRight().values())
            addTriggerToSchemaMutation(newTable, trigger, timestamp, mutation);

-        return mutation;
-    }
 #endif
+    return mutations;
+}

 std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp)
 {
@@ -1159,13 +1212,39 @@ std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata>
    return mutations;
 }

+static future<schema_mutations> read_table_mutations(distributed<service::storage_proxy>& proxy, const qualified_name& table)
+{
+    return read_schema_partition_for_table(proxy, COLUMNFAMILIES, table.keyspace_name, table.table_name)
+        .then([&proxy, table] (mutation cf_m) {
+            return read_schema_partition_for_table(proxy, COLUMNS, table.keyspace_name, table.table_name)
+                .then([cf_m = std::move(cf_m)] (mutation col_m) {
+                    return schema_mutations{std::move(cf_m), std::move(col_m)};
+                });
+#if 0
+        // FIXME:
+    Row serializedTriggers = readSchemaPartitionForTable(TRIGGERS, ksName, cfName);
+    try
+    {
+        for (TriggerDefinition trigger : createTriggersFromTriggersPartition(serializedTriggers))
+            cfm.addTriggerDefinition(trigger);
+    }
+    catch (InvalidRequestException e)
+    {
+        throw new RuntimeException(e);
+    }
+#endif
+    });
+}
+
 future<schema_ptr> create_table_from_name(distributed<service::storage_proxy>& proxy, const sstring& keyspace, const sstring& table)
 {
-    return read_schema_partition_for_table(proxy, COLUMNFAMILIES, keyspace, table).then([&proxy, keyspace, table] (auto partition) {
-        if (partition.second->empty()) {
-            throw std::runtime_error(sprint("%s:%s not found in the schema definitions keyspace.", keyspace, table));
-        }
-        return create_table_from_table_partition(proxy, std::move(partition.second));
+    return do_with(qualified_name(keyspace, table), [&proxy] (auto&& qn) {
+        return read_table_mutations(proxy, qn).then([qn] (schema_mutations sm) {
+            if (!sm.live()) {
+               throw std::runtime_error(sprint("%s:%s not found in the schema definitions keyspace.", qn.keyspace_name, qn.table_name));
+            }
+            return create_table_from_mutations(std::move(sm));
+        });
    });
 }

@@ -1194,18 +1273,6 @@ future<std::map<sstring, schema_ptr>> create_tables_from_tables_partition(distri
    }
 #endif

-void create_table_from_table_row_and_columns_partition(schema_builder& builder, const query::result_set_row& table_row, const schema_result::value_type& serialized_columns)
-{
-    create_table_from_table_row_and_column_rows(builder, table_row, serialized_columns.second);
-}
-
-future<schema_ptr> create_table_from_table_partition(distributed<service::storage_proxy>& proxy, lw_shared_ptr<query::result_set>&& partition)
-{
-    return do_with(std::move(partition), [&proxy] (auto& partition) {
-        return create_table_from_table_row(proxy, partition->row(0));
-    });
-}
-
 /**
 * Deserialize table metadata from low-level representation
 *
@@ -1215,31 +1282,18 @@ future<schema_ptr> create_table_from_table_row(distributed<service::storage_prox
 {
    auto ks_name = row.get_nonnull<sstring>("keyspace_name");
    auto cf_name = row.get_nonnull<sstring>("columnfamily_name");
-    auto id = row.get_nonnull<utils::UUID>("cf_id");
-    return read_schema_partition_for_table(proxy, COLUMNS, ks_name, cf_name).then([&row, ks_name, cf_name, id] (auto serialized_columns) {
-        schema_builder builder{ks_name, cf_name, id};
-        create_table_from_table_row_and_columns_partition(builder, row, serialized_columns);
-        return builder.build();
-    });
-#if 0
-    // FIXME:
-    Row serializedTriggers = readSchemaPartitionForTable(TRIGGERS, ksName, cfName);
-    try
-    {
-        for (TriggerDefinition trigger : createTriggersFromTriggersPartition(serializedTriggers))
-            cfm.addTriggerDefinition(trigger);
-    }
-    catch (InvalidRequestException e)
-    {
-        throw new RuntimeException(e);
-    }
-#endif
+    return create_table_from_name(proxy, ks_name, cf_name);
 }

-void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_column_definitions)
+schema_ptr create_table_from_mutations(schema_mutations sm, std::experimental::optional<table_schema_version> version)
 {
+    auto table_rs = query::result_set(sm.columnfamilies_mutation());
+    query::result_set_row table_row = table_rs.row(0);
+
    auto ks_name = table_row.get_nonnull<sstring>("keyspace_name");
    auto cf_name = table_row.get_nonnull<sstring>("columnfamily_name");
+    auto id = table_row.get_nonnull<utils::UUID>("cf_id");
+    schema_builder builder{ks_name, cf_name, id};

 #if 0
    AbstractType<?> rawComparator = TypeParser.parse(result.getString("comparator"));
@@ -1257,11 +1311,12 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
    AbstractType<?> fullRawComparator = CFMetaData.makeRawAbstractType(rawComparator, subComparator);
 #endif

-    std::vector<column_definition> column_defs = create_columns_from_column_rows(serialized_column_definitions,
-                                                                    ks_name,
-                                                                    cf_name,/*,
-                                                                    fullRawComparator, */
-                                                                    cf == cf_type::super);
+    std::vector<column_definition> column_defs = create_columns_from_column_rows(
+            query::result_set(sm.columns_mutation()),
+            ks_name,
+            cf_name,/*,
+            fullRawComparator, */
+            cf == cf_type::super);

    bool is_dense;
    if (table_row.has("is_dense")) {
@@ -1269,7 +1324,7 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
    } else {
        // FIXME:
        // is_dense = CFMetaData.calculateIsDense(fullRawComparator, columnDefs);
-        throw std::runtime_error("not implemented");
+        throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
    }

    bool is_compound = cell_comparator::check_compound(table_row.get_nonnull<sstring>("comparator"));
@@ -1310,10 +1365,10 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
        builder.set_max_compaction_threshold(table_row.get_nonnull<int>("max_compaction_threshold"));
    }

-#if 0
-    if (result.has("comment"))
-        cfm.comment(result.getString("comment"));
-#endif
+    if (table_row.has("comment")) {
+        builder.set_comment(table_row.get_nonnull<sstring>("comment"));
+    }
+
    if (table_row.has("memtable_flush_period_in_ms")) {
        builder.set_memtable_flush_period(table_row.get_nonnull<int32_t>("memtable_flush_period_in_ms"));
    }
@@ -1365,13 +1420,22 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
        builder.set_bloom_filter_fp_chance(builder.get_bloom_filter_fp_chance());
    }

-#if 0
-    if (result.has("dropped_columns"))
-        cfm.droppedColumns(convertDroppedColumns(result.getMap("dropped_columns", UTF8Type.instance, LongType.instance)));
-#endif
+    if (table_row.has("dropped_columns")) {
+        auto map = table_row.get_nonnull<map_type_impl::native_type>("dropped_columns");
+        for (auto&& entry : map) {
+            builder.without_column(value_cast<sstring>(entry.first), value_cast<api::timestamp_type>(entry.second));
+        };
+    }
+
    for (auto&& cdef : column_defs) {
        builder.with_column(cdef);
    }
+    if (version) {
+        builder.with_version(*version);
+    } else {
+        builder.with_version(sm.digest());
+    }
+    return builder.build();
 }

 #if 0
@@ -1391,12 +1455,9 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
 void add_column_to_schema_mutation(schema_ptr table,
                                   const column_definition& column,
                                   api::timestamp_type timestamp,
-                                   const partition_key& pkey,
-                                   std::vector<mutation>& mutations)
+                                   mutation& m)
 {
-    schema_ptr s = columns();
-    mutation m{pkey, s};
-    auto ckey = clustering_key::from_exploded(*s, {utf8_type->decompose(table->cf_name()), column.name()});
+    auto ckey = clustering_key::from_exploded(*m.schema(), {utf8_type->decompose(table->cf_name()), column.name()});
    m.set_clustered_cell(ckey, "validator", column.type->name(), timestamp);
    m.set_clustered_cell(ckey, "type", serialize_kind(column.kind), timestamp);
    if (!column.is_on_all_components()) {
@@ -1407,7 +1468,6 @@ void add_column_to_schema_mutation(schema_ptr table,
    adder.add("index_type", column.getIndexType() == null ? null : column.getIndexType().toString());
    adder.add("index_options", json(column.getIndexOptions()));
 #endif
-    mutations.emplace_back(std::move(m));
 }

 sstring serialize_kind(column_kind kind)
@@ -1448,14 +1508,14 @@ void drop_column_from_schema_mutation(schema_ptr table, const column_definition&
    mutations.emplace_back(m);
 }

-std::vector<column_definition> create_columns_from_column_rows(const schema_result::mapped_type& rows,
+std::vector<column_definition> create_columns_from_column_rows(const query::result_set& rows,
                                                               const sstring& keyspace,
                                                               const sstring& table, /*,
                                                               AbstractType<?> rawComparator, */
                                                               bool is_super)
 {
    std::vector<column_definition> columns;
-    for (auto&& row : rows->rows()) {
+    for (auto&& row : rows.rows()) {
        columns.emplace_back(std::move(create_column_from_column_row(row, keyspace, table, /*, rawComparator, */ is_super)));
    }
    return columns;
--- a/db/schema_tables.hh
+++ b/db/schema_tables.hh
@@ -43,6 +43,8 @@
 #include "service/storage_proxy.hh"
 #include "mutation.hh"
 #include "schema.hh"
+#include "hashing.hh"
+#include "schema_mutations.hh"

 #include <vector>
 #include <map>
@@ -55,6 +57,7 @@ namespace db {
 namespace schema_tables {

 using schema_result = std::map<sstring, lw_shared_ptr<query::result_set>>;
+using schema_result_value_type = std::pair<sstring, lw_shared_ptr<query::result_set>>;

 static constexpr auto KEYSPACES = "schema_keyspaces";
 static constexpr auto COLUMNFAMILIES = "schema_columnfamilies";
@@ -74,7 +77,7 @@ future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>&

 future<std::vector<frozen_mutation>> convert_schema_to_mutations(distributed<service::storage_proxy>& proxy);

-future<schema_result::value_type>
+future<schema_result_value_type>
 read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name);

 future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations);
@@ -89,19 +92,26 @@ std::vector<mutation> make_create_keyspace_mutations(lw_shared_ptr<keyspace_meta

 std::vector<mutation> make_drop_keyspace_mutations(lw_shared_ptr<keyspace_metadata> keyspace, api::timestamp_type timestamp);

-lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result::value_type& partition);
+lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& partition);

-future<> merge_tables(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after);
-
-lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result::value_type& partition);
+lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& partition);

 mutation make_create_keyspace_mutation(lw_shared_ptr<keyspace_metadata> keyspace, api::timestamp_type timestamp, bool with_tables_and_types_and_functions = true);

 std::vector<mutation> make_create_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp);

+std::vector<mutation> make_update_table_mutations(
+    lw_shared_ptr<keyspace_metadata> keyspace,
+    schema_ptr old_table,
+    schema_ptr new_table,
+    api::timestamp_type timestamp,
+    bool from_thrift);
+
+schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers = true);
+
 future<std::map<sstring, schema_ptr>> create_tables_from_tables_partition(distributed<service::storage_proxy>& proxy, const schema_result::mapped_type& result);

-void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, const partition_key& pkey, std::vector<mutation>& mutations);
+void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, std::vector<mutation>& mutations);

 std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp);

@@ -109,13 +119,11 @@ future<schema_ptr> create_table_from_name(distributed<service::storage_proxy>& p

 future<schema_ptr> create_table_from_table_row(distributed<service::storage_proxy>& proxy, const query::result_set_row& row);

-void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_columns);
-
-future<schema_ptr> create_table_from_table_partition(distributed<service::storage_proxy>& proxy, lw_shared_ptr<query::result_set>&& partition);
+schema_ptr create_table_from_mutations(schema_mutations, std::experimental::optional<table_schema_version> version = {});

 void drop_column_from_schema_mutation(schema_ptr table, const column_definition& column, long timestamp, std::vector<mutation>& mutations);

-std::vector<column_definition> create_columns_from_column_rows(const schema_result::mapped_type& rows,
+std::vector<column_definition> create_columns_from_column_rows(const query::result_set& rows,
                                                               const sstring& keyspace,
                                                               const sstring& table,/*,
                                                               AbstractType<?> rawComparator, */
@@ -128,11 +136,25 @@ column_definition create_column_from_column_row(const query::result_set_row& row
                                                bool is_super);


-void add_column_to_schema_mutation(schema_ptr table, const column_definition& column, api::timestamp_type timestamp, const partition_key& pkey, std::vector<mutation>& mutations);
+void add_column_to_schema_mutation(schema_ptr table, const column_definition& column, api::timestamp_type timestamp, mutation& mutation);

 sstring serialize_kind(column_kind kind);
 column_kind deserialize_kind(sstring kind);
 data_type parse_type(sstring str);

+schema_ptr columns();
+schema_ptr columnfamilies();
+
+template<typename Hasher>
+void feed_hash_for_schema_digest(Hasher& h, const mutation& m) {
+    // Cassandra is skipping tombstones from digest calculation
+    // to avoid disagreements due to tombstone GC.
+    // See https://issues.apache.org/jira/browse/CASSANDRA-6862.
+    // We achieve similar effect with compact_for_compaction().
+    mutation m_compacted(m);
+    m_compacted.partition().compact_for_compaction(*m.schema(), api::max_timestamp, gc_clock::time_point::max());
+    feed_hash(h, m_compacted);
+}
+
 } // namespace schema_tables
 } // namespace db
--- a/db/serializer.cc
+++ b/db/serializer.cc
@@ -69,6 +69,11 @@ void db::serializer<bytes>::read(bytes& b, input& in) {
    b = in.read<bytes>();
 }

+template<>
+void db::serializer<bytes>::skip(input& in) {
+    in.read<bytes>(); // FIXME: Avoid reading
+}
+
 template<>
 db::serializer<bytes_view>::serializer(const bytes_view& v)
        : _item(v), _size(output::serialized_size(v)) {
@@ -104,6 +109,11 @@ void db::serializer<sstring>::read(sstring& s, input& in) {
    s = in.read<sstring>();
 }

+template<>
+void db::serializer<sstring>::skip(input& in) {
+    in.read<sstring>(); // FIXME: avoid reading
+}
+
 template<>
 db::serializer<tombstone>::serializer(const tombstone& t)
        : _item(t), _size(sizeof(t.timestamp) + sizeof(decltype(t.deletion_time.time_since_epoch().count()))) {
@@ -157,105 +167,6 @@ void db::serializer<collection_mutation_view>::read(collection_mutation_view& c,
    c = collection_mutation_view::from_bytes(bytes_view_serializer::read(in));
 }

-template<>
-db::serializer<partition_key_view>::serializer(const partition_key_view& key)
-    : _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
-}
-
-template<>
-void db::serializer<partition_key_view>::write(output& out, const partition_key_view& key) {
-    bytes_view v = key.representation();
-    out.write<uint16_t>(v.size());
-    out.write(v.begin(), v.end());
-}
-
-template<>
-void db::serializer<partition_key_view>::read(partition_key_view& b, input& in) {
-    auto len = in.read<uint16_t>();
-    b = partition_key_view::from_bytes(in.read_view(len));
-}
-
-template<>
-partition_key_view db::serializer<partition_key_view>::read(input& in) {
-    auto len = in.read<uint16_t>();
-    return partition_key_view::from_bytes(in.read_view(len));
-}
-
-template<>
-void db::serializer<partition_key_view>::skip(input& in) {
-    auto len = in.read<uint16_t>();
-    in.skip(len);
-}
-
-template<>
-db::serializer<clustering_key_view>::serializer(const clustering_key_view& key)
-    : _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
-}
-
-template<>
-void db::serializer<clustering_key_view>::write(output& out, const clustering_key_view& key) {
-    bytes_view v = key.representation();
-    out.write<uint16_t>(v.size());
-    out.write(v.begin(), v.end());
-}
-
-template<>
-void db::serializer<clustering_key_view>::read(clustering_key_view& b, input& in) {
-    auto len = in.read<uint16_t>();
-    b = clustering_key_view::from_bytes(in.read_view(len));
-}
-
-template<>
-clustering_key_view db::serializer<clustering_key_view>::read(input& in) {
-    auto len = in.read<uint16_t>();
-    return clustering_key_view::from_bytes(in.read_view(len));
-}
-
-template<>
-db::serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view& key)
-    : _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
-}
-
-template<>
-void db::serializer<clustering_key_prefix_view>::write(output& out, const clustering_key_prefix_view& key) {
-    bytes_view v = key.representation();
-    out.write<uint16_t>(v.size());
-    out.write(v.begin(), v.end());
-}
-
-template<>
-void db::serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view& b, input& in) {
-    auto len = in.read<uint16_t>();
-    b = clustering_key_prefix_view::from_bytes(in.read_view(len));
-}
-
-template<>
-clustering_key_prefix_view db::serializer<clustering_key_prefix_view>::read(input& in) {
-    auto len = in.read<uint16_t>();
-    return clustering_key_prefix_view::from_bytes(in.read_view(len));
-}
-
-template<>
-db::serializer<frozen_mutation>::serializer(const frozen_mutation& mutation)
-    : _item(mutation), _size(sizeof(uint32_t) /* size */ + mutation.representation().size()) {
-}
-
-template<>
-void db::serializer<frozen_mutation>::write(output& out, const frozen_mutation& mutation) {
-    bytes_view v = mutation.representation();
-    out.write(v);
-}
-
-template<>
-void db::serializer<frozen_mutation>::read(frozen_mutation& m, input& in) {
-    m = read(in);
-}
-
-template<>
-frozen_mutation db::serializer<frozen_mutation>::read(input& in) {
-    return frozen_mutation(bytes_serializer::read(in));
-}
-
 template<>
 db::serializer<db::replay_position>::serializer(const db::replay_position& rp)
        : _item(rp), _size(sizeof(uint64_t) * 2) {
@@ -280,8 +191,4 @@ template class db::serializer<sstring> ;
 template class db::serializer<atomic_cell_view> ;
 template class db::serializer<collection_mutation_view> ;
 template class db::serializer<utils::UUID> ;
-template class db::serializer<partition_key_view> ;
-template class db::serializer<clustering_key_view> ;
-template class db::serializer<clustering_key_prefix_view> ;
-template class db::serializer<frozen_mutation> ;
 template class db::serializer<db::replay_position> ;
--- a/db/serializer.hh
+++ b/db/serializer.hh
@@ -22,14 +22,13 @@
 #ifndef DB_SERIALIZER_HH_
 #define DB_SERIALIZER_HH_

+#include <experimental/optional>
+
 #include "utils/data_input.hh"
 #include "utils/data_output.hh"
 #include "bytes_ostream.hh"
 #include "bytes.hh"
-#include "mutation.hh"
-#include "keys.hh"
 #include "database_fwd.hh"
-#include "frozen_mutation.hh"
 #include "db/commitlog/replay_position.hh"

 namespace db {
@@ -58,9 +57,9 @@ public:
        return *this;
    }

-    static void write(output&, const T&);
-    static void read(T&, input&);
-    static T read(input&);
+    static void write(output&, const type&);
+    static void read(type&, input&);
+    static type read(input&);
    static void skip(input& in);

    size_t size() const {
@@ -76,11 +75,100 @@ public:
    void write(data_output& out) const {
        write(out, _item);
    }
+
+    bytes to_bytes() const {
+        bytes b(bytes::initialized_later(), _size);
+        data_output out(b);
+        write(out);
+        return b;
+    }
+
+    static type from_bytes(bytes_view v) {
+        data_input in(v);
+        return read(in);
+    }
 private:
-    const T& _item;
+    const type& _item;
    size_t _size;
 };

+template<typename T>
+class serializer<std::experimental::optional<T>> {
+public:
+    typedef std::experimental::optional<T> type;
+    typedef data_output output;
+    typedef data_input input;
+    typedef serializer<T> _MyType;
+
+    serializer(const type& t)
+        : _item(t)
+        , _size(output::serialized_size<bool>() + (t ? serializer<T>(*t).size() : 0))
+    {}
+
+    // apply to memory, must be at least size() large.
+    const _MyType& operator()(output& out) const {
+        write(out, _item);
+        return *this;
+    }
+
+    static void write(output& out, const type& v) {
+        bool en = v;
+        out.write<bool>(en);
+        if (en) {
+            serializer<T>::write(out, *v);
+        }
+    }
+    static void read(type& dst, input& in) {
+        auto en = in.read<bool>();
+        if (en) {
+            dst = serializer<T>::read(in);
+        } else {
+            dst = {};
+        }
+    }
+    static type read(input& in) {
+        type t;
+        read(t, in);
+        return t;
+    }
+    static void skip(input& in) {
+        auto en = in.read<bool>();
+        if (en) {
+            serializer<T>::skip(in);
+        }
+    }
+
+    size_t size() const {
+        return _size;
+    }
+
+    void write(bytes_ostream& out) const {
+        auto buf = out.write_place_holder(_size);
+        data_output data_out((char*)buf, _size);
+        write(data_out, _item);
+    }
+
+    void write(data_output& out) const {
+        write(out, _item);
+    }
+
+    bytes to_bytes() const {
+        bytes b(bytes::initialized_later(), _size);
+        data_output out(b);
+        write(out);
+        return b;
+    }
+
+    static type from_bytes(bytes_view v) {
+        data_input in(v);
+        return read(in);
+    }
+private:
+    const std::experimental::optional<T> _item;
+    size_t _size;
+};
+
+
 template<> serializer<utils::UUID>::serializer(const utils::UUID &);
 template<> void serializer<utils::UUID>::write(output&, const type&);
 template<> void serializer<utils::UUID>::read(utils::UUID&, input&);
@@ -90,6 +178,7 @@ template<> utils::UUID serializer<utils::UUID>::read(input&);
 template<> serializer<bytes>::serializer(const bytes &);
 template<> void serializer<bytes>::write(output&, const type&);
 template<> void serializer<bytes>::read(bytes&, input&);
+template<> void serializer<bytes>::skip(input&);

 template<> serializer<bytes_view>::serializer(const bytes_view&);
 template<> void serializer<bytes_view>::write(output&, const type&);
@@ -99,6 +188,7 @@ template<> bytes_view serializer<bytes_view>::read(input&);
 template<> serializer<sstring>::serializer(const sstring&);
 template<> void serializer<sstring>::write(output&, const type&);
 template<> void serializer<sstring>::read(sstring&, input&);
+template<> void serializer<sstring>::skip(input&);

 template<> serializer<tombstone>::serializer(const tombstone &);
 template<> void serializer<tombstone>::write(output&, const type&);
@@ -113,27 +203,6 @@ template<> serializer<collection_mutation_view>::serializer(const collection_mut
 template<> void serializer<collection_mutation_view>::write(output&, const type&);
 template<> void serializer<collection_mutation_view>::read(collection_mutation_view&, input&);

-template<> serializer<frozen_mutation>::serializer(const frozen_mutation &);
-template<> void serializer<frozen_mutation>::write(output&, const type&);
-template<> void serializer<frozen_mutation>::read(frozen_mutation&, input&);
-template<> frozen_mutation serializer<frozen_mutation>::read(input&);
-
-template<> serializer<partition_key_view>::serializer(const partition_key_view &);
-template<> void serializer<partition_key_view>::write(output&, const partition_key_view&);
-template<> void serializer<partition_key_view>::read(partition_key_view&, input&);
-template<> partition_key_view serializer<partition_key_view>::read(input&);
-template<> void serializer<partition_key_view>::skip(input&);
-
-template<> serializer<clustering_key_view>::serializer(const clustering_key_view &);
-template<> void serializer<clustering_key_view>::write(output&, const clustering_key_view&);
-template<> void serializer<clustering_key_view>::read(clustering_key_view&, input&);
-template<> clustering_key_view serializer<clustering_key_view>::read(input&);
-
-template<> serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view &);
-template<> void serializer<clustering_key_prefix_view>::write(output&, const clustering_key_prefix_view&);
-template<> void serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view&, input&);
-template<> clustering_key_prefix_view serializer<clustering_key_prefix_view>::read(input&);
-
 template<> serializer<db::replay_position>::serializer(const db::replay_position&);
 template<> void serializer<db::replay_position>::write(output&, const db::replay_position&);
 template<> void serializer<db::replay_position>::read(db::replay_position&, input&);
@@ -150,9 +219,6 @@ extern template class serializer<bytes>;
 extern template class serializer<bytes_view>;
 extern template class serializer<sstring>;
 extern template class serializer<utils::UUID>;
-extern template class serializer<partition_key_view>;
-extern template class serializer<clustering_key_view>;
-extern template class serializer<clustering_key_prefix_view>;
 extern template class serializer<db::replay_position>;

 typedef serializer<tombstone> tombstone_serializer;
@@ -162,10 +228,6 @@ typedef serializer<sstring> sstring_serializer;
 typedef serializer<atomic_cell_view> atomic_cell_view_serializer;
 typedef serializer<collection_mutation_view> collection_mutation_view_serializer;
 typedef serializer<utils::UUID> uuid_serializer;
-typedef serializer<partition_key_view> partition_key_view_serializer;
-typedef serializer<clustering_key_view> clustering_key_view_serializer;
-typedef serializer<clustering_key_prefix_view> clustering_key_prefix_view_serializer;
-typedef serializer<frozen_mutation> frozen_mutation_serializer;
 typedef serializer<db::replay_position> replay_position_serializer;

 }
--- a/db/system_keyspace.cc
+++ b/db/system_keyspace.cc
@@ -63,6 +63,8 @@
 #include "partition_slice_builder.hh"
 #include "db/config.hh"
 #include "schema_builder.hh"
+#include "md5_hasher.hh"
+#include "release.hh"
 #include <core/enum.hh>

 using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
@@ -73,6 +75,23 @@ std::unique_ptr<query_context> qctx = {};

 namespace system_keyspace {

+static const api::timestamp_type creation_timestamp = api::new_timestamp();
+
+api::timestamp_type schema_creation_timestamp() {
+    return creation_timestamp;
+}
+
+// Increase whenever changing schema of any system table.
+// FIXME: Make automatic by calculating from schema structure.
+static const uint16_t version_sequence_number = 1;
+
+table_schema_version generate_schema_version(utils::UUID table_id) {
+    md5_hasher h;
+    feed_hash(h, table_id);
+    feed_hash(h, version_sequence_number);
+    return utils::UUID_gen::get_name_UUID(h.finalize());
+}
+
 // Currently, the type variables (uuid_type, etc.) are thread-local reference-
 // counted shared pointers. This forces us to also make the built in schemas
 // below thread-local as well.
@@ -101,6 +120,7 @@ schema_ptr hints() {
       )));
       builder.set_gc_grace_seconds(0);
       builder.set_compaction_strategy_options({{ "enabled", "false" }});
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::yes);
    }();
    return hints;
@@ -126,6 +146,7 @@ schema_ptr batchlog() {
        //    .compactionStrategyOptions(Collections.singletonMap("min_threshold", "2"))
       )));
       builder.set_gc_grace_seconds(0);
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return batchlog;
@@ -150,6 +171,7 @@ schema_ptr batchlog() {
        // operations on resulting CFMetaData:
        //    .compactionStrategyClass(LeveledCompactionStrategy.class);
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return paxos;
@@ -171,6 +193,7 @@ schema_ptr built_indexes() {
        // comment
        "built column indexes"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::yes);
    }();
    return built_indexes;
@@ -212,6 +235,7 @@ schema_ptr built_indexes() {
        // comment
        "information about the local node"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return local;
@@ -242,6 +266,7 @@ schema_ptr built_indexes() {
        // comment
        "information about known peers in the cluster"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return peers;
@@ -265,6 +290,7 @@ schema_ptr built_indexes() {
        // comment
        "events related to peers"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return peer_events;
@@ -286,6 +312,7 @@ schema_ptr built_indexes() {
        // comment
        "ranges requested for transfer"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return range_xfers;
@@ -311,6 +338,7 @@ schema_ptr built_indexes() {
        // comment
        "unfinished compactions"
        )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return compactions_in_progress;
@@ -340,6 +368,7 @@ schema_ptr built_indexes() {
        "week-long compaction history"
        )));
        builder.set_default_time_to_live(std::chrono::duration_cast<std::chrono::seconds>(days(7)));
+        builder.with_version(generate_schema_version(builder.uuid()));
        return builder.build(schema_builder::compact_storage::no);
    }();
    return compaction_history;
@@ -368,6 +397,7 @@ schema_ptr built_indexes() {
        // comment
        "historic sstable read rates"
       )));
+       builder.with_version(generate_schema_version(builder.uuid()));
       return builder.build(schema_builder::compact_storage::no);
    }();
    return sstable_activity;
@@ -393,6 +423,7 @@ schema_ptr size_estimates() {
            "per-table primary range size estimates"
            )));
        builder.set_gc_grace_seconds(0);
+        builder.with_version(generate_schema_version(builder.uuid()));
        return builder.build(schema_builder::compact_storage::no);
    }();
    return size_estimates;
@@ -464,7 +495,8 @@ static future<> build_bootstrap_info() {
        static auto state_map = std::unordered_map<sstring, bootstrap_state>({
            { "NEEDS_BOOTSTRAP", bootstrap_state::NEEDS_BOOTSTRAP },
            { "COMPLETED", bootstrap_state::COMPLETED },
-            { "IN_PROGRESS", bootstrap_state::IN_PROGRESS }
+            { "IN_PROGRESS", bootstrap_state::IN_PROGRESS },
+            { "DECOMMISSIONED", bootstrap_state::DECOMMISSIONED }
        });
        bootstrap_state state = bootstrap_state::NEEDS_BOOTSTRAP;

@@ -485,6 +517,10 @@ future<> init_local_cache() {
    });
 }

+future<> deinit_local_cache() {
+    return _local_cache.stop();
+}
+
 void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp) {
    qctx = std::make_unique<query_context>(db, qp);
 }
@@ -508,7 +544,6 @@ future<> setup(distributed<database>& db, distributed<cql3::query_processor>& qp
            return ms.init_local_preferred_ip_cache();
        });
    });
-    return make_ready_future<>();
 }

 typedef std::pair<replay_positions, db_clock::time_point> truncation_entry;
@@ -796,6 +831,8 @@ future<> remove_endpoint(gms::inet_address ep) {
    }).then([ep] {
        sstring req = "DELETE FROM system.%s WHERE peer = ?";
        return execute_cql(req, PEERS, ep.addr()).discard_result();
+    }).then([] {
+        return force_blocking_flush(PEERS);
    });
 }

@@ -874,6 +911,10 @@ bool bootstrap_in_progress() {
    return get_bootstrap_state() == bootstrap_state::IN_PROGRESS;
 }

+bool was_decommissioned() {
+    return get_bootstrap_state() == bootstrap_state::DECOMMISSIONED;
+}
+
 bootstrap_state get_bootstrap_state() {
    return _local_cache.local()._state;
 }
@@ -882,7 +923,8 @@ future<> set_bootstrap_state(bootstrap_state state) {
    static std::unordered_map<bootstrap_state, sstring, enum_hash<bootstrap_state>> state_to_name({
        { bootstrap_state::NEEDS_BOOTSTRAP, "NEEDS_BOOTSTRAP" },
        { bootstrap_state::COMPLETED, "COMPLETED" },
-        { bootstrap_state::IN_PROGRESS, "IN_PROGRESS" }
+        { bootstrap_state::IN_PROGRESS, "IN_PROGRESS" },
+        { bootstrap_state::DECOMMISSIONED, "DECOMMISSIONED" }
    });

    sstring state_name = state_to_name.at(state);
@@ -973,8 +1015,9 @@ query_mutations(distributed<service::storage_proxy>& proxy, const sstring& cf_na
    database& db = proxy.local().get_db().local();
    schema_ptr schema = db.find_schema(db::system_keyspace::NAME, cf_name);
    auto slice = partition_slice_builder(*schema).build();
-    auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), std::numeric_limits<uint32_t>::max());
-    return proxy.local().query_mutations_locally(cmd, query::full_partition_range);
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(),
+        std::move(slice), std::numeric_limits<uint32_t>::max());
+    return proxy.local().query_mutations_locally(std::move(schema), std::move(cmd), query::full_partition_range);
 }

 future<lw_shared_ptr<query::result_set>>
@@ -982,7 +1025,8 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name) {
    database& db = proxy.local().get_db().local();
    schema_ptr schema = db.find_schema(db::system_keyspace::NAME, cf_name);
    auto slice = partition_slice_builder(*schema).build();
-    auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), std::numeric_limits<uint32_t>::max());
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(),
+        std::move(slice), std::numeric_limits<uint32_t>::max());
    return proxy.local().query(schema, cmd, {query::full_partition_range}, db::consistency_level::ONE).then([schema, cmd] (auto&& result) {
        return make_lw_shared(query::result_set::from_raw_result(schema, cmd->slice, *result));
    });
@@ -996,11 +1040,61 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name, const
    auto slice = partition_slice_builder(*schema)
        .with_range(std::move(row_range))
        .build();
-    auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), query::max_rows);
+    auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(), std::move(slice), query::max_rows);
    return proxy.local().query(schema, cmd, {query::partition_range::make_singular(key)}, db::consistency_level::ONE).then([schema, cmd] (auto&& result) {
        return make_lw_shared(query::result_set::from_raw_result(schema, cmd->slice, *result));
    });
 }

+static map_type_impl::native_type prepare_rows_merged(std::unordered_map<int32_t, int64_t>& rows_merged) {
+    map_type_impl::native_type tmp;
+    for (auto& r: rows_merged) {
+        int32_t first = r.first;
+        int64_t second = r.second;
+        auto map_element = std::make_pair<data_value, data_value>(data_value(first), data_value(second));
+        tmp.push_back(std::move(map_element));
+    }
+    return tmp;
+}
+
+future<> update_compaction_history(sstring ksname, sstring cfname, int64_t compacted_at, int64_t bytes_in, int64_t bytes_out,
+                                   std::unordered_map<int32_t, int64_t> rows_merged)
+{
+    // don't write anything when the history table itself is compacted, since that would in turn cause new compactions
+    if (ksname == "system" && cfname == COMPACTION_HISTORY) {
+        return make_ready_future<>();
+    }
+
+    auto map_type = map_type_impl::get_instance(int32_type, long_type, true);
+
+    sstring req = "INSERT INTO system.%s (id, keyspace_name, columnfamily_name, compacted_at, bytes_in, bytes_out, rows_merged) VALUES (?, ?, ?, ?, ?, ?, ?)";
+
+    return execute_cql(req, COMPACTION_HISTORY, utils::UUID_gen::get_time_UUID(), ksname, cfname, compacted_at, bytes_in, bytes_out,
+                       make_map_value(map_type, prepare_rows_merged(rows_merged))).discard_result();
+}
+
+future<std::vector<compaction_history_entry>> get_compaction_history()
+{
+    sstring req = "SELECT * from system.%s";
+    return execute_cql(req, COMPACTION_HISTORY).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
+        std::vector<compaction_history_entry> history;
+
+        for (auto& row : *msg) {
+            compaction_history_entry entry;
+            entry.id = row.get_as<utils::UUID>("id");
+            entry.ks = row.get_as<sstring>("keyspace_name");
+            entry.cf = row.get_as<sstring>("columnfamily_name");
+            entry.compacted_at = row.get_as<int64_t>("compacted_at");
+            entry.bytes_in = row.get_as<int64_t>("bytes_in");
+            entry.bytes_out = row.get_as<int64_t>("bytes_out");
+            if (row.has("rows_merged")) {
+                entry.rows_merged = row.get_map<int32_t, int64_t>("rows_merged");
+            }
+            history.push_back(std::move(entry));
+        }
+        return std::move(history);
+    });
+}
+
 } // namespace system_keyspace
 } // namespace db
--- a/db/system_keyspace.hh
+++ b/db/system_keyspace.hh
@@ -84,10 +84,13 @@ extern schema_ptr hints();
 extern schema_ptr batchlog();
 extern schema_ptr built_indexes(); // TODO (from Cassandra): make private

+table_schema_version generate_schema_version(utils::UUID table_id);
+
 // Only for testing.
 void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp);

 future<> init_local_cache();
+future<> deinit_local_cache();
 future<> setup(distributed<database>& db, distributed<cql3::query_processor>& qp);
 future<> update_schema_version(utils::UUID version);
 future<> update_tokens(std::unordered_set<dht::token> tokens);
@@ -153,7 +156,8 @@ load_dc_rack_info();
 enum class bootstrap_state {
    NEEDS_BOOTSTRAP,
    COMPLETED,
-    IN_PROGRESS
+    IN_PROGRESS,
+    DECOMMISSIONED
 };

 #if 0
@@ -258,26 +262,28 @@ enum class bootstrap_state {
        compactionLog.truncateBlocking();
    }

-    public static void updateCompactionHistory(String ksname,
-                                               String cfname,
-                                               long compactedAt,
-                                               long bytesIn,
-                                               long bytesOut,
-                                               Map<Integer, Long> rowsMerged)
-    {
-        // don't write anything when the history table itself is compacted, since that would in turn cause new compactions
-        if (ksname.equals("system") && cfname.equals(COMPACTION_HISTORY))
-            return;
-        String req = "INSERT INTO system.%s (id, keyspace_name, columnfamily_name, compacted_at, bytes_in, bytes_out, rows_merged) VALUES (?, ?, ?, ?, ?, ?, ?)";
-        executeInternal(String.format(req, COMPACTION_HISTORY), UUIDGen.getTimeUUID(), ksname, cfname, ByteBufferUtil.bytes(compactedAt), bytesIn, bytesOut, rowsMerged);
-    }
-
    public static TabularData getCompactionHistory() throws OpenDataException
    {
        UntypedResultSet queryResultSet = executeInternal(String.format("SELECT * from system.%s", COMPACTION_HISTORY));
        return CompactionHistoryTabularData.from(queryResultSet);
    }
 #endif
+    struct compaction_history_entry {
+        utils::UUID id;
+        sstring ks;
+        sstring cf;
+        int64_t compacted_at = 0;
+        int64_t bytes_in = 0;
+        int64_t bytes_out = 0;
+        // Key: number of rows merged
+        // Value: counter
+        std::unordered_map<int32_t, int64_t> rows_merged;
+    };
+
+    future<> update_compaction_history(sstring ksname, sstring cfname, int64_t compacted_at, int64_t bytes_in, int64_t bytes_out,
+                                       std::unordered_map<int32_t, int64_t> rows_merged);
+    future<std::vector<compaction_history_entry>> get_compaction_history();
+
    typedef std::vector<db::replay_position> replay_positions;

    future<> save_truncation_record(const column_family&, db_clock::time_point truncated_at, db::replay_position);
@@ -519,6 +525,7 @@ enum class bootstrap_state {
 bool bootstrap_complete();
 bool bootstrap_in_progress();
 bootstrap_state get_bootstrap_state();
+bool was_decommissioned();
 future<> set_bootstrap_state(bootstrap_state state);

 #if 0
@@ -668,5 +675,7 @@ future<> set_bootstrap_state(bootstrap_state state);
        executeInternal(String.format(cql, SSTABLE_ACTIVITY), keyspace, table, generation);
    }
 #endif
+
+    api::timestamp_type schema_creation_timestamp();
 } // namespace system_keyspace
 } // namespace db
--- a/dht/byte_ordered_partitioner.cc
+++ b/dht/byte_ordered_partitioner.cc
@@ -34,12 +34,12 @@ token byte_ordered_partitioner::get_random_token()

 std::map<token, float> byte_ordered_partitioner::describe_ownership(const std::vector<token>& sorted_tokens)
 {
-    throw std::runtime_error("not implemented");
+    throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 }

 token byte_ordered_partitioner::midpoint(const token& t1, const token& t2) const
 {
-    throw std::runtime_error("not implemented");
+    throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
 }

 unsigned
--- a/dht/i_partitioner.hh
+++ b/dht/i_partitioner.hh
@@ -386,12 +386,22 @@ public:
    friend std::ostream& operator<<(std::ostream&, const ring_position&);
 };

+// Trichotomic comparator for ring_position
 struct ring_position_comparator {
    const schema& s;
    ring_position_comparator(const schema& s_) : s(s_) {}
    int operator()(const ring_position& lh, const ring_position& rh) const;
 };

+// "less" comparator for ring_position
+struct ring_position_less_comparator {
+    const schema& s;
+    ring_position_less_comparator(const schema& s_) : s(s_) {}
+    bool operator()(const ring_position& lh, const ring_position& rh) const {
+        return lh.less_compare(s, rh);
+    }
+};
+
 struct token_comparator {
    // Return values are those of a trichotomic comparison.
    int operator()(const token& t1, const token& t2) const;
--- a/dht/murmur3_partitioner.cc
+++ b/dht/murmur3_partitioner.cc
@@ -88,18 +88,8 @@ inline int64_t long_token(const token& t) {
    return net::ntoh(*lp);
 }

-// XXX: Technically, this should be inside long token. However, long_token is
-// used quite a lot in hot paths, so it is better to keep the branches of, if
-// we can. Most our comparators will check for _kind separately,
-// so this should be fine.
 sstring murmur3_partitioner::to_sstring(const token& t) const {
-    int64_t lt;
-    if (t._kind == dht::token::kind::before_all_keys) {
-        lt = std::numeric_limits<long>::min();
-    } else {
-        lt = long_token(t);
-    }
-    return ::to_sstring(lt);
+    return ::to_sstring(long_token(t));
 }

 dht::token murmur3_partitioner::from_sstring(const sstring& t) const {
@@ -122,17 +112,35 @@ int murmur3_partitioner::tri_compare(const token& t1, const token& t2) {
    }
 }

+// Assuming that x>=y, return the positive difference x-y.
+// The return type is an unsigned type, as the difference may overflow
+// a signed type (e.g., consider very positive x and very negative y).
+template <typename T>
+static std::make_unsigned_t<T> positive_subtract(T x, T y) {
+        return std::make_unsigned_t<T>(x) - std::make_unsigned_t<T>(y);
+}
+
 token murmur3_partitioner::midpoint(const token& t1, const token& t2) const {
    auto l1 = long_token(t1);
    auto l2 = long_token(t2);
-    // long_token is defined as signed, but the arithmetic works out the same
-    // without invoking undefined behavior with a signed type.
-    auto delta = (uint64_t(l2) - uint64_t(l1)) / 2;
-    if (l1 > l2) {
-        // wraparound
-        delta += 0x8000'0000'0000'0000;
+    int64_t mid;
+    if (l1 <= l2) {
+        // To find the midpoint, we cannot use the trivial formula (l1+l2)/2
+        // because the addition can overflow the integer. To avoid this
+        // overflow, we first notice that the above formula is equivalent to
+        // l1 + (l2-l1)/2. Now, "l2-l1" can still overflow a signed integer
+        // (e.g., think of a very positive l2 and very negative l1), but
+        // because l1 <= l2 in this branch, we note that l2-l1 is positive
+        // and fits an *unsigned* int's range. So,
+        mid = l1 + positive_subtract(l2, l1)/2;
+    } else {
+        // When l2 < l1, we need to switch l1 and and l2 in the above
+        // formula, because now l1 - l2 is positive.
+        // Additionally, we consider this case is a "wrap around", so we need
+        // to behave as if l2 + 2^64 was meant instead of l2, i.e., add 2^63
+        // to the average.
+        mid = l2 + positive_subtract(l1, l2)/2 + 0x8000'0000'0000'0000;
    }
-    auto mid = uint64_t(l1) + delta;
    return get_token(mid);
 }

--- a/dht/range_streamer.cc
+++ b/dht/range_streamer.cc
@@ -45,6 +45,7 @@
 #include "log.hh"
 #include "streaming/stream_plan.hh"
 #include "streaming/stream_state.hh"
+#include "service/storage_service.hh"

 namespace dht {

@@ -109,7 +110,6 @@ range_streamer::get_all_ranges_with_sources_for(const sstring& keyspace_name, st
    auto& ks = _db.local().find_keyspace(keyspace_name);
    auto& strat = ks.get_replication_strategy();

-    // std::unordered_multimap<range<token>, inet_address>
    auto tm = _metadata.clone_only_token_map();
    auto range_addresses = unordered_multimap_to_unordered_map(strat.get_range_addresses(tm));

@@ -205,9 +205,7 @@ range_streamer::get_all_ranges_with_strict_sources_for(const sstring& keyspace_n
 bool range_streamer::use_strict_sources_for_ranges(const sstring& keyspace_name) {
    auto& ks = _db.local().find_keyspace(keyspace_name);
    auto& strat = ks.get_replication_strategy();
-    // FIXME: DatabaseDescriptor.isReplacing()
-    auto is_replacing = false;
-    return !is_replacing
+    return !_db.local().is_replacing()
           && use_strict_consistency()
           && !_tokens.empty()
           && _metadata.get_all_endpoints().size() != strat.get_replication_factor();
@@ -224,25 +222,17 @@ void range_streamer::add_ranges(const sstring& keyspace_name, std::vector<range<
        }
    }

-    // TODO: share code with unordered_multimap_to_unordered_map
-    std::unordered_map<inet_address, std::vector<range<token>>> tmp;
+    std::unordered_map<inet_address, std::vector<range<token>>> range_fetch_map;
    for (auto& x : get_range_fetch_map(ranges_for_keyspace, _source_filters, keyspace_name)) {
-        auto& addr = x.first;
-        auto& range_ = x.second;
-        auto it = tmp.find(addr);
-        if (it != tmp.end()) {
-            it->second.push_back(range_);
-        } else {
-            tmp.emplace(addr, std::vector<range<token>>{range_});
-        }
+        range_fetch_map[x.first].emplace_back(x.second);
    }

    if (logger.is_enabled(logging::log_level::debug)) {
-        for (auto& x : tmp) {
+        for (auto& x : range_fetch_map) {
            logger.debug("{} : range {} from source {} for keyspace {}", _description, x.second, x.first, keyspace_name);
        }
    }
-    _to_fetch.emplace(keyspace_name, std::move(tmp));
+    _to_fetch.emplace(keyspace_name, std::move(range_fetch_map));
 }

 future<streaming::stream_state> range_streamer::fetch_async() {
@@ -251,12 +241,11 @@ future<streaming::stream_state> range_streamer::fetch_async() {
        for (auto& x : fetch.second) {
            auto& source = x.first;
            auto& ranges = x.second;
-            auto preferred = net::get_local_messaging_service().get_preferred_ip(source);
            /* Send messages to respective folks to stream data over to me */
            if (logger.is_enabled(logging::log_level::debug)) {
                logger.debug("{}ing from {} ranges {}", _description, source, ranges);
            }
-            _stream_plan.request_ranges(source, preferred, keyspace, ranges);
+            _stream_plan.request_ranges(source, keyspace, ranges);
        }
    }

@@ -272,4 +261,8 @@ range_streamer::get_work_map(const std::unordered_multimap<range<token>, inet_ad
    return get_range_fetch_map(ranges_with_source_target, source_filters, keyspace);
 }

+bool range_streamer::use_strict_consistency() {
+    return service::get_local_storage_service().db().local().get_config().consistent_rangemovement();
+}
+
 } // dht
--- a/dht/range_streamer.hh
+++ b/dht/range_streamer.hh
@@ -62,10 +62,7 @@ public:
    using stream_plan = streaming::stream_plan;
    using stream_state = streaming::stream_state;
    using i_failure_detector = gms::i_failure_detector;
-    static bool use_strict_consistency() {
-        //FIXME: Boolean.parseBoolean(System.getProperty("cassandra.consistent.rangemovement","true"));
-        return true;
-    }
+    static bool use_strict_consistency();
 public:
    /**
     * A filter applied to sources to stream from when constructing a fetch map.
--- a/dist/ami/build_ami.sh
+++ b/dist/ami/build_ami.sh
@@ -5,6 +5,23 @@ if [ ! -e dist/ami/build_ami.sh ]; then
    exit 1
 fi

+print_usage() {
+    echo "build_ami.sh -l"
+    echo "  -l  deploy locally built rpms"
+    exit 1
+}
+LOCALRPM=0
+while getopts lh OPT; do
+    case "$OPT" in
+        "l")
+            LOCALRPM=1
+            ;;
+        "h")
+            print_usage
+            ;;
+    esac
+done
+
 cd dist/ami

 if [ ! -f variables.json ]; then
@@ -20,4 +37,12 @@ if [ ! -d packer ]; then
    cd -
 fi

+if [ $LOCALRPM = 0 ]; then
+    echo "sudo sh -x -e /home/centos/scylla_install_pkg; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
+else
+    echo "sudo sh -x -e /home/centos/scylla_install_pkg -l /home/centos; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
+
+fi
+
+chmod a+rx scylla_deploy.sh
 packer/packer build -var-file=variables.json scylla.json
--- a/dist/ami/build_ami_local.sh
+++ b/dist/ami/build_ami_local.sh
@@ -0,0 +1,31 @@
+#!/bin/sh -e
+
+if [ ! -e dist/ami/build_ami_local.sh ]; then
+    echo "run build_ami_local.sh in top of scylla dir"
+    exit 1
+fi
+
+rm -rf build/*
+sudo yum -y install git
+if [ ! -f dist/ami/files/scylla-server.x86_64.rpm ]; then
+    dist/redhat/build_rpm.sh
+    cp build/rpmbuild/RPMS/x86_64/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-server.x86_64.rpm
+fi
+if [ ! -f dist/ami/files/scylla-jmx.noarch.rpm ]; then
+    cd build
+    git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
+    cd scylla-jmx
+    sh -x -e dist/redhat/build_rpm.sh $*
+    cd ../..
+    cp build/scylla-jmx/build/rpmbuild/RPMS/noarch/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-jmx.noarch.rpm
+fi
+if [ ! -f dist/ami/files/scylla-tools.noarch.rpm ]; then
+    cd build
+    git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
+    cd scylla-tools-java
+    sh -x -e dist/redhat/build_rpm.sh
+    cd ../..
+    cp build/scylla-tools-java/build/rpmbuild/RPMS/noarch/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools.noarch.rpm
+fi
+
+exec dist/ami/build_ami.sh -l
--- a/dist/ami/files/.bash_profile
+++ b/dist/ami/files/.bash_profile
@@ -0,0 +1,47 @@
+# .bash_profile
+
+# Get the aliases and functions
+if [ -f ~/.bashrc ]; then
+	. ~/.bashrc
+fi
+
+# User specific environment and startup programs
+
+PATH=$PATH:$HOME/.local/bin:$HOME/bin
+
+export PATH
+
+echo
+echo '   _____            _ _       _____  ____  '
+echo '  / ____|          | | |     |  __ \|  _ \ '
+echo ' | (___   ___ _   _| | | __ _| |  | | |_) |'
+echo '  \___ \ / __| | | | | |/ _` | |  | |  _ < '
+echo '  ____) | (__| |_| | | | (_| | |__| | |_) |'
+echo ' |_____/ \___|\__, |_|_|\__,_|_____/|____/ '
+echo '               __/ |                       '
+echo '              |___/                        '
+echo ''
+echo ''
+echo 'Nodetool:'
+echo '	nodetool help'
+echo 'CQL Shell:'
+echo '	cqlsh'
+echo 'More documentation available at: '
+echo '	http://www.scylladb.com/doc/'
+echo
+
+if [ "`systemctl is-active scylla-server`" = "active" ]; then
+	tput setaf 4
+	tput bold
+	echo "    ScyllaDB is active."
+	tput sgr0
+	echo
+else
+	tput setaf 1
+	tput bold
+	echo "    ScyllaDB is not started!"
+	tput sgr0
+	echo "Please wait for startup. To see status of ScyllaDB, run "
+	echo " 'systemctl status scylla-server'"
+	echo
+fi
--- a/dist/ami/files/coredump.conf
+++ b/dist/ami/files/coredump.conf
@@ -1,5 +0,0 @@
-[Coredump]
-Storage=external
-Compress=yes
-ProcessSizeMax=16G
-ExternalSizeMax=16G
--- a/Show More
+++ b/Show More