storage_proxy: do not touch all_replicas.front() if it's empty.

The list of all endpoints for a query can be empty if we have replication_factor 0 or there are no live endpoints for this token. Do not access all_replicas.front() in this case. Fixes #5935. Message-Id: <20200306192521.73486-2-kostja@scylladb.com> (cherry picked from commit 9827efe554)
cql transport: do not log broken pipe error when a client closes its side of a connection abruptly
2020-06-22 18:29:15 +03:00 · 2020-06-21 13:09:22 +03:00 · 2020-06-21 13:07:21 +03:00 · 2020-06-21 13:03:05 +03:00 · 2020-06-21 12:57:48 +03:00 · 2020-06-21 12:47:05 +03:00
2786 changed files with 47730 additions and 11849 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -19,3 +19,8 @@ CMakeLists.txt.user
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
+.pytest_cache
+/expressions.tokens
+tags
+testlog/*
+test/*/*.reject
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "seastar"]
 	path = seastar
-	url = ../seastar
+	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -97,7 +97,7 @@ scan_scylla_source_directories(
          service
          sstables
          streaming
-          tests
+          test
          thrift
          tracing
          transport
--- a/HACKING.md
+++ b/HACKING.md
@@ -56,7 +56,7 @@ $ ./configure.py --help

 The most important option is:

- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
+- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.

 Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.

--- a/31
+++ b/31
@@ -5,8 +5,6 @@ F: Filename, directory, or pattern for the subsystem
 ---

 AUTH
-M: Paweł Dziepak <pdziepak@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 R: Vlad Zolotarov <vladz@scylladb.com>
 R: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
@@ -14,22 +12,17 @@ F: auth/*

 CACHE
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
-M: Paweł Dziepak <pdziepak@scylladb.com>
 R: Piotr Jastrzebski <piotr@scylladb.com>
 F: row_cache*
 F: *mutation*
 F: tests/mvcc*

 COMMITLOG / BATCHLOGa
-M: Paweł Dziepak <pdziepak@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 R: Calle Wilund <calle@scylladb.com>
 F: db/commitlog/*
 F: db/batch*

 COORDINATOR
-M: Paweł Dziepak <pdziepak@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 R: Gleb Natapov <gleb@scylladb.com>
 F: service/storage_proxy*

@@ -49,12 +42,10 @@ M: Pekka Enberg <penberg@scylladb.com>
 F: cql3/*

 COUNTERS
-M: Paweł Dziepak <pdziepak@scylladb.com>
 F: counters*
 F: tests/counter_test*

 GOSSIP
-M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: gms/*
@@ -65,14 +56,11 @@ F: dist/docker/*

 LSA
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
-M: Paweł Dziepak <pdziepak@scylladb.com>
 F: utils/logalloc*

 MATERIALIZED VIEWS
-M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
-R: Nadav Har'El <nyh@scylladb.com>
-R: Duarte Nunes <duarte@scylladb.com>
+M: Nadav Har'El <nyh@scylladb.com>
 F: db/view/*
 F: cql3/statements/*view*

@@ -82,14 +70,12 @@ F: dist/*

 REPAIR
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
 F: repair/*

 SCHEMA MANAGEMENT
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 M: Pekka Enberg <penberg@scylladb.com>
 F: db/schema_tables*
 F: db/legacy_schema_migrator*
@@ -98,15 +84,13 @@ F: schema*

 SECONDARY INDEXES
 M: Pekka Enberg <penberg@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
-R: Nadav Har'El <nyh@scylladb.com>
+M: Nadav Har'El <nyh@scylladb.com>
 R: Pekka Enberg <penberg@scylladb.com>
 F: db/index/*
 F: cql3/statements/*index*

 SSTABLES
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 R: Raphael S. Carvalho <raphaelsc@scylladb.com>
 R: Glauber Costa <glauber@scylladb.com>
 R: Nadav Har'El <nyh@scylladb.com>
@@ -114,18 +98,17 @@ F: sstables/*

 STREAMING
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 R: Asias He <asias@scylladb.com>
 F: streaming/*
 F: service/storage_service.*

-THRIFT TRANSPORT LAYER
-M: Duarte Nunes <duarte@scylladb.com>
-F: thrift/*
+ALTERNATOR
+M: Nadav Har'El <nyh@scylladb.com>
+F: alternator/*
+F: alternator-test/*

 THE REST
 M: Avi Kivity <avi@scylladb.com>
-M: Paweł Dziepak <pdziepak@scylladb.com>
-M: Duarte Nunes <duarte@scylladb.com>
 M: Tomasz Grabiec <tgrabiec@scylladb.com>
+M: Nadav Har'El <nyh@scylladb.com>
 F: *
--- a/README-DPDK.md
+++ b/README-DPDK.md
@@ -1,29 +0,0 @@
-Seastar and DPDK
-================
-
-Seastar uses the Data Plane Development Kit to drive NIC hardware directly.  This
-provides an enormous performance boost.
-
-To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
-run-time parameter.  This will use the DPDK package provided as a git submodule with the
-seastar sources.
-
-To use your own self-compiled DPDK package, follow this procedure:
-
-1. Setup host to compile DPDK:
-   - Ubuntu 
-     `sudo apt-get install -y build-essential linux-image-extra-$(uname -r)` 
-2. Prepare a DPDK SDK:
-   - Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
-   - Untar it.
-   - Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
-   - For DPDK 1.7.x: edit config/common_linuxapp: 
-     - Set CONFIG_RTE_LIBRTE_PMD_BOND  to 'n'.
-     - Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
-     - Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
-   - Start the tools/setup.sh script as root.
-   - Compile a linuxapp target (option 9).
-   - Install IGB_UIO module (option 11).
-   - Bind some physical port to IGB_UIO (option 17).
-   - Configure hugepage mappings (option 14/15).
-3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.
--- a/README.md
+++ b/README.md
@@ -27,10 +27,10 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev

 ```

-* run Scylla with one CPU and ./tmp as data directory
+* run Scylla with one CPU and ./tmp as work directory

 ```
-./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
+./build/release/scylla --workdir tmp --smp 1
 ```

 * For more run options:
@@ -38,6 +38,24 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
 ./build/release/scylla --help
 ```

+## Scylla APIs and compatibility
+By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
+Thrift. There is also experimental support for the API of Amazon DynamoDB,
+but being experimental it needs to be explicitly enabled to be used. For more
+information on how to enable the experimental DynamoDB compatibility in Scylla,
+and the current limitations of this feature, see
+[Alternator](docs/alternator/alternator.md) and
+[Getting started with Alternator](docs/alternator/getting-started.md).
+
+## Documentation
+
+Documentation can be found in [./docs](./docs) and on the
+[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
+definition of what goes where, so when looking for something be sure to check
+both.
+Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
+User documentation can be found [here](https://docs.scylladb.com/).
+
 ## Building Fedora RPM

 As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:
--- a/2
+++ b/2
@@ -1,7 +1,7 @@
 #!/bin/sh

 PRODUCT=scylla
-VERSION=666.development
+VERSION=3.3.4

 if test -f version
 then
--- a/alternator-test/README.md
+++ b/alternator-test/README.md
@@ -0,0 +1,78 @@
+Tests for Alternator that should also pass, identically, against DynamoDB.
+
+Tests use the boto3 library for AWS API, and the pytest frameworks
+(both are available from Linux distributions, or with "pip install").
+
+To run all tests against the local installation of Alternator on
+http://localhost:8000, just run `pytest`.
+
+Some additional pytest options:
+* To run all tests in a single file, do `pytest test_table.py`.
+* To run a single specific test, do `pytest test_table.py::test_create_table_unsupported_names`.
+* Additional useful pytest options, especially useful for debugging tests:
+  * -v: show the names of each individual test running instead of just dots.
+  * -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)
+
+Add the `--aws` option to test against AWS instead of the local installation.
+For example - `pytest --aws test_item.py` or `pytest --aws`.
+
+If you plan to run tests against AWS and not just a local Scylla installation,
+the files ~/.aws/credentials should be configured with your AWS key:
+
+```
+[default]
+aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
+aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+```
+
+and ~/.aws/config with the default region to use in the test:
+```
+[default]
+region = us-east-1
+```
+
+## HTTPS support
+
+In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided
+with alternator\_https\_port configuration option in order to initialize a HTTPS server.
+Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate
+a key and a self-signed certificate, which is sufficient to run `--https` tests:
+
+```
+openssl genrsa 2048 > scylla.key
+openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt
+```
+
+If this pair is put into `conf/` directory, it will be enough
+to allow the alternator HTTPS server to think it's been authorized and properly certified.
+Still, boto3 library issues warnings that the certificate used for communication is self-signed,
+and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.
+
+
+## Authorization
+
+By default, boto3 prepares a properly signed Authorization header with every request.
+In order to confirm the authorization, the server recomputes the signature by using
+user credentials (user-provided username + a secret key known by the server),
+and then checks if it matches the signature from the header.
+Early alternator code did not verify signatures at all, which is also allowed by the protocol.
+A partial implementation of the authorization verification can be allowed by providing a Scylla
+configuration parameter:
+```yaml
+  alternator_enforce_authorization: true
+```
+The implementation is currently coupled with Scylla's system\_auth.roles table,
+which means that an additional step needs to be performed when setting up Scylla
+as the test environment. Tests will use the following credentials:
+Username: `alternator`
+Secret key: `secret_pass`
+
+With CQLSH, it can be achieved by executing this snipped:
+
+```bash
+cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"
+```
+
+Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`
+turned off. However, test cases from `test_authorization.py` may require this option to be turned on,
+so it's advised.
--- a/alternator-test/conftest.py
+++ b/alternator-test/conftest.py
@@ -0,0 +1,179 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file contains "test fixtures", a pytest concept described in
+# https://docs.pytest.org/en/latest/fixture.html.
+# A "fixture" is some sort of setup which an invididual test requires to run.
+# The fixture has setup code and teardown code, and if multiple tests
+# require the same fixture, it can be set up only once - while still allowing
+# the user to run individual tests and automatically set up the fixtures they need.
+
+import pytest
+import boto3
+from util import create_test_table
+
+# Test that the Boto libraries are new enough. These tests want to test a
+# large variety of DynamoDB API features, and to do this we need a new-enough
+# version of the the Boto libraries (boto3 and botocore) so that they can
+# access all these API features.
+# In particular, the BillingMode feature was added in botocore 1.12.54.
+import botocore
+import sys
+from distutils.version import LooseVersion
+if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):
+    pytest.exit("Your Boto library is too old. Please upgrade it,\ne.g. using:\n    sudo pip{} install --upgrade boto3".format(sys.version_info[0]))
+
+# By default, tests run against a local Scylla installation on localhost:8080/.
+# The "--aws" option can be used to run against Amazon DynamoDB in the us-east-1
+# region.
+def pytest_addoption(parser):
+    parser.addoption("--aws", action="store_true",
+        help="run against AWS instead of a local Scylla installation")
+    parser.addoption("--https", action="store_true",
+        help="communicate via HTTPS protocol on port 8043 instead of HTTP when"
+            " running against a local Scylla installation")
+
+# "dynamodb" fixture: set up client object for communicating with the DynamoDB
+# API. Currently this chooses either Amazon's DynamoDB in the default region
+# or a local Alternator installation on http://localhost:8080 - depending on the
+# existence of the "--aws" option. In the future we should provide options
+# for choosing other Amazon regions or local installations.
+# We use scope="session" so that all tests will reuse the same client object.
+@pytest.fixture(scope="session")
+def dynamodb(request):
+    if request.config.getoption('aws'):
+        return boto3.resource('dynamodb')
+    else:
+        # Even though we connect to the local installation, Boto3 still
+        # requires us to specify dummy region and credential parameters,
+        # otherwise the user is forced to properly configure ~/.aws even
+        # for local runs.
+        local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'
+        # Disable verifying in order to be able to use self-signed TLS certificates
+        verify = not request.config.getoption('https')
+        # Silencing the 'Unverified HTTPS request warning'
+        if request.config.getoption('https'):
+            import urllib3
+            urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
+        return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,
+            region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')
+
+# "test_table" fixture: Create and return a temporary table to be used in tests
+# that need a table to work on. The table is automatically deleted at the end.
+# We use scope="session" so that all tests will reuse the same client object.
+# This "test_table" creates a table which has a specific key schema: both a
+# partition key and a sort key, and both are strings. Other fixtures (below)
+# can be used to create different types of tables.
+#
+# TODO: Although we are careful about deleting temporary tables when the
+# fixture is torn down, in some cases (e.g., interrupted tests) we can be left
+# with some tables not deleted, and they will never be deleted. Because all
+# our temporary tables have the same test_table_prefix, we can actually find
+# and remove these old tables with this prefix. We can have a fixture, which
+# test_table will require, which on teardown will delete all remaining tables
+# (possibly from an older run). Because the table's name includes the current
+# time, we can also remove just tables older than a particular age. Such
+# mechanism will allow running tests in parallel, without the risk of deleting
+# a parallel run's temporary tables.
+@pytest.fixture(scope="session")
+def test_table(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+        ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+        ])
+    yield table
+    # We get back here when this fixture is torn down. We ask Dynamo to delete
+    # this table, but not wait for the deletion to complete. The next time
+    # we create a test_table fixture, we'll choose a different table name
+    # anyway.
+    table.delete()
+
+# The following fixtures test_table_* are similar to test_table but create
+# tables with different key schemas.
+@pytest.fixture(scope="session")
+def test_table_s(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
+        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
+    yield table
+    table.delete()
+@pytest.fixture(scope="session")
+def test_table_b(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
+        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'B' } ])
+    yield table
+    table.delete()
+@pytest.fixture(scope="session")
+def test_table_sb(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'B' } ])
+    yield table
+    table.delete()
+@pytest.fixture(scope="session")
+def test_table_sn(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'N' } ])
+    yield table
+    table.delete()
+
+# "filled_test_table" fixture:  Create a temporary table to be used in tests
+# that involve reading data - GetItem, Scan, etc. The table is filled with
+# 328 items - each consisting of a partition key, clustering key and two
+# string attributes. 164 of the items are in a single partition (with the
+# partition key 'long') and the 164 other items are each in a separate
+# partition. Finally, a 329th item is added with different attributes.
+# This table is supposed to be read from, not updated nor overwritten.
+# This fixture returns both a table object and the description of all items
+# inserted into it.
+@pytest.fixture(scope="session")
+def filled_test_table(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+        ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+        ])
+    count = 164
+    items = [{
+        'p': str(i),
+        'c': str(i),
+        'attribute': "x" * 7,
+        'another': "y" * 16
+    } for i in range(count)]
+    items = items + [{
+        'p': 'long',
+        'c': str(i),
+        'attribute': "x" * (1 + i % 7),
+        'another': "y" * (1 + i % 16)
+    } for i in range(count)]
+    items.append({'p': 'hello', 'c': 'world', 'str': 'and now for something completely different'})
+
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+
+    yield table, items
+    table.delete()
--- a/alternator-test/test_authorization.py
+++ b/alternator-test/test_authorization.py
@@ -0,0 +1,74 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for authorization
+
+import pytest
+import botocore
+from botocore.exceptions import ClientError
+import boto3
+import requests
+
+# Test that trying to perform an operation signed with a wrong key
+# will not succeed
+def test_wrong_key_access(request, dynamodb):
+    print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
+    url = dynamodb.meta.client._endpoint.host
+    with pytest.raises(ClientError, match='UnrecognizedClientException'):
+        if url.endswith('.amazonaws.com'):
+            boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()
+        else:
+            verify = not url.startswith('https')
+            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()
+
+# A similar test, but this time the user is expected to exist in the database (for local tests)
+def test_wrong_password(request, dynamodb):
+    print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
+    url = dynamodb.meta.client._endpoint.host
+    with pytest.raises(ClientError, match='UnrecognizedClientException'):
+        if url.endswith('.amazonaws.com'):
+            boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()
+        else:
+            verify = not url.startswith('https')
+            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()
+
+# A test ensuring that expired signatures are not accepted
+def test_expired_signature(dynamodb, test_table):
+    url = dynamodb.meta.client._endpoint.host
+    print(url)
+    headers = {'Content-Type': 'application/x-amz-json-1.0',
+               'X-Amz-Date': '20170101T010101Z',
+               'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
+               'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
+    }
+    response = requests.post(url, headers=headers, verify=False)
+    assert not response.ok
+    assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
+
+# A test ensuring that signatures that exceed current time too much are not accepted.
+# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.
+def test_signature_too_futuristic(dynamodb, test_table):
+    url = dynamodb.meta.client._endpoint.host
+    print(url)
+    headers = {'Content-Type': 'application/x-amz-json-1.0',
+               'X-Amz-Date': '30200101T010101Z',
+               'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
+               'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
+    }
+    response = requests.post(url, headers=headers, verify=False)
+    assert not response.ok
+    assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text
--- a/alternator-test/test_batch.py
+++ b/alternator-test/test_batch.py
@@ -0,0 +1,253 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for batch operations - BatchWriteItem, BatchReadItem.
+# Note that various other tests in other files also use these operations,
+# so they are actually tested by other tests as well.
+
+import pytest
+from botocore.exceptions import ClientError
+from util import random_string, full_scan, full_query, multiset
+
+# Test ensuring that items inserted by a batched statement can be properly extracted
+# via GetItem. Schema has both hash and sort keys.
+def test_basic_batch_write_item(test_table):
+    count = 7
+
+    with test_table.batch_writer() as batch:
+        for i in range(count):
+            batch.put_item(Item={
+                'p': "batch{}".format(i),
+                'c': "batch_ck{}".format(i),
+                'attribute': str(i),
+                'another': 'xyz'
+            })
+
+    for i in range(count):
+        item = test_table.get_item(Key={'p': "batch{}".format(i), 'c': "batch_ck{}".format(i)}, ConsistentRead=True)['Item']
+        assert item['p'] == "batch{}".format(i)
+        assert item['c'] == "batch_ck{}".format(i)
+        assert item['attribute'] == str(i)
+        assert item['another'] == 'xyz' 
+
+# Test batch write to a table with only a hash key
+def test_batch_write_hash_only(test_table_s):
+    items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
+    with test_table_s.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    for item in items:
+        assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
+
+# Test batch delete operation (DeleteRequest): We create a bunch of items, and
+# then delete them all.
+def test_batch_write_delete(test_table_s):
+    items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
+    with test_table_s.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    for item in items:
+        assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
+    with test_table_s.batch_writer() as batch:
+        for item in items:
+            batch.delete_item(Key={'p': item['p']})
+    # Verify that all items are now missing:
+    for item in items:
+        assert not 'Item' in test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)
+
+# Test the same batch including both writes and delete. Should be fine.
+def test_batch_write_and_delete(test_table_s):
+    p1 = random_string()
+    p2 = random_string()
+    test_table_s.put_item(Item={'p': p1})
+    assert 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
+    assert not 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
+    with test_table_s.batch_writer() as batch:
+        batch.put_item({'p': p2})
+        batch.delete_item(Key={'p': p1})
+    assert not 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
+    assert 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
+
+# It is forbidden to update the same key twice in the same batch.
+# DynamoDB says "Provided list of item keys contains duplicates".
+def test_batch_write_duplicate_write(test_table_s, test_table):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*duplicates'):
+        with test_table_s.batch_writer() as batch:
+            batch.put_item({'p': p})
+            batch.put_item({'p': p})
+    c = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*duplicates'):
+        with test_table.batch_writer() as batch:
+            batch.put_item({'p': p, 'c': c})
+            batch.put_item({'p': p, 'c': c})
+    # But it is fine to touch items with one component the same, but the other not.
+    other = random_string()
+    with test_table.batch_writer() as batch:
+        batch.put_item({'p': p, 'c': c})
+        batch.put_item({'p': p, 'c': other})
+        batch.put_item({'p': other, 'c': c})
+
+def test_batch_write_duplicate_delete(test_table_s, test_table):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*duplicates'):
+        with test_table_s.batch_writer() as batch:
+            batch.delete_item(Key={'p': p})
+            batch.delete_item(Key={'p': p})
+    c = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*duplicates'):
+        with test_table.batch_writer() as batch:
+            batch.delete_item(Key={'p': p, 'c': c})
+            batch.delete_item(Key={'p': p, 'c': c})
+    # But it is fine to touch items with one component the same, but the other not.
+    other = random_string()
+    with test_table.batch_writer() as batch:
+        batch.delete_item(Key={'p': p, 'c': c})
+        batch.delete_item(Key={'p': p, 'c': other})
+        batch.delete_item(Key={'p': other, 'c': c})
+
+def test_batch_write_duplicate_write_and_delete(test_table_s, test_table):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*duplicates'):
+        with test_table_s.batch_writer() as batch:
+            batch.delete_item(Key={'p': p})
+            batch.put_item({'p': p})
+    c = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*duplicates'):
+        with test_table.batch_writer() as batch:
+            batch.delete_item(Key={'p': p, 'c': c})
+            batch.put_item({'p': p, 'c': c})
+    # But it is fine to touch items with one component the same, but the other not.
+    other = random_string()
+    with test_table.batch_writer() as batch:
+        batch.delete_item(Key={'p': p, 'c': c})
+        batch.put_item({'p': p, 'c': other})
+        batch.put_item({'p': other, 'c': c})
+
+# Test that BatchWriteItem's PutRequest completely replaces an existing item.
+# It shouldn't merge it with a previously existing value. See also the same
+# test for PutItem - test_put_item_replace().
+def test_batch_put_item_replace(test_table_s, test_table):
+    p = random_string()
+    with test_table_s.batch_writer() as batch:
+        batch.put_item(Item={'p': p, 'a': 'hi'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
+    with test_table_s.batch_writer() as batch:
+        batch.put_item(Item={'p': p, 'b': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
+    c = random_string()
+    with test_table.batch_writer() as batch:
+        batch.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
+    with test_table.batch_writer() as batch:
+        batch.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}
+
+# Test that if one of the batch's operations is invalid, because a key
+# column is missing or has the wrong type, the entire batch is rejected
+# before any write is done.
+def test_batch_write_invalid_operation(test_table_s):
+    # test key attribute with wrong type:
+    p1 = random_string()
+    p2 = random_string()
+    items = [{'p': p1}, {'p': 3}, {'p': p2}]
+    with pytest.raises(ClientError, match='ValidationException'):
+        with test_table_s.batch_writer() as batch:
+            for item in items:
+                batch.put_item(item)
+    for p in [p1, p2]:
+        assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
+    # test missing key attribute:
+    p1 = random_string()
+    p2 = random_string()
+    items = [{'p': p1}, {'x': 'whatever'}, {'p': p2}]
+    with pytest.raises(ClientError, match='ValidationException'):
+        with test_table_s.batch_writer() as batch:
+            for item in items:
+                batch.put_item(item)
+    for p in [p1, p2]:
+        assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
+
+# Basic test for BatchGetItem, reading several entire items.
+# Schema has both hash and sort keys.
+def test_batch_get_item(test_table):
+    items = [{'p': random_string(), 'c': random_string(), 'val': random_string()} for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    keys = [{k: x[k] for k in ('p', 'c')} for x in items]
+    # We use the low-level batch_get_item API for lack of a more convenient
+    # API. At least it spares us the need to encode the key's types...
+    reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ConsistentRead': True}})
+    print(reply)
+    got_items = reply['Responses'][test_table.name]
+    assert multiset(got_items) == multiset(items)
+
+# Same, with schema has just hash key.
+def test_batch_get_item_hash(test_table_s):
+    items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
+    with test_table_s.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    keys = [{k: x[k] for k in ('p')} for x in items]
+    reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': keys, 'ConsistentRead': True}})
+    got_items = reply['Responses'][test_table_s.name]
+    assert multiset(got_items) == multiset(items)
+
+# Test what do we get if we try to read two *missing* values in addition to
+# an existing one. It turns out the missing items are simply not returned,
+# with no sign they are missing.
+def test_batch_get_item_missing(test_table_s):
+    p = random_string();
+    test_table_s.put_item(Item={'p': p})
+    reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}, {'p': random_string()}, {'p': p}], 'ConsistentRead': True}})
+    got_items = reply['Responses'][test_table_s.name]
+    assert got_items == [{'p' : p}]
+
+# If all the keys requested from a particular table are missing, we still
+# get a response array for that table - it's just empty.
+def test_batch_get_item_completely_missing(test_table_s):
+    reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}], 'ConsistentRead': True}})
+    got_items = reply['Responses'][test_table_s.name]
+    assert got_items == []
+
+# Test GetItem with AttributesToGet
+def test_batch_get_item_attributes_to_get(test_table):
+    items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    keys = [{k: x[k] for k in ('p', 'c')} for x in items]
+    for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
+        reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'AttributesToGet': wanted, 'ConsistentRead': True}})
+        got_items = reply['Responses'][test_table.name]
+        expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
+        assert multiset(got_items) == multiset(expected_items)
+
+# Test GetItem with ProjectionExpression (just a simple one, with
+# top-level attributes)
+def test_batch_get_item_projection_expression(test_table):
+    items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    keys = [{k: x[k] for k in ('p', 'c')} for x in items]
+    for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
+        reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ProjectionExpression': ",".join(wanted), 'ConsistentRead': True}})
+        got_items = reply['Responses'][test_table.name]
+        expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
+        assert multiset(got_items) == multiset(expected_items)
--- a/alternator-test/test_condition_expression.py
+++ b/alternator-test/test_condition_expression.py
--- a/alternator-test/test_describe_endpoints.py
+++ b/alternator-test/test_describe_endpoints.py
@@ -0,0 +1,49 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test for the DescribeEndpoints operation
+
+import boto3
+
+# Test that the DescribeEndpoints operation works as expected: that it
+# returns one endpoint (it may return more, but it never does this in
+# Amazon), and this endpoint can be used to make more requests.
+def test_describe_endpoints(request, dynamodb):
+    endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']
+    # It is not strictly necessary that only a single endpoint be returned,
+    # but this is what Amazon DynamoDB does today (and so does Alternator).
+    assert len(endpoints) == 1
+    for endpoint in endpoints:
+        assert 'CachePeriodInMinutes' in endpoint.keys()
+        address = endpoint['Address']
+        # Check that the address is a valid endpoint by checking that we can
+        # send it another describe_endpoints() request ;-) Note that the
+        # address does not include the "http://" or "https://" prefix, and
+        # we need to choose one manually.
+        prefix = "https://" if request.config.getoption('https') else "http://"
+        verify = not request.config.getoption('https')
+        url = prefix + address
+        if address.endswith('.amazonaws.com'):
+            boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()
+        else:
+            # Even though we connect to the local installation, Boto3 still
+            # requires us to specify dummy region and credential parameters,
+            # otherwise the user is forced to properly configure ~/.aws even
+            # for local runs.
+            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()
+        # Nothing to check here - if the above call failed with an exception,
+        # the test would fail.
--- a/alternator-test/test_describe_table.py
+++ b/alternator-test/test_describe_table.py
@@ -0,0 +1,169 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the DescribeTable operation.
+# Some attributes used only by a specific major feature will be tested
+# elsewhere:
+#  1. Tests for describing tables with global or local secondary indexes
+#     (the GlobalSecondaryIndexes and LocalSecondaryIndexes attributes)
+#     are in test_gsi.py and test_lsi.py.
+#  2. Tests for the stream feature (LatestStreamArn, LatestStreamLabel,
+#     StreamSpecification) will be in the tests devoted to the stream
+#     feature.
+#  3. Tests for describing a restored table (RestoreSummary, TableId)
+#     will be together with tests devoted to the backup/restore feature.
+
+import pytest
+from botocore.exceptions import ClientError
+import re
+import time
+from util import multiset
+
+# Test that DescribeTable correctly returns the table's name and state
+def test_describe_table_basic(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert got['TableName'] == test_table.name
+    assert got['TableStatus'] == 'ACTIVE'
+
+# Test that DescribeTable correctly returns the table's schema, in
+# AttributeDefinitions and KeySchema attributes
+def test_describe_table_schema(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    expected = { # Copied from test_table()'s fixture
+        'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+        ],
+        'AttributeDefinitions': [
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+        ]
+    }
+    assert got['KeySchema'] == expected['KeySchema']
+    # The list of attribute definitions may be arbitrarily reordered
+    assert multiset(got['AttributeDefinitions']) == multiset(expected['AttributeDefinitions'])
+
+# Test that DescribeTable correctly returns the table's billing mode,
+# in the BillingModeSummary attribute.
+def test_describe_table_billing(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert got['BillingModeSummary']['BillingMode'] == 'PAY_PER_REQUEST'
+    # The BillingModeSummary should also contain a
+    # LastUpdateToPayPerRequestDateTime attribute, which is a date.
+    # We don't know what date this is supposed to be, but something we
+    # do know is that the test table was created already with this billing
+    # mode, so the table creation date should be the same as the billing
+    # mode setting date.
+    assert 'LastUpdateToPayPerRequestDateTime' in got['BillingModeSummary']
+    assert got['BillingModeSummary']['LastUpdateToPayPerRequestDateTime'] == got['CreationDateTime']
+
+# Test that DescribeTable correctly returns the table's creation time.
+# We don't know what this creation time is supposed to be, so this test
+# cannot be very thorough... We currently just tests against something we
+# know to be wrong - returning the *current* time, which changes on every
+# call.
+@pytest.mark.xfail(reason="DescribeTable does not return table creation time")
+def test_describe_table_creation_time(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert 'CreationDateTime' in got
+    time1 = got['CreationDateTime']
+    time.sleep(1) 
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    time2 = got['CreationDateTime']
+    assert time1 == time2
+
+# Test that DescribeTable returns the table's estimated item count
+# in the ItemCount attribute. Unfortunately, there's not much we can
+# really test here... The documentation says that the count can be
+# delayed by six hours, so the number we get here may have no relation
+# to the current number of items in the test table. The attribute should exist,
+# though. This test does NOT verify that ItemCount isn't always returned as
+# zero - such stub implementation will pass this test.
+@pytest.mark.xfail(reason="DescribeTable does not return table item count")
+def test_describe_table_item_count(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert 'ItemCount' in got
+
+# Similar test for estimated size in bytes - TableSizeBytes - which again,
+# may reflect the size as long as six hours ago.
+@pytest.mark.xfail(reason="DescribeTable does not return table size")
+def test_describe_table_size(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert 'TableSizeBytes' in got
+
+# Test the ProvisionedThroughput attribute returned by DescribeTable.
+# This is a very partial test: Our test table is configured without
+# provisioned throughput, so obviously it will not have interesting settings
+# for it. DynamoDB returns zeros for some of the attributes, even though
+# the documentation suggests missing values should have been fine too.
+@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")
+def test_describe_table_provisioned_throughput(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0
+    assert got['ProvisionedThroughput']['WriteCapacityUnits'] == 0
+    assert got['ProvisionedThroughput']['ReadCapacityUnits'] == 0
+
+# This is a silly test for the RestoreSummary attribute in DescribeTable -
+# it should not exist in a table not created by a restore. When testing
+# the backup/restore feature, we will have more meaninful tests for the
+# value of this attribute in that case.
+def test_describe_table_restore_summary(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert not 'RestoreSummary' in got
+
+# This is a silly test for the SSEDescription attribute in DescribeTable -
+# by default, a table is encrypted with AWS-owned keys, not using client-
+# owned keys, and the SSEDescription attribute is not returned at all.
+def test_describe_table_encryption(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert not 'SSEDescription' in got
+
+# This is a silly test for the StreamSpecification attribute in DescribeTable -
+# when there are no streams, this attribute should be missing.
+def test_describe_table_stream_specification(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert not 'StreamSpecification' in got
+
+# Test that the table has an ARN, a unique identifier for the table which
+# includes which zone it is on, which account, and of course the table's
+# name. The ARN format is described in
+# https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#genref-arns
+@pytest.mark.xfail(reason="DescribeTable does not return ARN")
+def test_describe_table_arn(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert 'TableArn' in got and got['TableArn'].startswith('arn:')
+
+# Test that the table has a TableId.
+# TODO: Figure out what is this TableId supposed to be, it is just a
+# unique id that is created with the table and never changes? Or anything
+# else?
+@pytest.mark.xfail(reason="DescribeTable does not return TableId")
+def test_describe_table_id(test_table):
+    got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
+    assert 'TableId' in got
+
+# DescribeTable error path: trying to describe a non-existent table should
+# result in a ResourceNotFoundException.
+def test_describe_table_non_existent_table(dynamodb):
+    with pytest.raises(ClientError, match='ResourceNotFoundException') as einfo:
+        dynamodb.meta.client.describe_table(TableName='non_existent_table')
+    # As one of the first error-path tests that we wrote, let's test in more
+    # detail that the error reply has the appropriate fields:
+    response = einfo.value.response
+    print(response)
+    err = response['Error']
+    assert err['Code'] == 'ResourceNotFoundException'
+    assert re.match(err['Message'], 'Requested resource not found: Table: non_existent_table not found')
--- a/alternator-test/test_expected.py
+++ b/alternator-test/test_expected.py
--- a/alternator-test/test_gsi.py
+++ b/alternator-test/test_gsi.py
@@ -0,0 +1,874 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests of GSI (Global Secondary Indexes)
+#
+# Note that many of these tests are slower than usual, because many of them
+# need to create new tables and/or new GSIs of different types, operations
+# which are extremely slow in DynamoDB, often taking minutes (!).
+
+import pytest
+import time
+from botocore.exceptions import ClientError, ParamValidationError
+from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
+
+# GSIs only support eventually consistent reads, so tests that involve
+# writing to a table and then expect to read something from it cannot be
+# guaranteed to succeed without retrying the read. The following utility
+# functions make it easy to write such tests.
+# Note that in practice, there repeated reads are almost never necessary:
+# Amazon claims that "Changes to the table data are propagated to the global
+# secondary indexes within a fraction of a second, under normal conditions"
+# and indeed, in practice, the tests here almost always succeed without a
+# retry.
+def assert_index_query(table, index_name, expected_items, **kwargs):
+    for i in range(3):
+        if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
+            return
+        print('assert_index_query retrying')
+        time.sleep(1)
+    assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
+
+def assert_index_scan(table, index_name, expected_items, **kwargs):
+    for i in range(3):
+        if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
+            return
+        print('assert_index_scan retrying')
+        time.sleep(1)
+    assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
+
+# Although quite silly, it is actually allowed to create an index which is
+# identical to the base table.
+def test_gsi_identical(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
+        AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Scanning the entire table directly or via the index yields the same
+    # results (in different order).
+    assert multiset(items) == multiset(full_scan(table))
+    assert_index_scan(table, 'hello', items)
+    # We can't scan a non-existant index
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_scan(table, IndexName='wrong')
+    table.delete()
+
+# One of the simplest forms of a non-trivial GSI: The base table has a hash
+# and sort key, and the index reverses those roles. Other attributes are just
+# copied.
+@pytest.fixture(scope="session")
+def test_table_gsi_1(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+        ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'c', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'p', 'KeyType': 'RANGE' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ],
+        )
+    yield table
+    table.delete()
+
+def test_gsi_simple(test_table_gsi_1):
+    items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
+    with test_table_gsi_1.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    c = items[0]['c']
+    # The index allows a query on just a specific sort key, which isn't
+    # allowed on the base table.
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
+    expected_items = [x for x in items if x['c'] == c]
+    assert_index_query(test_table_gsi_1, 'hello', expected_items,
+        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
+    # Scanning the entire table directly or via the index yields the same
+    # results (in different order).
+    assert_index_scan(test_table_gsi_1, 'hello', full_scan(test_table_gsi_1))
+
+def test_gsi_same_key(test_table_gsi_1):
+    c = random_string();
+    # All these items have the same sort key 'c' but different hash key 'p'
+    items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
+    with test_table_gsi_1.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    assert_index_query(test_table_gsi_1, 'hello', items,
+        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
+
+# Check we get an appropriate error when trying to read a non-existing index
+# of an existing table. Although the documentation specifies that a
+# ResourceNotFoundException should be returned if "The operation tried to
+# access a nonexistent table or index", in fact in the specific case that
+# the table does exist but an index does not - we get a ValidationException.
+def test_gsi_missing_index(test_table_gsi_1):
+    with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
+        full_query(test_table_gsi_1, IndexName='wrong_name',
+            KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
+    with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
+        full_scan(test_table_gsi_1, IndexName='wrong_name')
+
+# Nevertheless, if the table itself does not exist, a query should return
+# a ResourceNotFoundException, not ValidationException:
+def test_gsi_missing_table(dynamodb):
+    with pytest.raises(ClientError, match='ResourceNotFoundException'):
+        dynamodb.meta.client.query(TableName='nonexistent_table', IndexName='any_name', KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
+    with pytest.raises(ClientError, match='ResourceNotFoundException'):
+        dynamodb.meta.client.scan(TableName='nonexistent_table', IndexName='any_name')
+
+# Verify that strongly-consistent reads on GSI are *not* allowed.
+@pytest.mark.xfail(reason="GSI strong consistency not checked")
+def test_gsi_strong_consistency(test_table_gsi_1):
+    with pytest.raises(ClientError, match='ValidationException.*Consistent'):
+        full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': ['hi'], 'ComparisonOperator': 'EQ'}}, IndexName='hello', ConsistentRead=True)
+    with pytest.raises(ClientError, match='ValidationException.*Consistent'):
+        full_scan(test_table_gsi_1, IndexName='hello', ConsistentRead=True)
+
+# Verify that a GSI is correctly listed in describe_table
+@pytest.mark.xfail(reason="DescribeTable provides index names only, no size or item count")
+def test_gsi_describe(test_table_gsi_1):
+    desc = test_table_gsi_1.meta.client.describe_table(TableName=test_table_gsi_1.name)
+    assert 'Table' in desc
+    assert 'GlobalSecondaryIndexes' in desc['Table']
+    gsis = desc['Table']['GlobalSecondaryIndexes']
+    assert len(gsis) == 1
+    gsi = gsis[0]
+    assert gsi['IndexName'] == 'hello'
+    assert 'IndexSizeBytes' in gsi     # actual size depends on content
+    assert 'ItemCount' in gsi
+    assert gsi['Projection'] == {'ProjectionType': 'ALL'}
+    assert gsi['IndexStatus'] == 'ACTIVE'
+    assert gsi['KeySchema'] == [{'KeyType': 'HASH', 'AttributeName': 'c'},
+                                {'KeyType': 'RANGE', 'AttributeName': 'p'}]
+    # TODO: check also ProvisionedThroughput, IndexArn
+
+# When a GSI's key includes an attribute not in the base table's key, we
+# need to remember to add its type to AttributeDefinitions.
+def test_gsi_missing_attribute_definition(dynamodb):
+    with pytest.raises(ClientError, match='ValidationException.*AttributeDefinitions'):
+        create_test_table(dynamodb,
+            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+            AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ],
+            GlobalSecondaryIndexes=[
+                {   'IndexName': 'hello',
+                    'KeySchema': [ { 'AttributeName': 'c', 'KeyType': 'HASH' } ],
+                    'Projection': { 'ProjectionType': 'ALL' }
+                }
+            ])
+
+# test_table_gsi_1_hash_only is a variant of test_table_gsi_1: It's another
+# case where the index doesn't involve non-key attributes. Again the base
+# table has a hash and sort key, but in this case the index has *only* a
+# hash key (which is the base's hash key). In the materialized-view-based
+# implementation, we need to remember the other part of the base key as a
+# clustering key.
+@pytest.fixture(scope="session")
+def test_table_gsi_1_hash_only(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+        ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'c', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ],
+        )
+    yield table
+    table.delete()
+
+def test_gsi_key_not_in_index(test_table_gsi_1_hash_only):
+    # Test with items with different 'c' values:
+    items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
+    with test_table_gsi_1_hash_only.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    c = items[0]['c']
+    expected_items = [x for x in items if x['c'] == c]
+    assert_index_query(test_table_gsi_1_hash_only, 'hello', expected_items,
+        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
+    # Test items with the same sort key 'c' but different hash key 'p'
+    c = random_string();
+    items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
+    with test_table_gsi_1_hash_only.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    assert_index_query(test_table_gsi_1_hash_only, 'hello', items,
+        KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
+    # Scanning the entire table directly or via the index yields the same
+    # results (in different order).
+    assert_index_scan(test_table_gsi_1_hash_only, 'hello', full_scan(test_table_gsi_1_hash_only))
+
+
+# A second scenario of GSI. Base table has just hash key, Index has a
+# different hash key - one of the non-key attributes from the base table.
+@pytest.fixture(scope="session")
+def test_table_gsi_2(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'x', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    yield table
+    table.delete()
+
+def test_gsi_2(test_table_gsi_2):
+    items1 = [{'p': random_string(), 'x': random_string()} for i in range(10)]
+    x1 = items1[0]['x']
+    x2 = random_string()
+    items2 = [{'p': random_string(), 'x': x2} for i in range(10)]
+    items = items1 + items2
+    with test_table_gsi_2.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    expected_items = [i for i in items if i['x'] == x1]
+    assert_index_query(test_table_gsi_2, 'hello', expected_items,
+        KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
+    expected_items = [i for i in items if i['x'] == x2]
+    assert_index_query(test_table_gsi_2, 'hello', expected_items,
+        KeyConditions={'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
+
+# Test that when a table has a GSI, if the indexed attribute is missing, the
+# item is added to the base table but not the index.
+def test_gsi_missing_attribute(test_table_gsi_2):
+    p1 = random_string()
+    x1 = random_string()
+    test_table_gsi_2.put_item(Item={'p':  p1, 'x': x1})
+    p2 = random_string()
+    test_table_gsi_2.put_item(Item={'p':  p2})
+
+    # Both items are now in the base table:
+    assert test_table_gsi_2.get_item(Key={'p':  p1})['Item'] == {'p': p1, 'x': x1}
+    assert test_table_gsi_2.get_item(Key={'p':  p2})['Item'] == {'p': p2}
+
+    # But only the first item is in the index: It can be found using a
+    # Query, and a scan of the index won't find it (but a scan on the base
+    # will).
+    assert_index_query(test_table_gsi_2, 'hello', [{'p': p1, 'x': x1}],
+        KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
+    assert any([i['p'] == p1 for i in full_scan(test_table_gsi_2)])
+    # Note: with eventually consistent read, we can't really be sure that
+    # and item will "never" appear in the index. We do this test last,
+    # so if we had a bug and such item did appear, hopefully we had enough
+    # time for the bug to become visible. At least sometimes.
+    assert not any([i['p'] == p2 for i in full_scan(test_table_gsi_2, IndexName='hello')])
+
+# Test when a table has a GSI, if the indexed attribute has the wrong type,
+# the update operation is rejected, and is added to neither base table nor
+# index. This is different from the case of a *missing* attribute, where
+# the item is added to the base table but not index.
+# The following three tests test_gsi_wrong_type_attribute_{put,update,batch}
+# test updates using PutItem, UpdateItem, and BatchWriteItem respectively.
+def test_gsi_wrong_type_attribute_put(test_table_gsi_2):
+    # PutItem with wrong type for 'x' is rejected, item isn't created even
+    # in the base table.
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*mismatch'):
+        test_table_gsi_2.put_item(Item={'p':  p, 'x': 3})
+    assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
+
+def test_gsi_wrong_type_attribute_update(test_table_gsi_2):
+    # An UpdateItem with wrong type for 'x' is also rejected, but naturally
+    # if the item already existed, it remains as it was.
+    p = random_string()
+    x = random_string()
+    test_table_gsi_2.put_item(Item={'p':  p, 'x': x})
+    with pytest.raises(ClientError, match='ValidationException.*mismatch'):
+        test_table_gsi_2.update_item(Key={'p':  p}, AttributeUpdates={'x': {'Value': 3, 'Action': 'PUT'}})
+    assert test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': x}
+
+def test_gsi_wrong_type_attribute_batch(test_table_gsi_2):
+    # In a BatchWriteItem, if any update is forbidden, the entire batch is
+    # rejected, and none of the updates happen at all.
+    p1 = random_string()
+    p2 = random_string()
+    p3 = random_string()
+    items = [{'p': p1, 'x': random_string()},
+             {'p': p2, 'x': 3},
+             {'p': p3, 'x': random_string()}]
+    with pytest.raises(ClientError, match='ValidationException.*mismatch'):
+        with test_table_gsi_2.batch_writer() as batch:
+            for item in items:
+                batch.put_item(item)
+    for p in [p1, p2, p3]:
+        assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
+
+# A third scenario of GSI. Index has a hash key and a sort key, both are
+# non-key attributes from the base table. This scenario may be very
+# difficult to implement in Alternator because Scylla's materialized-views
+# implementation only allows one new key column in the view, and here
+# we need two (which, also, aren't actual columns, but map items).
+@pytest.fixture(scope="session")
+def test_table_gsi_3(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'a', 'AttributeType': 'S' },
+                    { 'AttributeName': 'b', 'AttributeType': 'S' }
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'a', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'b', 'KeyType': 'RANGE' }
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    yield table
+    table.delete()
+
+def test_gsi_3(test_table_gsi_3):
+    items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
+    with test_table_gsi_3.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    assert_index_query(test_table_gsi_3, 'hello', [items[3]],
+        KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
+
+def test_gsi_update_second_regular_base_column(test_table_gsi_3):
+    items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
+    with test_table_gsi_3.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    items[3]['b'] = 'updated'
+    test_table_gsi_3.update_item(Key={'p':  items[3]['p']}, AttributeUpdates={'b': {'Value': 'updated', 'Action': 'PUT'}})
+    assert_index_query(test_table_gsi_3, 'hello', [items[3]],
+        KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
+
+# Test that when a table has a GSI, if the indexed attribute is missing, the
+# item is added to the base table but not the index.
+# This is the same feature we already tested in test_gsi_missing_attribute()
+# above, but on a different table: In that test we used test_table_gsi_2,
+# with one indexed attribute, and in this test we use test_table_gsi_3 which
+# has two base regular attributes in the view key, and more possibilities
+# of which value might be missing. Reproduces issue #6008.
+def test_gsi_missing_attribute_3(test_table_gsi_3):
+    p = random_string()
+    a = random_string()
+    b = random_string()
+    # First, add an item with a missing "a" value. It should appear in the
+    # base table, but not in the index:
+    test_table_gsi_3.put_item(Item={'p':  p, 'b': b})
+    assert test_table_gsi_3.get_item(Key={'p':  p})['Item'] == {'p': p, 'b': b}
+    # Note: with eventually consistent read, we can't really be sure that
+    # an item will "never" appear in the index. We hope that if a bug exists
+    # and such an item did appear, sometimes the delay here will be enough
+    # for the unexpected item to become visible.
+    assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
+    # Same thing for an item with a missing "b" value:
+    test_table_gsi_3.put_item(Item={'p':  p, 'a': a})
+    assert test_table_gsi_3.get_item(Key={'p':  p})['Item'] == {'p': p, 'a': a}
+    assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
+    # And for an item missing both:
+    test_table_gsi_3.put_item(Item={'p':  p})
+    assert test_table_gsi_3.get_item(Key={'p':  p})['Item'] == {'p': p}
+    assert not any([i['p'] == p for i in full_scan(test_table_gsi_3, IndexName='hello')])
+
+# A fourth scenario of GSI. Two GSIs on a single base table.
+@pytest.fixture(scope="session")
+def test_table_gsi_4(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'a', 'AttributeType': 'S' },
+                    { 'AttributeName': 'b', 'AttributeType': 'S' }
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello_a',
+                'KeySchema': [
+                    { 'AttributeName': 'a', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            },
+            {   'IndexName': 'hello_b',
+                'KeySchema': [
+                    { 'AttributeName': 'b', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    yield table
+    table.delete()
+
+# Test that a base table with two GSIs updates both as expected.
+def test_gsi_4(test_table_gsi_4):
+    items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
+    with test_table_gsi_4.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    assert_index_query(test_table_gsi_4, 'hello_a', [items[3]],
+        KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'}})
+    assert_index_query(test_table_gsi_4, 'hello_b', [items[3]],
+        KeyConditions={'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
+
+# Verify that describe_table lists the two GSIs.
+def test_gsi_4_describe(test_table_gsi_4):
+    desc = test_table_gsi_4.meta.client.describe_table(TableName=test_table_gsi_4.name)
+    assert 'Table' in desc
+    assert 'GlobalSecondaryIndexes' in desc['Table']
+    gsis = desc['Table']['GlobalSecondaryIndexes']
+    assert len(gsis) == 2
+    assert multiset([g['IndexName'] for g in gsis]) == multiset(['hello_a', 'hello_b'])
+
+# A scenario for GSI in which the table has both hash and sort key
+@pytest.fixture(scope="session")
+def test_table_gsi_5(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'x', 'KeyType': 'RANGE' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    yield table
+    table.delete()
+
+def test_gsi_5(test_table_gsi_5):
+    items1 = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
+    p1, x1 = items1[0]['p'], items1[0]['x']
+    p2, x2 = random_string(), random_string()
+    items2 = [{'p': p2, 'c': random_string(), 'x': x2} for i in range(10)]
+    items = items1 + items2
+    with test_table_gsi_5.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    expected_items = [i for i in items if i['p'] == p1 and i['x'] == x1]
+    assert_index_query(test_table_gsi_5, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
+                       'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
+    expected_items = [i for i in items if i['p'] == p2 and i['x'] == x2]
+    assert_index_query(test_table_gsi_5, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
+                       'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
+
+# Verify that DescribeTable correctly returns the schema of both base-table
+# and secondary indexes. KeySchema is given for each of the base table and
+# indexes, and AttributeDefinitions is merged for all of them together.
+def test_gsi_5_describe_table_schema(test_table_gsi_5):
+    got = test_table_gsi_5.meta.client.describe_table(TableName=test_table_gsi_5.name)['Table']
+    # Copied from test_table_gsi_5 fixture
+    expected_base_keyschema = [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' } ]
+    expected_gsi_keyschema = [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'x', 'KeyType': 'RANGE' } ]
+    expected_all_attribute_definitions = [
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' } ]
+    assert got['KeySchema'] == expected_base_keyschema
+    gsis = got['GlobalSecondaryIndexes']
+    assert len(gsis) == 1
+    assert gsis[0]['KeySchema'] == expected_gsi_keyschema
+    # The list of attribute definitions may be arbitrarily reordered
+    assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
+
+# Similar DescribeTable schema test for test_table_gsi_2. The peculiarity
+# in that table is that the base table has only a hash key p, and index
+# only hash hash key x; Now, while internally Scylla needs to add "p" as a
+# clustering key in the materialized view (in Scylla the view key always
+# contains the base key), when describing the table, "p" shouldn't be
+# returned as a range key, because the user didn't ask for it.
+# This test reproduces issue #5320.
+@pytest.mark.xfail(reason="GSI DescribeTable spurious range key (#5320)")
+def test_gsi_2_describe_table_schema(test_table_gsi_2):
+    got = test_table_gsi_2.meta.client.describe_table(TableName=test_table_gsi_2.name)['Table']
+    # Copied from test_table_gsi_2 fixture
+    expected_base_keyschema = [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ]
+    expected_gsi_keyschema = [ { 'AttributeName': 'x', 'KeyType': 'HASH' } ]
+    expected_all_attribute_definitions = [
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' } ]
+    assert got['KeySchema'] == expected_base_keyschema
+    gsis = got['GlobalSecondaryIndexes']
+    assert len(gsis) == 1
+    assert gsis[0]['KeySchema'] == expected_gsi_keyschema
+    # The list of attribute definitions may be arbitrarily reordered
+    assert multiset(got['AttributeDefinitions']) == multiset(expected_all_attribute_definitions)
+
+# All tests above involved "ProjectionType: ALL". This test checks how
+# "ProjectionType:: KEYS_ONLY" works. We note that it projects both
+# the index's key, *and* the base table's key. So items which had different
+# base-table keys cannot suddenly become the same item in the index.
+@pytest.mark.xfail(reason="GSI not supported")
+def test_gsi_projection_keys_only(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'x', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'KEYS_ONLY' }
+            }
+        ])
+    items = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    wanted = ['p', 'x']
+    expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+    assert_index_scan(table, 'hello', expected_items)
+    table.delete()
+
+# Test for "ProjectionType:: INCLUDE". The secondary table includes the
+# its own and the base's keys (as in KEYS_ONLY) plus the extra keys given
+# in NonKeyAttributes.
+@pytest.mark.xfail(reason="GSI not supported")
+def test_gsi_projection_include(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'x', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'INCLUDE',
+                                'NonKeyAttributes': ['a', 'b'] }
+            }
+        ])
+    # Some items have the projected attributes a,b and some don't:
+    items = [{'p': random_string(), 'x': random_string(), 'a': random_string(), 'b': random_string(), 'y': random_string()} for i in range(10)]
+    items = items + [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    wanted = ['p', 'x', 'a', 'b']
+    expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+    assert_index_scan(table, 'hello', expected_items)
+    print(len(expected_items))
+    table.delete()
+
+# DynamoDB's says the "Projection" argument of GlobalSecondaryIndexes is
+# mandatory, and indeed Boto3 enforces that it must be passed. The
+# documentation then goes on to claim that the "ProjectionType" member of
+# "Projection" is optional - and Boto3 allows it to be missing. But in
+# fact, it is not allowed to be missing: DynamoDB complains: "Unknown
+# ProjectionType: null".
+@pytest.mark.xfail(reason="GSI not supported")
+def test_gsi_missing_projection_type(dynamodb):
+    with pytest.raises(ClientError, match='ValidationException.*ProjectionType'):
+        create_test_table(dynamodb,
+            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
+            AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
+            GlobalSecondaryIndexes=[
+                {   'IndexName': 'hello',
+                    'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
+                    'Projection': {}
+                }
+            ])
+
+# update_table() for creating a GSI is an asynchronous operation.
+# The table's TableStatus changes from ACTIVE to UPDATING for a short while
+# and then goes back to ACTIVE, but the new GSI's IndexStatus appears as
+# CREATING, until eventually (after a *long* time...) it becomes ACTIVE.
+# During the CREATING phase, at some point the Backfilling attribute also
+# appears, until it eventually disappears. We need to wait until all three
+# markers indicate completion.
+# Unfortunately, while boto3 has a client.get_waiter('table_exists') to
+# wait for a table to exists, there is no such function to wait for an
+# index to come up, so we need to code it ourselves.
+def wait_for_gsi(table, gsi_name):
+    start_time = time.time()
+    # Surprisingly, even for tiny tables this can take a very long time
+    # on DynamoDB - often many minutes!
+    for i in range(300):
+        time.sleep(1)
+        desc = table.meta.client.describe_table(TableName=table.name)
+        table_status = desc['Table']['TableStatus']
+        if table_status != 'ACTIVE':
+            print('%d Table status still %s' % (i, table_status))
+            continue
+        index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
+        assert len(index_desc) == 1
+        index_status = index_desc[0]['IndexStatus']
+        if index_status != 'ACTIVE':
+            print('%d Index status still %s' % (i, index_status))
+            continue
+        # When the index is ACTIVE, this must be after backfilling completed
+        assert not 'Backfilling' in index_desc[0]
+        print('wait_for_gsi took %d seconds' % (time.time() - start_time))
+        return
+    raise AssertionError("wait_for_gsi did not complete")
+
+# Similarly to how wait_for_gsi() waits for a GSI to finish adding,
+# this function waits for a GSI to be finally deleted.
+def wait_for_gsi_gone(table, gsi_name):
+    start_time = time.time()
+    for i in range(300):
+        time.sleep(1)
+        desc = table.meta.client.describe_table(TableName=table.name)
+        table_status = desc['Table']['TableStatus']
+        if table_status != 'ACTIVE':
+            print('%d Table status still %s' % (i, table_status))
+            continue
+        if 'GlobalSecondaryIndexes' in desc['Table']:
+            index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
+            if len(index_desc) != 0:
+                index_status = index_desc[0]['IndexStatus']
+                print('%d Index status still %s' % (i, index_status))
+                continue
+        print('wait_for_gsi_gone took %d seconds' % (time.time() - start_time))
+        return
+    raise AssertionError("wait_for_gsi_gone did not complete")
+
+# All tests above involved creating a new table with a GSI up-front. This
+# test will test creating a base table *without* a GSI, putting data in
+# it, and then adding a GSI with the UpdateTable operation. This starts
+# a backfilling stage - where data is copied to the index - and when this
+# stage is done, the index is usable. Items whose indexed column contains
+# the wrong type are silently ignored and not added to the index (it would
+# not have been possible to add such items if the GSI was already configured
+# when they were added).
+@pytest.mark.xfail(reason="GSI not supported")
+def test_gsi_backfill(dynamodb):
+    # First create, and fill, a table without GSI. The items in items1
+    # will have the appropriate string type for 'x' and will later get
+    # indexed. Items in item2 have no value for 'x', and in item3 'x' is in
+    # not a string; So the items in items2 and items3 will be missing
+    # in the index we'll create later.
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
+    items1 = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
+    items2 = [{'p': random_string(), 'y': random_string()} for i in range(10)]
+    items3 = [{'p': random_string(), 'x': i} for i in range(10)]
+    items = items1 + items2 + items3
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    assert multiset(items) == multiset(full_scan(table))
+    # Now use UpdateTable to create the GSI
+    dynamodb.meta.client.update_table(TableName=table.name,
+        AttributeDefinitions=[{ 'AttributeName': 'x', 'AttributeType': 'S' }],
+        GlobalSecondaryIndexUpdates=[ {  'Create':
+            {  'IndexName': 'hello',
+                'KeySchema': [{ 'AttributeName': 'x', 'KeyType': 'HASH' }],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }}])
+    # update_table is an asynchronous operation. We need to wait until it
+    # finishes and the table is backfilled.
+    wait_for_gsi(table, 'hello')
+    # As explained above, only items in items1 got copied to the gsi,
+    # and Scan on them works as expected.
+    # Note that we don't need to retry the reads here (i.e., use the
+    # assert_index_scan() or assert_index_query() functions) because after
+    # we waited for backfilling to complete, we know all the pre-existing
+    # data is already in the index.
+    assert multiset(items1) == multiset(full_scan(table, IndexName='hello'))
+    # We can also use Query on the new GSI, to search on the attribute x:
+    assert multiset([items1[3]]) == multiset(full_query(table,
+        IndexName='hello',
+        KeyConditions={'x': {'AttributeValueList': [items1[3]['x']], 'ComparisonOperator': 'EQ'}}))
+    # Let's also test that we cannot add another index with the same name
+    # that already exists
+    with pytest.raises(ClientError, match='ValidationException.*already exists'):
+        dynamodb.meta.client.update_table(TableName=table.name,
+            AttributeDefinitions=[{ 'AttributeName': 'y', 'AttributeType': 'S' }],
+            GlobalSecondaryIndexUpdates=[ {  'Create':
+                {  'IndexName': 'hello',
+                    'KeySchema': [{ 'AttributeName': 'y', 'KeyType': 'HASH' }],
+                    'Projection': { 'ProjectionType': 'ALL' }
+                }}])
+    table.delete()
+
+# Test deleting an existing GSI using UpdateTable
+@pytest.mark.xfail(reason="GSI not supported")
+def test_gsi_delete(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'x', 'KeyType': 'HASH' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # So far, we have the index for "x" and can use it:
+    assert_index_query(table, 'hello', [items[3]],
+        KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
+    # Now use UpdateTable to delete the GSI for "x"
+    dynamodb.meta.client.update_table(TableName=table.name,
+        GlobalSecondaryIndexUpdates=[{  'Delete':
+            { 'IndexName': 'hello' } }])
+    # update_table is an asynchronous operation. We need to wait until it
+    # finishes and the GSI is removed.
+    wait_for_gsi_gone(table, 'hello')
+    # Now index is gone. We cannot query using it.
+    with pytest.raises(ClientError, match='ValidationException.*hello'):
+        full_query(table, IndexName='hello',
+            KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
+    table.delete()
+
+# Utility function for creating a new table a GSI with the given name,
+# and, if creation was successful, delete it. Useful for testing which
+# GSI names work.
+def create_gsi(dynamodb, index_name):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
+        AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': index_name,
+                'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    # Verify that the GSI wasn't just ignored, as Scylla originally did ;-)
+    assert 'GlobalSecondaryIndexes' in table.meta.client.describe_table(TableName=table.name)['Table']
+    table.delete()
+
+# Like table names (tested in test_table.py), index names must must also
+# be 3-255 characters and match the regex [a-zA-Z0-9._-]+. This test
+# is similar to test_create_table_unsupported_names(), but for GSI names.
+# Note that Scylla is actually more limited in the length of the index
+# names, because both table name and index name, together, have to fit in
+# 221 characters. But we don't verify here this specific limitation.
+def test_gsi_unsupported_names(dynamodb):
+    # Unfortunately, the boto library tests for names shorter than the
+    # minimum length (3 characters) immediately, and failure results in
+    # ParamValidationError. But the other invalid names are passed to
+    # DynamoDB, which returns an HTTP response code, which results in a
+    # CientError exception.
+    with pytest.raises(ParamValidationError):
+        create_gsi(dynamodb, 'n')
+    with pytest.raises(ParamValidationError):
+        create_gsi(dynamodb, 'nn')
+    with pytest.raises(ClientError, match='ValidationException.*nnnnn'):
+        create_gsi(dynamodb, 'n' * 256)
+    with pytest.raises(ClientError, match='ValidationException.*nyh'):
+        create_gsi(dynamodb, 'nyh@test')
+
+# On the other hand, names following the above rules should be accepted. Even
+# names which the Scylla rules forbid, such as a name starting with .
+def test_gsi_non_scylla_name(dynamodb):
+    create_gsi(dynamodb, '.alternator_test')
+
+# Index names with 255 characters are allowed in Dynamo. In Scylla, the
+# limit is different - the sum of both table and index length cannot
+# exceed 211 characters. So we test a much shorter limit.
+# (compare test_create_and_delete_table_very_long_name()).
+def test_gsi_very_long_name(dynamodb):
+    #create_gsi(dynamodb, 'n' * 255)   # works on DynamoDB, but not on Scylla
+    create_gsi(dynamodb, 'n' * 190)
+
+# Verify that ListTables does not list materialized views used for indexes.
+# This is hard to test, because we don't really know which table names
+# should be listed beyond those we created, and don't want to assume that
+# no other test runs in parallel with us. So the method we chose is to use a
+# unique random name for an index, and check that no table contains this
+# name. This assumes that materialized-view names are composed using the
+# index's name (which is currently what we do).
+
+@pytest.fixture(scope="session")
+def test_table_gsi_random_name(dynamodb):
+    index_name = random_string()
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+        ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': index_name,
+                'KeySchema': [
+                    { 'AttributeName': 'c', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'p', 'KeyType': 'RANGE' },
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ],
+        )
+    yield [table, index_name]
+    table.delete()
+
+def test_gsi_list_tables(dynamodb, test_table_gsi_random_name):
+    table, index_name = test_table_gsi_random_name
+    # Check that the random "index_name" isn't a substring of any table name:
+    tables = list_tables(dynamodb)
+    for name in tables:
+        assert not index_name in name
+    # But of course, the table's name should be in the list:
+    assert table.name in tables
--- a/alternator-test/test_health.py
+++ b/alternator-test/test_health.py
@@ -0,0 +1,35 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the health check
+
+import requests
+
+# Test that a health check can be performed with a GET packet
+def test_health_works(dynamodb):
+    url = dynamodb.meta.client._endpoint.host
+    response = requests.get(url)
+    assert response.ok
+    assert response.content.decode('utf-8').strip()  == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))
+
+# Test that a health check only works for the root URL ('/')
+def test_health_only_works_for_root_path(dynamodb):
+    url = dynamodb.meta.client._endpoint.host
+    for suffix in ['/abc', '/-', '/index.htm', '/health']:
+        print(url + suffix)
+        response = requests.get(url + suffix, verify=False)
+        assert response.status_code in range(400, 405)
--- a/alternator-test/test_item.py
+++ b/alternator-test/test_item.py
@@ -0,0 +1,402 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the CRUD item operations: PutItem, GetItem, UpdateItem, DeleteItem
+
+import pytest
+from botocore.exceptions import ClientError
+from decimal import Decimal
+from util import random_string, random_bytes
+
+# Basic test for creating a new item with a random name, and reading it back
+# with strong consistency.
+# Only the string type is used for keys and attributes. None of the various
+# optional PutItem features (Expected, ReturnValues, ReturnConsumedCapacity,
+# ReturnItemCollectionMetrics, ConditionalOperator, ConditionExpression,
+# ExpressionAttributeNames, ExpressionAttributeValues) are used, and
+# for GetItem strong consistency is requested as well as all attributes,
+# but no other optional features (AttributesToGet, ReturnConsumedCapacity,
+# ProjectionExpression, ExpressionAttributeNames)
+def test_basic_string_put_and_get(test_table):
+    p = random_string()
+    c = random_string()
+    val = random_string()
+    val2 = random_string()
+    test_table.put_item(Item={'p': p, 'c': c, 'attribute': val, 'another': val2})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item['p'] == p
+    assert item['c'] == c
+    assert item['attribute'] == val
+    assert item['another'] == val2
+
+# Similar to test_basic_string_put_and_get, just uses UpdateItem instead of
+# PutItem. Because the item does not yet exist, it should work the same.
+def test_basic_string_update_and_get(test_table):
+    p = random_string()
+    c = random_string()
+    val = random_string()
+    val2 = random_string()
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'PUT'}, 'another': {'Value': val2, 'Action': 'PUT'}})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item['p'] == p
+    assert item['c'] == c
+    assert item['attribute'] == val
+    assert item['another'] == val2
+
+# Test put_item and get_item of various types for the *attributes*,
+# including both scalars as well as nested documents, lists and sets.
+# The full list of types tested here:
+#    number, boolean, bytes, null, list, map, string set, number set,
+#    binary set.
+# The keys are still strings.
+# Note that only top-level attributes are written and read in this test -
+# this test does not attempt to modify *nested* attributes.
+# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html
+# on how to pass these various types to Boto3's put_item().
+def test_put_and_get_attribute_types(test_table):
+    key = {'p': random_string(), 'c': random_string()}
+    test_items = [
+        Decimal("12.345"),
+        42,
+        True,
+        False,
+        b'xyz',
+        None,
+        ['hello', 'world', 42],
+        {'hello': 'world', 'life': 42},
+        {'hello': {'test': 'hi', 'hello': True, 'list': [1, 2, 'hi']}},
+        set(['hello', 'world', 'hi']),
+        set([1, 42, Decimal("3.14")]),
+        set([b'xyz', b'hi']),
+    ]
+    item = { str(i) : test_items[i] for i in range(len(test_items)) }
+    item.update(key)
+    test_table.put_item(Item=item)
+    got_item = test_table.get_item(Key=key, ConsistentRead=True)['Item']
+    assert item == got_item
+
+# The test_empty_* tests below verify support for empty items, with no
+# attributes except the key. This is a difficult case for Scylla, because
+# for an empty row to exist, Scylla needs to add a "CQL row marker".
+# There are several ways to create empty items - via PutItem, UpdateItem
+# and deleting attributes from non-empty items, and we need to check them
+# all, in several test_empty_* tests:
+def test_empty_put(test_table):
+    p = random_string()
+    c = random_string()
+    test_table.put_item(Item={'p': p, 'c': c})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item == {'p': p, 'c': c}
+def test_empty_put_delete(test_table):
+    p = random_string()
+    c = random_string()
+    test_table.put_item(Item={'p': p, 'c': c, 'hello': 'world'})
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item == {'p': p, 'c': c}
+def test_empty_update(test_table):
+    p = random_string()
+    c = random_string()
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item == {'p': p, 'c': c}
+def test_empty_update_delete(test_table):
+    p = random_string()
+    c = random_string()
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Value': 'world', 'Action': 'PUT'}})
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item == {'p': p, 'c': c}
+
+# Test error handling of UpdateItem passed a bad "Action" field.
+def test_update_bad_action(test_table):
+    p = random_string()
+    c = random_string()
+    val = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'NONEXISTENT'}})
+
+# A more elaborate UpdateItem test, updating different attributes at different
+# times. Includes PUT and DELETE operations.
+def test_basic_string_more_update(test_table):
+    p = random_string()
+    c = random_string()
+    val1 = random_string()
+    val2 = random_string()
+    val3 = random_string()
+    val4 = random_string()
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Value': val1, 'Action': 'PUT'}})
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val1, 'Action': 'PUT'}})
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a2': {'Value': val2, 'Action': 'PUT'}})
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val3, 'Action': 'PUT'}})
+    test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Action': 'DELETE'}})
+    item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
+    assert item['p'] == p
+    assert item['c'] == c
+    assert item['a1'] == val3
+    assert item['a2'] == val2
+    assert not 'a3' in item
+
+# Test that item operations on a non-existant table name fail with correct
+# error code.
+def test_item_operations_nonexistent_table(dynamodb):
+    with pytest.raises(ClientError, match='ResourceNotFoundException'):
+        dynamodb.meta.client.put_item(TableName='non_existent_table',
+            Item={'a':{'S':'b'}})
+
+# Fetching a non-existant item. According to the DynamoDB doc, "If there is no
+# matching item, GetItem does not return any data and there will be no Item
+# element in the response."
+def test_get_item_missing_item(test_table):
+    p = random_string()
+    c = random_string()
+    assert not "Item" in test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)
+
+# Test that if we have a table with string hash and sort keys, we can't read
+# or write items with other key types to it.
+def test_put_item_wrong_key_type(test_table):
+    b = random_bytes()
+    s = random_string()
+    n = Decimal("3.14")
+    # Should succeed (correct key types)
+    test_table.put_item(Item={'p': s, 'c': s})
+    assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
+    # Should fail (incorrect hash key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.put_item(Item={'p': b, 'c': s})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.put_item(Item={'p': n, 'c': s})
+    # Should fail (incorrect sort key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.put_item(Item={'p': s, 'c': b})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.put_item(Item={'p': s, 'c': n})
+    # Should fail (missing hash key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.put_item(Item={'c': s})
+    # Should fail (missing sort key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.put_item(Item={'p': s})
+def test_update_item_wrong_key_type(test_table, test_table_s):
+    b = random_bytes()
+    s = random_string()
+    n = Decimal("3.14")
+    # Should succeed (correct key types)
+    test_table.update_item(Key={'p': s, 'c': s}, AttributeUpdates={})
+    assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
+    # Should fail (incorrect hash key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'p': b, 'c': s}, AttributeUpdates={})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'p': n, 'c': s}, AttributeUpdates={})
+    # Should fail (incorrect sort key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'p': s, 'c': b}, AttributeUpdates={})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'p': s, 'c': n}, AttributeUpdates={})
+    # Should fail (missing hash key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'c': s}, AttributeUpdates={})
+    # Should fail (missing sort key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.update_item(Key={'p': s}, AttributeUpdates={})
+    # Should fail (spurious key columns)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': s, 'c': s})
+def test_get_item_wrong_key_type(test_table, test_table_s):
+    b = random_bytes()
+    s = random_string()
+    n = Decimal("3.14")
+    # Should succeed (correct key types) but have empty result
+    assert not "Item" in test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)
+    # Should fail (incorrect hash key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': b, 'c': s})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': n, 'c': s})
+    # Should fail (incorrect sort key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': s, 'c': b})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': s, 'c': n})
+    # Should fail (missing hash key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'c': s})
+    # Should fail (missing sort key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': s})
+    # Should fail (spurious key columns)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': s, 'c': s})
+def test_delete_item_wrong_key_type(test_table, test_table_s):
+    b = random_bytes()
+    s = random_string()
+    n = Decimal("3.14")
+    # Should succeed (correct key types)
+    test_table.delete_item(Key={'p': s, 'c': s})
+    # Should fail (incorrect hash key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'p': b, 'c': s})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'p': n, 'c': s})
+    # Should fail (incorrect sort key types)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'p': s, 'c': b})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'p': s, 'c': n})
+    # Should fail (missing hash key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'c': s})
+    # Should fail (missing sort key)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'p': s})
+    # Should fail (spurious key columns)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table.delete_item(Key={'p': s, 'c': s, 'spurious': s})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': s, 'c': s})
+
+# Most of the tests here arbitrarily used a table with both hash and sort keys
+# (both strings). Let's check that a table with *only* a hash key works ok
+# too, for PutItem, GetItem, and UpdateItem.
+def test_only_hash_key(test_table_s):
+    s = random_string()
+    test_table_s.put_item(Item={'p': s, 'hello': 'world'})
+    assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world'}
+    test_table_s.update_item(Key={'p': s}, AttributeUpdates={'hi': {'Value': 'there', 'Action': 'PUT'}})
+    assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world', 'hi': 'there'}
+
+# Tests for item operations in tables with non-string hash or sort keys.
+# These tests focus only on the type of the key - everything else is as
+# simple as we can (string attributes, no special options for GetItem
+# and PutItem). These tests also focus on individual items only, and
+# not about the sort order of sort keys - this should be verified in
+# test_query.py, for example.
+def test_bytes_hash_key(test_table_b):
+    # Bytes values are passed using base64 encoding, which has weird cases
+    # depending on len%3 and len%4. So let's try various lengths.
+    for len in range(10,18):
+        p = random_bytes(len)
+        val = random_string()
+        test_table_b.put_item(Item={'p': p, 'attribute': val})
+        assert test_table_b.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'attribute': val}
+def test_bytes_sort_key(test_table_sb):
+    p = random_string()
+    c = random_bytes()
+    val = random_string()
+    test_table_sb.put_item(Item={'p': p, 'c': c, 'attribute': val})
+    assert test_table_sb.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': val}
+
+# Tests for using a large binary blob as hash key, sort key, or attribute.
+# DynamoDB strictly limits the size of the binary hash key to 2048 bytes,
+# and binary sort key to 1024 bytes, and refuses anything larger. The total
+# size of an item is limited to 400KB, which also limits the size of the
+# largest attributes. For more details on these limits, see
+# https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
+# Alternator currently does *not* have these limitations, and can accept much
+# larger keys and attributes, but what we do in the following tests is to verify
+# that items up to DynamoDB's maximum sizes also work well in Alternator.
+def test_large_blob_hash_key(test_table_b):
+    b = random_bytes(2048)
+    test_table_b.put_item(Item={'p': b})
+    assert test_table_b.get_item(Key={'p': b}, ConsistentRead=True)['Item'] == {'p': b}
+def test_large_blob_sort_key(test_table_sb):
+    s = random_string()
+    b = random_bytes(1024)
+    test_table_sb.put_item(Item={'p': s, 'c': b})
+    assert test_table_sb.get_item(Key={'p': s, 'c': b}, ConsistentRead=True)['Item'] == {'p': s, 'c': b}
+def test_large_blob_attribute(test_table):
+    p = random_string()
+    c = random_string()
+    b = random_bytes(409500)  # a bit less than 400KB
+    test_table.put_item(Item={'p': p, 'c': c, 'attribute': b })
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': b}
+
+# Checks what it is not allowed to use in a single UpdateItem request both
+# old-style AttributeUpdates and new-style UpdateExpression.
+def test_update_item_two_update_methods(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'a': {'Value': 3, 'Action': 'PUT'}},
+            UpdateExpression='SET b = :val1',
+            ExpressionAttributeValues={':val1': 4})
+
+# Verify that having neither AttributeUpdates nor UpdateExpression is
+# allowed, and results in creation of an empty item.
+def test_update_item_no_update_method(test_table_s):
+    p = random_string()
+    assert not "Item" in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
+    test_table_s.update_item(Key={'p': p})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p}
+
+# Test GetItem with the AttributesToGet parameter. Result should include the
+# selected attributes only - if one wants the key attributes as well, one
+# needs to select them explicitly. When no key attributes are selected,
+# some items may have *none* of the selected attributes. Those items are
+# returned too, as empty items - they are not outright missing.
+def test_getitem_attributes_to_get(dynamodb, test_table):
+    p = random_string()
+    c = random_string()
+    item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
+    test_table.put_item(Item=item)
+    for wanted in [ ['a'],             # only non-key attribute
+                    ['c', 'a'],        # a key attribute (sort key) and non-key
+                    ['p', 'c'],        # entire key
+                    ['nonexistent']    # Our item doesn't have this
+                   ]:
+        got_item = test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=wanted, ConsistentRead=True)['Item']
+        expected_item = {k: item[k] for k in wanted if k in item}
+        assert expected_item == got_item
+
+# Basic test for DeleteItem, with hash key only
+def test_delete_item_hash(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p})
+    assert 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
+    test_table_s.delete_item(Key={'p': p})
+    assert not 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
+
+# Basic test for DeleteItem, with hash and sort key
+def test_delete_item_sort(test_table):
+    p = random_string()
+    c = random_string()
+    key = {'p': p, 'c': c}
+    test_table.put_item(Item=key)
+    assert 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
+    test_table.delete_item(Key=key)
+    assert not 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
+
+# Test that PutItem completely replaces an existing item. It shouldn't merge
+# it with a previously existing value, as UpdateItem does!
+# We test for a table with just hash key, and for a table with both hash and
+# sort keys.
+def test_put_item_replace(test_table_s, test_table):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
+    test_table_s.put_item(Item={'p': p, 'b': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
+    c = random_string()
+    test_table.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
+    test_table.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}
--- a/alternator-test/test_lsi.py
+++ b/alternator-test/test_lsi.py
@@ -0,0 +1,365 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests of LSI (Local Secondary Indexes)
+#
+# Note that many of these tests are slower than usual, because many of them
+# need to create new tables and/or new LSIs of different types, operations
+# which are extremely slow in DynamoDB, often taking minutes (!).
+
+import pytest
+import time
+from botocore.exceptions import ClientError, ParamValidationError
+from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
+
+# Currently, Alternator's LSIs only support eventually consistent reads, so tests
+# that involve writing to a table and then expect to read something from it cannot
+# be guaranteed to succeed without retrying the read. The following utility
+# functions make it easy to write such tests.
+def assert_index_query(table, index_name, expected_items, **kwargs):
+    for i in range(3):
+        if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
+            return
+        print('assert_index_query retrying')
+        time.sleep(1)
+    assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
+
+def assert_index_scan(table, index_name, expected_items, **kwargs):
+    for i in range(3):
+        if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
+            return
+        print('assert_index_scan retrying')
+        time.sleep(1)
+    assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
+
+# Although quite silly, it is actually allowed to create an index which is
+# identical to the base table.
+def test_lsi_identical(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
+        AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'S' }],
+        LocalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    items = [{'p': random_string(), 'c': random_string()} for i in range(10)]
+    with table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Scanning the entire table directly or via the index yields the same
+    # results (in different order).
+    assert multiset(items) == multiset(full_scan(table))
+    assert_index_scan(table, 'hello', items)
+    # We can't scan a non-existant index
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_scan(table, IndexName='wrong')
+    table.delete()
+
+# Checks that providing a hash key different than the base table is not allowed,
+# and so is providing duplicated keys or no sort key at all
+def test_lsi_wrong(dynamodb):
+    with pytest.raises(ClientError, match='ValidationException.*'):
+        table = create_test_table(dynamodb,
+            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+            AttributeDefinitions=[
+                        { 'AttributeName': 'p', 'AttributeType': 'S' },
+                        { 'AttributeName': 'a', 'AttributeType': 'S' },
+                        { 'AttributeName': 'b', 'AttributeType': 'S' }
+            ],
+            LocalSecondaryIndexes=[
+                {   'IndexName': 'hello',
+                    'KeySchema': [
+                        { 'AttributeName': 'b', 'KeyType': 'HASH' },
+                        { 'AttributeName': 'p', 'KeyType': 'RANGE' }
+                    ],
+                    'Projection': { 'ProjectionType': 'ALL' }
+                }
+            ])
+        table.delete()
+    with pytest.raises(ClientError, match='ValidationException.*'):
+        table = create_test_table(dynamodb,
+            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+            AttributeDefinitions=[
+                        { 'AttributeName': 'p', 'AttributeType': 'S' },
+                        { 'AttributeName': 'a', 'AttributeType': 'S' },
+                        { 'AttributeName': 'b', 'AttributeType': 'S' }
+            ],
+            LocalSecondaryIndexes=[
+                {   'IndexName': 'hello',
+                    'KeySchema': [
+                        { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                        { 'AttributeName': 'p', 'KeyType': 'RANGE' }
+                    ],
+                    'Projection': { 'ProjectionType': 'ALL' }
+                }
+            ])
+        table.delete()
+    with pytest.raises(ClientError, match='ValidationException.*'):
+        table = create_test_table(dynamodb,
+            KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
+            AttributeDefinitions=[
+                        { 'AttributeName': 'p', 'AttributeType': 'S' },
+                        { 'AttributeName': 'a', 'AttributeType': 'S' },
+                        { 'AttributeName': 'b', 'AttributeType': 'S' }
+            ],
+            LocalSecondaryIndexes=[
+                {   'IndexName': 'hello',
+                    'KeySchema': [
+                        { 'AttributeName': 'p', 'KeyType': 'HASH' }
+                    ],
+                    'Projection': { 'ProjectionType': 'ALL' }
+                }
+            ])
+        table.delete()
+
+# A simple scenario for LSI. Base table has just hash key, Index has an
+# additional sort key - one of the non-key attributes from the base table.
+@pytest.fixture(scope="session")
+def test_table_lsi_1(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+                    { 'AttributeName': 'b', 'AttributeType': 'S' },
+        ],
+        LocalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'b', 'KeyType': 'RANGE' }
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            }
+        ])
+    yield table
+    table.delete()
+
+def test_lsi_1(test_table_lsi_1):
+    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
+    p1, b1 = items1[0]['p'], items1[0]['b']
+    p2, b2 = random_string(), random_string()
+    items2 = [{'p': p2, 'c': p2, 'b': b2}]
+    items = items1 + items2
+    with test_table_lsi_1.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
+    assert_index_query(test_table_lsi_1, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
+    expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
+    assert_index_query(test_table_lsi_1, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}})
+
+# A second scenario of LSI. Base table has both hash and sort keys,
+# a local index is created on each non-key parameter
+@pytest.fixture(scope="session")
+def test_table_lsi_4(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x1', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x2', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x3', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x4', 'AttributeType': 'S' },
+        ],
+        LocalSecondaryIndexes=[
+            {   'IndexName': 'hello_' + column,
+                'KeySchema': [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': column, 'KeyType': 'RANGE' }
+                ],
+                'Projection': { 'ProjectionType': 'ALL' }
+            } for column in ['x1','x2','x3','x4']
+        ])
+    yield table
+    table.delete()
+
+def test_lsi_4(test_table_lsi_4):
+    items1 = [{'p': random_string(), 'c': random_string(),
+               'x1': random_string(), 'x2': random_string(), 'x3': random_string(), 'x4': random_string()} for i in range(10)]
+    i_values = items1[0]
+    i5 = random_string()
+    items2 = [{'p': i5, 'c': i5, 'x1': i5, 'x2': i5, 'x3': i5, 'x4': i5}]
+    items = items1 + items2
+    with test_table_lsi_4.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    for column in ['x1', 'x2', 'x3', 'x4']:
+        expected_items = [i for i in items if (i['p'], i[column]) == (i_values['p'], i_values[column])]
+        assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
+            KeyConditions={'p': {'AttributeValueList': [i_values['p']], 'ComparisonOperator': 'EQ'},
+                           column: {'AttributeValueList': [i_values[column]], 'ComparisonOperator': 'EQ'}})
+        expected_items = [i for i in items if (i['p'], i[column]) == (i5, i5)]
+        assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
+            KeyConditions={'p': {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'},
+                           column: {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'}})
+
+def test_lsi_describe(test_table_lsi_4):
+    desc = test_table_lsi_4.meta.client.describe_table(TableName=test_table_lsi_4.name)
+    assert 'Table' in desc
+    assert 'LocalSecondaryIndexes' in desc['Table']
+    lsis = desc['Table']['LocalSecondaryIndexes']
+    assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_x1', 'hello_x2', 'hello_x3', 'hello_x4'])
+    # TODO: check projection and key params
+    # TODO: check also ProvisionedThroughput, IndexArn
+
+# A table with selective projection - only keys are projected into the index
+@pytest.fixture(scope="session")
+def test_table_lsi_keys_only(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+                    { 'AttributeName': 'b', 'AttributeType': 'S' }
+        ],
+        LocalSecondaryIndexes=[
+            {   'IndexName': 'hello',
+                'KeySchema': [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'b', 'KeyType': 'RANGE' }
+                ],
+                'Projection': { 'ProjectionType': 'KEYS_ONLY' }
+            }
+        ])
+    yield table
+    table.delete()
+
+# Check that it's possible to extract a non-projected attribute from the index,
+# as the documentation promises
+def test_lsi_get_not_projected_attribute(test_table_lsi_keys_only):
+    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
+    p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
+    p2, b2, d2 = random_string(), random_string(), random_string()
+    items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
+    items = items1 + items2
+    with test_table_lsi_keys_only.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1 and i['d'] == d1]
+    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
+        Select='ALL_ATTRIBUTES')
+    expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
+    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
+        Select='ALL_ATTRIBUTES')
+    expected_items = [{'d': i['d']} for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
+    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
+        Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['d'])
+
+# Check that only projected attributes can be extracted
+@pytest.mark.xfail(reason="LSI in alternator currently only implement full projections")
+def test_lsi_get_all_projected_attributes(test_table_lsi_keys_only):
+    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
+    p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
+    p2, b2, d2 = random_string(), random_string(), random_string()
+    items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
+    items = items1 + items2
+    with test_table_lsi_keys_only.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    expected_items = [{'p': i['p'], 'c': i['c'],'b': i['b']} for i in items if i['p'] == p1 and i['b'] == b1]
+    assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
+
+# Check that strongly consistent reads are allowed for LSI
+def test_lsi_consistent_read(test_table_lsi_1):
+    items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
+    p1, b1 = items1[0]['p'], items1[0]['b']
+    p2, b2 = random_string(), random_string()
+    items2 = [{'p': p2, 'c': p2, 'b': b2}]
+    items = items1 + items2
+    with test_table_lsi_1.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
+    assert_index_query(test_table_lsi_1, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
+        ConsistentRead=True)
+    expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
+    assert_index_query(test_table_lsi_1, 'hello', expected_items,
+        KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
+                       'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
+        ConsistentRead=True)
+
+# A table with both gsi and lsi present
+@pytest.fixture(scope="session")
+def test_table_lsi_gsi(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
+        AttributeDefinitions=[
+                    { 'AttributeName': 'p', 'AttributeType': 'S' },
+                    { 'AttributeName': 'c', 'AttributeType': 'S' },
+                    { 'AttributeName': 'x1', 'AttributeType': 'S' },
+        ],
+        GlobalSecondaryIndexes=[
+            {   'IndexName': 'hello_g1',
+                'KeySchema': [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'x1', 'KeyType': 'RANGE' }
+                ],
+                'Projection': { 'ProjectionType': 'KEYS_ONLY' }
+            }
+        ],
+        LocalSecondaryIndexes=[
+            {   'IndexName': 'hello_l1',
+                'KeySchema': [
+                    { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                    { 'AttributeName': 'x1', 'KeyType': 'RANGE' }
+                ],
+                'Projection': { 'ProjectionType': 'KEYS_ONLY' }
+            }
+        ])
+    yield table
+    table.delete()
+
+# Test that GSI and LSI can coexist, even if they're identical
+def test_lsi_and_gsi(test_table_lsi_gsi):
+    desc = test_table_lsi_gsi.meta.client.describe_table(TableName=test_table_lsi_gsi.name)
+    assert 'Table' in desc
+    assert 'LocalSecondaryIndexes' in desc['Table']
+    assert 'GlobalSecondaryIndexes' in desc['Table']
+    lsis = desc['Table']['LocalSecondaryIndexes']
+    gsis = desc['Table']['GlobalSecondaryIndexes']
+    assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_l1'])
+    assert(sorted([gsi['IndexName'] for gsi in gsis]) == ['hello_g1'])
+
+    items = [{'p': random_string(), 'c': random_string(), 'x1': random_string()} for i in range(17)]
+    p1, c1, x1 = items[0]['p'], items[0]['c'], items[0]['x1']
+    with test_table_lsi_gsi.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+
+    for index in ['hello_g1', 'hello_l1']:
+        expected_items = [i for i in items if i['p'] == p1 and i['x1'] == x1]
+        assert_index_query(test_table_lsi_gsi, index, expected_items,
+            KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
+                           'x1': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
--- a/alternator-test/test_nested.py
+++ b/alternator-test/test_nested.py
@@ -0,0 +1,60 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test for operations on items with *nested* attributes.
+
+import pytest
+from botocore.exceptions import ClientError
+from util import random_string
+
+# Test that we can write a top-level attribute that is a nested document, and
+# read it back correctly.
+def test_nested_document_attribute_write(test_table_s):
+    nested_value = {
+        'a': 3,
+        'b': {'c': 'hello', 'd': ['hi', 'there', {'x': 'y'}, '42']},
+    }
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': nested_value})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': nested_value}
+
+# Test that if we have a top-level attribute that is a nested document (i.e.,
+# a dictionary), updating this attribute will replace it entirely by a new
+# nested document - not merge into the old content with the new content.
+def test_nested_document_attribute_overwrite(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
+    test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': {'c': 5}, 'Action': 'PUT'}})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'c': 5}, 'd': 5}
+
+# Moreover, we can overwrite an entire nested document by, say, a string,
+# and that's also fine.
+def test_nested_document_attribute_overwrite_2(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
+    test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': 'hi', 'Action': 'PUT'}})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi', 'd': 5}
+
+# Verify that AttributeUpdates cannot be used to update a nested attribute -
+# trying to use a dot in the name of the attribute, will just create one with
+# an actual dot in its name.
+def test_attribute_updates_dot(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a.b': {'Value': 3, 'Action': 'PUT'}})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}
--- a/alternator-test/test_projection_expression.py
+++ b/alternator-test/test_projection_expression.py
@@ -0,0 +1,201 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the various operations (GetItem, Query, Scan) with a
+# ProjectionExpression parameter.
+#
+# ProjectionExpression is an expension of the legacy AttributesToGet
+# parameter. Both parameters request that only a subset of the attributes
+# be fetched for each item, instead of all of them. But while AttributesToGet
+# was limited to top-level attributes, ProjectionExpression can request also
+# nested attributes.
+
+import pytest
+from botocore.exceptions import ClientError
+from util import random_string, full_scan, full_query, multiset
+
+# Basic test for ProjectionExpression, requesting only top-level attributes.
+# Result should include the selected attributes only - if one wants the key
+# attributes as well, one needs to select them explicitly. When no key
+# attributes are selected, an item may have *none* of the selected
+# attributes, and returned as an empty item.
+def test_projection_expression_toplevel(test_table):
+    p = random_string()
+    c = random_string()
+    item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
+    test_table.put_item(Item=item)
+    for wanted in [ ['a'],             # only non-key attribute
+                    ['c', 'a'],        # a key attribute (sort key) and non-key
+                    ['p', 'c'],        # entire key
+                    ['nonexistent']    # Our item doesn't have this
+                   ]:
+        got_item = test_table.get_item(Key={'p': p, 'c': c}, ProjectionExpression=",".join(wanted), ConsistentRead=True)['Item']
+        expected_item = {k: item[k] for k in wanted if k in item}
+        assert expected_item == got_item
+
+# Various simple tests for ProjectionExpression's syntax, using only top-evel
+# attributes.
+def test_projection_expression_toplevel_syntax(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': 'hello'}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a'})['Item'] == {'a': 'hello'}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,b')['Item'] == {'a': 'hello', 'b': 'hi'}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=' a  ,   b  ')['Item'] == {'a': 'hello', 'b': 'hi'}
+    # Missing or unused names in ExpressionAttributeNames are errors:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#wrong': 'a'})['Item'] == {'a': 'hello'}
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a', '#unused': 'b'})['Item'] == {'a': 'hello'}
+    # It is not allowed to fetch the same top-level attribute twice (or in
+    # general, list two overlapping attributes). We get an error like
+    # "Invalid ProjectionExpression: Two document paths overlap with each
+    # other; must remove or rewrite one of these paths; path one: [a], path
+    # two: [a]".
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a')['Item']
+    # A comma with nothing after it is a syntax error:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,')['Item']
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=',a')['Item']
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,,b')['Item']
+    # An empty ProjectionExpression is not allowed. DynamoDB recognizes its
+    # syntax, but then writes: "Invalid ProjectionExpression: The expression
+    # can not be empty".
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='')['Item']
+
+# The following two tests are similar to test_projection_expression_toplevel()
+# which tested the GetItem operation - but these test Scan and Query.
+# Both test ProjectionExpression with only top-level attributes.
+def test_projection_expression_scan(filled_test_table):
+    table, items = filled_test_table
+    for wanted in [ ['another'],       # only non-key attributes (one item doesn't have it!)
+                    ['c', 'another'],  # a key attribute (sort key) and non-key
+                    ['p', 'c'],        # entire key
+                    ['nonexistent']    # none of the items have this attribute!
+                   ]:
+        got_items = full_scan(table,  ProjectionExpression=",".join(wanted))
+        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+        assert multiset(expected_items) == multiset(got_items)
+
+def test_projection_expression_query(test_table):
+    p = random_string()
+    items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    for wanted in [ ['a'],             # only non-key attributes
+                    ['c', 'a'],        # a key attribute (sort key) and non-key
+                    ['p', 'c'],        # entire key
+                    ['nonexistent']    # none of the items have this attribute!
+                   ]:
+        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression=",".join(wanted))
+        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+        assert multiset(expected_items) == multiset(got_items)
+
+# The previous tests all fetched only top-level attributes. They could all
+# be written using AttributesToGet instead of ProjectionExpression (and,
+# in fact, we do have similar tests with AttributesToGet in other files),
+# but the previous test checked that the alternative syntax works correctly.
+# The following test checks fetching more elaborate attribute paths from
+# nested documents.
+@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
+def test_projection_expression_path(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={
+        'p': p,
+        'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5},
+        'b': 'hello' 
+        })
+    # Fetching the entire nested document "a" works, of course:
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5}}
+    # If we fetch a.b, we get only the content of b - but it's still inside
+    # the a dictionary:
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}]}}
+    # Similarly, fetching a.b[0] gives us a one-element array in a dictionary.
+    # Note that [0] is the first element of an array.
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0]')['Item'] == {'a': {'b': [2]}}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2]')['Item'] == {'a': {'b': [{'x': 'hi', 'y': 'yo'}]}}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2].y')['Item'] == {'a': {'b': [{'y': 'yo'}]}}
+    # Trying to read any sort of non-existant attribute returns an empty item.
+    # This includes a non-existing top-level attribute, an attempt to read
+    # beyond the end of an array or a non-existant member of a dictionary, as
+    # well as paths which begin with a non-existant prefix.
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='x')['Item'] == {}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3]')['Item'] == {}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x')['Item'] == {}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x.y')['Item'] == {}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3].x')['Item'] == {}
+    # We can read multiple paths - the result are merged into one object
+    # structured the same was as in the original item:
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[1]')['Item'] == {'a': {'b': [2, 4]}}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.c')['Item'] == {'a': {'b': [2], 'c': 5}}
+    # It is not allowed to read the same path multiple times. The error from
+    # DynamoDB looks like: "Invalid ProjectionExpression: Two document paths
+    # overlap with each other; must remove or rewrite one of these paths;
+    # path one: [a, b, [0]], path two: [a, b, [0]]".
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[0]')['Item']
+    # Two paths are considered to "overlap" if the content of one path
+    # contains the content of the second path. So requesting both "a" and
+    # "a.b[0]" is not allowed.
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a.b[0]')['Item']
+
+@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
+def test_query_projection_expression_path(test_table):
+    p = random_string()
+    items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression="a.x")
+    expected_items = [{'a': {'x': x['a']['x']}} for x in items]
+    assert multiset(expected_items) == multiset(got_items)
+
+@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
+def test_scan_projection_expression_path(test_table):
+    # This test is similar to test_query_projection_expression_path above,
+    # but uses a scan instead of a query. The scan will generate unrelated
+    # partitions created by other tests (hopefully not too many...) that we
+    # need to ignore. We also need to ask for "p" too, so we can filter by it.
+    p = random_string()
+    items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    got_items = [ x for x in full_scan(test_table, ProjectionExpression="p, a.x") if x['p'] == p]
+    expected_items = [{'p': p, 'a': {'x': x['a']['x']}} for x in items]
+    assert multiset(expected_items) == multiset(got_items)
+
+# It is not allowed to use both ProjectionExpression and its older cousin,
+# AttributesToGet, together. If trying to do this, DynamoDB produces an error
+# like "Can not use both expression and non-expression parameters in the same
+# request: Non-expression parameters: {AttributesToGet} Expression
+# parameters: {ProjectionExpression}
+def test_projection_expression_and_attributes_to_get(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
+    with pytest.raises(ClientError, match='ValidationException.*both'):
+        test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a', AttributesToGet=['b'])['Item']
+    with pytest.raises(ClientError, match='ValidationException.*both'):
+        full_scan(test_table_s,  ProjectionExpression='a', AttributesToGet=['a'])
+    with pytest.raises(ClientError, match='ValidationException.*both'):
+        full_query(test_table_s, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression='a', AttributesToGet=['a'])
--- a/alternator-test/test_query.py
+++ b/alternator-test/test_query.py
@@ -0,0 +1,516 @@
+# -*- coding: utf-8 -*-
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the Query operation
+
+import random
+import pytest
+from botocore.exceptions import ClientError, ParamValidationError
+from decimal import Decimal
+from util import random_string, random_bytes, full_query, multiset
+from boto3.dynamodb.conditions import Key, Attr
+
+# Test that scanning works fine with in-stock paginator
+def test_query_basic_restrictions(dynamodb, filled_test_table):
+    test_table, items = filled_test_table
+    paginator = dynamodb.meta.client.get_paginator('query')
+
+    # EQ
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)
+
+    # LT
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
+
+    # LE
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)
+
+    # GT
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)
+
+    # GE
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)
+
+    # BETWEEN
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)
+
+    # BEGINS_WITH
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}
+        }):
+        print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)
+
+# Test that KeyConditionExpression parameter is supported
+@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")
+def test_query_key_condition_expression(dynamodb, filled_test_table):
+    test_table, items = filled_test_table
+    paginator = dynamodb.meta.client.get_paginator('query')
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):
+        got_items += page['Items']
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
+
+def test_begins_with(dynamodb, test_table):
+    paginator = dynamodb.meta.client.get_paginator('query')
+    items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+
+    # TODO(sarna): Once bytes type is supported, /xFF character should be tested
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])
+
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name, KeyConditions={
+            'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
+            'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
+        }):
+        got_items += page['Items']
+    print(got_items)
+    assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])
+
+def test_begins_with_wrong_type(dynamodb, test_table_sn):
+    paginator = dynamodb.meta.client.get_paginator('query')
+    with pytest.raises(ClientError, match='ValidationException'):
+        for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={
+                'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
+                'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}
+                }):
+            pass
+
+# Items returned by Query should be sorted by the sort key. The following
+# tests verify that this is indeed the case, for the three allowed key types:
+# strings, binary, and numbers. These tests test not just the Query operation,
+# but inherently that the sort-key sorting works.
+def test_query_sort_order_string(test_table):
+    # Insert a lot of random items in one new partition:
+    # str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.
+    p = random_string()
+    items = [{'p': p, 'c': str(i)} for i in range(128)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
+    assert len(items) == len(got_items)
+    # Extract just the sort key ("c") from the items
+    sort_keys = [x['c'] for x in items]
+    got_sort_keys = [x['c'] for x in got_items]
+    # Verify that got_sort_keys are already sorted (in string order)
+    assert sorted(got_sort_keys) == got_sort_keys
+    # Verify that got_sort_keys are a sorted version of the expected sort_keys
+    assert sorted(sort_keys) == got_sort_keys
+def test_query_sort_order_bytes(test_table_sb):
+    # Insert a lot of random items in one new partition:
+    # We arbitrarily use random_bytes with a random length.
+    p = random_string()
+    items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]
+    with test_table_sb.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
+    assert len(items) == len(got_items)
+    sort_keys = [x['c'] for x in items]
+    got_sort_keys = [x['c'] for x in got_items]
+    # Boto3's "Binary" objects are sorted as if bytes are signed integers.
+    # This isn't the order that DynamoDB itself uses (byte 0 should be first,
+    # not byte -128). Sorting the byte array ".value" works.
+    assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys
+    assert sorted(sort_keys) == got_sort_keys
+def test_query_sort_order_number(test_table_sn):
+    # This is a list of numbers, sorted in correct order, and each suitable
+    # for accurate representation by Alternator's number type.
+    numbers = [
+        Decimal("-2e10"),
+        Decimal("-7.1e2"),
+        Decimal("-4.1"),
+        Decimal("-0.1"),
+        Decimal("-1e-5"),
+        Decimal("0"),
+        Decimal("2e-5"),
+        Decimal("0.15"),
+        Decimal("1"),
+        Decimal("1.00000000000000000000000001"),
+        Decimal("3.14159"),
+        Decimal("3.1415926535897932384626433832795028841"),
+        Decimal("31.4"),
+        Decimal("1.4e10"),
+    ]
+    # Insert these numbers, in random order, into one partition:
+    p = random_string()
+    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
+    with test_table_sn.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Finally, verify that we get back exactly the same numbers (with identical
+    # precision), and in their original sorted order.
+    got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
+    got_sort_keys = [x['c'] for x in got_items]
+    assert got_sort_keys == numbers
+
+def test_query_filtering_attributes_equality(filled_test_table):
+    test_table, items = filled_test_table
+
+    query_filter = {
+        "attribute" : {
+            "AttributeValueList" : [ "xxxx" ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
+
+    query_filter = {
+        "attribute" : {
+            "AttributeValueList" : [ "xxxx" ],
+            "ComparisonOperator": "EQ"
+        },
+        "another" : {
+            "AttributeValueList" : [ "yy" ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
+
+# Test that FilterExpression works as expected
+@pytest.mark.xfail(reason="FilterExpression not supported yet")
+def test_query_filter_expression(filled_test_table):
+    test_table, items = filled_test_table
+
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
+
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
+    print(got_items)
+    assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
+
+# QueryFilter can only contain non-key attributes in order to be compatible
+def test_query_filtering_key_equality(filled_test_table):
+    test_table, items = filled_test_table
+
+    with pytest.raises(ClientError, match='ValidationException'):
+        query_filter = {
+            "c" : {
+                "AttributeValueList" : [ "5" ],
+                "ComparisonOperator": "EQ"
+            }
+        }
+        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
+        print(got_items)
+
+    with pytest.raises(ClientError, match='ValidationException'):
+        query_filter = {
+            "attribute" : {
+                "AttributeValueList" : [ "x" ],
+                "ComparisonOperator": "EQ"
+            },
+            "p" : {
+                "AttributeValueList" : [ "5" ],
+                "ComparisonOperator": "EQ"
+            }
+        }
+        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
+        print(got_items)
+
+# Test Query with the AttributesToGet parameter. Result should include the
+# selected attributes only - if one wants the key attributes as well, one
+# needs to select them explicitly. When no key attributes are selected,
+# some items may have *none* of the selected attributes. Those items are
+# returned too, as empty items - they are not outright missing.
+def test_query_attributes_to_get(dynamodb, test_table):
+    p = random_string()
+    items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
+    with test_table.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    for wanted in [ ['a'],             # only non-key attributes
+                    ['c', 'a'],        # a key attribute (sort key) and non-key
+                    ['p', 'c'],        # entire key
+                    ['nonexistent']    # none of the items have this attribute!
+                   ]:
+        got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)
+        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+        assert multiset(expected_items) == multiset(got_items)
+
+# Test that in a table with both hash key and sort key, which keys we can
+# Query by: We can Query by the hash key, by a combination of both hash and
+# sort keys, but *cannot* query by just the sort key, and obviously not
+# by any non-key column.
+def test_query_which_key(test_table):
+    p = random_string()
+    c = random_string()
+    p2 = random_string()
+    c2 = random_string()
+    item1 = {'p': p, 'c': c}
+    item2 = {'p': p, 'c': c2}
+    item3 = {'p': p2, 'c': c}
+    for i in [item1, item2, item3]:
+        test_table.put_item(Item=i)
+    # Query by hash key only:
+    got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
+    expected_items = [item1, item2]
+    assert multiset(expected_items) == multiset(got_items)
+    # Query by hash key *and* sort key (this is basically a GetItem):
+    got_items = full_query(test_table, KeyConditions={
+        'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
+        'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
+    })
+    expected_items = [item1]
+    assert multiset(expected_items) == multiset(got_items)
+    # Query by sort key alone is not allowed. DynamoDB reports:
+    # "Query condition missed key schema element: p".
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_query(test_table, KeyConditions={
+            'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
+        })
+    # Query by a non-key isn't allowed, for the same reason - that the
+    # actual hash key (p) is missing in the query:
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_query(test_table, KeyConditions={
+            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
+        })
+    # If we try both p and a non-key we get a complaint that the sort
+    # key is missing: "Query condition missed key schema element: c"
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_query(test_table, KeyConditions={
+            'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
+            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
+        })
+    # If we try p, c and another key, we get an error that
+    # "Conditions can be of length 1 or 2 only".
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_query(test_table, KeyConditions={
+            'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
+            'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},
+            'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
+        })
+
+# Test the "Select" parameter of Query. The default Select mode,
+# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
+# allow returning just specific attributes or just counting the results
+# without returning items at all.
+@pytest.mark.xfail(reason="Select not supported yet")
+def test_query_select(test_table_sn):
+    numbers = [Decimal(i) for i in range(10)]
+    # Insert these numbers, in random order, into one partition:
+    p = random_string()
+    items = [{'p': p, 'c': num, 'x': num} for num in random.sample(numbers, len(numbers))]
+    with test_table_sn.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Verify that we get back the numbers in their sorted order. By default,
+    # query returns all attributes:
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
+    got_sort_keys = [x['c'] for x in got_items]
+    assert got_sort_keys == numbers
+    got_x_attributes = [x['x'] for x in got_items]
+    assert got_x_attributes == numbers
+    # Select=ALL_ATTRIBUTES does exactly the same as the default - return
+    # all attributes:
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_ATTRIBUTES')['Items']
+    got_sort_keys = [x['c'] for x in got_items]
+    assert got_sort_keys == numbers
+    got_x_attributes = [x['x'] for x in got_items]
+    assert got_x_attributes == numbers
+    # Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
+    # is just for indexes, when IndexName is specified)
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='ALL_PROJECTED_ATTRIBUTES')
+    # Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
+    # or ProjectionExpression appears, but then really does nothing:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES')
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['x'])['Items']
+    expected_items = [{'x': i} for i in numbers]
+    assert got_items == expected_items
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression='x')['Items']
+    assert got_items == expected_items
+    # Select=COUNT just returns a count - not any items
+    got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='COUNT')
+    assert got['Count'] == len(numbers)
+    assert not 'Items' in got
+    # Check again that we also get a count - not just with Select=COUNT,
+    # but without Select=COUNT we also get the items:
+    got = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
+    assert got['Count'] == len(numbers)
+    assert 'Items' in got
+    # Select with some unknown string generates a validation exception:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Select='UNKNOWN')
+
+# Test that the "Limit" parameter can be used to return only some of the
+# items in a single partition. The items returned are the first in the
+# sorted order.
+def test_query_limit(test_table_sn):
+    numbers = [Decimal(i) for i in range(10)]
+    # Insert these numbers, in random order, into one partition:
+    p = random_string()
+    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
+    with test_table_sn.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Verify that we get back the numbers in their sorted order.
+    # First, no Limit so we should get all numbers (we have few of them, so
+    # it all fits in the default 1MB limitation)
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})['Items']
+    got_sort_keys = [x['c'] for x in got_items]
+    assert got_sort_keys == numbers
+    # Now try a few different Limit values, and verify that the query
+    # returns exactly the first Limit sorted numbers.
+    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
+        got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)['Items']
+        assert len(got_items) == min(limit, len(numbers))
+        got_sort_keys = [x['c'] for x in got_items]
+        assert got_sort_keys == numbers[0:limit]
+    # Unfortunately, the boto3 library forbids a Limit of 0 on its own,
+    # before even sending a request, so we can't test how the server responds.
+    with pytest.raises(ParamValidationError):
+        test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=0)
+
+# In test_query_limit we tested just that Limit allows to stop the result
+# after right right number of items. Here we test that such a stopped result
+# can be resumed, via the LastEvaluatedKey/ExclusiveStartKey paging mechanism.
+def test_query_limit_paging(test_table_sn):
+    numbers = [Decimal(i) for i in range(20)]
+    # Insert these numbers, in random order, into one partition:
+    p = random_string()
+    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
+    with test_table_sn.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Verify that full_query() returns all these numbers, in sorted order.
+    # full_query() will do a query with the given limit, and resume it again
+    # and again until the last page.
+    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
+        got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit)
+        got_sort_keys = [x['c'] for x in got_items]
+        assert got_sort_keys == numbers
+
+# Test that the ScanIndexForward parameter works, and can be used to
+# return items sorted in reverse order. Combining this with Limit can
+# be used to return the last items instead of the first items of the
+# partition.
+@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
+def test_query_reverse(test_table_sn):
+    numbers = [Decimal(i) for i in range(20)]
+    # Insert these numbers, in random order, into one partition:
+    p = random_string()
+    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
+    with test_table_sn.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    # Verify that we get back the numbers in their sorted order or reverse
+    # order, depending on the ScanIndexForward parameter being True or False.
+    # First, no Limit so we should get all numbers (we have few of them, so
+    # it all fits in the default 1MB limitation)
+    reversed_numbers = list(reversed(numbers))
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=True)['Items']
+    got_sort_keys = [x['c'] for x in got_items]
+    assert got_sort_keys == numbers
+    got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False)['Items']
+    got_sort_keys = [x['c'] for x in got_items]
+    assert got_sort_keys == reversed_numbers
+    # Now try a few different Limit values, and verify that the query
+    # returns exactly the first Limit sorted numbers - in regular or
+    # reverse order, depending on ScanIndexForward.
+    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
+        got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=True)['Items']
+        assert len(got_items) == min(limit, len(numbers))
+        got_sort_keys = [x['c'] for x in got_items]
+        assert got_sort_keys == numbers[0:limit]
+        got_items = test_table_sn.query(KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, Limit=limit, ScanIndexForward=False)['Items']
+        assert len(got_items) == min(limit, len(numbers))
+        got_sort_keys = [x['c'] for x in got_items]
+        assert got_sort_keys == reversed_numbers[0:limit]
+
+# Test that paging also works properly with reverse order
+# (ScanIndexForward=false), i.e., reverse-order queries can be resumed
+@pytest.mark.xfail(reason="ScanIndexForward not supported yet")
+def test_query_reverse_paging(test_table_sn):
+    numbers = [Decimal(i) for i in range(20)]
+    # Insert these numbers, in random order, into one partition:
+    p = random_string()
+    items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
+    with test_table_sn.batch_writer() as batch:
+        for item in items:
+            batch.put_item(item)
+    reversed_numbers = list(reversed(numbers))
+    # Verify that with ScanIndexForward=False, full_query() returns all
+    # these numbers in reversed sorted order - getting pages of Limit items
+    # at a time and resuming the query.
+    for limit in [1, 2, 3, 7, 10, 17, 100, 10000]:
+        got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ScanIndexForward=False, Limit=limit)
+        got_sort_keys = [x['c'] for x in got_items]
+        assert got_sort_keys == reversed_numbers
--- a/alternator-test/test_returnvalues.py
+++ b/alternator-test/test_returnvalues.py
@@ -0,0 +1,226 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the ReturnValues parameter for the different update operations
+# (PutItem, UpdateItem, DeleteItem).
+
+import pytest
+from botocore.exceptions import ClientError
+from util import random_string
+
+# Test trivial support for the ReturnValues parameter in PutItem, UpdateItem
+# and DeleteItem - test that "NONE" works (and changes nothing), while a
+# completely unsupported value gives an error.
+# This test is useful to check that before the ReturnValues parameter is fully
+# implemented, it returns an error when a still-unsupported ReturnValues
+# option is attempted in the request - instead of simply being ignored.
+def test_trivial_returnvalues(test_table_s):
+    # PutItem:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
+    assert not 'Attributes' in ret
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
+    # UpdateItem:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
+        UpdateExpression='SET b = :val',
+        ExpressionAttributeValues={':val': 'cat'})
+    assert not 'Attributes' in ret
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
+            UpdateExpression='SET a = a + :val',
+            ExpressionAttributeValues={':val': 1})
+    # DeleteItem:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
+    assert not 'Attributes' in ret
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
+
+# Test the ReturnValues parameter on a PutItem operation. Only two settings
+# are supported for this parameter for this operation: NONE (the default)
+# and ALL_OLD.
+@pytest.mark.xfail(reason="ReturnValues not supported")
+def test_put_item_returnvalues(test_table_s):
+    # By default, the previous value of an item is not returned:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'})
+    assert not 'Attributes' in ret
+    # Using ReturnValues=NONE is the same:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='NONE')
+    assert not 'Attributes' in ret
+    # With ReturnValues=ALL_OLD, the old value of the item is returned
+    # in an "Attributes" attribute:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_OLD')
+    assert ret['Attributes'] == {'p': p, 'a': 'hi'}
+    # Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
+    # are supported by other operations but not by PutItem:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_OLD')
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='ALL_NEW')
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='UPDATED_NEW')
+    # Also, obviously, a non-supported setting "DOG" also returns in error:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='DOG')
+    # The ReturnValues value is case sensitive, so while "NONE" is supported
+    # (and tested above), "none" isn't:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.put_item(Item={'p': p, 'a': 'hello'}, ReturnValues='none')
+
+# Test the ReturnValues parameter on a DeleteItem operation. Only two settings
+# are supported for this parameter for this operation: NONE (the default)
+# and ALL_OLD.
+@pytest.mark.xfail(reason="ReturnValues not supported")
+def test_delete_item_returnvalues(test_table_s):
+    # By default, the previous value of an item is not returned:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.delete_item(Key={'p': p})
+    assert not 'Attributes' in ret
+    # Using ReturnValues=NONE is the same:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='NONE')
+    assert not 'Attributes' in ret
+    # With ReturnValues=ALL_OLD, the old value of the item is returned
+    # in an "Attributes" attribute:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi'})
+    ret=test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_OLD')
+    assert ret['Attributes'] == {'p': p, 'a': 'hi'}
+    # Other ReturnValue options - UPDATED_OLD, ALL_NEW, UPDATED_NEW,
+    # are supported by other operations but not by PutItem:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_OLD')
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': p}, ReturnValues='ALL_NEW')
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': p}, ReturnValues='UPDATE_NEW')
+    # Also, obviously, a non-supported setting "DOG" also returns in error:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': p}, ReturnValues='DOG')
+    # The ReturnValues value is case sensitive, so while "NONE" is supported
+    # (and tested above), "none" isn't:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.delete_item(Key={'p': p}, ReturnValues='none')
+
+# Test the ReturnValues parameter on a UpdateItem operation. All five
+# settings are supported for this parameter for this operation: NONE
+# (the default), ALL_OLD, UPDATED_OLD, ALL_NEW and UPDATED_NEW.
+@pytest.mark.xfail(reason="ReturnValues not supported")
+def test_update_item_returnvalues(test_table_s):
+    # By default, the previous value of an item is not returned:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val',
+        ExpressionAttributeValues={':val': 'cat'})
+    assert not 'Attributes' in ret
+
+    # Using ReturnValues=NONE is the same:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='NONE',
+        UpdateExpression='SET b = :val',
+        ExpressionAttributeValues={':val': 'cat'})
+    assert not 'Attributes' in ret
+
+    # With ReturnValues=ALL_OLD, the entire old value of the item (even
+    # attributes we did not modify) is returned in an "Attributes" attribute:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_OLD',
+        UpdateExpression='SET b = :val',
+        ExpressionAttributeValues={':val': 'cat'})
+    assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'dog'}
+
+    # With ReturnValues=UPDATED_OLD, only the overwritten attributes of the
+    # old item are returned in an "Attributes" attribute:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
+        UpdateExpression='SET b = :val, c = :val2',
+        ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
+    assert ret['Attributes'] == {'b': 'dog'}
+    # Even if an update overwrites an attribute by the same value again,
+    # this is considered an update, and the old value (identical to the
+    # new one) is returned:
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
+        UpdateExpression='SET b = :val',
+        ExpressionAttributeValues={':val': 'cat'})
+    assert ret['Attributes'] == {'b': 'cat'}
+    # Deleting an attribute also counts as overwriting it, of course:
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_OLD',
+        UpdateExpression='REMOVE b')
+    assert ret['Attributes'] == {'b': 'cat'}
+
+    # With ReturnValues=ALL_NEW, the entire new value of the item (including
+    # old attributes we did not modify) is returned:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='ALL_NEW',
+        UpdateExpression='SET b = :val',
+        ExpressionAttributeValues={':val': 'cat'})
+    assert ret['Attributes'] == {'p': p, 'a': 'hi', 'b': 'cat'}
+
+    # With ReturnValues=UPDATED_NEW, only the new value of the updated
+    # attributes are returned. Note that "updated attributes" means
+    # the newly set attributes - it doesn't require that these attributes
+    # have any previous values
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hi', 'b': 'dog'})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
+        UpdateExpression='SET b = :val, c = :val2',
+        ExpressionAttributeValues={':val': 'cat', ':val2': 'hello'})
+    assert ret['Attributes'] == {'b': 'cat', 'c': 'hello'}
+    # Deleting an attribute also counts as overwriting it, but the delete
+    # column is not returned in the response - so it's empty in this case.
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
+        UpdateExpression='REMOVE b')
+    assert not 'Attributes' in ret
+    # In the above examples, UPDATED_NEW is not useful because it just
+    # returns the new values we already know from the request... UPDATED_NEW
+    # becomes more useful in read-modify-write operations:
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 1})
+    ret=test_table_s.update_item(Key={'p': p}, ReturnValues='UPDATED_NEW',
+        UpdateExpression='SET a = a + :val',
+        ExpressionAttributeValues={':val': 1})
+    assert ret['Attributes'] == {'a': 2}
+
+    # A non-supported setting "DOG" also returns in error:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, ReturnValues='DOG',
+            UpdateExpression='SET a = a + :val',
+            ExpressionAttributeValues={':val': 1})
+    # The ReturnValues value is case sensitive, so while "NONE" is supported
+    # (and tested above), "none" isn't:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, ReturnValues='none',
+            UpdateExpression='SET a = a + :val',
+            ExpressionAttributeValues={':val': 1})
--- a/alternator-test/test_scan.py
+++ b/alternator-test/test_scan.py
@@ -0,0 +1,252 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the Scan operation
+
+import pytest
+from botocore.exceptions import ClientError
+from util import random_string, full_scan, full_scan_and_count, multiset
+from boto3.dynamodb.conditions import Attr
+
+# Test that scanning works fine with/without pagination
+def test_scan_basic(filled_test_table):
+    test_table, items = filled_test_table
+    for limit in [None,1,2,4,33,50,100,9007,16*1024*1024]:
+        pos = None
+        got_items = []
+        while True:
+            if limit:
+                response = test_table.scan(Limit=limit, ExclusiveStartKey=pos) if pos else test_table.scan(Limit=limit)
+                assert len(response['Items']) <= limit
+            else:
+                response = test_table.scan(ExclusiveStartKey=pos) if pos else test_table.scan()
+            pos = response.get('LastEvaluatedKey', None)
+            got_items += response['Items']
+            if not pos:
+                break
+
+        assert len(items) == len(got_items)
+        assert multiset(items) == multiset(got_items)
+
+def test_scan_with_paginator(dynamodb, filled_test_table):
+    test_table, items = filled_test_table
+    paginator = dynamodb.meta.client.get_paginator('scan')
+
+    got_items = []
+    for page in paginator.paginate(TableName=test_table.name):
+        got_items += page['Items']
+
+    assert len(items) == len(got_items)
+    assert multiset(items) == multiset(got_items)
+
+    for page_size in [1, 17, 1234]:
+        got_items = []
+        for page in paginator.paginate(TableName=test_table.name, PaginationConfig={'PageSize': page_size}):
+            got_items += page['Items']
+
+    assert len(items) == len(got_items)
+    assert multiset(items) == multiset(got_items)
+
+# Although partitions are scanned in seemingly-random order, inside a
+# partition items must be returned by Scan sorted in sort-key order.
+# This test verifies this, for string sort key. We'll need separate
+# tests for the other sort-key types (number and binary)
+def test_scan_sort_order_string(filled_test_table):
+    test_table, items = filled_test_table
+    got_items = full_scan(test_table)
+    assert len(items) == len(got_items)
+    # Extract just the sort key ("c") from the partition "long"
+    items_long = [x['c'] for x in items if x['p'] == 'long']
+    got_items_long = [x['c'] for x in got_items if x['p'] == 'long']
+    # Verify that got_items_long are already sorted (in string order)
+    assert sorted(got_items_long) == got_items_long
+    # Verify that got_items_long are a sorted version of the expected items_long
+    assert sorted(items_long) == got_items_long
+
+# Test Scan with the AttributesToGet parameter. Result should include the
+# selected attributes only - if one wants the key attributes as well, one
+# needs to select them explicitly. When no key attributes are selected,
+# some items may have *none* of the selected attributes. Those items are
+# returned too, as empty items - they are not outright missing.
+def test_scan_attributes_to_get(dynamodb, filled_test_table):
+    table, items = filled_test_table
+    for wanted in [ ['another'],       # only non-key attributes (one item doesn't have it!)
+                    ['c', 'another'],  # a key attribute (sort key) and non-key
+                    ['p', 'c'],        # entire key
+                    ['nonexistent']    # none of the items have this attribute!
+                   ]:
+        print(wanted)
+        got_items = full_scan(table, AttributesToGet=wanted)
+        expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+        assert multiset(expected_items) == multiset(got_items)
+
+def test_scan_with_attribute_equality_filtering(dynamodb, filled_test_table):
+    table, items = filled_test_table
+    scan_filter = {
+        "attribute" : {
+            "AttributeValueList" : [ "xxxxx" ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+
+    got_items = full_scan(table, ScanFilter=scan_filter)
+    expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" ]
+    assert multiset(expected_items) == multiset(got_items)
+
+    scan_filter = {
+        "another" : {
+            "AttributeValueList" : [ "y" ],
+            "ComparisonOperator": "EQ"
+        },
+        "attribute" : {
+            "AttributeValueList" : [ "xxxxx" ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+
+    got_items = full_scan(table, ScanFilter=scan_filter)
+    expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" and item["another"] == "y" ]
+    assert multiset(expected_items) == multiset(got_items)
+
+# Test that FilterExpression works as expected
+@pytest.mark.xfail(reason="FilterExpression not supported yet")
+def test_scan_filter_expression(filled_test_table):
+    test_table, items = filled_test_table
+
+    got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx"))
+    print(got_items)
+    assert multiset([item for item in items if 'attribute' in item.keys() and item['attribute'] == 'xxxx']) == multiset(got_items)
+
+    got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
+    print(got_items)
+    assert multiset([item for item in items if 'attribute' in item.keys() and 'another' in item.keys() and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
+
+def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):
+    table, items = filled_test_table
+    scan_filter_p = {
+        "p" : {
+            "AttributeValueList" : [ "7" ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+    scan_filter_c = {
+        "c" : {
+            "AttributeValueList" : [ "9" ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+    scan_filter_p_and_attribute = {
+        "p" : {
+            "AttributeValueList" : [ "7" ],
+            "ComparisonOperator": "EQ"
+        },
+        "attribute" : {
+            "AttributeValueList" : [ "x"*7 ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+    scan_filter_c_and_another = {
+        "c" : {
+            "AttributeValueList" : [ "9" ],
+            "ComparisonOperator": "EQ"
+        },
+        "another" : {
+            "AttributeValueList" : [ "y"*16 ],
+            "ComparisonOperator": "EQ"
+        }
+    }
+
+    # Filtering on the hash key
+    got_items = full_scan(table, ScanFilter=scan_filter_p)
+    expected_items = [item for item in items if "p" in item.keys() and item["p"] == "7" ]
+    assert multiset(expected_items) == multiset(got_items)
+
+    # Filtering on the sort key
+    got_items = full_scan(table, ScanFilter=scan_filter_c)
+    expected_items = [item for item in items if "c" in item.keys() and item["c"] == "9"]
+    assert multiset(expected_items) == multiset(got_items)
+
+    # Filtering on the hash key and an attribute
+    got_items = full_scan(table, ScanFilter=scan_filter_p_and_attribute)
+    expected_items = [item for item in items if "p" in item.keys() and "another" in item.keys() and item["p"] == "7" and item["another"] == "y"*16]
+    assert multiset(expected_items) == multiset(got_items)
+
+    # Filtering on the sort key and an attribute
+    got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)
+    expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]
+    assert multiset(expected_items) == multiset(got_items)
+
+# Test the "Select" parameter of Scan. The default Select mode,
+# ALL_ATTRIBUTES, returns items with all their attributes. Other modes
+# allow returning just specific attributes or just counting the results
+# without returning items at all.
+@pytest.mark.xfail(reason="Select not supported yet")
+def test_scan_select(filled_test_table):
+    test_table, items = filled_test_table
+    got_items = full_scan(test_table)
+    # By default, a scan returns all the items, with all their attributes:
+    # query returns all attributes:
+    got_items = full_scan(test_table)
+    assert multiset(items) == multiset(got_items)
+    # Select=ALL_ATTRIBUTES does exactly the same as the default - return
+    # all attributes:
+    got_items = full_scan(test_table, Select='ALL_ATTRIBUTES')
+    assert multiset(items) == multiset(got_items)
+    # Select=ALL_PROJECTED_ATTRIBUTES is not allowed on a base table (it
+    # is just for indexes, when IndexName is specified)
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_scan(test_table, Select='ALL_PROJECTED_ATTRIBUTES')
+    # Select=SPECIFIC_ATTRIBUTES requires that either a AttributesToGet
+    # or ProjectionExpression appears, but then really does nothing beyond
+    # what AttributesToGet and ProjectionExpression already do:
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_scan(test_table, Select='SPECIFIC_ATTRIBUTES')
+    wanted = ['c', 'another']
+    got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', AttributesToGet=wanted)
+    expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
+    assert multiset(expected_items) == multiset(got_items)
+    got_items = full_scan(test_table, Select='SPECIFIC_ATTRIBUTES', ProjectionExpression=','.join(wanted))
+    assert multiset(expected_items) == multiset(got_items)
+    # Select=COUNT just returns a count - not any items
+    (got_count, got_items) = full_scan_and_count(test_table, Select='COUNT')
+    assert got_count == len(items)
+    assert got_items == []
+    # Check that we also get a count in regular scans - not just with
+    # Select=COUNT, but without Select=COUNT we both items and count:
+    (got_count, got_items) = full_scan_and_count(test_table)
+    assert got_count == len(items)
+    assert multiset(items) == multiset(got_items)
+    # Select with some unknown string generates a validation exception:
+    with pytest.raises(ClientError, match='ValidationException'):
+        full_scan(test_table, Select='UNKNOWN')
+
+# Test parallel scan, i.e., the Segments and TotalSegments options.
+# In the following test we check that these parameters allow splitting
+# a scan into multiple parts, and that these parts are in fact disjoint,
+# and their union is the entire contents of the table. We do not actually
+# try to run these queries in *parallel* in this test.
+@pytest.mark.xfail(reason="parallel scan not supported yet")
+def test_scan_parallel(filled_test_table):
+    test_table, items = filled_test_table
+    for nsegments in [1, 2, 17]:
+        print('Testing TotalSegments={}'.format(nsegments))
+        got_items = []
+        for segment in range(nsegments):
+            got_items.extend(full_scan(test_table, TotalSegments=nsegments, Segment=segment))
+        # The following comparison verifies that each of the expected item
+        # in items was returned in one - and just one - of the segments.
+        assert multiset(items) == multiset(got_items)
--- a/alternator-test/test_table.py
+++ b/alternator-test/test_table.py
@@ -0,0 +1,276 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for basic table operations: CreateTable, DeleteTable, ListTables.
+
+import pytest
+from botocore.exceptions import ClientError
+from util import list_tables, test_table_name, create_test_table, random_string
+
+# Utility function for create a table with a given name and some valid
+# schema.. This function initiates the table's creation, but doesn't
+# wait for the table to actually become ready.
+def create_table(dynamodb, name, BillingMode='PAY_PER_REQUEST', **kwargs):
+    return dynamodb.create_table(
+        TableName=name,
+        BillingMode=BillingMode,
+        KeySchema=[
+            {
+                'AttributeName': 'p',
+                'KeyType': 'HASH'
+            },
+            {
+                'AttributeName': 'c',
+                'KeyType': 'RANGE'
+            }
+        ],
+        AttributeDefinitions=[
+            {
+                'AttributeName': 'p',
+                'AttributeType': 'S'
+            },
+            {
+                'AttributeName': 'c',
+                'AttributeType': 'S'
+            },
+        ],
+        **kwargs
+    )
+
+# Utility function for creating a table with a given name, and then deleting
+# it immediately, waiting for these operations to complete. Since the wait
+# uses DescribeTable, this function requires all of CreateTable, DescribeTable
+# and DeleteTable to work correctly.
+# Note that in DynamoDB, table deletion takes a very long time, so tests
+# successfully using this function are very slow.
+def create_and_delete_table(dynamodb, name, **kwargs):
+    table = create_table(dynamodb, name, **kwargs)
+    table.meta.client.get_waiter('table_exists').wait(TableName=name)
+    table.delete()
+    table.meta.client.get_waiter('table_not_exists').wait(TableName=name)
+
+##############################################################################
+
+# Test creating a table, and then deleting it, waiting for each operation
+# to have completed before proceeding. Since the wait uses DescribeTable,
+# this tests requires all of CreateTable, DescribeTable and DeleteTable to
+# function properly in their basic use cases.
+# Unfortunately, this test is extremely slow with DynamoDB because deleting
+# a table is extremely slow until it really happens.
+def test_create_and_delete_table(dynamodb):
+    create_and_delete_table(dynamodb, 'alternator_test')
+
+# DynamoDB documentation specifies that table names must be 3-255 characters,
+# and match the regex [a-zA-Z0-9._-]+. Names not matching these rules should
+# be rejected, and no table be created.
+def test_create_table_unsupported_names(dynamodb):
+    from botocore.exceptions import ParamValidationError, ClientError
+    # Intererstingly, the boto library tests for names shorter than the
+    # minimum length (3 characters) immediately, and failure results in
+    # ParamValidationError. But the other invalid names are passed to
+    # DynamoDB, which returns an HTTP response code, which results in a
+    # CientError exception.
+    with pytest.raises(ParamValidationError):
+        create_table(dynamodb, 'n')
+    with pytest.raises(ParamValidationError):
+        create_table(dynamodb, 'nn')
+    with pytest.raises(ClientError, match='ValidationException'):
+        create_table(dynamodb, 'n' * 256)
+    with pytest.raises(ClientError, match='ValidationException'):
+        create_table(dynamodb, 'nyh@test')
+
+# On the other hand, names following the above rules should be accepted. Even
+# names which the Scylla rules forbid, such as a name starting with .
+def test_create_and_delete_table_non_scylla_name(dynamodb):
+    create_and_delete_table(dynamodb, '.alternator_test')
+
+# names with 255 characters are allowed in Dynamo, but they are not currently
+# supported in Scylla because we create a directory whose name is the table's
+# name followed by 33 bytes (underscore and UUID). So currently, we only
+# correctly support names with length up to 222.
+def test_create_and_delete_table_very_long_name(dynamodb):
+    # In the future, this should work:
+    #create_and_delete_table(dynamodb, 'n' * 255)
+    # But for now, only 222 works:
+    create_and_delete_table(dynamodb, 'n' * 222)
+    # We cannot test the following on DynamoDB because it will succeed
+    # (DynamoDB allows up to 255 bytes)
+    #with pytest.raises(ClientError, match='ValidationException'):
+    #   create_table(dynamodb, 'n' * 223)
+
+# Tests creating a table with an invalid schema should return a
+# ValidationException error.
+def test_create_table_invalid_schema(dynamodb):
+    # The name of the table "created" by this test shouldn't matter, the
+    # creation should not succeed anyway.
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(
+            TableName='name_doesnt_matter',
+            BillingMode='PAY_PER_REQUEST',
+            KeySchema=[
+                { 'AttributeName': 'p', 'KeyType': 'HASH' },
+                { 'AttributeName': 'c', 'KeyType': 'HASH' }
+            ],
+            AttributeDefinitions=[
+                { 'AttributeName': 'p', 'AttributeType': 'S' },
+                { 'AttributeName': 'c', 'AttributeType': 'S' },
+            ],
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(
+            TableName='name_doesnt_matter',
+            BillingMode='PAY_PER_REQUEST',
+            KeySchema=[
+                { 'AttributeName': 'p', 'KeyType': 'RANGE' },
+                { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+            ],
+            AttributeDefinitions=[
+                { 'AttributeName': 'p', 'AttributeType': 'S' },
+                { 'AttributeName': 'c', 'AttributeType': 'S' },
+            ],
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(
+            TableName='name_doesnt_matter',
+            BillingMode='PAY_PER_REQUEST',
+            KeySchema=[
+                { 'AttributeName': 'c', 'KeyType': 'RANGE' }
+            ],
+            AttributeDefinitions=[
+                { 'AttributeName': 'c', 'AttributeType': 'S' },
+            ],
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(
+            TableName='name_doesnt_matter',
+            BillingMode='PAY_PER_REQUEST',
+            KeySchema=[
+                { 'AttributeName': 'c', 'KeyType': 'HASH' },
+                { 'AttributeName': 'p', 'KeyType': 'RANGE' },
+                { 'AttributeName': 'z', 'KeyType': 'RANGE' }
+            ],
+            AttributeDefinitions=[
+                { 'AttributeName': 'c', 'AttributeType': 'S' },
+                { 'AttributeName': 'p', 'AttributeType': 'S' },
+                { 'AttributeName': 'z', 'AttributeType': 'S' }
+            ],
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(
+            TableName='name_doesnt_matter',
+            BillingMode='PAY_PER_REQUEST',
+            KeySchema=[
+                { 'AttributeName': 'c', 'KeyType': 'HASH' },
+            ],
+            AttributeDefinitions=[
+                { 'AttributeName': 'z', 'AttributeType': 'S' }
+            ],
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(
+            TableName='name_doesnt_matter',
+            BillingMode='PAY_PER_REQUEST',
+            KeySchema=[
+                { 'AttributeName': 'k', 'KeyType': 'HASH' },
+            ],
+            AttributeDefinitions=[
+                { 'AttributeName': 'k', 'AttributeType': 'Q' }
+            ],
+        )
+
+# Test that trying to create a table that already exists fails in the
+# appropriate way (ResourceInUseException)
+def test_create_table_already_exists(dynamodb, test_table):
+    with pytest.raises(ClientError, match='ResourceInUseException'):
+        create_table(dynamodb, test_table.name)
+
+# Test that BillingMode error path works as expected - only the values
+# PROVISIONED or PAY_PER_REQUEST are allowed. The former requires
+# ProvisionedThroughput to be set, the latter forbids it.
+# If BillingMode is outright missing, it defaults (as original
+# DynamoDB did) to PROVISIONED so ProvisionedThroughput is allowed.
+def test_create_table_billing_mode_errors(dynamodb, test_table):
+    with pytest.raises(ClientError, match='ValidationException'):
+        create_table(dynamodb, test_table_name(), BillingMode='unknown')
+    # billing mode is case-sensitive
+    with pytest.raises(ClientError, match='ValidationException'):
+        create_table(dynamodb, test_table_name(), BillingMode='pay_per_request')
+    # PAY_PER_REQUEST cannot come with a ProvisionedThroughput:
+    with pytest.raises(ClientError, match='ValidationException'):
+        create_table(dynamodb, test_table_name(),
+            BillingMode='PAY_PER_REQUEST', ProvisionedThroughput={'ReadCapacityUnits': 10, 'WriteCapacityUnits': 10})
+    # On the other hand, PROVISIONED requires ProvisionedThroughput:
+    # By the way, ProvisionedThroughput not only needs to appear, it must
+    # have both ReadCapacityUnits and WriteCapacityUnits - but we can't test
+    # this with boto3, because boto3 has its own verification that if
+    # ProvisionedThroughput is given, it must have the correct form.
+    with pytest.raises(ClientError, match='ValidationException'):
+        create_table(dynamodb, test_table_name(), BillingMode='PROVISIONED')
+    # If BillingMode is completely missing, it defaults to PROVISIONED, so
+    # ProvisionedThroughput is required
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.create_table(TableName=test_table_name(),
+            KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
+            AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }])
+
+# Our first implementation had a special column name called "attrs" where
+# we stored a map for all non-key columns. If the user tried to name one
+# of the key columns with this same name, the result was a disaster - Scylla
+# goes into a bad state after trying to write data with two updates to same-
+# named columns.
+special_column_name1 = 'attrs'
+special_column_name2 = ':attrs'
+@pytest.fixture(scope="session")
+def test_table_special_column_name(dynamodb):
+    table = create_test_table(dynamodb,
+        KeySchema=[
+            { 'AttributeName': special_column_name1, 'KeyType': 'HASH' },
+            { 'AttributeName': special_column_name2, 'KeyType': 'RANGE' }
+        ],
+        AttributeDefinitions=[
+            { 'AttributeName': special_column_name1, 'AttributeType': 'S' },
+            { 'AttributeName': special_column_name2, 'AttributeType': 'S' },
+        ],
+    )
+    yield table
+    table.delete()
+@pytest.mark.xfail(reason="special attrs column not yet hidden correctly")
+def test_create_table_special_column_name(test_table_special_column_name):
+    s = random_string()
+    c = random_string()
+    h = random_string()
+    expected = {special_column_name1: s, special_column_name2: c, 'hello': h}
+    test_table_special_column_name.put_item(Item=expected)
+    got = test_table_special_column_name.get_item(Key={special_column_name1: s, special_column_name2: c}, ConsistentRead=True)['Item']
+    assert got == expected
+
+# Test that all tables we create are listed, and pagination works properly.
+# Note that the DyanamoDB setup we run this against may have hundreds of
+# other tables, for all we know. We just need to check that the tables we
+# created are indeed listed.
+def test_list_tables_paginated(dynamodb, test_table, test_table_s, test_table_b):
+    my_tables_set = {table.name for table in [test_table, test_table_s, test_table_b]}
+    for limit in [1, 2, 3, 4, 50, 100]:
+        print("testing limit={}".format(limit))
+        list_tables_set = set(list_tables(dynamodb, limit))
+        assert my_tables_set.issubset(list_tables_set)
+
+# Test that pagination limit is validated
+def test_list_tables_wrong_limit(dynamodb):
+    # lower limit (min. 1) is imposed by boto3 library checks
+    with pytest.raises(ClientError, match='ValidationException'):
+        dynamodb.meta.client.list_tables(Limit=101)
--- a/alternator-test/test_update_expression.py
+++ b/alternator-test/test_update_expression.py
@@ -0,0 +1,854 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the UpdateItem operations with an UpdateExpression parameter
+
+import random
+import string
+import pytest
+from botocore.exceptions import ClientError
+from decimal import Decimal
+from util import random_string
+
+# The simplest test of using UpdateExpression to set a top-level attribute,
+# instead of the older AttributeUpdates parameter.
+# Checks only one "SET" action in an UpdateExpression.
+def test_update_expression_set(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val1',
+        ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
+
+# An empty UpdateExpression is NOT allowed, and generates a "The expression
+# can not be empty" error. This contrasts with an empty AttributeUpdates which
+# is allowed, and results in the creation of an empty item if it didn't exist
+# yet (see test_empty_update()).
+def test_update_expression_empty(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='')
+
+# A basic test with multiple SET actions in one expression
+def test_update_expression_set_multi(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET x = :val1, y = :val1',
+        ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': 4, 'y': 4}
+
+# SET can be used to copy an existing attribute to a new one
+def test_update_expression_set_copy(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = a')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
+    # Copying an non-existing attribute generates an error
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = z')
+    # It turns out that attributes to be copied are read before the SET
+    # starts to write, so "SET x = :val1, y = x" does not work...
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET x = :val1, y = x', ExpressionAttributeValues={':val1': 4})
+    # SET z=z does nothing if z exists, or fails if it doesn't
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = a')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET z = z')
+    # We can also use name references in either LHS or RHS of SET, e.g.,
+    # SET #one = #two. We need to also take the references used in the RHS
+    # when we want to complain about unused names in ExpressionAttributeNames.
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
+         ExpressionAttributeNames={'#one': 'c', '#two': 'a'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello', 'c': 'hello'}
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
+             ExpressionAttributeNames={'#one': 'c', '#two': 'a', '#three': 'z'})
+
+# Test for read-before-write action where the value to be read is nested inside a - operator
+def test_update_expression_set_nested_copy(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #n = :two',
+         ExpressionAttributeNames={'#n': 'n'}, ExpressionAttributeValues={':two': 2})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nn = :seven - #n',
+         ExpressionAttributeNames={'#nn': 'nn', '#n': 'n'}, ExpressionAttributeValues={':seven': 7})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5}
+
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnn = :nnn',
+         ExpressionAttributeNames={'#nnn': 'nnn'}, ExpressionAttributeValues={':nnn': [2,4]})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnnn = list_append(:val1, #nnn)',
+         ExpressionAttributeNames={'#nnnn': 'nnnn', '#nnn': 'nnn'}, ExpressionAttributeValues={':val1': [1,3]})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5, 'nnn': [2,4], 'nnnn': [1,3,2,4]}
+
+# Test for getting a key value with read-before-write
+def test_update_expression_set_key(test_table_sn):
+    p = random_string()
+    test_table_sn.update_item(Key={'p': p, 'c': 7});
+    test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #n = #p',
+         ExpressionAttributeNames={'#n': 'n', '#p': 'p'})
+    test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #nn = #c + #c',
+         ExpressionAttributeNames={'#nn': 'nn', '#c': 'c'})
+    assert test_table_sn.get_item(Key={'p': p, 'c': 7}, ConsistentRead=True)['Item'] == {'p': p, 'c': 7, 'n': p, 'nn': 14}
+
+# Simple test for the "REMOVE" action
+def test_update_expression_remove(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hi'}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hi'}
+
+# Demonstrate that although all DynamoDB examples give UpdateExpression
+# action names in uppercase - e.g., "SET", it can actually be any case.
+def test_update_expression_action_case(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = :val1', ExpressionAttributeValues={':val1': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='sEt b = :val1', ExpressionAttributeValues={':val1': 5})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
+
+# Demonstrate that whitespace is ignored in UpdateExpression parsing.
+def test_update_expression_action_whitespace(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='  set   b=:val1  ', ExpressionAttributeValues={':val1': 5})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
+
+# In UpdateExpression, the attribute name can appear directly in the expression
+# (without a "#placeholder" notation) only if it is a single "token" as
+# determined by DynamoDB's lexical analyzer rules: Such token is composed of
+# alphanumeric characters whose first character must be alphabetic. Other
+# names cause the parser to see multiple tokens, and produce syntax errors.
+def test_update_expression_name_token(test_table_s):
+    p = random_string()
+    # Alphanumeric names starting with an alphabetical character work
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET alnum = :val1', ExpressionAttributeValues={':val1': 1})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['alnum'] == 1
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 2})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['Alpha_Numeric_123'] == 2
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET A123_ = :val1', ExpressionAttributeValues={':val1': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['A123_'] == 3
+    # But alphanumeric names cannot start with underscore or digits.
+    # DynamoDB's lexical analyzer doesn't recognize them, and produces
+    # a ValidationException looking like:
+    #   Invalid UpdateExpression: Syntax error; token: "_", near: "SET _123"
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _123 = :val1', ExpressionAttributeValues={':val1': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _abc = :val1', ExpressionAttributeValues={':val1': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123a = :val1', ExpressionAttributeValues={':val1': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123 = :val1', ExpressionAttributeValues={':val1': 3})
+    # Various other non-alpha-numeric characters, split a token and NOT allowed
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi-there = :val1', ExpressionAttributeValues={':val1': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi$there = :val1', ExpressionAttributeValues={':val1': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET "hithere" = :val1', ExpressionAttributeValues={':val1': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET !hithere = :val1', ExpressionAttributeValues={':val1': 3})
+
+    # In addition to the literal names, DynamoDB also allows references to any
+    # name, using the "#reference" syntax. It turns out the reference name is
+    # also a token following the rules as above, with one interesting point:
+    # since "#" already started the token, the next character may be any
+    # alphanumeric and doesn't need to be only alphabetical.
+    # Note that the reference target - the actual attribute name - can include
+    # absolutely any characters, and we use silly_name below as an example
+    silly_name = '3can include any character!.#='
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 4}, ExpressionAttributeNames={'#Alpha_Numeric_123': silly_name})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 4
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123a = :val1', ExpressionAttributeValues={':val1': 5}, ExpressionAttributeNames={'#123a': silly_name})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 5
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123 = :val1', ExpressionAttributeValues={':val1': 6}, ExpressionAttributeNames={'#123': silly_name})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 6
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #_ = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#_': silly_name})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 7
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #hi-there = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#hi-there': silly_name})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #!hi = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#!hi': silly_name})
+    # Just a "#" is not enough as a token. Interestingly, DynamoDB will
+    # find the bad name in ExpressionAttributeNames before it actually tries
+    # to parse UpdateExpression, but we can verify the parse fails too by
+    # using a valid but irrelevant name in ExpressionAttributeNames:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#': silly_name})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#a': silly_name})
+
+    # There is also the value references, ":reference", for the right-hand
+    # side of an assignment. These have similar naming rules like "#reference".
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :Alpha_Numeric_123', ExpressionAttributeValues={':Alpha_Numeric_123': 8})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 8
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123a', ExpressionAttributeValues={':123a': 9})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123', ExpressionAttributeValues={':123': 10})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 10
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :_', ExpressionAttributeValues={':_': 11})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 11
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :hi!there', ExpressionAttributeValues={':hi!there': 12})
+    # Just a ":" is not enough as a token.
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':': 7})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':a': 7})
+    # Trying to use a :reference on the left-hand side of an assignment will
+    # not work. In DynamoDB, it's a different type of token (and generates
+    # syntax error).
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET :a = :b', ExpressionAttributeValues={':a': 1, ':b': 2})
+
+# Multiple actions are allowed in one expression, but actions are divided
+# into clauses (SET, REMOVE, DELETE, ADD) and each of those can only appear
+# once.
+def test_update_expression_multi(test_table_s):
+    p = random_string()
+    # We can have two SET actions in one SET clause:
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 2}
+    # But not two SET clauses - we get error "The "SET" section can only be used once in an update expression"
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
+    # We can have a REMOVE and a SET clause (note no comma between clauses):
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = :val2', ExpressionAttributeValues={':val2': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = :val2 REMOVE b', ExpressionAttributeValues={':val2': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 3}
+    # The same clause (e.g., SET) cannot be used twice, even if interleaved with something else
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 REMOVE a SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
+
+# Trying to modify the same item twice in the same update is forbidden.
+# For "SET a=:v REMOVE a" DynamoDB says: "Invalid UpdateExpression: Two
+# document paths overlap with each other; must remove or rewrite one of
+# these paths; path one: [a], path two: [a]". 
+# It is actually good for Scylla that such updates are forbidden, because had
+# we allowed "SET a=:v REMOVE a" the result would be surprising - because data
+# wins over a delete with the same timestamp, so "a" would be set despite the
+# REMOVE command appearing later in the command line.
+def test_update_expression_multi_overlap(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
+    # Neither "REMOVE a SET a = :v" nor "SET a = :v REMOVE a" are allowed:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET a = :v', ExpressionAttributeValues={':v': 'hi'})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v REMOVE a', ExpressionAttributeValues={':v': 'yo'})
+    # It's also not allowed to set a twice in the same clause
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v1, a = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'})
+    # Obviously, the paths are compared after the name references are evaluated
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a1 = :v1, #a2 = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'}, ExpressionAttributeNames={'#a1': 'a', '#a2': 'a'})
+
+# The problem isn't just with identical paths - we can't modify two paths that
+# "overlap" in the sense that one is the ancestor of the other.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_update_expression_multi_overlap_nested(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*overlap'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, a.b = :val2',
+            ExpressionAttributeValues={':val1': {'b': 7}, ':val2': 'there'})
+    test_table_s.put_item(Item={'p': p, 'a': {'b': {'c': 2}}})
+    with pytest.raises(ClientError, match='ValidationException.*overlap'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.b = :val1, a.b.c = :val2',
+            ExpressionAttributeValues={':val1': 'hi', ':val2': 'there'})
+
+# In the previous test we saw that *modifying* the same item twice in the same
+# update is forbidden; But it is allowed to *read* an item in the same update
+# that also modifies it, and we check this here.
+def test_update_expression_multi_with_copy(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
+    # "REMOVE a SET b = a" works: as noted in test_update_expression_set_copy()
+    # the value of 'a' is read before the actual REMOVE operation happens.
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = a')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = b REMOVE b')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 'hello'}
+
+
+# Test case where a :val1 is referenced, without being defined
+def test_update_expression_set_missing_value(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = :val1',
+            ExpressionAttributeValues={':val2': 4})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = :val1')
+
+# It is forbidden for ExpressionAttributeValues to contain values not used
+# by the expression. DynamoDB produces an error like: "Value provided in
+# ExpressionAttributeValues unused in expressions: keys: {:val1}"
+def test_update_expression_spurious_value(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1',
+            ExpressionAttributeValues={':val1': 3, ':val2': 4})
+
+# Test case where a #name is referenced, without being defined
+def test_update_expression_set_missing_name(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET #name = :val1',
+            ExpressionAttributeValues={':val2': 4},
+            ExpressionAttributeNames={'#wrongname': 'hello'})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET #name = :val1',
+            ExpressionAttributeValues={':val2': 4})
+
+# It is forbidden for ExpressionAttributeNames to contain names not used
+# by the expression. DynamoDB produces an error like: "Value provided in
+# ExpressionAttributeNames unused in expressions: keys: {#b}"
+def test_update_expression_spurious_name(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
+            ExpressionAttributeNames={'#a': 'hello', '#b': 'hi'},
+            ExpressionAttributeValues={':val1': 3, ':val2': 4})
+
+# Test that the key attributes (hash key or sort key) cannot be modified
+# by an update
+def test_update_expression_cannot_modify_key(test_table):
+    p = random_string()
+    c = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c},
+            UpdateExpression='SET p = :val1', ExpressionAttributeValues={':val1': 4})
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c},
+            UpdateExpression='SET c = :val1', ExpressionAttributeValues={':val1': 4})
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE p')
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE c')
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c},
+            UpdateExpression='ADD p :val1', ExpressionAttributeValues={':val1': 4})
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c},
+            UpdateExpression='ADD c :val1', ExpressionAttributeValues={':val1': 4})
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c},
+            UpdateExpression='DELETE p :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
+    with pytest.raises(ClientError, match='ValidationException.*key'):
+        test_table.update_item(Key={'p': p, 'c': c},
+            UpdateExpression='DELETE c :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
+    # As sanity check, verify we *can* modify a non-key column
+    test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='SET a = :val1', ExpressionAttributeValues={':val1': 4})
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 4}
+    test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE a')
+    assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c}
+
+# Test that trying to start an expression with some nonsense like HELLO
+# instead of SET, REMOVE, ADD or DELETE, fails.
+def test_update_expression_non_existant_clause(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='HELLO b = :val1',
+            ExpressionAttributeValues={':val1': 4})
+
+# Test support for "SET a = :val1 + :val2", "SET a = :val1 - :val2"
+# Only exactly these combinations work - e.g., it's a syntax error to
+# try to add three. Trying to add a string fails.
+def test_update_expression_plus_basic(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val1 + :val2',
+        ExpressionAttributeValues={':val1': 4, ':val2': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 7}
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val1 - :val2',
+        ExpressionAttributeValues={':val1': 5, ':val2': 2})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
+    # Only the addition of exactly two values is supported!
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = :val1 + :val2 + :val3',
+            ExpressionAttributeValues={':val1': 4, ':val2': 3, ':val3': 2})
+    # Only numeric values can be added - other things like strings or lists
+    # cannot be added, and we get an error like "Incorrect operand type for
+    # operator or function; operator or function: +, operand type: S".
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = :val1 + :val2',
+            ExpressionAttributeValues={':val1': 'dog', ':val2': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = :val1 + :val2',
+            ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
+
+# While most of the Alternator code just saves high-precision numbers
+# unchanged, the "+" and "-" operations need to calculate with them, and
+# we should check the calculation isn't done with some lower-precision
+# representation, e.g., double
+def test_update_expression_plus_precision(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val1 + :val2',
+        ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("10000000000000000000001")}
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val2 - :val1',
+        ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("9999999999999999999999")}
+
+# Test support for "SET a = b + :val2" et al., i.e., a version of the
+# above test_update_expression_plus_basic with read before write.
+def test_update_expression_plus_rmw(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 2})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = a + :val1',
+        ExpressionAttributeValues={':val1': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 5
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = :val1 + a',
+        ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = :val1 + a',
+        ExpressionAttributeValues={':val1': 1})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 10
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = b + a')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 19
+
+# Test the list_append() function in SET, for the most basic use case of
+# concatenating two value references. Because this is the first test of
+# functions in SET, we also test some generic features of how functions
+# are parsed.
+def test_update_expression_list_append_basic(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = list_append(:val1, :val2)',
+        ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': [4, 'hello', 'hi', 7]}
+    # Unlike the operation name "SET", function names are case-sensitive!
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = LIST_APPEND(:val1, :val2)',
+            ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
+    # As usual, spaces are ignored by the parser
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = list_append(:val1, :val2)',
+        ExpressionAttributeValues={':val1': ['a'], ':val2': ['b']})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['a', 'b']}
+    # The list_append function only allows two parameters. The parser can
+    # correctly parse fewer or more, but then an error is generated: "Invalid
+    # UpdateExpression: Incorrect number of operands for operator or function;
+    # operator or function: list_append, number of operands: 1".
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = list_append(:val1)',
+            ExpressionAttributeValues={':val1': ['a']})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = list_append(:val1, :val2, :val3)',
+            ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': [7], ':val3': ['a']})
+    # If list_append is used on value which isn't a list, we get
+    # error: "Invalid UpdateExpression: Incorrect operand type for operator
+    # or function; operator or function: list_append, operand type: S"
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = list_append(:val1, :val2)',
+            ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': 'hi'})
+
+# Additional list_append() tests, also using attribute paths as parameters
+# (i.e., read-modify-write).
+def test_update_expression_list_append(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = :val1',
+        ExpressionAttributeValues={':val1': ['hi', 2]})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==['hi', 2]
+    # Often, list_append is used to append items to a list attribute
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = list_append(a, :val1)',
+        ExpressionAttributeValues={':val1': [4, 'hello']})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['hi', 2, 4, 'hello']
+    # But it can also be used to just concatenate in other ways:
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = list_append(:val1, a)',
+        ExpressionAttributeValues={':val1': ['dog']})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello']
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = list_append(a, :val1)',
+        ExpressionAttributeValues={':val1': ['cat']})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == ['dog', 'hi', 2, 4, 'hello', 'cat']
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET c = list_append(a, b)')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['c'] == ['dog', 'hi', 2, 4, 'hello', 'dog', 'hi', 2, 4, 'hello', 'cat']
+    # As usual, #references are allowed instead of inline names:
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET #name1 = list_append(#name2,:val1)',
+        ExpressionAttributeValues={':val1': [8]},
+        ExpressionAttributeNames={'#name1': 'a', '#name2': 'a'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello', 8]
+
+# Test the "if_not_exists" function in SET
+# The test also checks additional features of function-call parsing.
+def test_update_expression_if_not_exists(test_table_s):
+    p = random_string()
+    # Since attribute a doesn't exist, set it:
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = if_not_exists(a, :val1)',
+        ExpressionAttributeValues={':val1': 2})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
+    # Now the attribute does exist, so set does nothing:
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = if_not_exists(a, :val1)',
+        ExpressionAttributeValues={':val1': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
+    # if_not_exists can also be used to check one attribute and set another,
+    # but note that if_not_exists(a, :val) means a's value if it exists,
+    # otherwise :val!
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = if_not_exists(c, :val1)',
+        ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = if_not_exists(c, :val1)',
+        ExpressionAttributeValues={':val1': 5})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = if_not_exists(a, :val1)',
+        ExpressionAttributeValues={':val1': 6})
+    # note how because 'a' does exist, its value is copied, overwriting b's
+    # value:
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 2
+    # The parser expects function parameters to be value references, paths,
+    # or nested call to functions. Other crap will cause syntax errors:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = if_not_exists(non@sense, :val1)',
+            ExpressionAttributeValues={':val1': 6})
+    # if_not_exists() requires that the first parameter be a path. However,
+    # the parser doesn't know this, and allows for a function parameter
+    # also a value reference or a function call. If try one of these other
+    # things the parser succeeds, but we get a later error, looking like:
+    # "Invalid UpdateExpression: Operator or function requires a document
+    # path; operator or function: if_not_exists"
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = if_not_exists(if_not_exists(a, :val2), :val1)',
+            ExpressionAttributeValues={':val1': 6, ':val2': 3})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = if_not_exists(:val2, :val1)',
+            ExpressionAttributeValues={':val1': 6, ':val2': 3})
+    # Surprisingly, if the wrong argument is a :val value reference, the
+    # parser first tries to look it up in ExpressionAttributeValues (and
+    # fails if it's missing), before realizing any value reference would be
+    # wrong... So the following fails like the above does - but with a
+    # different error message (which we do not check here): "Invalid
+    # UpdateExpression: An expression attribute value used in expression
+    # is not defined; attribute value: :val2"
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET b = if_not_exists(:val2, :val1)',
+            ExpressionAttributeValues={':val1': 6})
+
+# When the expression parser parses a function call f(value, value), each
+# value may itself be a function call - ad infinitum. So expressions like
+# list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper
+# nesting.
+@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")
+def test_update_expression_function_nesting(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
+            ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['cat', 'dog']})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog']
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
+            ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog', '1', '2']
+    # I don't understand why the following expression isn't accepted, but it
+    # isn't! It produces a "Invalid UpdateExpression: The function is not
+    # allowed to be used this way in an expression; function: list_append".
+    # I don't know how to explain it. In any case, the *parsing* works -
+    # this is not a syntax error - the failure is in some verification later.
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = list_append(list_append(:val1, :val2), :val3)',
+                ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi']})
+    # Ditto, the following passes the parser but fails some later check with
+    # the same error message as above.
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = list_append(list_append(list_append(:val1, :val2), :val3), :val4)',
+                ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi'], ':val4': ['yo']})
+
+# Verify how in SET expressions, "+" (or "-") nests with functions.
+# We discover that f(x)+f(y) works but f(x+y) does NOT (results in a syntax
+# error on the "+"). This means that the parser has two separate rules:
+# 1.  set_action: SET path = value + value
+# 2.  value: VALREF | NAME | NAME (value, ...)
+def test_update_expression_function_plus_nesting(test_table_s):
+    p = random_string()
+    # As explained above, this - with "+" outside the expression, works:
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='SET b = if_not_exists(b, :val1)+:val2',
+            ExpressionAttributeValues={':val1': 2, ':val2': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
+    # ...but this - with the "+" inside an expression parameter, is a syntax
+    # error:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET c = if_not_exists(c, :val1+:val2)',
+                ExpressionAttributeValues={':val1': 5, ':val2': 4})
+
+# This test tries to use an undefined function "f". This, obviously, fails,
+# but where we to actually print the error we would see "Invalid
+# UpdateExpression: Invalid function name; function: f". Not a syntax error.
+# This means that the parser accepts any alphanumeric name as a function
+# name, and only later use of this function fails because it's not one of
+# the supported file.
+def test_update_expression_unknown_function(test_table_s):
+    p = random_string()
+    with pytest.raises(ClientError, match='ValidationException.*f'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = f(b,c,d)')
+    with pytest.raises(ClientError, match='ValidationException.*f123_hi'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = f123_hi(b,c,d)')
+    # Just like unreferenced column names parsed by the DynamoDB parser,
+    # function names must also start with an alphabetic character. Trying
+    # to use _f as a function name will result with an actual syntax error,
+    # on the "_" token.
+    with pytest.raises(ClientError, match='ValidationException.*yntax error'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='SET a = _f(b,c,d)')
+
+# Test "ADD" operation for numbers
+def test_update_expression_add_numbers(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 3, 'b': 'hi'})
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='ADD a :val1',
+        ExpressionAttributeValues={':val1': 4})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 7
+    # If the value to be added isn't a number, we get an error like "Invalid
+    # UpdateExpression: Incorrect operand type for operator or function;
+    # operator: ADD, operand type: STRING".
+    with pytest.raises(ClientError, match='ValidationException.*type'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='ADD a :val1',
+            ExpressionAttributeValues={':val1': 'hello'})
+    # Similarly, if the attribute we're adding to isn't a number, we get an
+    # error like "An operand in the update expression has an incorrect data
+    # type"
+    with pytest.raises(ClientError, match='ValidationException.*type'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='ADD b :val1',
+            ExpressionAttributeValues={':val1': 1})
+
+# Test "ADD" operation for sets
+def test_update_expression_add_sets(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='ADD a :val1',
+        ExpressionAttributeValues={':val1': set(['pig'])})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog', 'cat', 'mouse', 'pig'])
+
+    # TODO: right now this test won't detect duplicated values in the returned result,
+    # because boto3 parses a set out of the returned JSON anyway. This check should leverage
+    # lower level API (if exists) to ensure that the JSON contains no duplicates
+    # in the set representation. It has been verified manually.
+    test_table_s.put_item(Item={'p': p, 'a': set(['beaver', 'lynx', 'coati']), 'b': 'hi'})
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='ADD a :val1',
+        ExpressionAttributeValues={':val1': set(['coati', 'beaver', 'badger'])})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['beaver', 'badger', 'lynx', 'coati'])
+
+    # The value to be added needs to be a set of the same type - it can't
+    # be a single element or anything else. If the value has the wrong type,
+    # we get an error like "Invalid UpdateExpression: Incorrect operand type
+    # for operator or function; operator: ADD, operand type: STRING".
+    with pytest.raises(ClientError, match='ValidationException.*type'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='ADD a :val1',
+            ExpressionAttributeValues={':val1': 'hello'})
+
+# Test "DELETE" operation for sets
+def test_update_expression_delete_sets(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='DELETE a :val1',
+        ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
+    # Deleting an element not present in the set is not an error - it just
+    # does nothing
+    test_table_s.update_item(Key={'p': p},
+        UpdateExpression='DELETE a :val1',
+        ExpressionAttributeValues={':val1': set(['pig'])})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
+    # The value to be deleted must be a set of the same type - it can't
+    # be a single element or anything else. If the value has the wrong type,
+    # we get an error like "Invalid UpdateExpression: Incorrect operand type
+    # for operator or function; operator: DELETE, operand type: STRING".
+    with pytest.raises(ClientError, match='ValidationException.*type'):
+        test_table_s.update_item(Key={'p': p},
+            UpdateExpression='DELETE a :val1',
+            ExpressionAttributeValues={':val1': 'hello'})
+
+######## Tests for paths and nested attribute updates:
+
+# A dot inside a name in ExpressionAttributeNames is a literal dot, and
+# results in a top-level attribute with an actual dot in its name - not
+# a nested attribute path.
+def test_update_expression_dot_in_name(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
+        ExpressionAttributeValues={':val1': 3},
+        ExpressionAttributeNames={'#a': 'a.b'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}
+
+# A basic test for direct update of a nested attribute: One of the top-level
+# attributes is itself a document, and we update only one of that document's
+# nested attributes.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_update_expression_nested_attribute_dot(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
+        ExpressionAttributeValues={':val1': 7})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}, 'd': 5}
+    # Of course we can also add new nested attributes, not just modify
+    # existing ones:
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.d = :val1',
+        ExpressionAttributeValues={':val1': 3})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7, 'd': 3}, 'd': 5}
+
+# Similar test, for a list: one of the top-level attributes is a list, we
+# can update one of its items.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_update_expression_nested_attribute_index(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[1] = :val1',
+        ExpressionAttributeValues={':val1': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'hello', 'three']}
+
+# Test that just like happens in top-level attributes, also in nested
+# attributes, setting them replaces the old value - potentially an entire
+# nested document, by the whole value (which may have a different type)
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_update_expression_nested_different_type(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': {'one': 1, 'two': 2}}})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
+        ExpressionAttributeValues={':val1': 7})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}}
+
+# Yet another test of a nested attribute update. This one uses deeper
+# level of nesting (dots and indexes), adds #name references to the mix.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_update_expression_nested_deep(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}}]}})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c[1].#name.y[1] = :val1',
+        ExpressionAttributeValues={':val1': 9}, ExpressionAttributeNames={'#name': 'x'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==  {'b': 3, 'c': ['hi', {'x': {'y': [3, 9, 7]}}]}
+    # A deep path can also appear on the right-hand-side of an assignment
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.z = a.c[1].#name.y[1]',
+        ExpressionAttributeNames={'#name': 'x'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a']['z'] ==  9
+
+# A REMOVE operation can be used to remove nested attributes, and also
+# individual list items.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_update_expression_nested_remove(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}, 'q': 2}]}})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.c[1].x.y[1], a.c[1].q')
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==  {'b': 3, 'c': ['hi', {'x': {'y': [3, 7]}}]}
+
+# The DynamoDB documentation specifies: "When you use SET to update a list
+# element, the contents of that element are replaced with the new data that
+# you specify. If the element does not already exist, SET will append the
+# new element at the end of the list."
+# So if we take a three-element list a[7], and set a[7], the new element
+# will be put at the end of the list, not position 7 specifically.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_nested_attribute_update_array_out_of_bounds(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[7] = :val1',
+        ExpressionAttributeValues={':val1': 'hello'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello']}
+    # The DynamoDB documentation also says: "If you add multiple elements
+    # in a single SET operation, the elements are sorted in order by element
+    # number.
+    test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[84] = :val1, a[37] = :val2',
+        ExpressionAttributeValues={':val1': 'a1', ':val2': 'a2'})
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello', 'a2', 'a1']}
+
+# Test what happens if we try to write to a.b, which would only make sense if
+# a were a nested document, but a doesn't exist, or exists and is NOT a nested
+# document but rather a scalar or list or something.
+# DynamoDB actually detects this case and prints an error:
+#   ClientError: An error occurred (ValidationException) when calling the
+#   UpdateItem operation: The document path provided in the update expression
+#   is invalid for update
+# Because Scylla doesn't read before write, it cannot detect this as an error,
+# so we'll probably want to allow for that possibility as well.
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_nested_attribute_update_bad_path_dot(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': ['hi']})
+    with pytest.raises(ClientError, match='ValidationException.*path'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
+            ExpressionAttributeValues={':val1': 7})
+    with pytest.raises(ClientError, match='ValidationException.*path'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b.c = :val1',
+            ExpressionAttributeValues={':val1': 7})
+    with pytest.raises(ClientError, match='ValidationException.*path'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c.c = :val1',
+            ExpressionAttributeValues={':val1': 7})
+
+
+# Similarly for other types of bad paths - using [0] on something which
+# isn't an array,
+@pytest.mark.xfail(reason="nested updates not yet implemented")
+def test_nested_attribute_update_bad_path_array(test_table_s):
+    p = random_string()
+    test_table_s.put_item(Item={'p': p, 'a': 'hello'})
+    with pytest.raises(ClientError, match='ValidationException.*path'):
+        test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[0] = :val1',
+            ExpressionAttributeValues={':val1': 7})
--- a/alternator-test/util.py
+++ b/alternator-test/util.py
@@ -0,0 +1,141 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Various utility functions which are useful for multiple tests
+
+import string
+import random
+import collections
+import time
+
+def random_string(length=10, chars=string.ascii_uppercase + string.digits):
+    return ''.join(random.choice(chars) for x in range(length))
+
+def random_bytes(length=10):
+    return bytearray(random.getrandbits(8) for _ in range(length))
+
+# Utility functions for scan and query into an array of items:
+# TODO: add to full_scan and full_query by default ConsistentRead=True, as
+# it's not useful for tests without it!
+def full_scan(table, **kwargs):
+    response = table.scan(**kwargs)
+    items = response['Items']
+    while 'LastEvaluatedKey' in response:
+        response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
+        items.extend(response['Items'])
+    return items
+
+# full_scan_and_count returns both items and count as returned by the server.
+# Note that count isn't simply len(items) - the server returns them
+# independently. e.g., with Select='COUNT' the items are not returned, but
+# count is.
+def full_scan_and_count(table, **kwargs):
+    response = table.scan(**kwargs)
+    items = []
+    count = 0
+    if 'Items' in response:
+        items.extend(response['Items'])
+    if 'Count' in response:
+        count = count + response['Count']
+    while 'LastEvaluatedKey' in response:
+        response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
+        if 'Items' in response:
+            items.extend(response['Items'])
+        if 'Count' in response:
+            count = count + response['Count']
+    return (count, items)
+
+# Utility function for fetching the entire results of a query into an array of items
+def full_query(table, **kwargs):
+    response = table.query(**kwargs)
+    items = response['Items']
+    while 'LastEvaluatedKey' in response:
+        response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
+        items.extend(response['Items'])
+    return items
+
+# To compare two lists of items (each is a dict) without regard for order,
+# "==" is not good enough because it will fail if the order is different.
+# The following function, multiset() converts the list into a multiset
+# (set with duplicates) where order doesn't matter, so the multisets can
+# be compared.
+
+def freeze(item):
+    if isinstance(item, dict):
+        return frozenset((key, freeze(value)) for key, value in item.items())
+    elif isinstance(item, list):
+        return tuple(freeze(value) for value in item)
+    return item
+
+def multiset(items):
+    return collections.Counter([freeze(item) for item in items])
+
+
+test_table_prefix = 'alternator_test_'
+def test_table_name():
+    current_ms = int(round(time.time() * 1000))
+    # In the off chance that test_table_name() is called twice in the same millisecond...
+    if test_table_name.last_ms >= current_ms:
+        current_ms = test_table_name.last_ms + 1
+    test_table_name.last_ms = current_ms
+    return test_table_prefix + str(current_ms)
+test_table_name.last_ms = 0
+
+def create_test_table(dynamodb, **kwargs):
+    name = test_table_name()
+    print("fixture creating new table {}".format(name))
+    table = dynamodb.create_table(TableName=name,
+        BillingMode='PAY_PER_REQUEST', **kwargs)
+    waiter = table.meta.client.get_waiter('table_exists')
+    # recheck every second instead of the default, lower, frequency. This can
+    # save a few seconds on AWS with its very slow table creation, but can
+    # more on tests on Scylla with its faster table creation turnaround.
+    waiter.config.delay = 1
+    waiter.config.max_attempts = 200
+    waiter.wait(TableName=name)
+    return table
+
+# DynamoDB's ListTables request returns up to a single page of table names
+# (e.g., up to 100) and it is up to the caller to call it again and again
+# to get the next page. This is a utility function which calls it repeatedly
+# as much as necessary to get the entire list.
+# We deliberately return a list and not a set, because we want the caller
+# to be able to recognize bugs in ListTables which causes the same table
+# to be returned twice.
+def list_tables(dynamodb, limit=100):
+    ret = []
+    pos = None
+    while True:
+        if pos:
+            page = dynamodb.meta.client.list_tables(Limit=limit, ExclusiveStartTableName=pos);
+        else:
+            page = dynamodb.meta.client.list_tables(Limit=limit);
+        results = page.get('TableNames', None)
+        assert(results)
+        ret = ret + results
+        newpos = page.get('LastEvaluatedTableName', None)
+        if not newpos:
+            break;
+        # It doesn't make sense for Dynamo to tell us we need more pages, but
+        # not send anything in *this* page!
+        assert len(results) > 0
+        assert newpos != pos
+        # Note that we only checked that we got back tables, not that we got
+        # any new tables not already in ret. So a buggy implementation might
+        # still cause an endless loop getting the same tables again and again.
+        pos = newpos
+    return ret
--- a/alternator/auth.cc
+++ b/alternator/auth.cc
@@ -0,0 +1,147 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "alternator/error.hh"
+#include "log.hh"
+#include <string>
+#include <string_view>
+#include <gnutls/crypto.h>
+#include <seastar/util/defer.hh>
+#include "hashers.hh"
+#include "bytes.hh"
+#include "alternator/auth.hh"
+#include <fmt/format.h>
+#include "auth/common.hh"
+#include "auth/password_authenticator.hh"
+#include "auth/roles-metadata.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
+
+namespace alternator {
+
+static logging::logger alogger("alternator-auth");
+
+static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
+    hmac_sha256_digest digest;
+    int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
+    if (ret) {
+        throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
+    }
+    return digest;
+}
+
+static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
+    auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
+    auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
+    auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
+    auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
+    return signing;
+}
+
+static std::string apply_sha256(std::string_view msg) {
+    sha256_hasher hasher;
+    hasher.update(msg.data(), msg.size());
+    return to_hex(hasher.finalize());
+}
+
+static std::string format_time_point(db_clock::time_point tp) {
+    time_t time_point_repr = db_clock::to_time_t(tp);
+    std::string time_point_str;
+    time_point_str.resize(17);
+    ::tm time_buf;
+    // strftime prints the terminating null character as well
+    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));
+    time_point_str.resize(16);
+    return time_point_str;
+}
+
+void check_expiry(std::string_view signature_date) {
+    //FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
+    std::string expiration_str = format_time_point(db_clock::now() - 15min);
+    std::string validity_str = format_time_point(db_clock::now() + 15min);
+    if (signature_date < expiration_str) {
+        throw api_error("InvalidSignatureException",
+                fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
+                signature_date, expiration_str));
+    }
+    if (signature_date > validity_str) {
+        throw api_error("InvalidSignatureException",
+                fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
+                signature_date, validity_str));
+    }
+}
+
+std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
+        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
+        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
+    auto amz_date_it = signed_headers_map.find("x-amz-date");
+    if (amz_date_it == signed_headers_map.end()) {
+        throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
+    }
+    std::string_view amz_date = amz_date_it->second;
+    check_expiry(amz_date);
+    std::string_view datestamp = amz_date.substr(0, 8);
+    if (datestamp != orig_datestamp) {
+        throw api_error("InvalidSignatureException",
+                format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
+                        orig_datestamp, datestamp));
+    }
+    std::string_view canonical_uri = "/";
+
+    std::stringstream canonical_headers;
+    for (const auto& header : signed_headers_map) {
+        canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
+    }
+
+    std::string payload_hash = apply_sha256(body_content);
+    std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
+
+    std::string_view algorithm = "AWS4-HMAC-SHA256";
+    std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
+    std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope,  apply_sha256(canonical_request));
+
+    hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
+    hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
+
+    return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
+}
+
+future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
+    static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
+            auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
+
+    auto cl = auth::password_authenticator::consistency_for_user(username);
+    auto timeout = auth::internal_distributed_timeout_config();
+    return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
+        auto res = f.get0();
+        auto salted_hash = std::optional<sstring>();
+        if (res->empty()) {
+            throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
+        }
+        salted_hash = res->one().get_opt<sstring>("salted_hash");
+        if (!salted_hash) {
+            throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
+        }
+        return make_ready_future<std::string>(*salted_hash);
+    });
+}
+
+}
--- a/alternator/auth.hh
+++ b/alternator/auth.hh
@@ -0,0 +1,46 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <string>
+#include <string_view>
+#include <array>
+#include "gc_clock.hh"
+#include "utils/loading_cache.hh"
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace alternator {
+
+using hmac_sha256_digest = std::array<char, 32>;
+
+using key_cache = utils::loading_cache<std::string, std::string>;
+
+std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
+        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
+        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
+
+future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
+
+}
--- a/alternator/base64.cc
+++ b/alternator/base64.cc
@@ -0,0 +1,111 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values
+// be encoded in the JSON API as base64-encoded strings. This is code to
+// convert byte arrays to base64-encoded strings, and back.
+
+#include "base64.hh"
+
+#include <ctype.h>
+
+
+// Arrays for quickly converting to and from an integer between 0 and 63,
+// and the character used in base64 encoding to represent it.
+static class base64_chars {
+public:
+    static constexpr const char* to =
+            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+    int8_t from[255];
+    base64_chars() {
+        static_assert(strlen(to) == 64);
+        for (int i = 0; i < 255; i++) {
+            from[i] = 255; // signal invalid character
+        }
+        for (int i = 0; i < 64; i++) {
+            from[(unsigned) to[i]] = i;
+        }
+    }
+} base64_chars;
+
+std::string base64_encode(bytes_view in) {
+    std::string ret;
+    ret.reserve(((4 * in.size() / 3) + 3) & ~3);
+    int i = 0;
+    unsigned char chunk3[3]; // chunk of input
+    for (auto byte : in) {
+        chunk3[i++] = byte;
+        if (i == 3) {
+            ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];
+            ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
+            ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
+            ret += base64_chars.to[ chunk3[2] & 0x3f ];
+            i = 0;
+        }
+    }
+    if (i) {
+        // i can be 1 or 2.
+        for(int j = i; j < 3; j++)
+            chunk3[j] = '\0';
+        ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];
+        ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
+        if (i == 2) {
+            ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
+        } else {
+            ret += '=';
+        }
+        ret += '=';
+    }
+    return ret;
+}
+
+bytes base64_decode(std::string_view in) {
+    int i = 0;
+    int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
+    std::string ret;
+    ret.reserve(in.size() * 3 / 4);
+    for (unsigned char c : in) {
+        uint8_t dc = base64_chars.from[c];
+        if (dc == 255) {
+            // Any unexpected character, include the "=" character usually
+            // used for padding, signals the end of the decode.
+            break;
+        }
+        chunk4[i++] = dc;
+        if (i == 4) {
+            ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
+            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
+            ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];
+            i = 0;
+        }
+    }
+    if (i) {
+        // i can be 2 or 3, meaning 1 or 2 more output characters
+        if (i>=2)
+            ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
+        if (i==3)
+            ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
+    }
+    // FIXME: This copy is sad. The problem is we need back "bytes"
+    // but "bytes" doesn't have efficient append and std::string.
+    // To fix this we need to use bytes' "uninitialized" feature.
+    return bytes(ret.begin(), ret.end());
+}
--- a/alternator/base64.hh
+++ b/alternator/base64.hh
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <string_view>
+#include "bytes.hh"
+#include "rjson.hh"
+
+std::string base64_encode(bytes_view);
+
+bytes base64_decode(std::string_view);
+
+inline bytes base64_decode(const rjson::value& v) {
+  return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
+}
--- a/alternator/conditions.cc
+++ b/alternator/conditions.cc
@@ -0,0 +1,564 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <list>
+#include <map>
+#include <string_view>
+#include "alternator/conditions.hh"
+#include "alternator/error.hh"
+#include "cql3/constants.hh"
+#include <unordered_map>
+#include "rjson.hh"
+#include "serialization.hh"
+#include "base64.hh"
+#include <stdexcept>
+
+namespace alternator {
+
+static logging::logger clogger("alternator-conditions");
+
+comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
+    static std::unordered_map<std::string, comparison_operator_type> ops = {
+            {"EQ", comparison_operator_type::EQ},
+            {"NE", comparison_operator_type::NE},
+            {"LE", comparison_operator_type::LE},
+            {"LT", comparison_operator_type::LT},
+            {"GE", comparison_operator_type::GE},
+            {"GT", comparison_operator_type::GT},
+            {"IN", comparison_operator_type::IN},
+            {"NULL", comparison_operator_type::IS_NULL},
+            {"NOT_NULL", comparison_operator_type::NOT_NULL},
+            {"BETWEEN", comparison_operator_type::BETWEEN},
+            {"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
+            {"CONTAINS", comparison_operator_type::CONTAINS},
+            {"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
+    };
+    if (!comparison_operator.IsString()) {
+        throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
+    }
+    std::string op = comparison_operator.GetString();
+    auto it = ops.find(op);
+    if (it == ops.end()) {
+        throw api_error("ValidationException", format("Unsupported comparison operator {}", op));
+    }
+    return it->second;
+}
+
+static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {
+    bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));
+    auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));
+    bytes raw_value = serialize_item(value);
+    auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
+    return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));
+}
+
+static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {
+    bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));
+    auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
+    return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));
+}
+
+::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {
+    clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));
+    auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);
+    for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {
+        std::string_view column_name(it->name.GetString(), it->name.GetStringLength());
+        const rjson::value& condition = it->value;
+
+        const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");
+        const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");
+        comparison_operator_type op = get_comparison_operator(comp_definition);
+
+        if (op != comparison_operator_type::EQ) {
+            throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");
+        }
+        if (attr_list.Size() != 1) {
+            throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));
+        }
+        if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {
+            // Primary key restriction
+            filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);
+        } else {
+            // Regular column restriction
+            filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);
+        }
+
+    }
+    return filtering_restrictions;
+}
+
+namespace {
+
+struct size_check {
+    // True iff size passes this check.
+    virtual bool operator()(rapidjson::SizeType size) const = 0;
+    // Check description, such that format("expected array {}", check.what()) is human-readable.
+    virtual sstring what() const = 0;
+};
+
+class exact_size : public size_check {
+    rapidjson::SizeType _expected;
+  public:
+    explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
+    bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
+    sstring what() const override { return format("of size {}", _expected); }
+};
+
+struct empty : public size_check {
+    bool operator()(rapidjson::SizeType size) const override { return size < 1; }
+    sstring what() const override { return "to be empty"; }
+};
+
+struct nonempty : public size_check {
+    bool operator()(rapidjson::SizeType size) const override { return size > 0; }
+    sstring what() const override { return "to be non-empty"; }
+};
+
+} // anonymous namespace
+
+// Check that array has the expected number of elements
+static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
+    if (!array || !array->IsArray()) {
+        throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
+    }
+    if (!expected(array->Size())) {
+        throw api_error("ValidationException",
+                        format("{} operator requires AttributeValueList {}, instead found list size {}",
+                               op, expected.what(), array->Size()));
+    }
+}
+
+struct rjson_engaged_ptr_comp {
+    bool operator()(const rjson::value* p1, const rjson::value* p2) const {
+        return rjson::single_value_comp()(*p1, *p2);
+    }
+};
+
+// It's not enough to compare underlying JSON objects when comparing sets,
+// as internally they're stored in an array, and the order of elements is
+// not important in set equality. See issue #5021
+static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
+    if (set1.Size() != set2.Size()) {
+        return false;
+    }
+    std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
+    for (auto it = set1.Begin(); it != set1.End(); ++it) {
+        set1_raw.insert(&*it);
+    }
+    for (const auto& a : set2.GetArray()) {
+        if (set1_raw.count(&a) == 0) {
+            return false;
+        }
+    }
+    return true;
+}
+
+// Check if two JSON-encoded values match with the EQ relation
+static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
+    if (!v1) {
+        return false;
+    }
+    if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
+        auto it1 = v1->MemberBegin();
+        auto it2 = v2.MemberBegin();
+        if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
+            return check_EQ_for_sets(it1->value, it2->value);
+        }
+    }
+    return *v1 == v2;
+}
+
+// Check if two JSON-encoded values match with the NE relation
+static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
+    return !v1 || *v1 != v2; // null is unequal to anything.
+}
+
+// Check if two JSON-encoded values match with the BEGINS_WITH relation
+static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
+    // BEGINS_WITH requires that its single operand (v2) be a string or
+    // binary - otherwise it's a validation error. However, problems with
+    // the stored attribute (v1) will just return false (no match).
+    if (!v2.IsObject() || v2.MemberCount() != 1) {
+        throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
+    }
+    auto it2 = v2.MemberBegin();
+    if (it2->name != "S" && it2->name != "B") {
+        throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
+    }
+
+
+    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
+        return false;
+    }
+    auto it1 = v1->MemberBegin();
+    if (it1->name != it2->name) {
+        return false;
+    }
+    if (it2->name == "S") {
+        std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
+        std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
+        return val1.substr(0, val2.size()) == val2;
+    } else /* it2->name == "B" */ {
+        // TODO (optimization): Check the begins_with condition directly on
+        // the base64-encoded string, without making a decoded copy.
+        bytes val1 = base64_decode(it1->value);
+        bytes val2 = base64_decode(it2->value);
+        return val1.substr(0, val2.size()) == val2;
+    }
+}
+
+static std::string_view to_string_view(const rjson::value& v) {
+    return std::string_view(v.GetString(), v.GetStringLength());
+}
+
+static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {
+    return (type2 == "S" && type1 == "SS") || (type2 == "N" && type1 == "NS") || (type2 == "B" && type1 == "BS");
+}
+
+// Check if two JSON-encoded values match with the CONTAINS relation
+static bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
+    if (!v1) {
+        return false;
+    }
+    const auto& kv1 = *v1->MemberBegin();
+    const auto& kv2 = *v2.MemberBegin();
+    if (kv2.name != "S" && kv2.name != "N" &&  kv2.name != "B") {
+        throw api_error("ValidationException",
+                        format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "
+                               "got {} instead", kv2.name));
+    }
+    if (kv1.name == "S" && kv2.name == "S") {
+        return to_string_view(kv1.value).find(to_string_view(kv2.value)) != std::string_view::npos;
+    } else if (kv1.name == "B" && kv2.name == "B") {
+        return base64_decode(kv1.value).find(base64_decode(kv2.value)) != bytes::npos;
+    } else if (is_set_of(kv1.name, kv2.name)) {
+        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
+            if (*i == kv2.value) {
+                return true;
+            }
+        }
+    } else if (kv1.name == "L") {
+        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {
+            if (!i->IsObject() || i->MemberCount() != 1) {
+                clogger.error("check_CONTAINS received a list whose element is malformed");
+                return false;
+            }
+            const auto& el = *i->MemberBegin();
+            if (el.name == kv2.name && el.value == kv2.value) {
+                return true;
+            }
+        }
+    }
+    return false;
+}
+
+// Check if two JSON-encoded values match with the NOT_CONTAINS relation
+static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {
+    if (!v1) {
+        return false;
+    }
+    return !check_CONTAINS(v1, v2);
+}
+
+// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
+static bool check_IN(const rjson::value* val, const rjson::value& array) {
+    if (!array[0].IsObject() || array[0].MemberCount() != 1) {
+        throw api_error("ValidationException",
+                        format("IN operator encountered malformed AttributeValue: {}", array[0]));
+    }
+    const auto& type = array[0].MemberBegin()->name;
+    if (type != "S" && type != "N" && type != "B") {
+        throw api_error("ValidationException",
+                        "IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
+    }
+    if (!val) {
+        return false;
+    }
+    bool have_match = false;
+    for (const auto& elem : array.GetArray()) {
+        if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
+            throw api_error("ValidationException",
+                            "IN operator requires all AttributeValueList elements to have the same type ");
+        }
+        if (!have_match && *val == elem) {
+            // Can't return yet, must check types of all array elements. <sigh>
+            have_match = true;
+        }
+    }
+    return have_match;
+}
+
+static bool check_NULL(const rjson::value* val) {
+    return val == nullptr;
+}
+
+static bool check_NOT_NULL(const rjson::value* val) {
+    return val != nullptr;
+}
+
+// Check if two JSON-encoded values match with cmp.
+template <typename Comparator>
+bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
+    if (!v2.IsObject() || v2.MemberCount() != 1) {
+        throw api_error("ValidationException",
+                        format("{} requires a single AttributeValue of type String, Number, or Binary",
+                               cmp.diagnostic));
+    }
+    const auto& kv2 = *v2.MemberBegin();
+    if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
+        throw api_error("ValidationException",
+                        format("{} requires a single AttributeValue of type String, Number, or Binary",
+                               cmp.diagnostic));
+    }
+    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
+        return false;
+    }
+    const auto& kv1 = *v1->MemberBegin();
+    if (kv1.name != kv2.name) {
+        return false;
+    }
+    if (kv1.name == "N") {
+        return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
+    }
+    if (kv1.name == "S") {
+        return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
+                   std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
+    }
+    if (kv1.name == "B") {
+        return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
+    }
+    clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
+    return false;
+}
+
+struct cmp_lt {
+    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
+    static constexpr const char* diagnostic = "LT operator";
+};
+
+struct cmp_le {
+    // bytes only has <, so we cannot use <=.
+    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs || lhs == rhs; }
+    static constexpr const char* diagnostic = "LE operator";
+};
+
+struct cmp_ge {
+    // bytes only has <, so we cannot use >=.
+    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs || lhs == rhs; }
+    static constexpr const char* diagnostic = "GE operator";
+};
+
+struct cmp_gt {
+    // bytes only has <, so we cannot use >.
+    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
+    static constexpr const char* diagnostic = "GT operator";
+};
+
+// True if v is between lb and ub, inclusive.  Throws if lb > ub.
+template <typename T>
+bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
+    if (ub < lb) {
+        throw api_error("ValidationException",
+                        format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
+    }
+    return cmp_ge()(v, lb) && cmp_le()(v, ub);
+}
+
+static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
+    if (!v) {
+        return false;
+    }
+    if (!v->IsObject() || v->MemberCount() != 1) {
+        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
+    }
+    if (!lb.IsObject() || lb.MemberCount() != 1) {
+        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
+    }
+    if (!ub.IsObject() || ub.MemberCount() != 1) {
+        throw api_error("ValidationException", format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
+    }
+
+    const auto& kv_v = *v->MemberBegin();
+    const auto& kv_lb = *lb.MemberBegin();
+    const auto& kv_ub = *ub.MemberBegin();
+    if (kv_lb.name != kv_ub.name) {
+        throw api_error(
+                "ValidationException",
+                format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
+                       kv_lb.name, kv_ub.name));
+    }
+    if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
+        return false;
+    }
+    if (kv_v.name == "N") {
+        const char* diag = "BETWEEN operator";
+        return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
+    }
+    if (kv_v.name == "S") {
+        return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
+                             std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
+                             std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
+    }
+    if (kv_v.name == "B") {
+        return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
+    }
+    throw api_error("ValidationException",
+        format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
+               kv_lb.name));
+}
+
+// Verify one Expect condition on one attribute (whose content is "got")
+// for the verify_expected() below.
+// This function returns true or false depending on whether the condition
+// succeeded - it does not throw ConditionalCheckFailedException.
+// However, it may throw ValidationException on input validation errors.
+static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {
+    const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");
+    const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");
+    const rjson::value* value = rjson::find(condition, "Value");
+    const rjson::value* exists = rjson::find(condition, "Exists");
+    // There are three types of conditions that Expected supports:
+    // A value, not-exists, and a comparison of some kind. Each allows
+    // and requires a different combinations of parameters in the request
+    if (value) {
+        if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
+            throw api_error("ValidationException", "Cannot combine Value with Exists!=true");
+        }
+        if (comparison_operator) {
+            throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
+        }
+        return check_EQ(got, *value);
+    } else if (exists) {
+        if (comparison_operator) {
+            throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
+        }
+        if (!exists->IsBool() || exists->GetBool() != false) {
+            throw api_error("ValidationException", "Exists!=false requires Value");
+        }
+        // Remember Exists=false, so we're checking that the attribute does *not* exist:
+        return !got;
+    } else {
+        if (!comparison_operator) {
+            throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
+        }
+        comparison_operator_type op = get_comparison_operator(*comparison_operator);
+        switch (op) {
+        case comparison_operator_type::EQ:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_EQ(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::NE:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_NE(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::LT:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
+        case comparison_operator_type::LE:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_compare(got, (*attribute_value_list)[0], cmp_le{});
+        case comparison_operator_type::GT:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
+        case comparison_operator_type::GE:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
+        case comparison_operator_type::BEGINS_WITH:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::IN:
+            verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
+            return check_IN(got, *attribute_value_list);
+        case comparison_operator_type::IS_NULL:
+            verify_operand_count(attribute_value_list, empty(), *comparison_operator);
+            return check_NULL(got);
+        case comparison_operator_type::NOT_NULL:
+            verify_operand_count(attribute_value_list, empty(), *comparison_operator);
+            return check_NOT_NULL(got);
+        case comparison_operator_type::BETWEEN:
+            verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
+            return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
+        case comparison_operator_type::CONTAINS:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_CONTAINS(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::NOT_CONTAINS:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_NOT_CONTAINS(got, (*attribute_value_list)[0]);
+        }
+        throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));
+    }
+}
+
+// Verify that the existing values of the item (previous_item) match the
+// conditions given by the Expected and ConditionalOperator parameters
+// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
+// This function will throw a ConditionalCheckFailedException API error
+// if the values do not match the condition, or ValidationException if there
+// are errors in the format of the condition itself.
+void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {
+    const rjson::value* expected = rjson::find(req, "Expected");
+    if (!expected) {
+        return;
+    }
+    if (!expected->IsObject()) {
+        throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
+    }
+    // ConditionalOperator can be "AND" for requiring all conditions, or
+    // "OR" for requiring one condition, and defaults to "AND" if missing.
+    const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
+    bool require_all = true;
+    if (conditional_operator) {
+        if (!conditional_operator->IsString()) {
+            throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
+        }
+        std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());
+        if (s == "AND") {
+            // require_all is already true
+        } else if (s == "OR") {
+            require_all = false;
+        } else {
+            throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");
+        }
+        if (expected->GetObject().ObjectEmpty()) {
+            throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");
+        }
+    }
+
+    for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {
+        const rjson::value* got = nullptr;
+        if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {
+            got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));
+        }
+        bool success = verify_expected_one(it->value, got);
+        if (success && !require_all) {
+            // When !require_all, one success is enough!
+            return;
+        } else if (!success && require_all) {
+            // When require_all, one failure is enough!
+            throw api_error("ConditionalCheckFailedException", "Failed condition.");
+        }
+    }
+    // If we got here and require_all, none of the checks failed, so succeed.
+    // If we got here and !require_all, all of the checks failed, so fail.
+    if (!require_all) {
+        throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");
+    }
+}
+
+}
--- a/alternator/conditions.hh
+++ b/alternator/conditions.hh
@@ -0,0 +1,49 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file contains definitions and functions related to placing conditions
+ * on Alternator queries (equivalent of CQL's restrictions).
+ *
+ * With conditions, it's possible to add criteria to selection requests (Scan, Query)
+ * and use them for narrowing down the result set, by means of filtering or indexing.
+ *
+ * Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
+ */
+
+#pragma once
+
+#include "cql3/restrictions/statement_restrictions.hh"
+#include "serialization.hh"
+
+namespace alternator {
+
+enum class comparison_operator_type {
+    EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, NOT_CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
+};
+
+comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);
+
+::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);
+
+void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);
+
+}
--- a/alternator/error.hh
+++ b/alternator/error.hh
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <seastar/http/httpd.hh>
+#include "seastarx.hh"
+
+namespace alternator {
+
+// DynamoDB's error messages are described in detail in
+// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
+// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser
+// HTTP code (almost always, 400), and a human readable message. Eventually these
+// will be wrapped into a JSON object returned to the client.
+class api_error : public std::exception {
+public:
+    using status_type = httpd::reply::status_type;
+    status_type _http_code;
+    std::string _type;
+    std::string _msg;
+    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
+        : _http_code(std::move(http_code))
+        , _type(std::move(type))
+        , _msg(std::move(msg))
+    { }
+    api_error() = default;
+    virtual const char* what() const noexcept override { return _msg.c_str(); }
+};
+
+}
+
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
--- a/alternator/executor.hh
+++ b/alternator/executor.hh
@@ -0,0 +1,71 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <seastar/core/future.hh>
+#include <seastar/http/httpd.hh>
+#include "seastarx.hh"
+#include <seastar/json/json_elements.hh>
+
+#include "service/storage_proxy.hh"
+#include "service/migration_manager.hh"
+#include "service/client_state.hh"
+
+#include "stats.hh"
+
+namespace alternator {
+
+class executor {
+    service::storage_proxy& _proxy;
+    service::migration_manager& _mm;
+
+public:
+    using client_state = service::client_state;
+    stats _stats;
+    static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
+    static constexpr auto KEYSPACE_NAME = "alternator";
+
+    executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}
+
+    future<json::json_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> list_tables(client_state& client_state, std::string content);
+    future<json::json_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);
+    future<json::json_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+    future<json::json_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, std::string content);
+
+    future<> start();
+    future<> stop() { return make_ready_future<>(); }
+
+    future<> maybe_create_keyspace();
+
+    static tracing::trace_state_ptr maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
+};
+
+}
--- a/alternator/expressions.cc
+++ b/alternator/expressions.cc
@@ -0,0 +1,98 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "expressions.hh"
+#include "alternator/expressionsLexer.hpp"
+#include "alternator/expressionsParser.hpp"
+
+#include <seastarx.hh>
+
+#include <seastar/core/print.hh>
+#include <seastar/util/log.hh>
+
+#include <functional>
+
+namespace alternator {
+
+template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
+Result do_with_parser(std::string input, Func&& f) {
+    expressionsLexer::InputStreamType input_stream{
+        reinterpret_cast<const ANTLR_UINT8*>(input.data()),
+        ANTLR_ENC_UTF8,
+        static_cast<ANTLR_UINT32>(input.size()),
+        nullptr };
+    expressionsLexer lexer(&input_stream);
+    expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
+    expressionsParser parser(&tstream);
+
+    auto result = f(parser);
+    return result;
+}
+
+parsed::update_expression
+parse_update_expression(std::string query) {
+    try {
+        return do_with_parser(query,  std::mem_fn(&expressionsParser::update_expression));
+    } catch (...) {
+        throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));
+    }
+}
+
+std::vector<parsed::path>
+parse_projection_expression(std::string query) {
+    try {
+        return do_with_parser(query,  std::mem_fn(&expressionsParser::projection_expression));
+    } catch (...) {
+        throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));
+    }
+}
+
+template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
+template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
+
+namespace parsed {
+
+void update_expression::add(update_expression::action a) {
+    std::visit(overloaded {
+        [&] (action::set&)    { seen_set = true; },
+        [&] (action::remove&) { seen_remove = true; },
+        [&] (action::add&)    { seen_add = true; },
+        [&] (action::del&)    { seen_del = true; }
+    }, a._action);
+    _actions.push_back(std::move(a));
+}
+
+void update_expression::append(update_expression other) {
+    if ((seen_set && other.seen_set) ||
+        (seen_remove && other.seen_remove) ||
+        (seen_add && other.seen_add) ||
+        (seen_del && other.seen_del)) {
+        throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");
+    }
+    std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));
+    seen_set |= other.seen_set;
+    seen_remove |= other.seen_remove;
+    seen_add |= other.seen_add;
+    seen_del |= other.seen_del;
+}
+
+} // namespace parsed
+} // namespace alternator
--- a/alternator/expressions.g
+++ b/alternator/expressions.g
@@ -0,0 +1,214 @@
+/*
+ * Copyright 2019 ScyllaDB
+ *
+ * This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
+ * top-level directory for licensing information.
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * The DynamoDB protocol is based on JSON, and most DynamoDB requests
+ * describe the operation and its parameters via JSON objects such as maps
+ * and lists. Nevertheless, in some types of requests an "expression" is
+ * passed as a single string, and we need to parse this string. These
+ * cases include:
+ *  1. Attribute paths, such as "a[3].b.c", are used in projection
+ *     expressions as well as inside other expressions described below.
+ *  2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
+ *     used in conditional updates, filters, and other places.
+ *  3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
+ *
+ * All these expression syntaxes are very simple: Most of them could be
+ * parsed as regular expressions, and the parenthesized condition expression
+ * could be done with a simple hand-written lexical analyzer and recursive-
+ * descent parser. Nevertheless, we decided to specify these parsers in the
+ * ANTLR3 language already used in the Scylla project, hopefully making these
+ * parsers easier to reason about, and easier to change if needed - and
+ * reducing the amount of boiler-plate code.
+ */
+
+grammar expressions;
+
+options {
+    language = Cpp;
+}
+
+@parser::namespace{alternator}
+@lexer::namespace{alternator}
+
+/* TODO: explain what these traits things are. I haven't seen them explained
+ * in any document... Compilation fails without these fail because a definition
+ * of "expressionsLexerTraits" and "expressionParserTraits" is needed.
+ */
+@lexer::traits {
+    class expressionsLexer;
+    class expressionsParser;
+    typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
+}
+@parser::traits {
+    typedef expressionsLexerTraits expressionsParserTraits;
+}
+
+@lexer::header {
+	#include "alternator/expressions.hh"
+	// ANTLR generates a bunch of unused variables and functions. Yuck...
+    #pragma GCC diagnostic ignored "-Wunused-variable"
+    #pragma GCC diagnostic ignored "-Wunused-function"
+}
+@parser::header {
+	#include "expressionsLexer.hpp"
+}
+
+/* By default, ANTLR3 composes elaborate syntax-error messages, saying which
+ * token was unexpected, where, and so on on, but then dutifully writes these
+ * error messages to the standard error, and returns from the parser as if
+ * everything was fine, with a half-constructed output object! If we define
+ * the "displayRecognitionError" method, it will be called upon to build this
+ * error message, and we can instead throw an exception to stop the parsing
+ * immediately. This is good enough for now, for our simple needs, but if
+ * we ever want to show more information about the syntax error, Cql3.g
+ * contains an elaborate implementation (it would be nice if we could reuse
+ * it, not duplicate it).
+ * Unfortunately, we have to repeat the same definition twice - once for the
+ * parser, and once for the lexer.
+ */
+@parser::context {
+    void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
+        throw expressions_syntax_error("syntax error");
+    }
+}
+@lexer::context {
+    void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
+        throw expressions_syntax_error("syntax error");
+    }
+}
+
+/*
+ * Lexical analysis phase, i.e., splitting the input up to tokens.
+ * Lexical analyzer rules have names starting in capital letters.
+ * "fragment" rules do not generate tokens, and are just aliases used to
+ * make other rules more readable.
+ * Characters *not* listed here, e.g., '=', '(', etc., will be handled
+ * as individual tokens on their own right.
+ * Whitespace spans are skipped, so do not generate tokens.
+ */
+WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
+
+/* shortcuts for case-insensitive keywords */
+fragment A:('a'|'A');
+fragment B:('b'|'B');
+fragment C:('c'|'C');
+fragment D:('d'|'D');
+fragment E:('e'|'E');
+fragment F:('f'|'F');
+fragment G:('g'|'G');
+fragment H:('h'|'H');
+fragment I:('i'|'I');
+fragment J:('j'|'J');
+fragment K:('k'|'K');
+fragment L:('l'|'L');
+fragment M:('m'|'M');
+fragment N:('n'|'N');
+fragment O:('o'|'O');
+fragment P:('p'|'P');
+fragment Q:('q'|'Q');
+fragment R:('r'|'R');
+fragment S:('s'|'S');
+fragment T:('t'|'T');
+fragment U:('u'|'U');
+fragment V:('v'|'V');
+fragment W:('w'|'W');
+fragment X:('x'|'X');
+fragment Y:('y'|'Y');
+fragment Z:('z'|'Z');
+/* These keywords must be appear before the generic NAME token below,
+ * because NAME matches too, and the first to match wins.
+ */
+SET: S E T;
+REMOVE: R E M O V E;
+ADD: A D D;
+DELETE: D E L E T E;
+
+fragment ALPHA: 'A'..'Z' | 'a'..'z';
+fragment DIGIT: '0'..'9';
+fragment ALNUM: ALPHA | DIGIT | '_';
+INTEGER: DIGIT+;
+NAME: ALPHA ALNUM*;
+NAMEREF: '#' ALNUM+;
+VALREF: ':' ALNUM+;
+
+/*
+ * Parsing phase - parsing the string of tokens generated by the lexical
+ * analyzer defined above.
+ */
+
+path_component: NAME | NAMEREF;
+path returns [parsed::path p]:
+    root=path_component           { $p.set_root($root.text); }
+    (   '.' name=path_component   { $p.add_dot($name.text); }
+      | '[' INTEGER ']'           { $p.add_index(std::stoi($INTEGER.text)); }
+    )*;
+
+update_expression_set_value returns [parsed::value v]:
+      VALREF                             { $v.set_valref($VALREF.text); }
+    | path                               { $v.set_path($path.p); }
+    | NAME                               { $v.set_func_name($NAME.text); }
+     '(' x=update_expression_set_value   { $v.add_func_parameter($x.v); }
+     (',' x=update_expression_set_value  { $v.add_func_parameter($x.v); })*
+     ')'
+    ;
+
+update_expression_set_rhs returns [parsed::set_rhs rhs]:
+    v=update_expression_set_value  { $rhs.set_value(std::move($v.v)); }
+    (   '+' v=update_expression_set_value  { $rhs.set_plus(std::move($v.v)); }
+      | '-' v=update_expression_set_value  { $rhs.set_minus(std::move($v.v)); }
+    )?
+    ;
+
+update_expression_set_action returns [parsed::update_expression::action a]:
+    path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
+
+update_expression_remove_action returns [parsed::update_expression::action a]:
+    path { $a.assign_remove($path.p); };
+
+update_expression_add_action returns [parsed::update_expression::action a]:
+    path VALREF { $a.assign_add($path.p, $VALREF.text); };
+
+update_expression_delete_action returns [parsed::update_expression::action a]:
+    path VALREF { $a.assign_del($path.p, $VALREF.text); };
+
+update_expression_clause returns [parsed::update_expression e]:
+      SET s=update_expression_set_action { $e.add(s); }
+      (',' s=update_expression_set_action { $e.add(s); })*
+    | REMOVE r=update_expression_remove_action { $e.add(r); }
+      (',' r=update_expression_remove_action { $e.add(r); })*
+    | ADD a=update_expression_add_action { $e.add(a); }
+      (',' a=update_expression_add_action { $e.add(a); })*
+    | DELETE d=update_expression_delete_action { $e.add(d); }
+      (',' d=update_expression_delete_action { $e.add(d); })*
+    ;
+
+// Note the "EOF" token at the end of the update expression. We want to the
+//  parser to match the entire string given to it - not just its beginning!
+update_expression returns [parsed::update_expression e]:
+    (update_expression_clause { e.append($update_expression_clause.e); })* EOF;
+
+projection_expression returns [std::vector<parsed::path> v]:
+    p=path      { $v.push_back(std::move($p.p)); }
+    (',' p=path { $v.push_back(std::move($p.p)); } )* EOF;
--- a/alternator/expressions.hh
+++ b/alternator/expressions.hh
@@ -0,0 +1,41 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <string>
+#include <stdexcept>
+#include <vector>
+
+#include "expressions_types.hh"
+
+namespace alternator {
+
+class expressions_syntax_error : public std::runtime_error {
+public:
+    using runtime_error::runtime_error;
+};
+
+parsed::update_expression parse_update_expression(std::string query);
+std::vector<parsed::path> parse_projection_expression(std::string query);
+
+
+} /* namespace alternator */
--- a/alternator/expressions_types.hh
+++ b/alternator/expressions_types.hh
@@ -0,0 +1,166 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <vector>
+#include <string>
+#include <variant>
+
+/*
+ * Parsed representation of expressions and their components.
+ *
+ * Types in alternator::parse namespace are used for holding the parse
+ * tree - objects generated by the Antlr rules after parsing an expression.
+ * Because of the way Antlr works, all these objects are default-constructed
+ * first, and then assigned when the rule is completed, so all these types
+ * have only default constructors - but setter functions to set them later.
+ */
+
+namespace alternator {
+namespace parsed {
+
+// "path" is an attribute's path in a document, e.g., a.b[3].c.
+class path {
+    // All paths have a "root", a top-level attribute, and any number of
+    // "dereference operators" - each either an index (e.g., "[2]") or a
+    // dot (e.g., ".xyz").
+    std::string _root;
+    std::vector<std::variant<std::string, unsigned>> _operators;
+public:
+    void set_root(std::string root) {
+        _root = std::move(root);
+    }
+    void add_index(unsigned i) {
+        _operators.emplace_back(i);
+    }
+    void add_dot(std::string(name)) {
+        _operators.emplace_back(std::move(name));
+    }
+    const std::string& root() const {
+        return _root;
+    }
+    bool has_operators() const {
+        return !_operators.empty();
+    }
+};
+
+// "value" is is a value used in the right hand side of an assignment
+// expression, "SET a = ...". It can be a reference to a value included in
+// the request (":val"), a path to an attribute from the existing item
+// (e.g., "a.b[3].c"), or a function of other such values.
+// Note that the real right-hand-side of an assignment is actually a bit
+// more general - it allows either a value, or a value+value or value-value -
+// see class set_rhs below.
+struct value {
+    struct function_call {
+        std::string _function_name;
+        std::vector<value> _parameters;
+    };
+    std::variant<std::string, path, function_call> _value;
+    void set_valref(std::string s) {
+        _value = std::move(s);
+    }
+    void set_path(path p) {
+        _value = std::move(p);
+    }
+    void set_func_name(std::string s) {
+        _value = function_call {std::move(s), {}};
+    }
+    void add_func_parameter(value v) {
+        std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
+    }
+};
+
+// The right-hand-side of a SET in an update expression can be either a
+// single value (see above), or value+value, or value-value.
+class set_rhs {
+public:
+    char _op;  // '+', '-', or 'v''
+    value _v1;
+    value _v2;
+    void set_value(value&& v1) {
+        _op = 'v';
+        _v1 = std::move(v1);
+    }
+    void set_plus(value&& v2) {
+        _op = '+';
+        _v2 = std::move(v2);
+    }
+    void set_minus(value&& v2) {
+        _op = '-';
+        _v2 = std::move(v2);
+    }
+};
+
+class update_expression {
+public:
+    struct action {
+        path _path;
+        struct set {
+            set_rhs _rhs;
+        };
+        struct remove {
+        };
+        struct add {
+            std::string _valref;
+        };
+        struct del {
+            std::string _valref;
+        };
+        std::variant<set, remove, add, del> _action;
+
+        void assign_set(path p, set_rhs rhs) {
+            _path = std::move(p);
+            _action = set { std::move(rhs) };
+        }
+        void assign_remove(path p) {
+            _path = std::move(p);
+            _action = remove { };
+        }
+        void assign_add(path p, std::string v) {
+            _path = std::move(p);
+            _action = add { std::move(v) };
+        }
+        void assign_del(path p, std::string v) {
+            _path = std::move(p);
+            _action = del { std::move(v) };
+        }
+    };
+private:
+    std::vector<action> _actions;
+    bool seen_set = false;
+    bool seen_remove = false;
+    bool seen_add = false;
+    bool seen_del = false;
+public:
+    void add(action a);
+    void append(update_expression other);
+    bool empty() const {
+        return _actions.empty();
+    }
+    const std::vector<action>& actions() const {
+        return _actions;
+    }
+};
+
+} // namespace parsed
+} // namespace alternator
--- a/alternator/rjson.cc
+++ b/alternator/rjson.cc
@@ -0,0 +1,172 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "rjson.hh"
+#include "error.hh"
+#include <seastar/core/print.hh>
+
+namespace rjson {
+
+static allocator the_allocator;
+
+std::string print(const rjson::value& value) {
+    string_buffer buffer;
+    writer writer(buffer);
+    value.Accept(writer);
+    return std::string(buffer.GetString());
+}
+
+rjson::value copy(const rjson::value& value) {
+    return rjson::value(value, the_allocator);
+}
+
+rjson::value parse(const std::string& str) {
+    return parse_raw(str.c_str(), str.size());
+}
+
+rjson::value parse_raw(const char* c_str, size_t size) {
+    rjson::document d;
+    d.Parse(c_str, size);
+    if (d.HasParseError()) {
+        throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));
+    }
+    rjson::value& v = d;
+    return std::move(v);
+}
+
+rjson::value& get(rjson::value& value, rjson::string_ref_type name) {
+    auto member_it = value.FindMember(name);
+    if (member_it != value.MemberEnd())
+        return member_it->value;
+    else {
+        throw rjson::error(format("JSON parameter {} not found", name));
+    }
+}
+
+const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {
+    auto member_it = value.FindMember(name);
+    if (member_it != value.MemberEnd())
+        return member_it->value;
+    else {
+        throw rjson::error(format("JSON parameter {} not found", name));
+    }
+}
+
+rjson::value from_string(const std::string& str) {
+    return rjson::value(str.c_str(), str.size(), the_allocator);
+}
+
+rjson::value from_string(const sstring& str) {
+    return rjson::value(str.c_str(), str.size(), the_allocator);
+}
+
+rjson::value from_string(const char* str, size_t size) {
+    return rjson::value(str, size, the_allocator);
+}
+
+const rjson::value* find(const rjson::value& value, string_ref_type name) {
+    auto member_it = value.FindMember(name);
+    return member_it != value.MemberEnd() ? &member_it->value : nullptr;
+}
+
+rjson::value* find(rjson::value& value, string_ref_type name) {
+    auto member_it = value.FindMember(name);
+    return member_it != value.MemberEnd() ? &member_it->value : nullptr;
+}
+
+void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {
+    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);
+}
+
+void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {
+    base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);
+}
+
+void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {
+    base.AddMember(name, std::move(member), the_allocator);
+}
+
+void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {
+    base.AddMember(name, rjson::value(member), the_allocator);
+}
+
+void push_back(rjson::value& base_array, rjson::value&& item) {
+    base_array.PushBack(std::move(item), the_allocator);
+
+}
+
+bool single_value_comp::operator()(const rjson::value& r1, const rjson::value& r2) const {
+   auto r1_type = r1.GetType();
+   auto r2_type = r2.GetType();
+
+   // null is the smallest type and compares with every other type, nothing is lesser than null
+   if (r1_type == rjson::type::kNullType || r2_type == rjson::type::kNullType) {
+       return r1_type < r2_type;
+   }
+   // only null, true, and false are comparable with each other, other types are not compatible
+   if (r1_type != r2_type) {
+       if (r1_type > rjson::type::kTrueType || r2_type > rjson::type::kTrueType) {
+           throw rjson::error(format("Types are not comparable: {} {}", r1, r2));
+       }
+   }
+
+   switch (r1_type) {
+   case rjson::type::kNullType:
+       // fall-through
+   case rjson::type::kFalseType:
+       // fall-through
+   case rjson::type::kTrueType:
+       return r1_type < r2_type;
+   case rjson::type::kObjectType:
+       throw rjson::error("Object type comparison is not supported");
+   case rjson::type::kArrayType:
+       throw rjson::error("Array type comparison is not supported");
+   case rjson::type::kStringType: {
+       const size_t r1_len = r1.GetStringLength();
+       const size_t r2_len = r2.GetStringLength();
+       size_t len = std::min(r1_len, r2_len);
+       int result = std::strncmp(r1.GetString(), r2.GetString(), len);
+       return result < 0 || (result == 0 && r1_len < r2_len);
+   }
+   case rjson::type::kNumberType: {
+       if (r1.IsInt() && r2.IsInt()) {
+           return r1.GetInt() < r2.GetInt();
+       } else if (r1.IsUint() && r2.IsUint()) {
+           return r1.GetUint() < r2.GetUint();
+       } else if (r1.IsInt64() && r2.IsInt64()) {
+           return r1.GetInt64() < r2.GetInt64();
+       } else if (r1.IsUint64() && r2.IsUint64()) {
+           return r1.GetUint64() < r2.GetUint64();
+       } else {
+           // it's safe to call GetDouble() on any number type
+           return r1.GetDouble() < r2.GetDouble();
+       }
+   }
+   default:
+       return false;
+   }
+}
+
+} // end namespace rjson
+
+std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {
+    return os << rjson::print(v);
+}
--- a/alternator/rjson.hh
+++ b/alternator/rjson.hh
@@ -0,0 +1,163 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+/*
+ * rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.
+ *
+ * rapidjson has strict copy elision policies, which, among other things, involves
+ * using provided char arrays without copying them and allows copying objects only explicitly.
+ * As such, one should be careful when passing strings with limited liveness
+ * (e.g. data underneath local std::strings) to rjson functions, because created JSON objects
+ * may end up relying on dangling char pointers. All rjson functions that create JSONs from strings
+ * by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live
+ * at least as long as the object, e.g. a static char array) and for std::strings. The more optimal
+ * variants should be used *only* if the liveness of the string is guaranteed, otherwise it will
+ * result in undefined behaviour.
+ * Also, bear in mind that methods exposed by rjson::value are generic, but some of them
+ * work fine only for specific types. In case the type does not match, an rjson::error will be thrown.
+ * Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type
+ * or calling Size() on a non-array value.
+ */
+
+#include <string>
+#include <stdexcept>
+
+namespace rjson {
+class error : public std::exception {
+    std::string _msg;
+public:
+    error() = default;
+    error(const std::string& msg) : _msg(msg) {}
+
+    virtual const char* what() const noexcept override { return _msg.c_str(); }
+};
+}
+
+// rapidjson configuration macros
+#define RAPIDJSON_HAS_STDSTRING 1
+// Default rjson policy is to use assert() - which is dangerous for two reasons:
+// 1. assert() can be turned off with -DNDEBUG
+// 2. assert() crashes a program
+// Fortunately, the default policy can be overridden, and so rapidjson errors will
+// throw an rjson::error exception instead.
+#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)
+
+#include <rapidjson/document.h>
+#include <rapidjson/writer.h>
+#include <rapidjson/stringbuffer.h>
+#include <rapidjson/error/en.h>
+#include <seastar/core/sstring.hh>
+#include "seastarx.hh"
+
+namespace rjson {
+
+using allocator = rapidjson::CrtAllocator;
+using encoding = rapidjson::UTF8<>;
+using document = rapidjson::GenericDocument<encoding, allocator>;
+using value = rapidjson::GenericValue<encoding, allocator>;
+using string_ref_type = value::StringRefType;
+using string_buffer = rapidjson::GenericStringBuffer<encoding>;
+using writer = rapidjson::Writer<string_buffer, encoding>;
+using type = rapidjson::Type;
+
+// Returns an object representing JSON's null
+inline rjson::value null_value() {
+    return rjson::value(rapidjson::kNullType);
+}
+
+// Returns an empty JSON object - {}
+inline rjson::value empty_object() {
+    return rjson::value(rapidjson::kObjectType);
+}
+
+// Returns an empty JSON array - []
+inline rjson::value empty_array() {
+    return rjson::value(rapidjson::kArrayType);
+}
+
+// Returns an empty JSON string - ""
+inline rjson::value empty_string() {
+    return rjson::value(rapidjson::kStringType);
+}
+
+// Convert the JSON value to a string with JSON syntax, the opposite of parse().
+// The representation is dense - without any redundant indentation.
+std::string print(const rjson::value& value);
+
+// Copies given JSON value - involves allocation
+rjson::value copy(const rjson::value& value);
+
+// Parses a JSON value from given string or raw character array.
+// The string/char array liveness does not need to be persisted,
+// as both parse() and parse_raw() will allocate member names and values.
+// Throws rjson::error if parsing failed.
+rjson::value parse(const std::string& str);
+rjson::value parse_raw(const char* c_str, size_t size);
+
+// Creates a JSON value (of JSON string type) out of internal string representations.
+// The string value is copied, so str's liveness does not need to be persisted.
+rjson::value from_string(const std::string& str);
+rjson::value from_string(const sstring& str);
+rjson::value from_string(const char* str, size_t size);
+
+// Returns a pointer to JSON member if it exists, nullptr otherwise
+rjson::value* find(rjson::value& value, rjson::string_ref_type name);
+const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);
+
+// Returns a reference to JSON member if it exists, throws otherwise
+rjson::value& get(rjson::value& value, rjson::string_ref_type name);
+const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);
+
+// Sets a member in given JSON object by moving the member - allocates the name.
+// Throws if base is not a JSON object.
+void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);
+
+// Sets a string member in given JSON object by assigning its reference - allocates the name.
+// NOTICE: member string liveness must be ensured to be at least as long as base's.
+// Throws if base is not a JSON object.
+void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);
+
+// Sets a member in given JSON object by moving the member.
+// NOTICE: name liveness must be ensured to be at least as long as base's.
+// Throws if base is not a JSON object.
+void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);
+
+// Sets a string member in given JSON object by assigning its reference.
+// NOTICE: name liveness must be ensured to be at least as long as base's.
+// NOTICE: member liveness must be ensured to be at least as long as base's.
+// Throws if base is not a JSON object.
+void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);
+
+// Adds a value to a JSON list by moving the item to its end.
+// Throws if base_array is not a JSON array.
+void push_back(rjson::value& base_array, rjson::value&& item);
+
+struct single_value_comp {
+    bool operator()(const rjson::value& r1, const rjson::value& r2) const;
+};
+
+} // end namespace rjson
+
+namespace std {
+std::ostream& operator<<(std::ostream& os, const rjson::value& v);
+}
--- a/alternator/serialization.cc
+++ b/alternator/serialization.cc
@@ -0,0 +1,261 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "base64.hh"
+#include "log.hh"
+#include "serialization.hh"
+#include "error.hh"
+#include "rapidjson/writer.h"
+#include "concrete_types.hh"
+#include "cql3/type_json.hh"
+
+static logging::logger slogger("alternator-serialization");
+
+namespace alternator {
+
+type_info type_info_from_string(std::string type) {
+    static thread_local const std::unordered_map<std::string, type_info> type_infos = {
+        {"S", {alternator_type::S, utf8_type}},
+        {"B", {alternator_type::B, bytes_type}},
+        {"BOOL", {alternator_type::BOOL, boolean_type}},
+        {"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
+    };
+    auto it = type_infos.find(type);
+    if (it == type_infos.end()) {
+        return {alternator_type::NOT_SUPPORTED_YET, utf8_type};
+    }
+    return it->second;
+}
+
+type_representation represent_type(alternator_type atype) {
+    static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {
+        {alternator_type::S, {"S", utf8_type}},
+        {alternator_type::B, {"B", bytes_type}},
+        {alternator_type::BOOL, {"BOOL", boolean_type}},
+        {alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
+    };
+    auto it = type_representations.find(atype);
+    if (it == type_representations.end()) {
+        throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));
+    }
+    return it->second;
+}
+
+struct from_json_visitor {
+    const rjson::value& v;
+    bytes_ostream& bo;
+
+    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
+    void operator()(const string_type_impl& t) {
+        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
+    }
+    void operator()(const bytes_type_impl& t) const {
+        bo.write(base64_decode(v));
+    }
+    void operator()(const boolean_type_impl& t) const {
+        bo.write(boolean_type->decompose(v.GetBool()));
+    }
+    void operator()(const decimal_type_impl& t) const {
+        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
+    }
+    // default
+    void operator()(const abstract_type& t) const {
+        bo.write(from_json_object(t, Json::Value(rjson::print(v)), cql_serialization_format::internal()));
+    }
+};
+
+bytes serialize_item(const rjson::value& item) {
+    if (item.IsNull() || item.MemberCount() != 1) {
+        throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
+    }
+    auto it = item.MemberBegin();
+    type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings
+
+    if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
+        slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
+        return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
+    }
+
+    bytes_ostream bo;
+    bo.write(bytes{int8_t(type_info.atype)});
+    visit(*type_info.dtype, from_json_visitor{it->value, bo});
+
+    return bytes(bo.linearize());
+}
+
+struct to_json_visitor {
+    rjson::value& deserialized;
+    const std::string& type_ident;
+    bytes_view bv;
+
+    void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
+    void operator()(const decimal_type_impl& t) const {
+        auto s = to_json_string(*decimal_type, bytes(bv));
+        //FIXME(sarna): unnecessary copy
+        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
+    }
+    void operator()(const string_type_impl& t) {
+        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));
+    }
+    void operator()(const bytes_type_impl& t) const {
+        std::string b64 = base64_encode(bv);
+        rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));
+    }
+    // default
+    void operator()(const abstract_type& t) const {
+        rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));
+    }
+};
+
+rjson::value deserialize_item(bytes_view bv) {
+    rjson::value deserialized(rapidjson::kObjectType);
+    if (bv.empty()) {
+        throw api_error("ValidationException", "Serialized value empty");
+    }
+
+    alternator_type atype = alternator_type(bv[0]);
+    bv.remove_prefix(1);
+
+    if (atype == alternator_type::NOT_SUPPORTED_YET) {
+        slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));
+        return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());
+    }
+    type_representation type_representation = represent_type(atype);
+    visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});
+
+    return deserialized;
+}
+
+std::string type_to_string(data_type type) {
+    static thread_local std::unordered_map<data_type, std::string> types = {
+        {utf8_type, "S"},
+        {bytes_type, "B"},
+        {boolean_type, "BOOL"},
+        {decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type
+    };
+    auto it = types.find(type);
+    if (it == types.end()) {
+        throw std::runtime_error(format("Unknown type {}", type->name()));
+    }
+    return it->second;
+}
+
+bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
+    std::string column_name = column.name_as_text();
+    std::string expected_type = type_to_string(column.type);
+
+    const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));
+    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {
+        throw api_error("ValidationException",
+                format("Missing or invalid value object for key column {}: {}", column_name, item));
+    }
+    return get_key_from_typed_value(key_typed_value, column, expected_type);
+}
+
+bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {
+    auto it = key_typed_value.MemberBegin();
+    if (it->name.GetString() != expected_type) {
+        throw api_error("ValidationException",
+                format("Type mismatch: expected type {} for key column {}, got type {}",
+                        expected_type, column.name_as_text(), it->name.GetString()));
+    }
+    if (column.type == bytes_type) {
+        return base64_decode(it->value);
+    } else {
+        return column.type->from_string(it->value.GetString());
+    }
+
+}
+
+rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {
+    if (column.type == bytes_type) {
+        std::string b64 = base64_encode(cell);
+        return rjson::from_string(b64);
+    } if (column.type == utf8_type) {
+        return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
+    } else if (column.type == decimal_type) {
+        // FIXME: use specialized Alternator number type, not the more
+        // general "decimal_type". A dedicated type can be more efficient
+        // in storage space and in parsing speed.
+        auto s = to_json_string(*decimal_type, bytes(cell));
+        return rjson::from_string(s);
+    } else {
+        // We shouldn't get here, we shouldn't see such key columns.
+        throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));
+    }
+}
+
+
+partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {
+    std::vector<bytes> raw_pk;
+    // FIXME: this is a loop, but we really allow only one partition key column.
+    for (const column_definition& cdef : schema->partition_key_columns()) {
+        bytes raw_value = get_key_column_value(item, cdef);
+        raw_pk.push_back(std::move(raw_value));
+    }
+   return partition_key::from_exploded(raw_pk);
+}
+
+clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
+    if (schema->clustering_key_size() == 0) {
+        return clustering_key::make_empty();
+    }
+    std::vector<bytes> raw_ck;
+    // FIXME: this is a loop, but we really allow only one clustering key column.
+    for (const column_definition& cdef : schema->clustering_key_columns()) {
+        bytes raw_value = get_key_column_value(item,  cdef);
+        raw_ck.push_back(std::move(raw_value));
+    }
+
+    return clustering_key::from_exploded(raw_ck);
+}
+
+big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
+    if (!v.IsObject() || v.MemberCount() != 1) {
+        throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
+    }
+    auto it = v.MemberBegin();
+    if (it->name != "N") {
+        throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
+    }
+    if (it->value.IsNumber()) {
+         // FIXME(sarna): should use big_decimal constructor with numeric values directly:
+        return big_decimal(rjson::print(it->value));
+    }
+    if (!it->value.IsString()) {
+        throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
+    }
+    return big_decimal(it->value.GetString());
+}
+
+const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {
+    if (!v.IsObject() || v.MemberCount() != 1) {
+        return {"", nullptr};
+    }
+    auto it = v.MemberBegin();
+    const std::string it_key = it->name.GetString();
+    if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
+        return {"", nullptr};
+    }
+    return std::make_pair(it_key, &(it->value));
+}
+
+}
--- a/alternator/serialization.hh
+++ b/alternator/serialization.hh
@@ -0,0 +1,72 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <string>
+#include <string_view>
+#include "types.hh"
+#include "schema.hh"
+#include "keys.hh"
+#include "rjson.hh"
+#include "utils/big_decimal.hh"
+
+namespace alternator {
+
+enum class alternator_type : int8_t {
+    S, B, BOOL, N, NOT_SUPPORTED_YET
+};
+
+struct type_info {
+    alternator_type atype;
+    data_type dtype;
+};
+
+struct type_representation {
+    std::string ident;
+    data_type dtype;
+};
+
+type_info type_info_from_string(std::string type);
+type_representation represent_type(alternator_type atype);
+
+bytes serialize_item(const rjson::value& item);
+rjson::value deserialize_item(bytes_view bv);
+
+std::string type_to_string(data_type type);
+
+bytes get_key_column_value(const rjson::value& item, const column_definition& column);
+bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);
+rjson::value json_key_column_value(bytes_view cell, const column_definition& column);
+
+partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
+clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
+
+// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it.  Otherwise,
+// raises ValidationException with diagnostic.
+big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
+
+// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"
+// and returns set's type and a pointer to that set. If the object does not encode a set,
+// returned value is {"", nullptr}
+const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v);
+
+}
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -0,0 +1,314 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "alternator/server.hh"
+#include "log.hh"
+#include <seastar/http/function_handlers.hh>
+#include <seastar/json/json_elements.hh>
+#include <seastarx.hh>
+#include "error.hh"
+#include "rjson.hh"
+#include "auth.hh"
+#include <cctype>
+#include "cql3/query_processor.hh"
+
+static logging::logger slogger("alternator-server");
+
+using namespace httpd;
+
+namespace alternator {
+
+static constexpr auto TARGET = "X-Amz-Target";
+
+inline std::vector<std::string_view> split(std::string_view text, char separator) {
+    std::vector<std::string_view> tokens;
+    if (text == "") {
+        return tokens;
+    }
+
+    while (true) {
+        auto pos = text.find_first_of(separator);
+        if (pos != std::string_view::npos) {
+            tokens.emplace_back(text.data(), pos);
+            text.remove_prefix(pos + 1);
+        } else {
+            tokens.emplace_back(text);
+            break;
+        }
+    }
+    return tokens;
+}
+
+// DynamoDB HTTP error responses are structured as follows
+// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
+// Our handlers throw an exception to report an error. If the exception
+// is of type alternator::api_error, it unwrapped and properly reported to
+// the user directly. Other exceptions are unexpected, and reported as
+// Internal Server Error.
+class api_handler : public handler_base {
+public:
+    api_handler(const future_json_function& _handle) : _f_handle(
+         [_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
+         return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {
+             if (resf.failed()) {
+                 // Exceptions of type api_error are wrapped as JSON and
+                 // returned to the client as expected. Other types of
+                 // exceptions are unexpected, and returned to the user
+                 // as an internal server error:
+                 api_error ret;
+                 try {
+                     resf.get();
+                 } catch (api_error &ae) {
+                     ret = ae;
+                 } catch (rjson::error & re) {
+                     ret = api_error("ValidationException", re.what());
+                 } catch (...) {
+                     ret = api_error(
+                             "Internal Server Error",
+                             format("Internal server error: {}", std::current_exception()),
+                             reply::status_type::internal_server_error);
+                 }
+                 // FIXME: what is this version number?
+                 rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +
+                         "\"message\":\"" + ret._msg + "\"}";
+                 rep->_status = ret._http_code;
+                 slogger.trace("api_handler error case: {}", rep->_content);
+                 return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
+             }
+             slogger.trace("api_handler success case");
+             auto res = resf.get0();
+             if (res._body_writer) {
+                 rep->write_body("json", std::move(res._body_writer));
+             } else {
+                 rep->_content += res._res;
+             }
+             return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
+         });
+    }), _type("json") { }
+
+    api_handler(const api_handler&) = default;
+    future<std::unique_ptr<reply>> handle(const sstring& path,
+            std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
+        return _f_handle(std::move(req), std::move(rep)).then(
+                [this](std::unique_ptr<reply> rep) {
+                    rep->done(_type);
+                    return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
+                });
+    }
+
+protected:
+    future_handler_function _f_handle;
+    sstring _type;
+};
+
+class health_handler : public handler_base {
+    virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
+        rep->set_status(reply::status_type::ok);
+        rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
+        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
+    }
+};
+
+future<> server::verify_signature(const request& req) {
+    if (!_enforce_authorization) {
+        slogger.debug("Skipping authorization");
+        return make_ready_future<>();
+    }
+    auto host_it = req._headers.find("Host");
+    if (host_it == req._headers.end()) {
+        throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
+    }
+    auto authorization_it = req._headers.find("Authorization");
+    if (host_it == req._headers.end()) {
+        throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
+    }
+    std::string host = host_it->second;
+    std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
+    std::string credential;
+    std::string user_signature;
+    std::string signed_headers_str;
+    std::vector<std::string_view> signed_headers;
+    for (std::string_view entry : credentials_raw) {
+        std::vector<std::string_view> entry_split = split(entry, '=');
+        if (entry_split.size() != 2) {
+            if (entry != "AWS4-HMAC-SHA256") {
+                throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
+            }
+            continue;
+        }
+        std::string_view auth_value = entry_split[1];
+        // Commas appear as an additional (quite redundant) delimiter
+        if (auth_value.back() == ',') {
+            auth_value.remove_suffix(1);
+        }
+        if (entry_split[0] == "Credential") {
+            credential = std::string(auth_value);
+        } else if (entry_split[0] == "Signature") {
+            user_signature = std::string(auth_value);
+        } else if (entry_split[0] == "SignedHeaders") {
+            signed_headers_str = std::string(auth_value);
+            signed_headers = split(auth_value, ';');
+            std::sort(signed_headers.begin(), signed_headers.end());
+        }
+    }
+    std::vector<std::string_view> credential_split = split(credential, '/');
+    if (credential_split.size() != 5) {
+        throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
+    }
+    std::string user(credential_split[0]);
+    std::string datestamp(credential_split[1]);
+    std::string region(credential_split[2]);
+    std::string service(credential_split[3]);
+
+    std::map<std::string_view, std::string_view> signed_headers_map;
+    for (const auto& header : signed_headers) {
+        signed_headers_map.emplace(header, std::string_view());
+    }
+    for (auto& header : req._headers) {
+        std::string header_str;
+        header_str.resize(header.first.size());
+        std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
+        auto it = signed_headers_map.find(header_str);
+        if (it != signed_headers_map.end()) {
+            it->second = std::string_view(header.second);
+        }
+    }
+
+    auto cache_getter = [] (std::string username) {
+        return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
+    };
+    return _key_cache.get_ptr(user, cache_getter).then([this, &req,
+                                                    user = std::move(user),
+                                                    host = std::move(host),
+                                                    datestamp = std::move(datestamp),
+                                                    signed_headers_str = std::move(signed_headers_str),
+                                                    signed_headers_map = std::move(signed_headers_map),
+                                                    region = std::move(region),
+                                                    service = std::move(service),
+                                                    user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
+        std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
+                datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
+
+        if (signature != std::string_view(user_signature)) {
+            _key_cache.remove(user);
+            throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
+        }
+    });
+}
+
+future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
+    _executor.local()._stats.total_operations++;
+    sstring target = req->get_header(TARGET);
+    std::vector<std::string_view> split_target = split(target, '.');
+    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
+    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
+    slogger.trace("Request: {} {}", op, req->content);
+    return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
+        auto callback_it = _callbacks.find(op);
+        if (callback_it == _callbacks.end()) {
+            _executor.local()._stats.unsupported_operations++;
+            throw api_error("UnknownOperationException",
+                    format("Unsupported operation {}", op));
+        }
+        //FIXME: Client state can provide more context, e.g. client's endpoint address
+        // We use unique_ptr because client_state cannot be moved or copied
+        return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
+            client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
+            tracing::trace_state_ptr trace_state = executor::maybe_trace_query(*client_state, op, req->content);
+            tracing::trace(trace_state, op);
+            return callback_it->second(_executor.local(), *client_state, trace_state, std::move(req)).finally([trace_state] {});
+        });
+    });
+}
+
+void server::set_routes(routes& r) {
+    api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
+        return handle_api_request(std::move(req));
+    });
+
+    r.add(operation_type::POST, url("/"), req_handler);
+    r.add(operation_type::GET, url("/"), new health_handler);
+}
+
+//FIXME: A way to immediately invalidate the cache should be considered,
+// e.g. when the system table which stores the keys is changed.
+// For now, this propagation may take up to 1 minute.
+server::server(seastar::sharded<executor>& e)
+        : _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
+      , _callbacks{
+        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) {
+            return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req), trace_state = std::move(trace_state)] () mutable { return e.create_table(client_state, std::move(trace_state), req->content); }); }
+        },
+        {"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_table(client_state, std::move(trace_state), req->content); }},
+        {"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_table(client_state, std::move(trace_state), req->content); }},
+        {"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.put_item(client_state, std::move(trace_state), req->content); }},
+        {"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.update_item(client_state, std::move(trace_state), req->content); }},
+        {"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.get_item(client_state, std::move(trace_state), req->content); }},
+        {"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.delete_item(client_state, std::move(trace_state), req->content); }},
+        {"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
+        {"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.scan(client_state, std::move(trace_state), req->content); }},
+        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
+        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, std::move(trace_state), req->content); }},
+        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, std::move(trace_state), req->content); }},
+        {"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, std::unique_ptr<request> req) { return e.query(client_state, std::move(trace_state), req->content); }},
+    } {
+}
+
+future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {
+    _enforce_authorization = enforce_authorization;
+    if (!port && !https_port) {
+        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
+                " must be specified in order to init an alternator HTTP server instance"));
+    }
+    return seastar::async([this, addr, port, https_port, creds] {
+        try {
+            _executor.invoke_on_all([] (executor& e) {
+                return e.start();
+            }).get();
+
+            if (port) {
+                _control.start().get();
+                _control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
+                _control.listen(socket_address{addr, *port}).get();
+                slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);
+            }
+            if (https_port) {
+                _https_control.start().get();
+                _https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
+                _https_control.server().invoke_on_all([creds] (http_server& serv) {
+                    return serv.set_tls_credentials(creds->build_server_credentials());
+                }).get();
+
+                _https_control.listen(socket_address{addr, *https_port}).get();
+                slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
+            }
+        } catch (...) {
+            slogger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
+                    addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
+            std::throw_with_nested(std::runtime_error(
+                    format("Failed to set up Alternator HTTP server on {} port {}, TLS port {}",
+                            addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF")));
+        }
+    });
+}
+
+}
+
--- a/alternator/server.hh
+++ b/alternator/server.hh
@@ -0,0 +1,54 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "alternator/executor.hh"
+#include <seastar/core/future.hh>
+#include <seastar/http/httpd.hh>
+#include <seastar/net/tls.hh>
+#include <optional>
+#include <alternator/auth.hh>
+
+namespace alternator {
+
+class server {
+    using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, tracing::trace_state_ptr, std::unique_ptr<request>)>;
+    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
+
+    seastar::httpd::http_server_control _control;
+    seastar::httpd::http_server_control _https_control;
+    seastar::sharded<executor>& _executor;
+    key_cache _key_cache;
+    bool _enforce_authorization;
+    alternator_callbacks_map _callbacks;
+public:
+    server(seastar::sharded<executor>& executor);
+
+    seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);
+private:
+    void set_routes(seastar::httpd::routes& r);
+    future<> verify_signature(const seastar::httpd::request& r);
+    future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);
+};
+
+}
+
--- a/alternator/stats.cc
+++ b/alternator/stats.cc
@@ -0,0 +1,98 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "stats.hh"
+
+#include <seastar/core/metrics.hh>
+
+namespace alternator {
+
+const char* ALTERNATOR_METRICS = "alternator";
+
+stats::stats() : api_operations{} {
+    // Register the
+    seastar::metrics::label op("op");
+
+    _metrics.add_group("alternator", {
+#define OPERATION(name, CamelCaseName) \
+                seastar::metrics::make_total_operations("operation", api_operations.name, \
+                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
+#define OPERATION_LATENCY(name, CamelCaseName) \
+                seastar::metrics::make_histogram("op_latency", \
+                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),
+            OPERATION(batch_write_item, "BatchWriteItem")
+            OPERATION(create_backup, "CreateBackup")
+            OPERATION(create_global_table, "CreateGlobalTable")
+            OPERATION(create_table, "CreateTable")
+            OPERATION(delete_backup, "DeleteBackup")
+            OPERATION(delete_item, "DeleteItem")
+            OPERATION(delete_table, "DeleteTable")
+            OPERATION(describe_backup, "DescribeBackup")
+            OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
+            OPERATION(describe_endpoints, "DescribeEndpoints")
+            OPERATION(describe_global_table, "DescribeGlobalTable")
+            OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")
+            OPERATION(describe_limits, "DescribeLimits")
+            OPERATION(describe_table, "DescribeTable")
+            OPERATION(describe_time_to_live, "DescribeTimeToLive")
+            OPERATION(get_item, "GetItem")
+            OPERATION(list_backups, "ListBackups")
+            OPERATION(list_global_tables, "ListGlobalTables")
+            OPERATION(list_tables, "ListTables")
+            OPERATION(list_tags_of_resource, "ListTagsOfResource")
+            OPERATION(put_item, "PutItem")
+            OPERATION(query, "Query")
+            OPERATION(restore_table_from_backup, "RestoreTableFromBackup")
+            OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")
+            OPERATION(scan, "Scan")
+            OPERATION(tag_resource, "TagResource")
+            OPERATION(transact_get_items, "TransactGetItems")
+            OPERATION(transact_write_items, "TransactWriteItems")
+            OPERATION(untag_resource, "UntagResource")
+            OPERATION(update_continuous_backups, "UpdateContinuousBackups")
+            OPERATION(update_global_table, "UpdateGlobalTable")
+            OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")
+            OPERATION(update_item, "UpdateItem")
+            OPERATION(update_table, "UpdateTable")
+            OPERATION(update_time_to_live, "UpdateTimeToLive")
+            OPERATION_LATENCY(put_item_latency, "PutItem")
+            OPERATION_LATENCY(get_item_latency, "GetItem")
+            OPERATION_LATENCY(delete_item_latency, "DeleteItem")
+            OPERATION_LATENCY(update_item_latency, "UpdateItem")
+    });
+    _metrics.add_group("alternator", {
+            seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
+                    seastar::metrics::description("number of unsupported operations via Alternator API")),
+            seastar::metrics::make_total_operations("total_operations", total_operations,
+                    seastar::metrics::description("number of total operations via Alternator API")),
+            seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
+                    seastar::metrics::description("number of performed read-before-write operations")),
+            seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
+                    seastar::metrics::description("number of rows read during filtering operations")),
+            seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
+                    seastar::metrics::description("number of rows read and matched during filtering operations")),
+            seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
+                    seastar::metrics::description("number of rows read and dropped during filtering operations")),
+    });
+}
+
+
+}
--- a/alternator/stats.hh
+++ b/alternator/stats.hh
@@ -0,0 +1,95 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <cstdint>
+
+#include <seastar/core/metrics_registration.hh>
+#include "seastarx.hh"
+#include "utils/estimated_histogram.hh"
+#include "cql3/stats.hh"
+
+namespace alternator {
+
+// Object holding per-shard statistics related to Alternator.
+// While this object is alive, these metrics are also registered to be
+// visible by the metrics REST API, with the "alternator" prefix.
+class stats {
+public:
+    stats();
+    // Count of DynamoDB API operations by types
+    struct {
+        uint64_t batch_get_item = 0;
+        uint64_t batch_write_item = 0;
+        uint64_t create_backup = 0;
+        uint64_t create_global_table = 0;
+        uint64_t create_table = 0;
+        uint64_t delete_backup = 0;
+        uint64_t delete_item = 0;
+        uint64_t delete_table = 0;
+        uint64_t describe_backup = 0;
+        uint64_t describe_continuous_backups = 0;
+        uint64_t describe_endpoints = 0;
+        uint64_t describe_global_table = 0;
+        uint64_t describe_global_table_settings = 0;
+        uint64_t describe_limits = 0;
+        uint64_t describe_table = 0;
+        uint64_t describe_time_to_live = 0;
+        uint64_t get_item = 0;
+        uint64_t list_backups = 0;
+        uint64_t list_global_tables = 0;
+        uint64_t list_tables = 0;
+        uint64_t list_tags_of_resource = 0;
+        uint64_t put_item = 0;
+        uint64_t query = 0;
+        uint64_t restore_table_from_backup = 0;
+        uint64_t restore_table_to_point_in_time = 0;
+        uint64_t scan = 0;
+        uint64_t tag_resource = 0;
+        uint64_t transact_get_items = 0;
+        uint64_t transact_write_items = 0;
+        uint64_t untag_resource = 0;
+        uint64_t update_continuous_backups = 0;
+        uint64_t update_global_table = 0;
+        uint64_t update_global_table_settings = 0;
+        uint64_t update_item = 0;
+        uint64_t update_table = 0;
+        uint64_t update_time_to_live = 0;
+
+        utils::estimated_histogram put_item_latency;
+        utils::estimated_histogram get_item_latency;
+        utils::estimated_histogram delete_item_latency;
+        utils::estimated_histogram update_item_latency;
+    } api_operations;
+    // Miscellaneous event counters
+    uint64_t total_operations = 0;
+    uint64_t unsupported_operations = 0;
+    uint64_t reads_before_write = 0;
+    // CQL-derived stats
+    cql3::cql_stats cql_stats;
+private:
+    // The metric_groups object holds this stat object's metrics registered
+    // as long as the stats object is alive.
+    seastar::metrics::metric_groups _metrics;
+};
+
+}
--- a/api/api-doc/cache_service.json
+++ b/api/api-doc/cache_service.json
@@ -13,7 +13,7 @@
            {
               "method":"GET",
               "summary":"get row cache save period in seconds",
-               "type":"int",
+               "type": "long",
               "nickname":"get_row_cache_save_period_in_seconds",
               "produces":[
                  "application/json"
@@ -35,7 +35,7 @@
                     "description":"row cache save period in seconds",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -48,7 +48,7 @@
            {
               "method":"GET",
               "summary":"get key cache save period in seconds",
-               "type":"int",
+               "type": "long",
               "nickname":"get_key_cache_save_period_in_seconds",
               "produces":[
                  "application/json"
@@ -70,7 +70,7 @@
                     "description":"key cache save period in seconds",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -83,7 +83,7 @@
            {
               "method":"GET",
               "summary":"get counter cache save period in seconds",
-               "type":"int",
+               "type": "long",
               "nickname":"get_counter_cache_save_period_in_seconds",
               "produces":[
                  "application/json"
@@ -105,7 +105,7 @@
                     "description":"counter cache save period in seconds",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -118,7 +118,7 @@
            {
               "method":"GET",
               "summary":"get row cache keys to save",
-               "type":"int",
+               "type": "long",
               "nickname":"get_row_cache_keys_to_save",
               "produces":[
                  "application/json"
@@ -140,7 +140,7 @@
                     "description":"row cache keys to save",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -153,7 +153,7 @@
            {
               "method":"GET",
               "summary":"get key cache keys to save",
-               "type":"int",
+               "type": "long",
               "nickname":"get_key_cache_keys_to_save",
               "produces":[
                  "application/json"
@@ -175,7 +175,7 @@
                     "description":"key cache keys to save",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -188,7 +188,7 @@
            {
               "method":"GET",
               "summary":"get counter cache keys to save",
-               "type":"int",
+               "type": "long",
               "nickname":"get_counter_cache_keys_to_save",
               "produces":[
                  "application/json"
@@ -210,7 +210,7 @@
                     "description":"counter cache keys to save",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -448,7 +448,7 @@
        {
          "method": "GET",
          "summary": "Get key entries",
-          "type": "int",
+          "type": "long",
          "nickname": "get_key_entries",
          "produces": [
            "application/json"
@@ -568,7 +568,7 @@
        {
          "method": "GET",
          "summary": "Get row entries",
-          "type": "int",
+          "type": "long",
          "nickname": "get_row_entries",
          "produces": [
            "application/json"
@@ -688,7 +688,7 @@
        {
          "method": "GET",
          "summary": "Get counter entries",
-          "type": "int",
+          "type": "long",
          "nickname": "get_counter_entries",
          "produces": [
            "application/json"
--- a/api/api-doc/column_family.json
+++ b/api/api-doc/column_family.json
@@ -121,7 +121,7 @@
                     "description":"The minimum number of sstables in queue before compaction kicks off",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -172,7 +172,7 @@
                     "description":"The maximum number of sstables in queue before compaction kicks off",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -223,7 +223,7 @@
                     "description":"The maximum number of sstables in queue before compaction kicks off",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  },
                  {
@@ -231,7 +231,7 @@
                     "description":"The minimum number of sstables in queue before compaction kicks off",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -544,7 +544,7 @@
               "summary":"sstable count for each level. empty unless leveled compaction is used",
               "type":"array",
               "items":{
-                  "type":"int"
+                  "type": "long"
               },
               "nickname":"get_sstable_count_per_level",
               "produces":[
@@ -636,7 +636,7 @@
                     "description":"Duration (in milliseconds) of monitoring operation",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  },
                  {
@@ -644,7 +644,7 @@
                    "description":"number of the top partitions to list",
                    "required":false,
                    "allowMultiple":false,
-                    "type":"int",
+                    "type": "long",
                    "paramType":"query"
                 },
                 {
@@ -652,7 +652,7 @@
                    "description":"capacity of stream summary: determines amount of resources used in query processing",
                    "required":false,
                    "allowMultiple":false,
-                    "type":"int",
+                    "type": "long",
                    "paramType":"query"
                 }
              ]
@@ -921,7 +921,7 @@
            {
               "method":"GET",
               "summary":"Get memtable switch count",
-               "type":"int",
+               "type": "long",
               "nickname":"get_memtable_switch_count",
               "produces":[
                  "application/json"
@@ -945,7 +945,7 @@
            {
               "method":"GET",
               "summary":"Get all memtable switch count",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_memtable_switch_count",
               "produces":[
                  "application/json"
@@ -1082,7 +1082,7 @@
            {
               "method":"GET",
               "summary":"Get read latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_read_latency",
               "produces":[
                  "application/json"
@@ -1235,7 +1235,7 @@
            {
               "method":"GET",
               "summary":"Get all read latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_read_latency",
               "produces":[
                  "application/json"
@@ -1251,7 +1251,7 @@
            {
               "method":"GET",
               "summary":"Get range latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_range_latency",
               "produces":[
                  "application/json"
@@ -1275,7 +1275,7 @@
            {
               "method":"GET",
               "summary":"Get all range latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_range_latency",
               "produces":[
                  "application/json"
@@ -1291,7 +1291,7 @@
            {
               "method":"GET",
               "summary":"Get write latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_write_latency",
               "produces":[
                  "application/json"
@@ -1444,7 +1444,7 @@
            {
               "method":"GET",
               "summary":"Get all write latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_write_latency",
               "produces":[
                  "application/json"
@@ -1460,7 +1460,7 @@
            {
               "method":"GET",
               "summary":"Get pending flushes",
-               "type":"int",
+               "type": "long",
               "nickname":"get_pending_flushes",
               "produces":[
                  "application/json"
@@ -1484,7 +1484,7 @@
            {
               "method":"GET",
               "summary":"Get all pending flushes",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_pending_flushes",
               "produces":[
                  "application/json"
@@ -1500,7 +1500,7 @@
            {
               "method":"GET",
               "summary":"Get pending compactions",
-               "type":"int",
+               "type": "long",
               "nickname":"get_pending_compactions",
               "produces":[
                  "application/json"
@@ -1524,7 +1524,7 @@
            {
               "method":"GET",
               "summary":"Get all pending compactions",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_pending_compactions",
               "produces":[
                  "application/json"
@@ -1540,7 +1540,7 @@
            {
               "method":"GET",
               "summary":"Get live ss table count",
-               "type":"int",
+               "type": "long",
               "nickname":"get_live_ss_table_count",
               "produces":[
                  "application/json"
@@ -1564,7 +1564,7 @@
            {
               "method":"GET",
               "summary":"Get all live ss table count",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_live_ss_table_count",
               "produces":[
                  "application/json"
@@ -1580,7 +1580,7 @@
            {
               "method":"GET",
               "summary":"Get live disk space used",
-               "type":"int",
+               "type": "long",
               "nickname":"get_live_disk_space_used",
               "produces":[
                  "application/json"
@@ -1604,7 +1604,7 @@
            {
               "method":"GET",
               "summary":"Get all live disk space used",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_live_disk_space_used",
               "produces":[
                  "application/json"
@@ -1620,7 +1620,7 @@
            {
               "method":"GET",
               "summary":"Get total disk space used",
-               "type":"int",
+               "type": "long",
               "nickname":"get_total_disk_space_used",
               "produces":[
                  "application/json"
@@ -1644,7 +1644,7 @@
            {
               "method":"GET",
               "summary":"Get all total disk space used",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_total_disk_space_used",
               "produces":[
                  "application/json"
@@ -2100,7 +2100,7 @@
            {
               "method":"GET",
               "summary":"Get speculative retries",
-               "type":"int",
+               "type": "long",
               "nickname":"get_speculative_retries",
               "produces":[
                  "application/json"
@@ -2124,7 +2124,7 @@
            {
               "method":"GET",
               "summary":"Get all speculative retries",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_speculative_retries",
               "produces":[
                  "application/json"
@@ -2204,7 +2204,7 @@
            {
               "method":"GET",
               "summary":"Get row cache hit out of range",
-               "type":"int",
+               "type": "long",
               "nickname":"get_row_cache_hit_out_of_range",
               "produces":[
                  "application/json"
@@ -2228,7 +2228,7 @@
            {
               "method":"GET",
               "summary":"Get all row cache hit out of range",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_row_cache_hit_out_of_range",
               "produces":[
                  "application/json"
@@ -2244,7 +2244,7 @@
            {
               "method":"GET",
               "summary":"Get row cache hit",
-               "type":"int",
+               "type": "long",
               "nickname":"get_row_cache_hit",
               "produces":[
                  "application/json"
@@ -2268,7 +2268,7 @@
            {
               "method":"GET",
               "summary":"Get all row cache hit",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_row_cache_hit",
               "produces":[
                  "application/json"
@@ -2284,7 +2284,7 @@
            {
               "method":"GET",
               "summary":"Get row cache miss",
-               "type":"int",
+               "type": "long",
               "nickname":"get_row_cache_miss",
               "produces":[
                  "application/json"
@@ -2308,7 +2308,7 @@
            {
               "method":"GET",
               "summary":"Get all row cache miss",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_row_cache_miss",
               "produces":[
                  "application/json"
@@ -2324,7 +2324,7 @@
            {
               "method":"GET",
               "summary":"Get cas prepare",
-               "type":"int",
+               "type": "long",
               "nickname":"get_cas_prepare",
               "produces":[
                  "application/json"
@@ -2348,7 +2348,7 @@
            {
               "method":"GET",
               "summary":"Get cas propose",
-               "type":"int",
+               "type": "long",
               "nickname":"get_cas_propose",
               "produces":[
                  "application/json"
@@ -2372,7 +2372,7 @@
            {
               "method":"GET",
               "summary":"Get cas commit",
-               "type":"int",
+               "type": "long",
               "nickname":"get_cas_commit",
               "produces":[
                  "application/json"
--- a/api/api-doc/compaction_manager.json
+++ b/api/api-doc/compaction_manager.json
@@ -118,7 +118,7 @@
        {
          "method": "GET",
          "summary": "Get pending tasks",
-          "type": "int",
+          "type": "long",
          "nickname": "get_pending_tasks",
          "produces": [
            "application/json"
@@ -181,7 +181,7 @@
        {
          "method": "GET",
          "summary": "Get bytes compacted",
-          "type": "int",
+          "type": "long",
          "nickname": "get_bytes_compacted",
          "produces": [
            "application/json"
@@ -197,7 +197,7 @@
         "description":"A row merged information",
         "properties":{
            "key":{
-               "type":"int",
+               "type": "long",
               "description":"The number of sstable"
            },
            "value":{
--- a/api/api-doc/failure_detector.json
+++ b/api/api-doc/failure_detector.json
@@ -110,7 +110,7 @@
            {
               "method":"GET",
               "summary":"Get count down endpoint",
-               "type":"int",
+               "type": "long",
               "nickname":"get_down_endpoint_count",
               "produces":[
                  "application/json"
@@ -126,7 +126,7 @@
            {
               "method":"GET",
               "summary":"Get count up endpoint",
-               "type":"int",
+               "type": "long",
               "nickname":"get_up_endpoint_count",
               "produces":[
                  "application/json"
@@ -180,11 +180,11 @@
                    "description": "The endpoint address"
                },
                "generation": {
-                    "type": "int",
+                    "type": "long",
                    "description": "The heart beat generation"
                },
                "version": {
-                    "type": "int",
+                    "type": "long",
                    "description": "The heart beat version"
                },
                "update_time": {
@@ -209,7 +209,7 @@
           "description": "Holds a version value for an application state",
               "properties": {
                "application_state": {
-                    "type": "int",
+                    "type": "long",
                    "description": "The application state enum index"
                },
                "value": {
@@ -217,7 +217,7 @@
                    "description": "The version value"
                },
                "version": {
-                    "type": "int",
+                    "type": "long",
                    "description": "The application state version"
                }
            }
--- a/api/api-doc/gossiper.json
+++ b/api/api-doc/gossiper.json
@@ -75,7 +75,7 @@
            {
               "method":"GET",
               "summary":"Returns files which are pending for archival attempt. Does NOT include failed archive attempts",
-               "type":"int",
+               "type": "long",
               "nickname":"get_current_generation_number",
               "produces":[
                  "application/json"
@@ -99,7 +99,7 @@
            {
               "method":"GET",
               "summary":"Get heart beat version for a node",
-               "type":"int",
+               "type": "long",
               "nickname":"get_current_heart_beat_version",
               "produces":[
                  "application/json"
--- a/api/api-doc/hinted_handoff.json
+++ b/api/api-doc/hinted_handoff.json
@@ -99,7 +99,7 @@
        {
          "method": "GET",
          "summary": "Get create hint count",
-          "type": "int",
+          "type": "long",
          "nickname": "get_create_hint_count",
          "produces": [
            "application/json"
@@ -123,7 +123,7 @@
        {
          "method": "GET",
          "summary": "Get not stored hints count",
-          "type": "int",
+          "type": "long",
          "nickname": "get_not_stored_hints_count",
          "produces": [
            "application/json"
--- a/api/api-doc/messaging_service.json
+++ b/api/api-doc/messaging_service.json
@@ -191,7 +191,7 @@
            {
               "method":"GET",
               "summary":"Get the version number",
-               "type":"int",
+               "type": "long",
               "nickname":"get_version",
               "produces":[
                  "application/json"
--- a/api/api-doc/storage_proxy.json
+++ b/api/api-doc/storage_proxy.json
@@ -105,7 +105,7 @@
            {
               "method":"GET",
               "summary":"Get the max hint window",
-               "type":"int",
+               "type": "long",
               "nickname":"get_max_hint_window",
               "produces":[
                  "application/json"
@@ -128,7 +128,7 @@
                     "description":"max hint window in ms",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -141,7 +141,7 @@
            {
               "method":"GET",
               "summary":"Get max hints in progress",
-               "type":"int",
+               "type": "long",
               "nickname":"get_max_hints_in_progress",
               "produces":[
                  "application/json"
@@ -164,7 +164,7 @@
                     "description":"max hints in progress",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -177,7 +177,7 @@
            {
               "method":"GET",
               "summary":"get hints in progress",
-               "type":"int",
+               "type": "long",
               "nickname":"get_hints_in_progress",
               "produces":[
                  "application/json"
@@ -602,7 +602,7 @@
        {
          "method": "GET",
          "summary": "Get cas write metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_cas_write_metrics_unfinished_commit",
          "produces": [
            "application/json"
@@ -632,7 +632,7 @@
        {
          "method": "GET",
          "summary": "Get cas write metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_cas_write_metrics_condition_not_met",
          "produces": [
            "application/json"
@@ -647,7 +647,7 @@
        {
          "method": "GET",
          "summary": "Get cas read metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_cas_read_metrics_unfinished_commit",
          "produces": [
            "application/json"
@@ -671,28 +671,13 @@
        }
      ]
    },
-    {
-      "path": "/storage_proxy/metrics/cas_read/condition_not_met",
-      "operations": [
-        {
-          "method": "GET",
-          "summary": "Get cas read metrics",
-          "type": "int",
-          "nickname": "get_cas_read_metrics_condition_not_met",
-          "produces": [
-            "application/json"
-          ],
-          "parameters": []
-        }
-      ]
-    },
    {
      "path": "/storage_proxy/metrics/read/timeouts",
      "operations": [
        {
          "method": "GET",
          "summary": "Get read metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_read_metrics_timeouts",
          "produces": [
            "application/json"
@@ -707,7 +692,7 @@
        {
          "method": "GET",
          "summary": "Get read metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_read_metrics_unavailables",
          "produces": [
            "application/json"
@@ -842,7 +827,7 @@
        {
          "method": "GET",
          "summary": "Get range metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_range_metrics_timeouts",
          "produces": [
            "application/json"
@@ -857,7 +842,7 @@
        {
          "method": "GET",
          "summary": "Get range metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_range_metrics_unavailables",
          "produces": [
            "application/json"
@@ -902,7 +887,7 @@
        {
          "method": "GET",
          "summary": "Get write metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_write_metrics_timeouts",
          "produces": [
            "application/json"
@@ -917,7 +902,7 @@
        {
          "method": "GET",
          "summary": "Get write metrics",
-          "type": "int",
+          "type": "long",
          "nickname": "get_write_metrics_unavailables",
          "produces": [
            "application/json"
@@ -1023,7 +1008,7 @@
            {
               "method":"GET",
               "summary":"Get read latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_read_latency",
               "produces":[
                  "application/json"
@@ -1055,7 +1040,7 @@
            {
               "method":"GET",
               "summary":"Get write latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_write_latency",
               "produces":[
                  "application/json"
@@ -1087,7 +1072,7 @@
            {
               "method":"GET",
               "summary":"Get range latency",
-               "type":"int",
+               "type": "long",
               "nickname":"get_range_latency",
               "produces":[
                  "application/json"
--- a/api/api-doc/storage_service.json
+++ b/api/api-doc/storage_service.json
@@ -458,7 +458,7 @@
            {
               "method":"GET",
               "summary":"Return the generation value for this node.",
-               "type":"int",
+               "type": "long",
               "nickname":"get_current_generation_number",
               "produces":[
                  "application/json"
@@ -646,7 +646,7 @@
            {
               "method":"POST",
               "summary":"Trigger a cleanup of keys on a single keyspace",
-               "type":"int",
+               "type": "long",
               "nickname":"force_keyspace_cleanup",
               "produces":[
                  "application/json"
@@ -678,7 +678,7 @@
            {
               "method":"GET",
               "summary":"Scrub (deserialize + reserialize at the latest version, skipping bad rows if any) the given keyspace. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false",
-               "type":"int",
+               "type": "long",
               "nickname":"scrub",
               "produces":[
                  "application/json"
@@ -726,7 +726,7 @@
            {
               "method":"GET",
               "summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first.",
-               "type":"int",
+               "type": "long",
               "nickname":"upgrade_sstables",
               "produces":[
                  "application/json"
@@ -800,7 +800,7 @@
               "summary":"Return an array with the ids of the currently active repairs",
               "type":"array",
               "items":{
-                  "type":"int"
+                  "type": "long"
               },
               "nickname":"get_active_repair_async",
               "produces":[
@@ -816,7 +816,7 @@
            {
               "method":"POST",
               "summary":"Invoke repair asynchronously. You can track repair progress by using the get supplying id",
-               "type":"int",
+               "type": "long",
               "nickname":"repair_async",
               "produces":[
                  "application/json"
@@ -947,7 +947,7 @@
                     "description":"The repair ID to check for status",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -1277,18 +1277,18 @@
                  },
                  {
                     "name":"dynamic_update_interval",
-                     "description":"integer, in ms (default 100)",
+                     "description":"interval in ms (default 100)",
                     "required":false,
                     "allowMultiple":false,
-                     "type":"integer",
+                     "type":"long",
                     "paramType":"query"
                  },
                  {
                     "name":"dynamic_reset_interval",
-                     "description":"integer, in ms (default 600,000)",
+                     "description":"interval in ms (default 600,000)",
                     "required":false,
                     "allowMultiple":false,
-                     "type":"integer",
+                     "type":"long",
                     "paramType":"query"
                  },
                  {
@@ -1493,7 +1493,7 @@
                     "description":"Stream throughput",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -1501,7 +1501,7 @@
            {
               "method":"GET",
               "summary":"Get stream throughput mb per sec",
-               "type":"int",
+               "type": "long",
               "nickname":"get_stream_throughput_mb_per_sec",
               "produces":[
                  "application/json"
@@ -1517,7 +1517,7 @@
            {
               "method":"GET",
               "summary":"get compaction throughput mb per sec",
-               "type":"int",
+               "type": "long",
               "nickname":"get_compaction_throughput_mb_per_sec",
               "produces":[
                  "application/json"
@@ -1539,7 +1539,7 @@
                     "description":"compaction throughput",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -1943,7 +1943,7 @@
            {
               "method":"GET",
               "summary":"Returns the threshold for warning of queries with many tombstones",
-               "type":"int",
+               "type": "long",
               "nickname":"get_tombstone_warn_threshold",
               "produces":[
                  "application/json"
@@ -1965,7 +1965,7 @@
                     "description":"tombstone debug threshold",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -1978,7 +1978,7 @@
            {
               "method":"GET",
               "summary":"",
-               "type":"int",
+               "type": "long",
               "nickname":"get_tombstone_failure_threshold",
               "produces":[
                  "application/json"
@@ -2000,7 +2000,7 @@
                     "description":"tombstone debug threshold",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -2013,7 +2013,7 @@
            {
               "method":"GET",
               "summary":"Returns the threshold for rejecting queries due to a large batch size",
-               "type":"int",
+               "type": "long",
               "nickname":"get_batch_size_failure_threshold",
               "produces":[
                  "application/json"
@@ -2035,7 +2035,7 @@
                     "description":"batch size debug threshold",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -2059,7 +2059,7 @@
                     "description":"throttle in kb",
                     "required":true,
                     "allowMultiple":false,
-                     "type":"int",
+                     "type": "long",
                     "paramType":"query"
                  }
               ]
@@ -2072,7 +2072,7 @@
            {
               "method":"GET",
               "summary":"Get load",
-               "type":"int",
+               "type": "long",
               "nickname":"get_metrics_load",
               "produces":[
                  "application/json"
@@ -2088,7 +2088,7 @@
            {
               "method":"GET",
               "summary":"Get exceptions",
-               "type":"int",
+               "type": "long",
               "nickname":"get_exceptions",
               "produces":[
                  "application/json"
@@ -2104,7 +2104,7 @@
            {
               "method":"GET",
               "summary":"Get total hints in progress",
-               "type":"int",
+               "type": "long",
               "nickname":"get_total_hints_in_progress",
               "produces":[
                  "application/json"
@@ -2120,7 +2120,7 @@
            {
               "method":"GET",
               "summary":"Get total hints",
-               "type":"int",
+               "type": "long",
               "nickname":"get_total_hints",
               "produces":[
                  "application/json"
--- a/api/api-doc/stream_manager.json
+++ b/api/api-doc/stream_manager.json
@@ -32,7 +32,7 @@
            {
               "method":"GET",
               "summary":"Get number of active outbound streams",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_active_streams_outbound",
               "produces":[
                  "application/json"
@@ -48,7 +48,7 @@
            {
               "method":"GET",
               "summary":"Get total incoming bytes",
-               "type":"int",
+               "type": "long",
               "nickname":"get_total_incoming_bytes",
               "produces":[
                  "application/json"
@@ -72,7 +72,7 @@
            {
               "method":"GET",
               "summary":"Get all total incoming bytes",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_total_incoming_bytes",
               "produces":[
                  "application/json"
@@ -88,7 +88,7 @@
            {
               "method":"GET",
               "summary":"Get total outgoing bytes",
-               "type":"int",
+               "type": "long",
               "nickname":"get_total_outgoing_bytes",
               "produces":[
                  "application/json"
@@ -112,7 +112,7 @@
            {
               "method":"GET",
               "summary":"Get all total outgoing bytes",
-               "type":"int",
+               "type": "long",
               "nickname":"get_all_total_outgoing_bytes",
               "produces":[
                  "application/json"
@@ -154,7 +154,7 @@
               "description":"The peer"
            },
            "session_index":{
-               "type":"int",
+               "type": "long",
               "description":"The session index"
            },
            "connecting":{
@@ -211,7 +211,7 @@
               "description":"The ID"
            },
            "files":{
-               "type":"int",
+               "type": "long",
               "description":"Number of files to transfer. Can be 0 if nothing to transfer for some streaming request."
            },
            "total_size":{
@@ -242,7 +242,7 @@
               "description":"The peer address"
            },
            "session_index":{
-               "type":"int",
+               "type": "long",
               "description":"The session index"
            },
            "file_name":{
--- a/api/api-doc/system.json
+++ b/api/api-doc/system.json
@@ -52,6 +52,21 @@
            }
         ]
      },
+      {
+         "path":"/system/uptime_ms",
+         "operations":[
+            {
+               "method":"GET",
+               "summary":"Get system uptime, in milliseconds",
+               "type":"long",
+               "nickname":"get_system_uptime",
+               "produces":[
+                  "application/json"
+               ],
+               "parameters":[]
+            }
+         ]
+      },
      {
         "path":"/system/logger/{name}",
         "operations":[
--- a/api/api_init.hh
+++ b/api/api_init.hh
@@ -23,6 +23,8 @@
 #include "service/storage_proxy.hh"
 #include <seastar/http/httpd.hh>

+namespace service { class load_meter; }
+
 namespace api {

 struct http_context {
@@ -31,9 +33,11 @@ struct http_context {
    httpd::http_server_control http_server;
    distributed<database>& db;
    distributed<service::storage_proxy>& sp;
+    service::load_meter& lmeter;
    http_context(distributed<database>& _db,
-            distributed<service::storage_proxy>& _sp)
-            : db(_db), sp(_sp) {
+            distributed<service::storage_proxy>& _sp,
+            service::load_meter& _lm)
+            : db(_db), sp(_sp), lmeter(_lm) {
    }
 };

--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -26,7 +26,7 @@
 #include "sstables/sstables.hh"
 #include "utils/estimated_histogram.hh"
 #include <algorithm>
-
+#include "db/system_keyspace_view_types.hh"
 #include "db/data_listeners.hh"

 extern logging::logger apilog;
@@ -53,8 +53,7 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
    return std::make_tuple(name.substr(0, pos), name.substr(end));
 }

-const utils::UUID& get_uuid(const sstring& name, const database& db) {
-    auto [ks, cf] = parse_fully_qualified_cf_name(name);
+const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
    try {
        return db.find_uuid(ks, cf);
    } catch (std::out_of_range& e) {
@@ -62,6 +61,11 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
    }
 }

+const utils::UUID& get_uuid(const sstring& name, const database& db) {
+    auto [ks, cf] = parse_fully_qualified_cf_name(name);
+    return get_uuid(ks, cf, db);
+}
+
 future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
    auto uuid = get_uuid(name, ctx.db.local());

@@ -71,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,
-        int64_t column_family::stats::*f) {
+        int64_t column_family_stats::*f) {
    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
        return cf.get_stats().*f;
    }, std::plus<int64_t>());
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx,
-        int64_t column_family::stats::*f) {
+        int64_t column_family_stats::*f) {
    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
        return cf.get_stats().*f;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
        return (cf.get_stats().*f).hist.count;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    auto uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([uuid, f](database& db) {
        // Histograms information is sample of the actual load
@@ -108,14 +112,14 @@ static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const


 static future<json::json_return_type>  get_cf_stats_count(http_context& ctx,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
        return (cf.get_stats().*f).hist.count;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    utils::UUID uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([f, uuid](const database& p) {
        return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -126,7 +130,7 @@ static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const
    });
 }

-static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    std::function<utils::ihistogram(const database&)> fun = [f] (const database& db)  {
        utils::ihistogram res;
        for (auto i : db.get_column_families()) {
@@ -142,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
 }

 static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    utils::UUID uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([f, uuid](const database& p) {
        return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -153,7 +157,7 @@ static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& c
    });
 }

-static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db)  {
        utils::rate_moving_average_and_histogram res;
        for (auto i : db.get_column_families()) {
@@ -250,12 +254,11 @@ class sum_ratio {
    uint64_t _n = 0;
    T _total = 0;
 public:
-    future<> operator()(T value) {
+    void operator()(T value) {
        if (value > 0) {
            _total += value;
            _n++;
        }
-        return make_ready_future<>();
    }
    // Returns average value of all registered ratios.
    T get() && {
@@ -404,11 +407,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
+        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
    });

    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
+        return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
    });

    // FIXME: this refers to partitions, not rows.
@@ -453,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
+        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
    });

    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::pending_flushes);
+        return get_cf_stats(ctx, &column_family_stats::pending_flushes);
    });

    cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
+        return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
    });

    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx, &column_family::stats::reads);
+        return get_cf_stats_count(ctx, &column_family_stats::reads);
    });

    cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
+        return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
    });

    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx, &column_family::stats::writes);
+        return get_cf_stats_count(ctx, &column_family_stats::writes);
    });

    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
    });

    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
+        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
    });

    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
+        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
    });

    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
+        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
    });

    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, &column_family::stats::writes);
+        return get_cf_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
+        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
    });

    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
+        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
    });

    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, &column_family::stats::writes);
+        return get_cf_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
+        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -529,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
+        return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
    });

    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
+        return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
    });

    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -792,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {

    });

-    cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
+            return cf.get_stats().estimated_cas_prepare;
+        },
+        utils::estimated_histogram_merge, utils_json::estimated_histogram());
    });

-    cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
+            return cf.get_stats().estimated_cas_propose;
+        },
+        utils::estimated_histogram_merge, utils_json::estimated_histogram());
    });

-    cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
+            return cf.get_stats().estimated_cas_commit;
+        },
+        utils::estimated_histogram_merge, utils_json::estimated_histogram());
    });

    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -821,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
    });

    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
    });

    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -843,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {
        return true;
    });

-    cf::get_built_indexes.set(r, [](const_req) {
-        // FIXME
-        // Currently there are no index support
-        return std::vector<sstring>();
+    cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
+        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
+        return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
+            std::set<sstring> vp;
+            for (auto b : vb) {
+                if (b.view.first == ks) {
+                    vp.insert(b.view.second);
+                }
+            }
+            std::vector<sstring> res;
+            auto uuid = get_uuid(ks, cf_name, ctx.db.local());
+            column_family& cf = ctx.db.local().find_column_family(uuid);
+            res.reserve(cf.get_index_manager().list_indexes().size());
+            for (auto&& i : cf.get_index_manager().list_indexes()) {
+                if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
+                    res.emplace_back(i.metadata().name());
+                }
+            }
+            return make_ready_future<json::json_return_type>(res);
+        });
    });

-
    cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
        // FIXME
        // Currently there are no information on the compression
--- a/api/column_family.hh
+++ b/api/column_family.hh
@@ -109,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,
-        int64_t column_family::stats::*f);
+        int64_t column_family_stats::*f);

 future<json::json_return_type>  get_cf_stats(http_context& ctx,
-        int64_t column_family::stats::*f);
+        int64_t column_family_stats::*f);

 }
--- a/api/compaction_manager.cc
+++ b/api/compaction_manager.cc
@@ -74,13 +74,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {

    cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
        return ctx.db.map_reduce0([&ctx](database& db) {
-            std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> tasks;
-            return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
-                table& cf = *i.second.get();
-                tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
-                return make_ready_future<>();
-            }).then([&tasks] {
-                return tasks;
+            return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
+                return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
+                    table& cf = *i.second.get();
+                    tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
+                    return make_ready_future<>();
+                }).then([&tasks] {
+                    return std::move(tasks);
+                });
            });
        }, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
                [](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {
--- a/api/storage_proxy.cc
+++ b/api/storage_proxy.cc
@@ -81,12 +81,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {
        return make_ready_future<json::json_return_type>(0);
    });

-    sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {
-        //TBD
-        // FIXME
-        // hinted handoff is not supported currently,
-        // so we should return false
-        return make_ready_future<json::json_return_type>(false);
+    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req)  {
+        auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
+        return make_ready_future<json::json_return_type>(enabled);
    });

    sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {
@@ -250,68 +247,40 @@ void set_storage_proxy(http_context& ctx, routes& r) {
        });
    });

-    sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
    });

-    sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
    });

-    sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
    });

-    sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
    });

-    sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
    });

-    sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
    });

-    sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
    });

-    sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
    });

-    sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
-    });
-
-    sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
    });

    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -382,19 +351,11 @@ void set_storage_proxy(http_context& ctx, routes& r) {
        return sum_timer_stats(ctx.sp, &proxy::stats::write);
    });
    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return empty moving average
-
-        return make_ready_future<json::json_return_type>(get_empty_moving_average());
+        return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
    });

    sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return empty moving average
-
-        return make_ready_future<json::json_return_type>(get_empty_moving_average());
+        return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
    });

    sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -27,6 +27,7 @@
 #include <boost/range/adaptor/map.hpp>
 #include <boost/range/adaptor/filtered.hpp>
 #include "service/storage_service.hh"
+#include "service/load_meter.hh"
 #include "db/commitlog/commitlog.hh"
 #include "gms/gossiper.hh"
 #include "db/system_keyspace.hh"
@@ -55,26 +56,22 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
    throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
 }

-static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
-    std::vector<ss::token_range> res;
-    for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
-        ss::token_range r;
-        r.start_token = d._start_token;
-        r.end_token = d._end_token;
-        r.endpoints = d._endpoints;
-        r.rpc_endpoints = d._rpc_endpoints;
-        for (auto det : d._endpoint_details) {
-            ss::endpoint_detail ed;
-            ed.host = det._host;
-            ed.datacenter = det._datacenter;
-            if (det._rack != "") {
-                ed.rack = det._rack;
-            }
-            r.endpoint_details.push(ed);
+static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {
+    ss::token_range r;
+    r.start_token = d._start_token;
+    r.end_token = d._end_token;
+    r.endpoints = d._endpoints;
+    r.rpc_endpoints = d._rpc_endpoints;
+    for (auto det : d._endpoint_details) {
+        ss::endpoint_detail ed;
+        ed.host = det._host;
+        ed.datacenter = det._datacenter;
+        if (det._rack != "") {
+            ed.rack = det._rack;
        }
-        res.push_back(r);
+        r.endpoint_details.push(ed);
    }
-    return res;
+    return r;
 }

 void set_storage_service(http_context& ctx, routes& r) {
@@ -176,13 +173,13 @@ void set_storage_service(http_context& ctx, routes& r) {
        return make_ready_future<json::json_return_type>(res);
    });

-    ss::describe_any_ring.set(r, [&ctx](const_req req) {
-        return describe_ring("");
+    ss::describe_any_ring.set(r, [&ctx](std::unique_ptr<request> req) {
+        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(""), token_range_endpoints_to_json));
    });

-    ss::describe_ring.set(r, [&ctx](const_req req) {
-        auto keyspace = validate_keyspace(ctx, req.param);
-        return describe_ring(keyspace);
+    ss::describe_ring.set(r, [&ctx](std::unique_ptr<request> req) {
+        auto keyspace = validate_keyspace(ctx, req->param);
+        return make_ready_future<json::json_return_type>(stream_range_as_array(service::get_local_storage_service().describe_ring(keyspace), token_range_endpoints_to_json));
    });

    ss::get_host_id_map.set(r, [](const_req req) {
@@ -192,11 +189,11 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
+        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
    });

-    ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
-        return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
+    ss::get_load_map.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return ctx.lmeter.get_load_map().then([] (auto&& load_map) {
            std::vector<ss::map_string_double> res;
            for (auto i : load_map) {
                ss::map_string_double val;
@@ -254,6 +251,9 @@ void set_storage_service(http_context& ctx, routes& r) {
        if (column_family.empty()) {
            resp = service::get_local_storage_service().take_snapshot(tag, keynames);
        } else {
+            if (keynames.empty()) {
+                throw httpd::bad_param_exception("The keyspace of column families must be specified");
+            }
            if (keynames.size() > 1) {
                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
            }
@@ -304,17 +304,24 @@ void set_storage_service(http_context& ctx, routes& r) {
        if (column_families.empty()) {
            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
        }
-        return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
-            std::vector<column_family*> column_families_vec;
-            auto& cm = db.get_compaction_manager();
-            for (auto cf : column_families) {
-                column_families_vec.push_back(&db.find_column_family(keyspace, cf));
+        return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
+                column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
+            if (!is_cleanup_allowed) {
+                return make_exception_future<json::json_return_type>(
+                        std::runtime_error("Can not perform cleanup operation when topology changes"));
            }
-            return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
-                return cm.perform_cleanup(cf);
+            return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
+                std::vector<column_family*> column_families_vec;
+                auto& cm = db.get_compaction_manager();
+                for (auto cf : column_families) {
+                    column_families_vec.push_back(&db.find_column_family(keyspace, cf));
+                }
+                return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
+                    return cm.perform_cleanup(cf);
+                });
+            }).then([]{
+                return make_ready_future<json::json_return_type>(0);
            });
-        }).then([]{
-            return make_ready_future<json::json_return_type>(0);
        });
    });

@@ -598,9 +605,7 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::join_ring.set(r, [](std::unique_ptr<request> req) {
-        return service::get_local_storage_service().join_ring().then([] {
-            return make_ready_future<json::json_return_type>(json_void());
-        });
+        return make_ready_future<json::json_return_type>(json_void());
    });

    ss::is_joined.set(r, [] (std::unique_ptr<request> req) {
@@ -860,7 +865,7 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
+        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
    });

    ss::get_exceptions.set(r, [](const_req req) {
--- a/api/system.cc
+++ b/api/system.cc
@@ -30,6 +30,10 @@ namespace api {
 namespace hs = httpd::system_json;

 void set_system(http_context& ctx, routes& r) {
+    hs::get_system_uptime.set(r, [](const_req req) {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
+    });
+
    hs::get_all_logger_names.set(r, [](const_req req) {
        return logging::logger_registry().get_all_logger_names();
    });
--- a/atomic_cell.cc
+++ b/atomic_cell.cc
@@ -21,8 +21,8 @@

 #include "atomic_cell.hh"
 #include "atomic_cell_or_collection.hh"
+#include "counters.hh"
 #include "types.hh"
-#include "types/collection.hh"

 /// LSA mirator for cells with irrelevant type
 ///
@@ -148,35 +148,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,
 {
 }

-static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
-{
-    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
-    auto ti = data::type_info::make_collection();
-    data::cell::context ctx(f, ti);
-    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
-    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
-    return collection_mutation_view { dv };
-}
-
-collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
-    return get_collection_mutation_view(_data.get());
-}
-
-collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
-    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
-{
-}
-
-collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
-    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
-{
-}
-
-collection_mutation::operator collection_mutation_view() const
-{
-    return get_collection_mutation_view(_data.get());
-}
-
 bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
 {
    auto ptr_a = _data.get();
@@ -231,7 +202,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
    size_t external_value_size = 0;
    if (flags.get<data::cell::tags::external_data>()) {
        if (flags.get<data::cell::tags::collection>()) {
-            external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
+            external_value_size = as_collection_mutation().data.size_bytes();
        } else {
            auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
            external_value_size = cell_view.value_size();
@@ -244,6 +215,61 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
        + imr_object_type::size_overhead + external_value_size;
 }

+std::ostream&
+operator<<(std::ostream& os, const atomic_cell_view& acv) {
+    if (acv.is_live()) {
+        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
+            acv.is_counter_update()
+                    ? "counter_update_value=" + to_sstring(acv.counter_update_value())
+                    : to_hex(acv.value().linearize()),
+            acv.timestamp(),
+            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
+            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
+    } else {
+        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
+            acv.timestamp(), acv.deletion_time().time_since_epoch().count());
+    }
+}
+
+std::ostream&
+operator<<(std::ostream& os, const atomic_cell& ac) {
+    return os << atomic_cell_view(ac);
+}
+
+std::ostream&
+operator<<(std::ostream& os, const atomic_cell_view::printer& acvp) {
+    auto& type = acvp._type;
+    auto& acv = acvp._cell;
+    if (acv.is_live()) {
+        std::ostringstream cell_value_string_builder;
+        if (type.is_counter()) {
+            if (acv.is_counter_update()) {
+                cell_value_string_builder << "counter_update_value=" << acv.counter_update_value();
+            } else {
+                cell_value_string_builder << "shards: ";
+                counter_cell_view::with_linearized(acv, [&cell_value_string_builder] (counter_cell_view& ccv) {
+                    cell_value_string_builder << ::join(", ", ccv.shards());
+                });
+            }
+        } else {
+            cell_value_string_builder << type.to_string(acv.value().linearize());
+        }
+        return fmt_print(os, "atomic_cell{{{},ts={:d},expiry={:d},ttl={:d}}}",
+            cell_value_string_builder.str(),
+            acv.timestamp(),
+            acv.is_live_and_has_ttl() ? acv.expiry().time_since_epoch().count() : -1,
+            acv.is_live_and_has_ttl() ? acv.ttl().count() : 0);
+    } else {
+        return fmt_print(os, "atomic_cell{{DEAD,ts={:d},deletion_time={:d}}}",
+            acv.timestamp(), acv.deletion_time().time_since_epoch().count());
+    }
+}
+
+std::ostream&
+operator<<(std::ostream& os, const atomic_cell::printer& acp) {
+    return operator<<(os, static_cast<const atomic_cell_view::printer&>(acp));
+}
+
 std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
    if (!p._cell._data.get()) {
        return os << "{ null atomic_cell_or_collection }";
@@ -253,9 +279,9 @@ std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::prin
    if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
        os << "collection ";
        auto cmv = p._cell.as_collection_mutation();
-        os << to_hex(cmv.data.linearize());
+        os << collection_mutation_view::printer(*p._cdef.type, cmv);
    } else {
-        os << p._cell.as_atomic_cell(p._cdef);
+        os << atomic_cell_view::printer(*p._cdef.type, p._cell.as_atomic_cell(p._cdef));
    }
    return os << " }";
 }
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -153,6 +153,14 @@ public:
    }

    friend std::ostream& operator<<(std::ostream& os, const atomic_cell_view& acv);
+
+    class printer {
+        const abstract_type& _type;
+        const atomic_cell_view& _cell;
+    public:
+        printer(const abstract_type& type, const atomic_cell_view& cell) : _type(type), _cell(cell) {}
+        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
+    };
 };

 class atomic_cell_mutable_view final : public basic_atomic_cell_view<mutable_view::yes> {
@@ -219,30 +227,12 @@ public:
    static atomic_cell make_live_uninitialized(const abstract_type& type, api::timestamp_type timestamp, size_t size);
    friend class atomic_cell_or_collection;
    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
-};

-class collection_mutation_view;
-
-// Represents a mutation of a collection.  Actual format is determined by collection type,
-// and is:
-//   set:  list of atomic_cell
-//   map:  list of pair<atomic_cell, bytes> (for key/value)
-//   list: tbd, probably ugly
-class collection_mutation {
-public:
-    using imr_object_type =  imr::utils::object<data::cell::structure>;
-    imr_object_type _data;
-
-    collection_mutation() {}
-    collection_mutation(const collection_type_impl&, collection_mutation_view v);
-    collection_mutation(const collection_type_impl&, bytes_view bv);
-    operator collection_mutation_view() const;
-};
-
-
-class collection_mutation_view {
-public:
-    atomic_cell_value_view data;
+    class printer : atomic_cell_view::printer {
+    public:
+        printer(const abstract_type& type, const atomic_cell_view& cell) : atomic_cell_view::printer(type, cell) {}
+        friend std::ostream& operator<<(std::ostream& os, const printer& acvp);
+    };
 };

 class column_definition;
--- a/atomic_cell_hash.hh
+++ b/atomic_cell_hash.hh
@@ -34,14 +34,12 @@ template<>
 struct appending_hash<collection_mutation_view> {
    template<typename Hasher>
    void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
-      cell.data.with_linearized([&] (bytes_view cell_bv) {
-        auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
-        auto m_view = ctype->deserialize_mutation_form(cell_bv);
-        ::feed_hash(h, m_view.tomb);
-        for (auto&& key_and_value : m_view.cells) {
-            ::feed_hash(h, key_and_value.first);
-            ::feed_hash(h, key_and_value.second, cdef);
-        }
+        cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
+            ::feed_hash(h, m_view.tomb);
+            for (auto&& key_and_value : m_view.cells) {
+                ::feed_hash(h, key_and_value.first);
+                ::feed_hash(h, key_and_value.second, cdef);
+            }
      });
    }
 };
--- a/atomic_cell_or_collection.hh
+++ b/atomic_cell_or_collection.hh
@@ -22,6 +22,7 @@
 #pragma once

 #include "atomic_cell.hh"
+#include "collection_mutation.hh"
 #include "schema.hh"
 #include "hashing.hh"

--- a/auth/role_manager.hh
+++ b/auth/role_manager.hh
@@ -33,6 +33,7 @@

 #include "auth/resource.hh"
 #include "seastarx.hh"
+#include "exceptions/exceptions.hh"

 namespace auth {

@@ -52,9 +53,9 @@ struct role_config_update final {
 ///
 /// A logical argument error for a role-management operation.
 ///
-class roles_argument_exception : public std::invalid_argument {
+class roles_argument_exception : public exceptions::invalid_request_exception {
 public:
-    using std::invalid_argument::invalid_argument;
+    using exceptions::invalid_request_exception::invalid_request_exception;
 };

 class role_already_exists : public roles_argument_exception {
--- a/auth/service.cc
+++ b/auth/service.cc
@@ -39,7 +39,7 @@
 #include "db/consistency_level_type.hh"
 #include "exceptions/exceptions.hh"
 #include "log.hh"
-#include "service/migration_listener.hh"
+#include "service/migration_manager.hh"
 #include "utils/class_registrator.hh"
 #include "database.hh"

@@ -77,17 +77,23 @@ private:
    void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}

    void on_drop_keyspace(const sstring& ks_name) override {
-        _authorizer.revoke_all(
+        // Do it in the background.
+        (void)_authorizer.revoke_all(
                auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
            // Nothing.
+        }).handle_exception([] (std::exception_ptr e) {
+            log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);
        });
    }

    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
-        _authorizer.revoke_all(
+        // Do it in the background.
+        (void)_authorizer.revoke_all(
                auth::make_data_resource(
                        ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
            // Nothing.
+        }).handle_exception([] (std::exception_ptr e) {
+            log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);
        });
    }

@@ -108,14 +114,14 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n
 service::service(
        permissions_cache_config c,
        cql3::query_processor& qp,
-        ::service::migration_manager& mm,
+        ::service::migration_notifier& mn,
        std::unique_ptr<authorizer> z,
        std::unique_ptr<authenticator> a,
        std::unique_ptr<role_manager> r)
            : _permissions_cache_config(std::move(c))
            , _permissions_cache(nullptr)
            , _qp(qp)
-            , _migration_manager(mm)
+            , _mnotifier(mn)
            , _authorizer(std::move(z))
            , _authenticator(std::move(a))
            , _role_manager(std::move(r))
@@ -135,18 +141,19 @@ service::service(
 service::service(
        permissions_cache_config c,
        cql3::query_processor& qp,
+        ::service::migration_notifier& mn,
        ::service::migration_manager& mm,
        const service_config& sc)
            : service(
                      std::move(c),
                      qp,
-                      mm,
+                      mn,
                      create_object<authorizer>(sc.authorizer_java_name, qp, mm),
                      create_object<authenticator>(sc.authenticator_java_name, qp, mm),
                      create_object<role_manager>(sc.role_manager_java_name, qp, mm)) {
 }

-future<> service::create_keyspace_if_missing() const {
+future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {
    auto& db = _qp.db();

    if (!db.has_keyspace(meta::AUTH_KS)) {
@@ -160,15 +167,15 @@ future<> service::create_keyspace_if_missing() const {

        // We use min_timestamp so that default keyspace metadata will loose with any manual adjustments.
        // See issue #2129.
-        return _migration_manager.announce_new_keyspace(ksm, api::min_timestamp, false);
+        return mm.announce_new_keyspace(ksm, api::min_timestamp, false);
    }

    return make_ready_future<>();
 }

-future<> service::start() {
-    return once_among_shards([this] {
-        return create_keyspace_if_missing();
+future<> service::start(::service::migration_manager& mm) {
+    return once_among_shards([this, &mm] {
+        return create_keyspace_if_missing(mm);
    }).then([this] {
        return _role_manager->start().then([this] {
            return when_all_succeed(_authorizer->start(), _authenticator->start());
@@ -177,7 +184,7 @@ future<> service::start() {
        _permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
    }).then([this] {
        return once_among_shards([this] {
-            _migration_manager.register_listener(_migration_listener.get());
+            _mnotifier.register_listener(_migration_listener.get());
            return make_ready_future<>();
        });
    });
@@ -186,9 +193,9 @@ future<> service::start() {
 future<> service::stop() {
    // Only one of the shards has the listener registered, but let's try to
    // unregister on each one just to make sure.
-    _migration_manager.unregister_listener(_migration_listener.get());
-
-    return _permissions_cache->stop().then([this] {
+    return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {
+        return _permissions_cache->stop();
+    }).then([this] {
        return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
    });
 }
--- a/auth/service.hh
+++ b/auth/service.hh
@@ -28,6 +28,7 @@
 #include <seastar/core/future.hh>
 #include <seastar/core/sstring.hh>
 #include <seastar/util/bool_class.hh>
+#include <seastar/core/sharded.hh>

 #include "auth/authenticator.hh"
 #include "auth/authorizer.hh"
@@ -42,6 +43,7 @@ class query_processor;

 namespace service {
 class migration_manager;
+class migration_notifier;
 class migration_listener;
 }

@@ -76,13 +78,15 @@ public:
 ///
 /// All state associated with access-control is stored externally to any particular instance of this class.
 ///
-class service final {
+/// peering_sharded_service inheritance is needed to be able to access shard local authentication service
+/// given an object from another shard. Used for bouncing lwt requests to correct shard.
+class service final : public seastar::peering_sharded_service<service> {
    permissions_cache_config _permissions_cache_config;
    std::unique_ptr<permissions_cache> _permissions_cache;

    cql3::query_processor& _qp;

-    ::service::migration_manager& _migration_manager;
+    ::service::migration_notifier& _mnotifier;

    std::unique_ptr<authorizer> _authorizer;

@@ -97,7 +101,7 @@ public:
    service(
            permissions_cache_config,
            cql3::query_processor&,
-            ::service::migration_manager&,
+            ::service::migration_notifier&,
            std::unique_ptr<authorizer>,
            std::unique_ptr<authenticator>,
            std::unique_ptr<role_manager>);
@@ -110,10 +114,11 @@ public:
    service(
            permissions_cache_config,
            cql3::query_processor&,
+            ::service::migration_notifier&,
            ::service::migration_manager&,
            const service_config&);

-    future<> start();
+    future<> start(::service::migration_manager&);

    future<> stop();

@@ -159,7 +164,7 @@ public:
 private:
    future<bool> has_existing_legacy_users() const;

-    future<> create_keyspace_if_missing() const;
+    future<> create_keyspace_if_missing(::service::migration_manager& mm) const;
 };

 future<bool> has_superuser(const service&, const authenticated_user&);
--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -101,8 +101,8 @@ static future<std::optional<record>> find_record(cql3::query_processor& qp, std:
        return std::make_optional(
                record{
                        row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
-                        row.get_as<bool>("is_superuser"),
-                        row.get_as<bool>("can_login"),
+                        row.get_or<bool>("is_superuser", false),
+                        row.get_or<bool>("can_login", false),
                        (row.has("member_of")
                                 ? row.get_set<sstring>("member_of")
                                 : role_set())});
@@ -203,7 +203,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
            role_config config;
-            config.is_superuser = row.get_as<bool>("super");
+            config.is_superuser = row.get_or<bool>("super", false);
            config.can_login = true;

            return do_with(
--- a/build_id.cc
+++ b/build_id.cc
@@ -0,0 +1,71 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+#include "build_id.hh"
+#include <fmt/printf.h>
+#include <link.h>
+#include <seastar/core/align.hh>
+#include <sstream>
+
+using namespace seastar;
+
+static const Elf64_Nhdr* get_nt_build_id(dl_phdr_info* info) {
+    auto base = info->dlpi_addr;
+    const auto* h = info->dlpi_phdr;
+    auto num_headers = info->dlpi_phnum;
+    for (int i = 0; i != num_headers; ++i, ++h) {
+        if (h->p_type != PT_NOTE) {
+            continue;
+        }
+
+        auto* p = reinterpret_cast<const char*>(base) + h->p_vaddr;
+        auto* e = p + h->p_memsz;
+        while (p != e) {
+            const auto* n = reinterpret_cast<const Elf64_Nhdr*>(p);
+            if (n->n_type == NT_GNU_BUILD_ID) {
+                return n;
+            }
+
+            p += sizeof(Elf64_Nhdr);
+
+            p += n->n_namesz;
+            p = align_up(p, 4);
+
+            p += n->n_descsz;
+            p = align_up(p, 4);
+        }
+    }
+
+    assert(0 && "no NT_GNU_BUILD_ID note");
+}
+
+static int callback(dl_phdr_info* info, size_t size, void* data) {
+    std::string& ret = *(std::string*)data;
+    std::ostringstream os;
+
+    // The first DSO is always the main program, which has an empty name.
+    assert(strlen(info->dlpi_name) == 0);
+
+    auto* n = get_nt_build_id(info);
+    auto* p = reinterpret_cast<const char*>(n);
+
+    p += sizeof(Elf64_Nhdr);
+
+    p += n->n_namesz;
+    p = align_up(p, 4);
+
+    const char* desc = p;
+    for (unsigned i = 0; i < n->n_descsz; ++i) {
+        fmt::fprintf(os, "%02x", (unsigned char)*(desc + i));
+    }
+    ret = os.str();
+    return 1;
+}
+
+std::string get_build_id() {
+    std::string ret;
+    int r = dl_iterate_phdr(callback, &ret);
+    assert(r == 1);
+    return ret;
+}
--- a/build_id.hh
+++ b/build_id.hh
@@ -0,0 +1,9 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+#pragma once
+
+#include <string>
+
+std::string get_build_id();
--- a/bytes_ostream.hh
+++ b/bytes_ostream.hh
@@ -38,6 +38,7 @@ class bytes_ostream {
 public:
    using size_type = bytes::size_type;
    using value_type = bytes::value_type;
+    using fragment_type = bytes_view;
    static constexpr size_type max_chunk_size() { return 128 * 1024; }
 private:
    static_assert(sizeof(value_type) == 1, "value_type is assumed to be one byte long");
@@ -93,6 +94,29 @@ public:
            return _current != other._current;
        }
    };
+    using const_iterator = fragment_iterator;
+
+    class output_iterator {
+    public:
+        using iterator_category = std::output_iterator_tag;
+        using difference_type = std::ptrdiff_t;
+        using value_type = bytes_ostream::value_type;
+        using pointer = bytes_ostream::value_type*;
+        using reference = bytes_ostream::value_type&;
+
+        friend class bytes_ostream;
+
+    private:
+        bytes_ostream* _ostream = nullptr;
+
+    private:
+        explicit output_iterator(bytes_ostream& os) : _ostream(&os) { }
+
+    public:
+        reference operator*() const { return *_ostream->write_place_holder(1); }
+        output_iterator& operator++() { return *this; }
+        output_iterator operator++(int) { return *this; }
+    };
 private:
    inline size_type current_space_left() const {
        if (!_current) {
@@ -289,6 +313,11 @@ public:
        return _size;
    }

+    // For the FragmentRange concept
+    size_type size_bytes() const {
+        return _size;
+    }
+
    bool empty() const {
        return _size == 0;
    }
@@ -326,6 +355,8 @@ public:
    fragment_iterator begin() const { return { _begin.get() }; }
    fragment_iterator end() const { return { nullptr }; }

+    output_iterator write_begin() { return output_iterator(*this); }
+
    boost::iterator_range<fragment_iterator> fragments() const {
        return { begin(), end() };
    }
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -61,6 +61,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
        // - _last_row points at a direct predecessor of the next row which is going to be read.
        //   Used for populating continuity.
        // - _population_range_starts_before_all_rows is set accordingly
+        // - _underlying is engaged and fast-forwarded
        reading_from_underlying,

        end_of_stream
@@ -99,7 +100,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
    // forward progress is not guaranteed in case iterators are getting constantly invalidated.
    bool _lower_bound_changed = false;

+    // Points to the underlying reader conforming to _schema,
+    // either to *_underlying_holder or _read_context->underlying().underlying().
+    flat_mutation_reader* _underlying = nullptr;
+    std::optional<flat_mutation_reader> _underlying_holder;
+
    future<> do_fill_buffer(db::timeout_clock::time_point);
+    future<> ensure_underlying(db::timeout_clock::time_point);
    void copy_from_cache_to_buffer();
    future<> process_static_row(db::timeout_clock::time_point);
    void move_to_end();
@@ -186,23 +193,22 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
        return make_ready_future<>();
    } else {
        _read_context->cache().on_row_miss();
-        return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
-            if (sr) {
-                assert(sr->is_static_row());
-                maybe_add_to_cache(sr->as_static_row());
-                push_mutation_fragment(std::move(*sr));
-            }
-            maybe_set_static_row_continuous();
+        return ensure_underlying(timeout).then([this, timeout] {
+            return (*_underlying)(timeout).then([this] (mutation_fragment_opt&& sr) {
+                if (sr) {
+                    assert(sr->is_static_row());
+                    maybe_add_to_cache(sr->as_static_row());
+                    push_mutation_fragment(std::move(*sr));
+                }
+                maybe_set_static_row_continuous();
+            });
        });
    }
 }

 inline
 void cache_flat_mutation_reader::touch_partition() {
-    if (_snp->at_latest_version()) {
-        rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
-        _snp->tracker()->touch(last_dummy);
-    }
+    _snp->touch();
 }

 inline
@@ -232,14 +238,36 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
    });
 }

+inline
+future<> cache_flat_mutation_reader::ensure_underlying(db::timeout_clock::time_point timeout) {
+    if (_underlying) {
+        return make_ready_future<>();
+    }
+    return _read_context->ensure_underlying(timeout).then([this, timeout] {
+        flat_mutation_reader& ctx_underlying = _read_context->underlying().underlying();
+        if (ctx_underlying.schema() != _schema) {
+            _underlying_holder = make_delegating_reader(ctx_underlying);
+            _underlying_holder->upgrade_schema(_schema);
+            _underlying = &*_underlying_holder;
+        } else {
+            _underlying = &ctx_underlying;
+        }
+    });
+}
+
 inline
 future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
    if (_state == state::move_to_underlying) {
+        if (!_underlying) {
+            return ensure_underlying(timeout).then([this, timeout] {
+                return do_fill_buffer(timeout);
+            });
+        }
        _state = state::reading_from_underlying;
        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
        auto end = _next_row_in_range ? position_in_partition(_next_row.position())
                                      : position_in_partition(_upper_bound);
-        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
+        return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
            return read_from_underlying(timeout);
        });
    }
@@ -280,7 +308,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin

 inline
 future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
-    return consume_mutation_fragments_until(_read_context->underlying().underlying(),
+    return consume_mutation_fragments_until(*_underlying,
        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },
        [this] (mutation_fragment mf) {
            _read_context->cache().on_row_miss();
--- a/canonical_mutation.cc
+++ b/canonical_mutation.cc
@@ -35,6 +35,7 @@
 #include "idl/uuid.dist.impl.hh"
 #include "idl/keys.dist.impl.hh"
 #include "idl/mutation.dist.impl.hh"
+#include <iostream>

 canonical_mutation::canonical_mutation(bytes data)
        : _data(std::move(data))
@@ -79,7 +80,8 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {

    if (version == m.schema()->version()) {
        auto partition_view = mutation_partition_view::from_view(mv.partition());
-        m.partition().apply(*m.schema(), partition_view, *m.schema());
+        mutation_application_stats app_stats;
+        m.partition().apply(*m.schema(), partition_view, *m.schema(), app_stats);
    } else {
        column_mapping cm = mv.mapping();
        converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
@@ -88,3 +90,81 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
    }
    return m;
 }
+
+static sstring bytes_to_text(bytes_view bv) {
+    sstring ret(sstring::initialized_later(), bv.size());
+    std::copy_n(reinterpret_cast<const char*>(bv.data()), bv.size(), ret.data());
+    return ret;
+}
+
+std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm) {
+    auto in = ser::as_input_stream(cm._data);
+    auto mv = ser::deserialize(in, boost::type<ser::canonical_mutation_view>());
+    column_mapping mapping = mv.mapping();
+    auto partition_view = mutation_partition_view::from_view(mv.partition());
+    fmt::print(os, "{{canonical_mutation: ");
+    fmt::print(os, "table_id {} schema_version {} ", mv.table_id(), mv.schema_version());
+    fmt::print(os, "partition_key {} ", mv.key());
+    class printing_visitor : public mutation_partition_view_virtual_visitor {
+        std::ostream& _os;
+        const column_mapping& _cm;
+        bool _first = true;
+        bool _in_row = false;
+    private:
+        void print_separator() {
+            if (!_first) {
+                fmt::print(_os, ", ");
+            }
+            _first = false;
+        }
+    public:
+        printing_visitor(std::ostream& os, const column_mapping& cm) : _os(os), _cm(cm) {}
+        virtual void accept_partition_tombstone(tombstone t) override {
+            print_separator();
+            fmt::print(_os, "partition_tombstone {}", t);
+        }
+        virtual void accept_static_cell(column_id id, atomic_cell ac) override {
+            print_separator();
+            auto&& entry = _cm.static_column_at(id);
+            fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
+        }
+        virtual void accept_static_cell(column_id id, collection_mutation_view cmv) override {
+            print_separator();
+            auto&& entry = _cm.static_column_at(id);
+            fmt::print(_os, "static column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
+        }
+        virtual void accept_row_tombstone(range_tombstone rt) override {
+            print_separator();
+            fmt::print(_os, "row tombstone {}", rt);
+        }
+        virtual void accept_row(position_in_partition_view pipv, row_tombstone rt, row_marker rm, is_dummy, is_continuous) override {
+            if (_in_row) {
+                fmt::print(_os, "}}, ");
+            }
+            fmt::print(_os, "{{row {} tombstone {} marker {}", pipv, rt, rm);
+            _in_row = true;
+            _first = false;
+        }
+        virtual void accept_row_cell(column_id id, atomic_cell ac) override {
+            print_separator();
+            auto&& entry = _cm.regular_column_at(id);
+            fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), atomic_cell::printer(*entry.type(), ac));
+        }
+        virtual void accept_row_cell(column_id id, collection_mutation_view cmv) override {
+            print_separator();
+            auto&& entry = _cm.regular_column_at(id);
+            fmt::print(_os, "column {} {}", bytes_to_text(entry.name()), collection_mutation_view::printer(*entry.type(), cmv));
+        }
+        void finalize() {
+            if (_in_row) {
+                fmt::print(_os, "}}");
+            }
+        }
+    };
+    printing_visitor pv(os, mapping);
+    partition_view.accept(mapping, pv);
+    pv.finalize();
+    fmt::print(os, "}}");
+    return os;
+}
+
--- a/canonical_mutation.hh
+++ b/canonical_mutation.hh
@@ -26,6 +26,7 @@
 #include "database_fwd.hh"
 #include "mutation_partition_visitor.hh"
 #include "mutation_partition_serializer.hh"
+#include <iosfwd>

 // Immutable mutation form which can be read using any schema version of the same table.
 // Safe to access from other shards via const&.
@@ -52,4 +53,5 @@ public:

    const bytes& representation() const { return _data; }

+    friend std::ostream& operator<<(std::ostream& os, const canonical_mutation& cm);
 };
--- a/cdc/cdc.cc
+++ b/cdc/cdc.cc
@@ -0,0 +1,835 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <utility>
+#include <algorithm>
+
+#include <boost/range/irange.hpp>
+#include <seastar/util/defer.hh>
+#include <seastar/core/thread.hh>
+
+#include "cdc/cdc.hh"
+#include "bytes.hh"
+#include "database.hh"
+#include "db/config.hh"
+#include "dht/murmur3_partitioner.hh"
+#include "partition_slice_builder.hh"
+#include "schema.hh"
+#include "schema_builder.hh"
+#include "service/migration_listener.hh"
+#include "service/storage_service.hh"
+#include "types/tuple.hh"
+#include "cql3/statements/select_statement.hh"
+#include "cql3/multi_column_relation.hh"
+#include "cql3/tuples.hh"
+#include "log.hh"
+#include "json.hh"
+
+using locator::snitch_ptr;
+using locator::token_metadata;
+using locator::topology;
+using seastar::sstring;
+using service::migration_notifier;
+using service::storage_proxy;
+
+namespace std {
+
+template<> struct hash<std::pair<net::inet_address, unsigned int>> {
+    std::size_t operator()(const std::pair<net::inet_address, unsigned int> &p) const {
+        return std::hash<net::inet_address>{}(p.first) ^ std::hash<int>{}(p.second);
+    }
+};
+
+}
+
+using namespace std::chrono_literals;
+
+static logging::logger cdc_log("cdc");
+
+namespace cdc {
+static schema_ptr create_log_schema(const schema&, std::optional<utils::UUID> = {});
+static schema_ptr create_stream_description_table_schema(const schema&, std::optional<utils::UUID> = {});
+static future<> populate_desc(db_context ctx, const schema& s);
+}
+
+class cdc::cdc_service::impl : service::migration_listener::empty_listener {
+    friend cdc_service;
+    db_context _ctxt;
+    bool _stopped = false;
+public:
+    impl(db_context ctxt)
+        : _ctxt(std::move(ctxt))
+    {
+        _ctxt._migration_notifier.register_listener(this);
+    }
+    ~impl() {
+        assert(_stopped);
+    }
+
+    future<> stop() {
+        return _ctxt._migration_notifier.unregister_listener(this).then([this] {
+            _stopped = true;
+        });
+    }
+
+    void on_before_create_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
+        if (schema.cdc_options().enabled()) {
+            auto& db = _ctxt._proxy.get_db().local();
+            auto logname = log_name(schema.cf_name());
+            if (!db.has_schema(schema.ks_name(), logname)) {
+                // in seastar thread
+                auto log_schema = create_log_schema(schema);
+                auto stream_desc_schema = create_stream_description_table_schema(schema);
+                auto& keyspace = db.find_keyspace(schema.ks_name());
+
+                auto log_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), log_schema, timestamp);
+                auto stream_mut = db::schema_tables::make_create_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
+
+                mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
+                mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
+            }
+        }
+    }
+
+    void on_before_update_column_family(const schema& new_schema, const schema& old_schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
+        bool is_cdc = new_schema.cdc_options().enabled();
+        bool was_cdc = old_schema.cdc_options().enabled();
+
+        // we need to create or modify the log & stream schemas iff either we changed cdc status (was != is)
+        // or if cdc is on now unconditionally, since then any actual base schema changes will affect the column 
+        // etc.
+        if (was_cdc || is_cdc) {
+            auto logname = log_name(old_schema.cf_name());
+            auto descname = desc_name(old_schema.cf_name());
+            auto& db = _ctxt._proxy.get_db().local();
+            auto& keyspace = db.find_keyspace(old_schema.ks_name());
+            auto log_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), logname).schema() : nullptr;
+            auto stream_desc_schema = was_cdc ? db.find_column_family(old_schema.ks_name(), descname).schema() : nullptr;
+
+            if (!is_cdc) {
+                auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
+                auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
+
+                mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
+                mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
+                return;
+            }
+
+            auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt);
+            auto new_stream_desc_schema = create_stream_description_table_schema(new_schema, stream_desc_schema ? std::make_optional(stream_desc_schema->id()) : std::nullopt);
+
+            auto log_mut = log_schema 
+                ? db::schema_tables::make_update_table_mutations(keyspace.metadata(), log_schema, new_log_schema, timestamp, false)
+                : db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_log_schema, timestamp)
+                ;
+            auto stream_mut = stream_desc_schema 
+                ? db::schema_tables::make_update_table_mutations(keyspace.metadata(), stream_desc_schema, new_stream_desc_schema, timestamp, false)
+                : db::schema_tables::make_create_table_mutations(keyspace.metadata(), new_stream_desc_schema, timestamp)
+                ;
+
+            mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
+            mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
+        }
+    }
+
+    void on_before_drop_column_family(const schema& schema, std::vector<mutation>& mutations, api::timestamp_type timestamp) override {
+        if (schema.cdc_options().enabled()) {
+            auto logname = log_name(schema.cf_name());
+            auto descname = desc_name(schema.cf_name());
+            auto& db = _ctxt._proxy.get_db().local();
+            auto& keyspace = db.find_keyspace(schema.ks_name());
+            auto log_schema = db.find_column_family(schema.ks_name(), logname).schema();
+            auto stream_desc_schema = db.find_column_family(schema.ks_name(), descname).schema();
+
+            auto log_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), log_schema, timestamp);
+            auto stream_mut = db::schema_tables::make_drop_table_mutations(keyspace.metadata(), stream_desc_schema, timestamp);
+
+            mutations.insert(mutations.end(), std::make_move_iterator(log_mut.begin()), std::make_move_iterator(log_mut.end()));
+            mutations.insert(mutations.end(), std::make_move_iterator(stream_mut.begin()), std::make_move_iterator(stream_mut.end()));
+        }
+    }
+
+    void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {
+        // This callback is done on all shards. Only do the work once. 
+        if (engine().cpu_id() != 0) {
+            return; 
+        }
+        auto& db = _ctxt._proxy.get_db().local();
+        auto& cf = db.find_column_family(ks_name, cf_name);
+        auto schema = cf.schema();
+        if (schema->cdc_options().enabled()) {
+            populate_desc(_ctxt, *schema).get();
+        }
+    }
+
+    void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override {
+        on_create_column_family(ks_name, cf_name);
+    }
+
+    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {}
+
+    future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
+        lowres_clock::time_point timeout,
+        std::vector<mutation>&& mutations
+    );
+
+    template<typename Iter>
+    future<> append_mutations(Iter i, Iter e, schema_ptr s, lowres_clock::time_point, std::vector<mutation>&);
+};
+
+cdc::cdc_service::cdc_service(service::storage_proxy& proxy)
+    : cdc_service(db_context::builder(proxy).build())
+{}
+
+cdc::cdc_service::cdc_service(db_context ctxt)
+    : _impl(std::make_unique<impl>(std::move(ctxt)))
+{
+    _impl->_ctxt._proxy.set_cdc_service(this);
+}
+
+future<> cdc::cdc_service::stop() {
+    return _impl->stop();
+}
+
+cdc::cdc_service::~cdc_service() = default;
+
+cdc::options::options(const std::map<sstring, sstring>& map) {
+    if (map.find("enabled") == std::end(map)) {
+        return;
+    }
+
+    for (auto& p : map) {
+        if (p.first == "enabled") {
+            _enabled = p.second == "true";
+        } else if (p.first == "preimage") {
+            _preimage = p.second == "true";
+        } else if (p.first == "postimage") {
+            _postimage = p.second == "true";
+        } else if (p.first == "ttl") {
+            _ttl = std::stoi(p.second);
+        } else {
+            throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
+        }
+    }
+}
+
+std::map<sstring, sstring> cdc::options::to_map() const {
+    if (!_enabled) {
+        return {};
+    }
+    return {
+        { "enabled", _enabled ? "true" : "false" },
+        { "preimage", _preimage ? "true" : "false" },
+        { "postimage", _postimage ? "true" : "false" },
+        { "ttl", std::to_string(_ttl) },
+    };
+}
+
+sstring cdc::options::to_sstring() const {
+    return json::to_json(to_map());
+}
+
+bool cdc::options::operator==(const options& o) const {
+    return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
+}
+bool cdc::options::operator!=(const options& o) const {
+    return !(*this == o);
+}
+
+namespace cdc {
+
+using operation_native_type = std::underlying_type_t<operation>;
+using column_op_native_type = std::underlying_type_t<column_op>;
+
+sstring log_name(const sstring& table_name) {
+    static constexpr auto cdc_log_suffix = "_scylla_cdc_log";
+    return table_name + cdc_log_suffix;
+}
+
+sstring desc_name(const sstring& table_name) {
+    static constexpr auto cdc_desc_suffix = "_scylla_cdc_desc";
+    return table_name + cdc_desc_suffix;
+}
+
+static schema_ptr create_log_schema(const schema& s, std::optional<utils::UUID> uuid) {
+    schema_builder b(s.ks_name(), log_name(s.cf_name()));
+    b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
+    b.with_column("stream_id", uuid_type, column_kind::partition_key);
+    b.with_column("time", timeuuid_type, column_kind::clustering_key);
+    b.with_column("batch_seq_no", int32_type, column_kind::clustering_key);
+    b.with_column("operation", data_type_for<operation_native_type>());
+    b.with_column("ttl", long_type);
+    auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
+        for (const auto& column : columns) {
+            auto type = column.type;
+            if (is_data_col) {
+                type = tuple_type_impl::get_instance({ /* op */ data_type_for<column_op_native_type>(), /* value */ type, /* ttl */long_type});
+            }
+            b.with_column("_" + column.name(), type);
+        }
+    };
+    add_columns(s.partition_key_columns());
+    add_columns(s.clustering_key_columns());
+    add_columns(s.static_columns(), true);
+    add_columns(s.regular_columns(), true);
+
+    if (uuid) {
+        b.set_uuid(*uuid);
+    }
+    
+    return b.build();
+}
+
+static schema_ptr create_stream_description_table_schema(const schema& s, std::optional<utils::UUID> uuid) {
+    schema_builder b(s.ks_name(), desc_name(s.cf_name()));
+    b.set_comment(sprint("CDC description for %s.%s", s.ks_name(), s.cf_name()));
+    b.with_column("node_ip", inet_addr_type, column_kind::partition_key);
+    b.with_column("shard_id", int32_type, column_kind::partition_key);
+    b.with_column("created_at", timestamp_type, column_kind::clustering_key);
+    b.with_column("stream_id", uuid_type);
+
+    if (uuid) {
+        b.set_uuid(*uuid);
+    }
+
+    return b.build();
+}
+
+// This function assumes setup_stream_description_table was called on |s| before the call to this
+// function.
+static future<> populate_desc(db_context ctx, const schema& s) {
+    auto& db = ctx._proxy.get_db().local();
+    auto desc_schema =
+        db.find_schema(s.ks_name(), desc_name(s.cf_name()));
+    auto log_schema =
+        db.find_schema(s.ks_name(), log_name(s.cf_name()));
+    auto belongs_to = [&](const gms::inet_address& endpoint,
+                          const unsigned int shard_id,
+                          const int shard_count,
+                          const unsigned int ignore_msb_bits,
+                          const utils::UUID& stream_id) {
+        const auto log_pk = partition_key::from_singular(*log_schema,
+                                                         data_value(stream_id));
+        const auto token = ctx._partitioner.decorate_key(*log_schema, log_pk).token();
+        if (ctx._token_metadata.get_endpoint(ctx._token_metadata.first_token(token)) != endpoint) {
+            return false;
+        }
+        const auto owning_shard_id = dht::murmur3_partitioner(shard_count, ignore_msb_bits).shard_of(token);
+        return owning_shard_id == shard_id;
+    };
+
+    std::vector<mutation> mutations;
+    const auto ts = api::new_timestamp();
+    const auto ck = clustering_key::from_single_value(
+            *desc_schema, timestamp_type->decompose(ts));
+    auto cdef = desc_schema->get_column_definition(to_bytes("stream_id"));
+
+    for (const auto& dc : ctx._token_metadata.get_topology().get_datacenter_endpoints()) {
+        for (const auto& endpoint : dc.second) {
+            const auto decomposed_ip = inet_addr_type->decompose(endpoint.addr());
+            const unsigned int shard_count = ctx._snitch->get_shard_count(endpoint);
+            const unsigned int ignore_msb_bits = ctx._snitch->get_ignore_msb_bits(endpoint);
+            for (unsigned int shard_id = 0; shard_id < shard_count; ++shard_id) {
+                const auto pk = partition_key::from_exploded(
+                        *desc_schema, { decomposed_ip, int32_type->decompose(static_cast<int>(shard_id)) });
+                mutations.emplace_back(desc_schema, pk);
+
+                auto stream_id = utils::make_random_uuid();
+                while (!belongs_to(endpoint, shard_id, shard_count, ignore_msb_bits, stream_id)) {
+                    stream_id = utils::make_random_uuid();
+                }
+                auto value = atomic_cell::make_live(*uuid_type,
+                                                    ts,
+                                                    uuid_type->decompose(stream_id));
+                mutations.back().set_cell(ck, *cdef, std::move(value));
+            }
+        }
+    }
+    return ctx._proxy.mutate(std::move(mutations),
+                             db::consistency_level::QUORUM,
+                             db::no_timeout,
+                             nullptr,
+                             empty_service_permit());
+}
+
+db_context::builder::builder(service::storage_proxy& proxy) 
+    : _proxy(proxy) 
+{}
+
+db_context::builder& db_context::builder::with_migration_notifier(service::migration_notifier& migration_notifier) {
+    _migration_notifier = migration_notifier;
+    return *this;
+}
+
+db_context::builder& db_context::builder::with_token_metadata(locator::token_metadata& token_metadata) {
+    _token_metadata = token_metadata;
+    return *this;
+}
+
+db_context::builder& db_context::builder::with_snitch(locator::snitch_ptr& snitch) {
+    _snitch = snitch;
+    return *this;
+}
+
+db_context::builder& db_context::builder::with_partitioner(dht::i_partitioner& partitioner) {
+    _partitioner = partitioner;
+    return *this;
+}
+
+db_context db_context::builder::build() {
+    return db_context{
+        _proxy,
+        _migration_notifier ? _migration_notifier->get() : service::get_local_storage_service().get_migration_notifier(),
+        _token_metadata ? _token_metadata->get() : service::get_local_storage_service().get_token_metadata(),
+        _snitch ? _snitch->get() : locator::i_endpoint_snitch::get_local_snitch_ptr(),
+        _partitioner ? _partitioner->get() : dht::global_partitioner()
+    };
+}
+
+class transformer final {
+public:
+    using streams_type = std::unordered_map<std::pair<net::inet_address, unsigned int>, utils::UUID>;
+private:
+    db_context _ctx;
+    schema_ptr _schema;
+    schema_ptr _log_schema;
+    utils::UUID _time;
+    bytes _decomposed_time;
+    ::shared_ptr<const transformer::streams_type> _streams;
+    const column_definition& _op_col;
+    ttl_opt _cdc_ttl_opt;
+
+    clustering_key set_pk_columns(const partition_key& pk, int batch_no, mutation& m) const {
+        const auto log_ck = clustering_key::from_exploded(
+                *m.schema(), { _decomposed_time, int32_type->decompose(batch_no) });
+        auto pk_value = pk.explode(*_schema);
+        size_t pos = 0;
+        for (const auto& column : _schema->partition_key_columns()) {
+            assert (pos < pk_value.size());
+            auto cdef = m.schema()->get_column_definition(to_bytes("_" + column.name()));
+            auto value = atomic_cell::make_live(*column.type,
+                                                _time.timestamp(),
+                                                bytes_view(pk_value[pos]),
+                                                _cdc_ttl_opt);
+            m.set_cell(log_ck, *cdef, std::move(value));
+            ++pos;
+        }
+        return log_ck;
+    }
+
+    void set_operation(const clustering_key& ck, operation op, mutation& m) const {
+        m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op)), _cdc_ttl_opt));
+    }
+
+    partition_key stream_id(const net::inet_address& ip, unsigned int shard_id) const {
+        auto it = _streams->find(std::make_pair(ip, shard_id));
+        if (it == std::end(*_streams)) {
+                throw std::runtime_error(format("No stream found for node {} and shard {}", ip, shard_id));
+        }
+        return partition_key::from_exploded(*_log_schema, { uuid_type->decompose(it->second) });
+    }
+public:
+    transformer(db_context ctx, schema_ptr s, ::shared_ptr<const transformer::streams_type> streams)
+        : _ctx(ctx)
+        , _schema(std::move(s))
+        , _log_schema(ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
+        , _time(utils::UUID_gen::get_time_UUID())
+        , _decomposed_time(timeuuid_type->decompose(_time))
+        , _streams(std::move(streams))
+        , _op_col(*_log_schema->get_column_definition(to_bytes("operation")))
+    {
+        if (_schema->cdc_options().ttl()) {
+            _cdc_ttl_opt = std::chrono::seconds(_schema->cdc_options().ttl());
+        }
+    }
+
+    // TODO: is pre-image data based on query enough. We only have actual column data. Do we need
+    // more details like tombstones/ttl? Probably not but keep in mind.
+    mutation transform(const mutation& m, const cql3::untyped_result_set* rs = nullptr) const {
+        auto& t = m.token();
+        auto&& ep = _ctx._token_metadata.get_endpoint(
+                _ctx._token_metadata.first_token(t));
+        if (!ep) {
+            throw std::runtime_error(format("No owner found for key {}", m.decorated_key()));
+        }
+        auto shard_id = dht::murmur3_partitioner(_ctx._snitch->get_shard_count(*ep), _ctx._snitch->get_ignore_msb_bits(*ep)).shard_of(t);
+        mutation res(_log_schema, stream_id(ep->addr(), shard_id));
+        auto& p = m.partition();
+        if (p.partition_tombstone()) {
+            // Partition deletion
+            auto log_ck = set_pk_columns(m.key(), 0, res);
+            set_operation(log_ck, operation::partition_delete, res);
+        } else if (!p.row_tombstones().empty()) {
+            // range deletion
+            int batch_no = 0;
+            for (auto& rt : p.row_tombstones()) {
+                auto set_bound = [&] (const clustering_key& log_ck, const clustering_key_prefix& ckp) {
+                    auto exploded = ckp.explode(*_schema);
+                    size_t pos = 0;
+                    for (const auto& column : _schema->clustering_key_columns()) {
+                        if (pos >= exploded.size()) {
+                            break;
+                        }
+                        auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
+                        auto value = atomic_cell::make_live(*column.type,
+                                                            _time.timestamp(),
+                                                            bytes_view(exploded[pos]),
+                                                            _cdc_ttl_opt);
+                        res.set_cell(log_ck, *cdef, std::move(value));
+                        ++pos;
+                    }
+                };
+                {
+                    auto log_ck = set_pk_columns(m.key(), batch_no, res);
+                    set_bound(log_ck, rt.start);
+                    // TODO: separate inclusive/exclusive range
+                    set_operation(log_ck, operation::range_delete_start, res);
+                    ++batch_no;
+                }
+                {
+                    auto log_ck = set_pk_columns(m.key(), batch_no, res);
+                    set_bound(log_ck, rt.end);
+                    // TODO: separate inclusive/exclusive range
+                    set_operation(log_ck, operation::range_delete_end, res);
+                    ++batch_no;
+                }
+            }
+        } else {
+            // should be update or deletion
+            int batch_no = 0;
+            for (const rows_entry& r : p.clustered_rows()) {
+                auto ck_value = r.key().explode(*_schema);
+
+                std::optional<clustering_key> pikey;
+                const cql3::untyped_result_set_row * pirow = nullptr;
+
+                if (rs) {
+                    for (auto& utr : *rs) {
+                        bool match = true;
+                        for (auto& c : _schema->clustering_key_columns()) {
+                            auto rv = utr.get_view(c.name_as_text());
+                            auto cv = r.key().get_component(*_schema, c.component_index());
+                            if (rv != cv) {
+                                match = false;
+                                break;
+                            }
+                        }
+                        if (match) {
+                            pikey = set_pk_columns(m.key(), batch_no, res);
+                            set_operation(*pikey, operation::pre_image, res);
+                            pirow = &utr;
+                            ++batch_no;
+                            break;
+                        }
+                    }
+                }
+
+                auto log_ck = set_pk_columns(m.key(), batch_no, res);
+
+                size_t pos = 0;
+                for (const auto& column : _schema->clustering_key_columns()) {
+                    assert (pos < ck_value.size());
+                    auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
+                    res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
+
+                    if (pirow) {
+                        assert(pirow->has(column.name_as_text()));
+                        res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos]), _cdc_ttl_opt));
+                    }
+
+                    ++pos;
+                }
+
+                std::vector<bytes_opt> values(3);
+
+                auto process_cells = [&](const row& r, column_kind ckind) {
+                    r.for_each_cell([&](column_id id, const atomic_cell_or_collection& cell) {
+                        auto& cdef = _schema->column_at(ckind, id);
+                        auto* dst = _log_schema->get_column_definition(to_bytes("_" + cdef.name()));
+                        // todo: collections.
+                        if (cdef.is_atomic()) {
+                            column_op op;
+
+                            values[1] = values[2] = std::nullopt;
+                            auto view = cell.as_atomic_cell(cdef);
+                            if (view.is_live()) {
+                                op = column_op::set;
+                                values[1] = view.value().linearize();
+                                if (view.is_live_and_has_ttl()) {
+                                    values[2] = long_type->decompose(data_value(view.ttl().count()));
+                                }
+                            } else {
+                                op = column_op::del;
+                            }
+
+                            values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(op)));
+                            res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
+
+                            if (pirow && pirow->has(cdef.name_as_text())) {
+                                values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(column_op::set)));
+                                values[1] = pirow->get_blob(cdef.name_as_text());
+                                values[2] = std::nullopt;
+
+                                assert(std::addressof(res.partition().clustered_row(*_log_schema, *pikey)) != std::addressof(res.partition().clustered_row(*_log_schema, log_ck)));
+                                assert(pikey->explode() != log_ck.explode());
+                                res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values), _cdc_ttl_opt));
+                            }
+                        } else {
+                            cdc_log.warn("Non-atomic cell ignored {}.{}:{}", _schema->ks_name(), _schema->cf_name(), cdef.name_as_text());
+                        }
+                    });
+                };
+
+                process_cells(r.row().cells(), column_kind::regular_column);
+                process_cells(p.static_row().get(), column_kind::static_column);
+
+                set_operation(log_ck, operation::update, res);
+                ++batch_no;
+            }
+        }
+
+        return res;
+    }
+
+    static db::timeout_clock::time_point default_timeout() {
+        return db::timeout_clock::now() + 10s;
+    }
+
+    future<lw_shared_ptr<cql3::untyped_result_set>> pre_image_select(
+            service::client_state& client_state,
+            db::consistency_level cl,
+            const mutation& m)
+    {
+        auto& p = m.partition();
+        if (p.partition_tombstone() || !p.row_tombstones().empty() || p.clustered_rows().empty()) {
+            return make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>();
+        }
+
+        dht::partition_range_vector partition_ranges{dht::partition_range(m.decorated_key())};
+
+        auto&& pc = _schema->partition_key_columns();
+        auto&& cc = _schema->clustering_key_columns();
+
+        std::vector<query::clustering_range> bounds;
+        if (cc.empty()) {
+            bounds.push_back(query::clustering_range::make_open_ended_both_sides());
+        } else {
+            for (const rows_entry& r : p.clustered_rows()) {
+                auto& ck = r.key();
+                bounds.push_back(query::clustering_range::make_singular(ck));
+            }
+        }
+
+        std::vector<const column_definition*> columns;
+        columns.reserve(_schema->all_columns().size());
+
+        std::transform(pc.begin(), pc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
+        std::transform(cc.begin(), cc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
+
+        query::column_id_vector static_columns, regular_columns;
+
+        auto sk = column_kind::static_column;
+        auto rk = column_kind::regular_column;
+        // TODO: this assumes all mutations touch the same set of columns. This might not be true, and we may need to do more horrible set operation here.
+        for (auto& [r, cids, kind] : { std::tie(p.static_row().get(), static_columns, sk), std::tie(p.clustered_rows().begin()->row().cells(), regular_columns, rk) }) {
+            r.for_each_cell([&](column_id id, const atomic_cell_or_collection&) {
+                auto& cdef =_schema->column_at(kind, id);
+                cids.emplace_back(id);
+                columns.emplace_back(&cdef);
+            });
+        }
+
+        auto selection = cql3::selection::selection::for_columns(_schema, std::move(columns));
+        auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), selection->get_query_options());
+        auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_partitions);
+
+        return _ctx._proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
+                [s = _schema, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
+                    cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
+                    query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *s, *selection));
+                    auto result_set = builder.build();
+                    if (!result_set || result_set->empty()) {
+                        return {};
+                    }
+                    return make_lw_shared<cql3::untyped_result_set>(*result_set);
+        });
+    }
+};
+
+// This class is used to build a mapping from <node ip, shard id> to stream_id
+// It is used as a consumer for rows returned by the query to CDC Description Table
+class streams_builder {
+    const schema& _schema;
+    transformer::streams_type _streams;
+    net::inet_address _node_ip = net::inet_address();
+    unsigned int _shard_id = 0;
+    api::timestamp_type _latest_row_timestamp = api::min_timestamp;
+    utils::UUID _latest_row_stream_id = utils::UUID();
+public:
+    streams_builder(const schema& s) : _schema(s) {}
+
+    void accept_new_partition(const partition_key& key, uint32_t row_count) {
+        auto exploded = key.explode(_schema);
+        _node_ip = value_cast<net::inet_address>(inet_addr_type->deserialize(exploded[0]));
+        _shard_id = static_cast<unsigned int>(value_cast<int>(int32_type->deserialize(exploded[1])));
+        _latest_row_timestamp = api::min_timestamp;
+        _latest_row_stream_id = utils::UUID();
+    }
+
+    void accept_new_partition(uint32_t row_count) {
+        assert(false);
+    }
+
+    void accept_new_row(
+            const clustering_key& key,
+            const query::result_row_view& static_row,
+            const query::result_row_view& row) {
+        auto row_iterator = row.iterator();
+        api::timestamp_type timestamp = value_cast<db_clock::time_point>(
+                timestamp_type->deserialize(key.explode(_schema)[0])).time_since_epoch().count();
+        if (timestamp <= _latest_row_timestamp) {
+            return;
+        }
+        _latest_row_timestamp = timestamp;
+        for (auto&& cdef : _schema.regular_columns()) {
+            if (cdef.name_as_text() != "stream_id") {
+                row_iterator.skip(cdef);
+                continue;
+            }
+            auto val_opt = row_iterator.next_atomic_cell();
+            assert(val_opt);
+            val_opt->value().with_linearized([&] (bytes_view bv) {
+                _latest_row_stream_id = value_cast<utils::UUID>(uuid_type->deserialize(bv));
+            });
+        }
+    }
+
+    void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
+        assert(false);
+    }
+
+    void accept_partition_end(const query::result_row_view& static_row) {
+        _streams.emplace(std::make_pair(_node_ip, _shard_id), _latest_row_stream_id);
+    }
+
+    transformer::streams_type build() {
+        return std::move(_streams);
+    }
+};
+
+static future<::shared_ptr<transformer::streams_type>> get_streams(
+        db_context ctx,
+        const sstring& ks_name,
+        const sstring& cf_name,
+        lowres_clock::time_point timeout,
+        service::query_state& qs) {
+    auto s =
+        ctx._proxy.get_db().local().find_schema(ks_name, desc_name(cf_name));
+    query::read_command cmd(
+            s->id(),
+            s->version(),
+            partition_slice_builder(*s).with_no_static_columns().build());
+    return ctx._proxy.query(
+            s,
+            make_lw_shared(std::move(cmd)),
+            {dht::partition_range::make_open_ended_both_sides()},
+            db::consistency_level::QUORUM,
+            {timeout, qs.get_permit(), qs.get_client_state()}).then([s = std::move(s)] (auto qr) mutable {
+        return query::result_view::do_with(*qr.query_result,
+                [s = std::move(s)] (query::result_view v) {
+            auto slice = partition_slice_builder(*s)
+                    .with_no_static_columns()
+                    .build();
+            streams_builder builder{ *s };
+            v.consume(slice, builder);
+            return ::make_shared<transformer::streams_type>(builder.build());
+        });
+    });
+}
+
+template <typename Func>
+future<std::vector<mutation>>
+transform_mutations(std::vector<mutation>& muts, decltype(muts.size()) batch_size, Func&& f) {
+    return parallel_for_each(
+            boost::irange(static_cast<decltype(muts.size())>(0), muts.size(), batch_size),
+            std::move(f))
+        .then([&muts] () mutable { return std::move(muts); });
+}
+
+} // namespace cdc
+
+future<std::tuple<std::vector<mutation>, cdc::result_callback>>
+cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
+    // we do all this because in the case of batches, we can have mixed schemas.
+    auto e = mutations.end();
+    auto i = std::find_if(mutations.begin(), e, [](const mutation& m) {
+        return m.schema()->cdc_options().enabled();
+    });
+
+    if (i == e) {
+        return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
+    }
+
+    mutations.reserve(2 * mutations.size());
+
+    return do_with(std::move(mutations), service::query_state(service::client_state::for_internal_calls(), empty_service_permit()), [this, timeout, i](std::vector<mutation>& mutations, service::query_state& qs) {
+        return transform_mutations(mutations, 1, [this, &mutations, timeout, &qs] (int idx) {
+            auto& m = mutations[idx];
+            auto s = m.schema();
+
+            if (!s->cdc_options().enabled()) {
+                return make_ready_future<>();
+            }
+            // for batches/multiple mutations this is super inefficient. either partition the mutation set by schema
+            // and re-use streams, or probably better: add a cache so this lookup is a noop on second mutation
+            return get_streams(_ctxt, s->ks_name(), s->cf_name(), timeout, qs).then([this, s = std::move(s), &qs, &mutations, idx](::shared_ptr<transformer::streams_type> streams) mutable {
+                auto& m = mutations[idx]; // should not really need because of reserve, but lets be conservative
+                transformer trans(_ctxt, s, streams);
+
+                if (!s->cdc_options().preimage()) {
+                    mutations.emplace_back(trans.transform(m));
+                    return make_ready_future<>();
+                }
+
+                // Note: further improvement here would be to coalesce the pre-image selects into one
+                // iff a batch contains several modifications to the same table. Otoh, batch is rare(?)
+                // so this is premature.
+                auto f = trans.pre_image_select(qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m);
+                return f.then([trans = std::move(trans), &mutations, idx] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
+                    mutations.push_back(trans.transform(mutations[idx], rs.get()));
+                });
+            });
+        }).then([](std::vector<mutation> mutations) {
+            return make_ready_future<std::tuple<std::vector<mutation>, cdc::result_callback>>(std::make_tuple(std::move(mutations), result_callback{}));
+        });
+    });
+}
+
+bool cdc::cdc_service::needs_cdc_augmentation(const std::vector<mutation>& mutations) const {
+    return std::any_of(mutations.begin(), mutations.end(), [](const mutation& m) {
+        return m.schema()->cdc_options().enabled();
+    });
+}
+
+future<std::tuple<std::vector<mutation>, cdc::result_callback>>
+cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations) {
+    return _impl->augment_mutation_call(timeout, std::move(mutations));
+}
--- a/cdc/cdc.hh
+++ b/cdc/cdc.hh
@@ -0,0 +1,142 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <functional>
+#include <optional>
+#include <map>
+#include <string>
+#include <vector>
+
+#include <seastar/core/future.hh>
+#include <seastar/core/lowres_clock.hh>
+#include <seastar/core/shared_ptr.hh>
+#include <seastar/core/sstring.hh>
+
+#include "exceptions/exceptions.hh"
+#include "timestamp.hh"
+#include "cdc_options.hh"
+
+class schema;
+using schema_ptr = seastar::lw_shared_ptr<const schema>;
+
+namespace locator {
+
+class snitch_ptr;
+class token_metadata;
+
+} // namespace locator
+
+namespace service {
+
+class migration_notifier;
+class storage_proxy;
+class query_state;
+
+} // namespace service
+
+namespace dht {
+
+class i_partitioner;
+
+} // namespace dht
+
+class mutation;
+class partition_key;
+
+namespace cdc {
+
+class db_context;
+
+// Callback to be invoked on mutation finish to fix
+// the whole bit about post-image.
+// TODO: decide on what the parameters are to be for this.
+using result_callback = std::function<future<>()>;
+
+/// \brief CDC service, responsible for schema listeners
+///
+/// CDC service will listen for schema changes and iff CDC is enabled/changed
+/// create/modify/delete corresponding log tables etc as part of the schema change. 
+///
+class cdc_service {
+    class impl;
+    std::unique_ptr<impl> _impl;
+public:
+    future<> stop();
+    cdc_service(service::storage_proxy&);
+    cdc_service(db_context);
+    ~cdc_service();
+
+    // If any of the mutations are cdc enabled, optionally selects preimage, and adds the
+    // appropriate augments to set the log entries.
+    // Iff post-image is enabled for any of these, a non-empty callback is also
+    // returned to be invoked post the mutation query.
+    future<std::tuple<std::vector<mutation>, result_callback>> augment_mutation_call(
+        lowres_clock::time_point timeout,
+        std::vector<mutation>&& mutations
+        );
+    bool needs_cdc_augmentation(const std::vector<mutation>&) const;
+};
+
+struct db_context final {
+    service::storage_proxy& _proxy;
+    service::migration_notifier& _migration_notifier;
+    locator::token_metadata& _token_metadata;
+    locator::snitch_ptr& _snitch;
+    dht::i_partitioner& _partitioner;
+
+    class builder final {
+        service::storage_proxy& _proxy;
+        std::optional<std::reference_wrapper<service::migration_notifier>> _migration_notifier;
+        std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
+        std::optional<std::reference_wrapper<locator::snitch_ptr>> _snitch;
+        std::optional<std::reference_wrapper<dht::i_partitioner>> _partitioner;
+    public:
+        builder(service::storage_proxy& proxy);
+
+        builder& with_migration_notifier(service::migration_notifier& migration_notifier);
+        builder& with_token_metadata(locator::token_metadata& token_metadata);
+        builder& with_snitch(locator::snitch_ptr& snitch);
+        builder& with_partitioner(dht::i_partitioner& partitioner);
+
+        db_context build();
+    };
+};
+
+// cdc log table operation
+enum class operation : int8_t {
+    // note: these values will eventually be read by a third party, probably not privvy to this
+    // enum decl, so don't change the constant values (or the datatype).
+    pre_image = 0, update = 1, row_delete = 2, range_delete_start = 3, range_delete_end = 4, partition_delete = 5
+};
+
+// cdc log data column operation
+enum class column_op : int8_t {
+    // same as "operation". Do not edit values or type/type unless you _really_ want to.
+    set = 0, del = 1, add = 2,
+};
+
+seastar::sstring log_name(const seastar::sstring& table_name);
+
+seastar::sstring desc_name(const seastar::sstring& table_name);
+
+} // namespace cdc
--- a/cdc/cdc_options.hh
+++ b/cdc/cdc_options.hh
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <map>
+#include <seastar/core/sstring.hh>
+#include "seastarx.hh"
+
+namespace cdc {
+
+class options final {
+    bool _enabled = false;
+    bool _preimage = false;
+    bool _postimage = false;
+    int _ttl = 86400; // 24h in seconds
+public:
+    options() = default;
+    options(const std::map<sstring, sstring>& map);
+
+    std::map<sstring, sstring> to_map() const;
+    sstring to_sstring() const;
+
+    bool enabled() const { return _enabled; }
+    bool preimage() const { return _preimage; }
+    bool postimage() const { return _postimage; }
+    int ttl() const { return _ttl; }
+
+    bool operator==(const options& o) const;
+    bool operator!=(const options& o) const;
+};
+
+} // namespace cdc
--- a/cell_locking.hh
+++ b/cell_locking.hh
@@ -68,7 +68,7 @@ public:
    public:
        explicit iterator(const mutation_partition& mp)
            : _mp(mp)
-            , _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
+            , _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row().get())
        { }

        iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)
--- a/collection_mutation.cc
+++ b/collection_mutation.cc
@@ -0,0 +1,476 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "types/collection.hh"
+#include "types/user.hh"
+#include "concrete_types.hh"
+#include "atomic_cell_or_collection.hh"
+#include "mutation_partition.hh"
+#include "compaction_garbage_collector.hh"
+#include "combine.hh"
+
+#include "collection_mutation.hh"
+
+collection_mutation::collection_mutation(const abstract_type& type, collection_mutation_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator())) {}
+
+collection_mutation::collection_mutation(const abstract_type& type, const bytes_ostream& data)
+	: _data(imr_object_type::make(data::cell::make_collection(fragment_range_view(data)), &type.imr_state().lsa_migrator())) {}
+
+static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
+{
+    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
+    auto ti = data::type_info::make_collection();
+    data::cell::context ctx(f, ti);
+    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
+    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
+    return collection_mutation_view { dv };
+}
+
+collection_mutation::operator collection_mutation_view() const
+{
+    return get_collection_mutation_view(_data.get());
+}
+
+collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
+    return get_collection_mutation_view(_data.get());
+}
+
+bool collection_mutation_view::is_empty() const {
+    auto in = collection_mutation_input_stream(data);
+    auto has_tomb = in.read_trivial<bool>();
+    return !has_tomb && in.read_trivial<uint32_t>() == 0;
+}
+
+template <typename F>
+GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
+static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
+    auto in = collection_mutation_input_stream(data);
+    auto has_tomb = in.read_trivial<bool>();
+    if (has_tomb) {
+        auto ts = in.read_trivial<api::timestamp_type>();
+        auto ttl = in.read_trivial<gc_clock::duration::rep>();
+        tomb.apply(tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))});
+    }
+
+    auto nr = in.read_trivial<uint32_t>();
+    for (uint32_t i = 0; i != nr; ++i) {
+        auto& type_info = read_cell_type_info(in);
+        auto vsize = in.read_trivial<uint32_t>();
+        auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
+        if (value.is_live(tomb, now, false)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone tomb, gc_clock::time_point now) const {
+    return visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        auto& type_info = ctype.value_comparator()->imr_state().type_info();
+        return ::is_any_live(data, tomb, now, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
+            auto key_size = in.read_trivial<uint32_t>();
+            in.skip(key_size);
+            return type_info;
+        });
+    },
+    [&] (const user_type_impl& utype) {
+        return ::is_any_live(data, tomb, now, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
+            auto key_size = in.read_trivial<uint32_t>();
+            auto key = in.read(key_size);
+            return utype.type(deserialize_field_index(key))->imr_state().type_info();
+        });
+    },
+    [&] (const abstract_type& o) -> bool {
+        throw std::runtime_error(format("collection_mutation_view::is_any_live: unknown type {}", o.name()));
+    }
+    ));
+}
+
+template <typename F>
+GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, collection_mutation_input_stream&>)
+static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
+    auto in = collection_mutation_input_stream(data);
+    api::timestamp_type max = api::missing_timestamp;
+    auto has_tomb = in.read_trivial<bool>();
+    if (has_tomb) {
+        max = std::max(max, in.read_trivial<api::timestamp_type>());
+        (void)in.read_trivial<gc_clock::duration::rep>();
+    }
+
+    auto nr = in.read_trivial<uint32_t>();
+    for (uint32_t i = 0; i != nr; ++i) {
+        auto& type_info = read_cell_type_info(in);
+        auto vsize = in.read_trivial<uint32_t>();
+        auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
+        max = std::max(value.timestamp(), max);
+    }
+
+    return max;
+}
+
+
+api::timestamp_type collection_mutation_view::last_update(const abstract_type& type) const {
+    return visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        auto& type_info = ctype.value_comparator()->imr_state().type_info();
+        return ::last_update(data, [&type_info] (collection_mutation_input_stream& in) -> const data::type_info& {
+            auto key_size = in.read_trivial<uint32_t>();
+            in.skip(key_size);
+            return type_info;
+        });
+    },
+    [&] (const user_type_impl& utype) {
+        return ::last_update(data, [&utype] (collection_mutation_input_stream& in) -> const data::type_info& {
+            auto key_size = in.read_trivial<uint32_t>();
+            auto key = in.read(key_size);
+            return utype.type(deserialize_field_index(key))->imr_state().type_info();
+        });
+    },
+    [&] (const abstract_type& o) -> api::timestamp_type {
+        throw std::runtime_error(format("collection_mutation_view::last_update: unknown type {}", o.name()));
+    }
+    ));
+}
+
+std::ostream& operator<<(std::ostream& os, const collection_mutation_view::printer& cmvp) {
+    fmt::print(os, "{{collection_mutation_view ");
+    cmvp._cmv.with_deserialized(cmvp._type, [&os, &type = cmvp._type] (const collection_mutation_view_description& cmvd) {
+        bool first = true;
+        fmt::print(os, "tombstone {}", cmvd.tomb);
+        visit(type, make_visitor(
+        [&] (const collection_type_impl& ctype) {
+            auto&& key_type = ctype.name_comparator();
+            auto&& value_type = ctype.value_comparator();
+            for (auto&& [key, value] : cmvd.cells) {
+                if (!first) {
+                    fmt::print(os, ", ");
+                }
+                fmt::print(os, "{}: {}", key_type->to_string(key), atomic_cell_view::printer(*value_type, value));
+                first = false;
+            }
+        },
+        [&] (const user_type_impl& utype) {
+            for (auto&& [raw_idx, value] : cmvd.cells) {
+                if (!first) {
+                    fmt::print(os, ", ");
+                }
+                auto idx = deserialize_field_index(raw_idx);
+                fmt::print(os, "{}: {}", utype.field_name_as_string(idx), atomic_cell_view::printer(*utype.type(idx), value));
+                first = false;
+            }
+        },
+        [&] (const abstract_type& o) {
+            // Not throwing exception in this likely-to-be debug context
+            fmt::print(os, "attempted to pretty-print collection_mutation_view_description with type {}", o.name());
+        }
+        ));
+    });
+    fmt::print(os, "}}");
+    return os;
+}
+
+
+collection_mutation_description
+collection_mutation_view_description::materialize(const abstract_type& type) const {
+    collection_mutation_description m;
+    m.tomb = tomb;
+    m.cells.reserve(cells.size());
+
+    visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        auto& value_type = *ctype.value_comparator();
+        for (auto&& e : cells) {
+            m.cells.emplace_back(to_bytes(e.first), atomic_cell(value_type, e.second));
+        }
+    },
+    [&] (const user_type_impl& utype) {
+        for (auto&& e : cells) {
+            m.cells.emplace_back(to_bytes(e.first), atomic_cell(*utype.type(deserialize_field_index(e.first)), e.second));
+        }
+    },
+    [&] (const abstract_type& o) {
+        throw std::runtime_error(format("attempted to materialize collection_mutation_view_description with type {}", o.name()));
+    }
+    ));
+
+    return m;
+}
+
+bool collection_mutation_description::compact_and_expire(column_id id, row_tombstone base_tomb, gc_clock::time_point query_time,
+    can_gc_fn& can_gc, gc_clock::time_point gc_before, compaction_garbage_collector* collector)
+{
+    bool any_live = false;
+    auto t = tomb;
+    tombstone purged_tomb;
+    if (tomb <= base_tomb.regular()) {
+        tomb = tombstone();
+    } else if (tomb.deletion_time < gc_before && can_gc(tomb)) {
+        purged_tomb = tomb;
+        tomb = tombstone();
+    }
+    t.apply(base_tomb.regular());
+    utils::chunked_vector<std::pair<bytes, atomic_cell>> survivors;
+    utils::chunked_vector<std::pair<bytes, atomic_cell>> losers;
+    for (auto&& name_and_cell : cells) {
+        atomic_cell& cell = name_and_cell.second;
+        auto cannot_erase_cell = [&] {
+            return cell.deletion_time() >= gc_before || !can_gc(tombstone(cell.timestamp(), cell.deletion_time()));
+        };
+
+        if (cell.is_covered_by(t, false) || cell.is_covered_by(base_tomb.shadowable().tomb(), false)) {
+            continue;
+        }
+        if (cell.has_expired(query_time)) {
+            if (cannot_erase_cell()) {
+                survivors.emplace_back(std::make_pair(
+                    std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
+            } else if (collector) {
+                losers.emplace_back(std::pair(
+                        std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
+            }
+        } else if (!cell.is_live()) {
+            if (cannot_erase_cell()) {
+                survivors.emplace_back(std::move(name_and_cell));
+            } else if (collector) {
+                losers.emplace_back(std::move(name_and_cell));
+            }
+        } else {
+            any_live |= true;
+            survivors.emplace_back(std::move(name_and_cell));
+        }
+    }
+    if (collector) {
+        collector->collect(id, collection_mutation_description{purged_tomb, std::move(losers)});
+    }
+    cells = std::move(survivors);
+    return any_live;
+}
+
+template <typename Iterator>
+static collection_mutation serialize_collection_mutation(
+        const abstract_type& type,
+        const tombstone& tomb,
+        boost::iterator_range<Iterator> cells) {
+    auto element_size = [] (size_t c, auto&& e) -> size_t {
+        return c + 8 + e.first.size() + e.second.serialize().size();
+    };
+    auto size = accumulate(cells, (size_t)4, element_size);
+    size += 1;
+    if (tomb) {
+        size += sizeof(tomb.timestamp) + sizeof(tomb.deletion_time);
+    }
+    bytes_ostream ret;
+    ret.reserve(size);
+    auto out = ret.write_begin();
+    *out++ = bool(tomb);
+    if (tomb) {
+        write(out, tomb.timestamp);
+        write(out, tomb.deletion_time.time_since_epoch().count());
+    }
+    auto writeb = [&out] (bytes_view v) {
+        serialize_int32(out, v.size());
+        out = std::copy_n(v.begin(), v.size(), out);
+    };
+    // FIXME: overflow?
+    serialize_int32(out, boost::distance(cells));
+    for (auto&& kv : cells) {
+        auto&& k = kv.first;
+        auto&& v = kv.second;
+        writeb(k);
+
+        writeb(v.serialize());
+    }
+    return collection_mutation(type, ret);
+}
+
+collection_mutation collection_mutation_description::serialize(const abstract_type& type) const {
+    return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
+}
+
+collection_mutation collection_mutation_view_description::serialize(const abstract_type& type) const {
+    return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
+}
+
+template <typename C>
+GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
+static collection_mutation_view_description
+merge(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type) {
+    using element_type = std::pair<bytes_view, atomic_cell_view>;
+
+    auto compare = [&] (const element_type& e1, const element_type& e2) {
+        return key_type.less(e1.first, e2.first);
+    };
+
+    auto merge = [] (const element_type& e1, const element_type& e2) {
+        // FIXME: use std::max()?
+        return std::make_pair(e1.first, compare_atomic_cell_for_merge(e1.second, e2.second) > 0 ? e1.second : e2.second);
+    };
+
+    // applied to a tombstone, returns a predicate checking whether a cell is killed by
+    // the tombstone
+    auto cell_killed = [] (const std::optional<tombstone>& t) {
+        return [&t] (const element_type& e) {
+            if (!t) {
+                return false;
+            }
+            // tombstone wins if timestamps equal here, unlike row tombstones
+            if (t->timestamp < e.second.timestamp()) {
+                return false;
+            }
+            return true;
+            // FIXME: should we consider TTLs too?
+        };
+    };
+
+    collection_mutation_view_description merged;
+    merged.cells.reserve(a.cells.size() + b.cells.size());
+
+    combine(a.cells.begin(), std::remove_if(a.cells.begin(), a.cells.end(), cell_killed(b.tomb)),
+            b.cells.begin(), std::remove_if(b.cells.begin(), b.cells.end(), cell_killed(a.tomb)),
+            std::back_inserter(merged.cells),
+            compare,
+            merge);
+    merged.tomb = std::max(a.tomb, b.tomb);
+
+    return merged;
+}
+
+collection_mutation merge(const abstract_type& type, collection_mutation_view a, collection_mutation_view b) {
+    return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
+        return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
+            return visit(type, make_visitor(
+            [&] (const collection_type_impl& ctype) {
+                return merge(std::move(a_view), std::move(b_view), *ctype.name_comparator());
+            },
+            [&] (const user_type_impl& utype) {
+                return merge(std::move(a_view), std::move(b_view), *short_type);
+            },
+            [] (const abstract_type& o) -> collection_mutation_view_description {
+                throw std::runtime_error(format("collection_mutation merge: unknown type: {}", o.name()));
+            }
+            )).serialize(type);
+        });
+    });
+}
+
+template <typename C>
+GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
+static collection_mutation_view_description
+difference(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type)
+{
+    collection_mutation_view_description diff;
+    diff.cells.reserve(std::max(a.cells.size(), b.cells.size()));
+
+    auto it = b.cells.begin();
+    for (auto&& c : a.cells) {
+        while (it != b.cells.end() && key_type.less(it->first, c.first)) {
+            ++it;
+        }
+        if (it == b.cells.end() || !key_type.equal(it->first, c.first)
+            || compare_atomic_cell_for_merge(c.second, it->second) > 0) {
+
+            auto cell = std::make_pair(c.first, c.second);
+            diff.cells.emplace_back(std::move(cell));
+        }
+    }
+    if (a.tomb > b.tomb) {
+        diff.tomb = a.tomb;
+    }
+
+    return diff;
+}
+
+collection_mutation difference(const abstract_type& type, collection_mutation_view a, collection_mutation_view b)
+{
+    return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
+        return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
+            return visit(type, make_visitor(
+            [&] (const collection_type_impl& ctype) {
+                return difference(std::move(a_view), std::move(b_view), *ctype.name_comparator());
+            },
+            [&] (const user_type_impl& utype) {
+                return difference(std::move(a_view), std::move(b_view), *short_type);
+            },
+            [] (const abstract_type& o) -> collection_mutation_view_description {
+                throw std::runtime_error(format("collection_mutation difference: unknown type: {}", o.name()));
+            }
+            )).serialize(type);
+        });
+    });
+}
+
+template <typename F>
+GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, collection_mutation_input_stream&>)
+static collection_mutation_view_description
+deserialize_collection_mutation(collection_mutation_input_stream& in, F&& read_kv) {
+    collection_mutation_view_description ret;
+
+    auto has_tomb = in.read_trivial<bool>();
+    if (has_tomb) {
+        auto ts = in.read_trivial<api::timestamp_type>();
+        auto ttl = in.read_trivial<gc_clock::duration::rep>();
+        ret.tomb = tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))};
+    }
+
+    auto nr = in.read_trivial<uint32_t>();
+    ret.cells.reserve(nr);
+    for (uint32_t i = 0; i != nr; ++i) {
+        ret.cells.push_back(read_kv(in));
+    }
+
+    assert(in.empty());
+    return ret;
+}
+
+collection_mutation_view_description
+deserialize_collection_mutation(const abstract_type& type, collection_mutation_input_stream& in) {
+    return visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        // value_comparator(), ugh
+        auto& type_info = ctype.value_comparator()->imr_state().type_info();
+        return deserialize_collection_mutation(in, [&type_info] (collection_mutation_input_stream& in) {
+            // FIXME: we could probably avoid the need for size
+            auto ksize = in.read_trivial<uint32_t>();
+            auto key = in.read(ksize);
+            auto vsize = in.read_trivial<uint32_t>();
+            auto value = atomic_cell_view::from_bytes(type_info, in.read(vsize));
+            return std::make_pair(key, value);
+        });
+    },
+    [&] (const user_type_impl& utype) {
+        return deserialize_collection_mutation(in, [&utype] (collection_mutation_input_stream& in) {
+            // FIXME: we could probably avoid the need for size
+            auto ksize = in.read_trivial<uint32_t>();
+            auto key = in.read(ksize);
+            auto vsize = in.read_trivial<uint32_t>();
+            auto value = atomic_cell_view::from_bytes(
+                    utype.type(deserialize_field_index(key))->imr_state().type_info(), in.read(vsize));
+            return std::make_pair(key, value);
+        });
+    },
+    [&] (const abstract_type& o) -> collection_mutation_view_description {
+        throw std::runtime_error(format("deserialize_collection_mutation: unknown type {}", o.name()));
+    }
+    ));
+}
--- a/collection_mutation.hh
+++ b/collection_mutation.hh
@@ -0,0 +1,139 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "utils/chunked_vector.hh"
+#include "schema_fwd.hh"
+#include "gc_clock.hh"
+#include "atomic_cell.hh"
+#include "cql_serialization_format.hh"
+#include "marshal_exception.hh"
+#include "utils/linearizing_input_stream.hh"
+#include <iosfwd>
+
+class abstract_type;
+class bytes_ostream;
+class compaction_garbage_collector;
+class row_tombstone;
+
+class collection_mutation;
+
+// An auxiliary struct used to (de)construct collection_mutations.
+// Unlike collection_mutation which is a serialized blob, this struct allows to inspect logical units of information
+// (tombstone and cells) inside the mutation easily.
+struct collection_mutation_description {
+    tombstone tomb;
+    // FIXME: use iterators?
+    // we never iterate over `cells` more than once, so there is no need to store them in memory.
+    // In some cases instead of constructing the `cells` vector, it would be more efficient to provide
+    // a one-time-use forward iterator which returns the cells.
+    utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
+
+    // Expires cells based on query_time. Expires tombstones based on max_purgeable and gc_before.
+    // Removes cells covered by tomb or this->tomb.
+    bool compact_and_expire(column_id id, row_tombstone tomb, gc_clock::time_point query_time,
+        can_gc_fn&, gc_clock::time_point gc_before, compaction_garbage_collector* collector = nullptr);
+
+    // Packs the data to a serialized blob.
+    collection_mutation serialize(const abstract_type&) const;
+};
+
+// Similar to collection_mutation_description, except that it doesn't store the cells' data, only observes it.
+struct collection_mutation_view_description {
+    tombstone tomb;
+    // FIXME: use iterators? See the fixme in collection_mutation_description; the same considerations apply here.
+    utils::chunked_vector<std::pair<bytes_view, atomic_cell_view>> cells;
+
+    // Copies the observed data, storing it in a collection_mutation_description.
+    collection_mutation_description materialize(const abstract_type&) const;
+
+    // Packs the data to a serialized blob.
+    collection_mutation serialize(const abstract_type&) const;
+};
+
+using collection_mutation_input_stream = utils::linearizing_input_stream<atomic_cell_value_view, marshal_exception>;
+
+// Given a linearized collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
+// The struct is an observer of the data given by the collection_mutation_view and is only valid while the
+// passed in `collection_mutation_input_stream` is alive.
+// The function needs to be given the type of stored data to reconstruct the structural information.
+collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, collection_mutation_input_stream&);
+
+class collection_mutation_view {
+public:
+    atomic_cell_value_view data;
+
+    // Is this a noop mutation?
+    bool is_empty() const;
+
+    // Is any of the stored cells live (not deleted nor expired) at the time point `tp`,
+    // given the later of the tombstones `t` and the one stored in the mutation (if any)?
+    // Requires a type to reconstruct the structural information.
+    bool is_any_live(const abstract_type&, tombstone t = tombstone(), gc_clock::time_point tp = gc_clock::time_point::min()) const;
+
+    // The maximum of timestamps of the mutation's cells and tombstone.
+    api::timestamp_type last_update(const abstract_type&) const;
+
+    // Given a function that operates on a collection_mutation_view_description,
+    // calls it on the corresponding description of `this`.
+    template <typename F>
+    inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
+        auto stream = collection_mutation_input_stream(data);
+        return f(deserialize_collection_mutation(type, stream));
+    }
+
+    class printer {
+        const abstract_type& _type;
+        const collection_mutation_view& _cmv;
+    public:
+        printer(const abstract_type& type, const collection_mutation_view& cmv)
+                : _type(type), _cmv(cmv) {}
+        friend std::ostream& operator<<(std::ostream& os, const printer& cmvp);
+    };
+};
+
+// A serialized mutation of a collection of cells.
+// Used to represent mutations of collections (lists, maps, sets) or non-frozen user defined types.
+// It contains a sequence of cells, each representing a mutation of a single entry (element or field) of the collection.
+// Each cell has an associated 'key' (or 'path'). The meaning of each (key, cell) pair is:
+//  for sets: the key is the serialized set element, the cell contains no data (except liveness information),
+//  for maps: the key is the serialized map element's key, the cell contains the serialized map element's value,
+//  for lists: the key is a timeuuid identifying the list entry, the cell contains the serialized value,
+//  for user types: the key is an index identifying the field, the cell contains the value of the field.
+//  The mutation may also contain a collection-wide tombstone.
+class collection_mutation {
+public:
+    using imr_object_type =  imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
+
+    collection_mutation() {}
+    collection_mutation(const abstract_type&, collection_mutation_view);
+    collection_mutation(const abstract_type& type, const bytes_ostream& data);
+    operator collection_mutation_view() const;
+};
+
+collection_mutation merge(const abstract_type&, collection_mutation_view, collection_mutation_view);
+
+collection_mutation difference(const abstract_type&, collection_mutation_view, collection_mutation_view);
+
+// Serializes the given collection of cells to a sequence of bytes ready to be sent over the CQL protocol.
+bytes serialize_for_cql(const abstract_type&, collection_mutation_view, cql_serialization_format);
--- a/compaction_garbage_collector.hh
+++ b/compaction_garbage_collector.hh
@@ -22,7 +22,7 @@
 #pragma once

 #include "schema.hh"
-#include "types/collection.hh"
+#include "collection_mutation.hh"

 class atomic_cell;
 class row_marker;
@@ -31,6 +31,6 @@ class compaction_garbage_collector {
 public:
    virtual ~compaction_garbage_collector() = default;
    virtual void collect(column_id id, atomic_cell) = 0;
-    virtual void collect(column_id id, collection_type_impl::mutation) = 0;
+    virtual void collect(column_id id, collection_mutation_description) = 0;
    virtual void collect(row_marker) = 0;
 };
--- a/compound.hh
+++ b/compound.hh
@@ -74,8 +74,8 @@ private:
     *   <len(value1)><value1><len(value2)><value2>...<len(value_n)><value_n>
     *
     */
-    template<typename RangeOfSerializedComponents>
-    static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out) {
+    template<typename RangeOfSerializedComponents, typename CharOutputIterator>
+    static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out) {
        for (auto&& val : values) {
            assert(val.size() <= std::numeric_limits<size_type>::max());
            write<size_type>(out, size_type(val.size()));
--- a/compound_compat.hh
+++ b/compound_compat.hh
@@ -248,15 +248,16 @@ private:
    static size_t size(const data_value& val) {
        return val.serialized_size();
    }
-    template<typename Value, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
-    static void write_value(Value&& val, bytes::iterator& out) {
+    template<typename Value, typename CharOutputIterator, typename = std::enable_if_t<!std::is_same<data_value, std::decay_t<Value>>::value>>
+    static void write_value(Value&& val, CharOutputIterator& out) {
        out = std::copy(val.begin(), val.end(), out);
    }
-    static void write_value(const data_value& val, bytes::iterator& out) {
+    template <typename CharOutputIterator>
+    static void write_value(const data_value& val, CharOutputIterator& out) {
        val.serialize(out);
    }
-    template<typename RangeOfSerializedComponents>
-    static void serialize_value(RangeOfSerializedComponents&& values, bytes::iterator& out, bool is_compound) {
+    template<typename RangeOfSerializedComponents, typename CharOutputIterator>
+    static void serialize_value(RangeOfSerializedComponents&& values, CharOutputIterator& out, bool is_compound) {
        if (!is_compound) {
            auto it = values.begin();
            write_value(std::forward<decltype(*it)>(*it), out);
--- a/concrete_types.hh
+++ b/concrete_types.hh
@@ -92,14 +92,17 @@ struct duration_type_impl final : public concrete_type<cql_duration> {

 struct timestamp_type_impl final : public simple_type_impl<db_clock::time_point> {
    timestamp_type_impl();
+    static db_clock::time_point from_sstring(sstring_view s);
 };

 struct simple_date_type_impl final : public simple_type_impl<uint32_t> {
    simple_date_type_impl();
+    static uint32_t from_sstring(sstring_view s);
 };

 struct time_type_impl final : public simple_type_impl<int64_t> {
    time_type_impl();
+    static int64_t from_sstring(sstring_view s);
 };

 struct string_type_impl : public concrete_type<sstring> {
@@ -125,8 +128,11 @@ struct date_type_impl final : public concrete_type<db_clock::time_point> {
    date_type_impl();
 };

+using timestamp_date_base_class = concrete_type<db_clock::time_point>;
+
 struct timeuuid_type_impl final : public concrete_type<utils::UUID> {
    timeuuid_type_impl();
+    static utils::UUID from_sstring(sstring_view s);
 };

 struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_int> {
@@ -135,10 +141,13 @@ struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_

 struct inet_addr_type_impl final : public concrete_type<seastar::net::inet_address> {
    inet_addr_type_impl();
+    static sstring to_sstring(const seastar::net::inet_address& addr);
+    static seastar::net::inet_address from_sstring(sstring_view s);
 };

 struct uuid_type_impl final : public concrete_type<utils::UUID> {
    uuid_type_impl();
+    static utils::UUID from_sstring(sstring_view s);
 };

 template <typename Func> using visit_ret_type = std::invoke_result_t<Func, const ascii_type_impl&>;
@@ -239,3 +248,28 @@ static inline visit_ret_type<Func> visit(const abstract_type& t, Func&& f) {
    }
    __builtin_unreachable();
 }
+
+template <typename Func> struct data_value_visitor {
+    const void* v;
+    Func& f;
+    auto operator()(const empty_type_impl& t) { return f(t, v); }
+    auto operator()(const counter_type_impl& t) { return f(t, v); }
+    auto operator()(const reversed_type_impl& t) { return f(t, v); }
+    template <typename T> auto operator()(const T& t) {
+        return f(t, reinterpret_cast<const typename T::native_type*>(v));
+    }
+};
+
+// Given an abstract_type and a void pointer to an object of that
+// type, call f with the runtime type of t and v casted to the
+// corresponding native type.
+// This takes an abstract_type and a void pointer instead of a
+// data_value to support reversed_type_impl without requiring that
+// each visitor create a new data_value just to recurse.
+template <typename Func> inline auto visit(const abstract_type& t, const void* v, Func&& f) {
+    return ::visit(t, data_value_visitor<Func>{v, f});
+}
+
+template <typename Func> inline auto visit(const data_value& v, Func&& f) {
+    return ::visit(*v.type(), v._value, f);
+}
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -25,15 +25,19 @@
 # multiple tokens per node, see http://cassandra.apache.org/doc/latest/operating
 num_tokens: 256

+# Directory where Scylla should store all its files, which are commitlog,
+# data, hints, view_hints and saved_caches subdirectories. All of these
+# subs can be overriden by the respective options below.
+# If unset, the value defaults to /var/lib/scylla
+# workdir: /var/lib/scylla
+
 # Directory where Scylla should store data on disk.
-# If not set, the default directory is /var/lib/scylla/data.
-data_file_directories:
-    - /var/lib/scylla/data
+# data_file_directories:
+#    - /var/lib/scylla/data

 # commit log.  when running on magnetic HDD, this should be a
 # separate spindle than the data directories.
-# If not set, the default directory is /var/lib/scylla/commitlog.
-commitlog_directory: /var/lib/scylla/commitlog
+# commitlog_directory: /var/lib/scylla/commitlog

 # commitlog_sync may be either "periodic" or "batch."
 #
@@ -112,6 +116,9 @@ read_request_timeout_in_ms: 5000

 # How long the coordinator should wait for writes to complete
 write_request_timeout_in_ms: 2000
+# how long a coordinator should continue to retry a CAS operation
+# that contends with other proposals for the same row
+cas_contention_timeout_in_ms: 1000

 # phi value that must be reached for a host to be marked down.
 # most users should never need to adjust this.
@@ -238,7 +245,10 @@ batch_size_fail_threshold_in_kb: 50
 # broadcast_rpc_address: 1.2.3.4

 # Uncomment to enable experimental features
-# experimental: true
+# experimental_features:
+#     - cdc
+#     - lwt
+#     - udf

 # The directory where hints files are stored if hinted handoff is enabled.
 # hints_directory: /var/lib/scylla/hints
@@ -257,24 +267,6 @@ batch_size_fail_threshold_in_kb: 50
 # created until it has been seen alive and gone down again.
 # max_hint_window_in_ms: 10800000 # 3 hours

-# Maximum throttle in KBs per second, per delivery thread.  This will be
-# reduced proportionally to the number of nodes in the cluster.  (If there
-# are two nodes in the cluster, each delivery thread will use the maximum
-# rate; if there are three, each will throttle to half of the maximum,
-# since we expect two nodes to be delivering hints simultaneously.)
-# hinted_handoff_throttle_in_kb: 1024
-# Number of threads with which to deliver hints;
-# Consider increasing this number when you have multi-dc deployments, since
-# cross-dc handoff tends to be slower
-# max_hints_delivery_threads: 2
-
-###################################################
-## Not currently supported, reserved for future use
-###################################################
-
-# Maximum throttle in KBs per second, total. This will be
-# reduced proportionally to the number of nodes in the cluster.
-# batchlog_replay_throttle_in_kb: 1024

 # Validity period for permissions cache (fetching permissions can be an
 # expensive operation depending on the authorizer, CassandraAuthorizer is
@@ -302,120 +294,6 @@ batch_size_fail_threshold_in_kb: 50
 #
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner

-# Maximum size of the key cache in memory.
-#
-# Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
-# minimum, sometimes more. The key cache is fairly tiny for the amount of
-# time it saves, so it's worthwhile to use it at large numbers.
-# The row cache saves even more time, but must contain the entire row,
-# so it is extremely space-intensive. It's best to only use the
-# row cache if you have hot rows or static rows.
-#
-# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
-#
-# Default value is empty to make it "auto" (min(5% of Heap (in MB), 100MB)). Set to 0 to disable key cache.
-# key_cache_size_in_mb:
-
-# Duration in seconds after which Scylla should
-# save the key cache. Caches are saved to saved_caches_directory as
-# specified in this configuration file.
-#
-# Saved caches greatly improve cold-start speeds, and is relatively cheap in
-# terms of I/O for the key cache. Row cache saving is much more expensive and
-# has limited use.
-#
-# Default is 14400 or 4 hours.
-# key_cache_save_period: 14400
-
-# Number of keys from the key cache to save
-# Disabled by default, meaning all keys are going to be saved
-# key_cache_keys_to_save: 100
-
-# Maximum size of the row cache in memory.
-# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
-#
-# Default value is 0, to disable row caching.
-# row_cache_size_in_mb: 0
-
-# Duration in seconds after which Scylla should
-# save the row cache. Caches are saved to saved_caches_directory as specified
-# in this configuration file.
-#
-# Saved caches greatly improve cold-start speeds, and is relatively cheap in
-# terms of I/O for the key cache. Row cache saving is much more expensive and
-# has limited use.
-#
-# Default is 0 to disable saving the row cache.
-# row_cache_save_period: 0
-
-# Number of keys from the row cache to save
-# Disabled by default, meaning all keys are going to be saved
-# row_cache_keys_to_save: 100
-
-# Maximum size of the counter cache in memory.
-#
-# Counter cache helps to reduce counter locks' contention for hot counter cells.
-# In case of RF = 1 a counter cache hit will cause Scylla to skip the read before
-# write entirely. With RF > 1 a counter cache hit will still help to reduce the duration
-# of the lock hold, helping with hot counter cell updates, but will not allow skipping
-# the read entirely. Only the local (clock, count) tuple of a counter cell is kept
-# in memory, not the whole counter, so it's relatively cheap.
-#
-# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.
-#
-# Default value is empty to make it "auto" (min(2.5% of Heap (in MB), 50MB)). Set to 0 to disable counter cache.
-# NOTE: if you perform counter deletes and rely on low gcgs, you should disable the counter cache.
-# counter_cache_size_in_mb:
-
-# Duration in seconds after which Scylla should
-# save the counter cache (keys only). Caches are saved to saved_caches_directory as
-# specified in this configuration file.
-#
-# Default is 7200 or 2 hours.
-# counter_cache_save_period: 7200
-
-# Number of keys from the counter cache to save
-# Disabled by default, meaning all keys are going to be saved
-# counter_cache_keys_to_save: 100
-
-# The off-heap memory allocator.  Affects storage engine metadata as
-# well as caches.  Experiments show that JEMAlloc saves some memory
-# than the native GCC allocator (i.e., JEMalloc is more
-# fragmentation-resistant).
-# 
-# Supported values are: NativeAllocator, JEMallocAllocator
-#
-# If you intend to use JEMallocAllocator you have to install JEMalloc as library and
-# modify cassandra-env.sh as directed in the file.
-#
-# Defaults to NativeAllocator
-# memory_allocator: NativeAllocator
-
-# saved caches
-# If not set, the default directory is /var/lib/scylla/saved_caches.
-# saved_caches_directory: /var/lib/scylla/saved_caches
-
-
-
-# For workloads with more data than can fit in memory, Scylla's
-# bottleneck will be reads that need to fetch data from
-# disk. "concurrent_reads" should be set to (16 * number_of_drives) in
-# order to allow the operations to enqueue low enough in the stack
-# that the OS and drives can reorder them. Same applies to
-# "concurrent_counter_writes", since counter writes read the current
-# values before incrementing and writing them back.
-#
-# On the other hand, since writes are almost never IO bound, the ideal
-# number of "concurrent_writes" is dependent on the number of cores in
-# your system; (8 * number_of_cores) is a good rule of thumb.
-# concurrent_reads: 32
-# concurrent_writes: 32
-# concurrent_counter_writes: 32
-
-# Total memory to use for sstable-reading buffers.  Defaults to
-# the smaller of 1/4 of heap or 512MB.
-# file_cache_size_in_mb: 512
-
 # Total space to use for commitlogs.
 #
 # If space gets above this value (it will round up to the next nearest
@@ -427,28 +305,6 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 # available for Scylla.
 commitlog_total_space_in_mb: -1

-# A fixed memory pool size in MB for for SSTable index summaries. If left
-# empty, this will default to 5% of the heap size. If the memory usage of
-# all index summaries exceeds this limit, SSTables with low read rates will
-# shrink their index summaries in order to meet this limit.  However, this
-# is a best-effort process. In extreme conditions Scylla may need to use
-# more than this amount of memory.
-# index_summary_capacity_in_mb:
-
-# How frequently index summaries should be resampled.  This is done
-# periodically to redistribute memory from the fixed-size pool to sstables
-# proportional their recent read rates.  Setting to -1 will disable this
-# process, leaving existing index summaries at their current sampling level.
-# index_summary_resize_interval_in_minutes: 60
-
-# Whether to, when doing sequential writing, fsync() at intervals in
-# order to force the operating system to flush the dirty
-# buffers. Enable this to avoid sudden dirty buffer flushing from
-# impacting read latencies. Almost always a good idea on SSDs; not
-# necessarily on platters.
-# trickle_fsync: false
-# trickle_fsync_interval_in_kb: 10240
-
 # TCP port, for commands and data
 # For security reasons, you should not expose this port to the internet.  Firewall it if needed.
 # storage_port: 7000
@@ -461,91 +317,21 @@ commitlog_total_space_in_mb: -1
 # listen_interface: eth0
 # listen_interface_prefer_ipv6: false

-# Internode authentication backend, implementing IInternodeAuthenticator;
-# used to allow/disallow connections from peer nodes.
-# internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
-
 # Whether to start the native transport server.
 # Please note that the address on which the native transport is bound is the
 # same as the rpc_address. The port however is different and specified below.
 # start_native_transport: true

-# The maximum threads for handling requests when the native transport is used.
-# This is similar to rpc_max_threads though the default differs slightly (and
-# there is no native_transport_min_threads, idle threads will always be stopped
-# after 30 seconds).
-# native_transport_max_threads: 128
-#
 # The maximum size of allowed frame. Frame (requests) larger than this will
 # be rejected as invalid. The default is 256MB.
 # native_transport_max_frame_size_in_mb: 256

-# The maximum number of concurrent client connections.
-# The default is -1, which means unlimited.
-# native_transport_max_concurrent_connections: -1
-
-# The maximum number of concurrent client connections per source ip.
-# The default is -1, which means unlimited.
-# native_transport_max_concurrent_connections_per_ip: -1
-
 # Whether to start the thrift rpc server.
 # start_rpc: true

 # enable or disable keepalive on rpc/native connections
 # rpc_keepalive: true

-# Scylla provides two out-of-the-box options for the RPC Server:
-#
-# sync  -> One thread per thrift connection. For a very large number of clients, memory
-#          will be your limiting factor. On a 64 bit JVM, 180KB is the minimum stack size
-#          per thread, and that will correspond to your use of virtual memory (but physical memory
-#          may be limited depending on use of stack space).
-#
-# hsha  -> Stands for "half synchronous, half asynchronous." All thrift clients are handled
-#          asynchronously using a small number of threads that does not vary with the amount
-#          of thrift clients (and thus scales well to many clients). The rpc requests are still
-#          synchronous (one thread per active request). If hsha is selected then it is essential
-#          that rpc_max_threads is changed from the default value of unlimited.
-#
-# The default is sync because on Windows hsha is about 30% slower.  On Linux,
-# sync/hsha performance is about the same, with hsha of course using less memory.
-#
-# Alternatively,  can provide your own RPC server by providing the fully-qualified class name
-# of an o.a.c.t.TServerFactory that can create an instance of it.
-# rpc_server_type: sync
-
-# Uncomment rpc_min|max_thread to set request pool size limits.
-#
-# Regardless of your choice of RPC server (see above), the number of maximum requests in the
-# RPC thread pool dictates how many concurrent requests are possible (but if you are using the sync
-# RPC server, it also dictates the number of clients that can be connected at all).
-#
-# The default is unlimited and thus provides no protection against clients overwhelming the server. You are
-# encouraged to set a maximum that makes sense for you in production, but do keep in mind that
-# rpc_max_threads represents the maximum number of client requests this server may execute concurrently.
-#
-# rpc_min_threads: 16
-# rpc_max_threads: 2048
-
-# uncomment to set socket buffer sizes on rpc connections
-# rpc_send_buff_size_in_bytes:
-# rpc_recv_buff_size_in_bytes:
-
-# Uncomment to set socket buffer size for internode communication
-# Note that when setting this, the buffer size is limited by net.core.wmem_max
-# and when not setting it it is defined by net.ipv4.tcp_wmem
-# See:
-# /proc/sys/net/core/wmem_max
-# /proc/sys/net/core/rmem_max
-# /proc/sys/net/ipv4/tcp_wmem
-# /proc/sys/net/ipv4/tcp_rmem
-# and: man tcp
-# internode_send_buff_size_in_bytes:
-# internode_recv_buff_size_in_bytes:
-
-# Frame size for thrift (maximum message length).
-# thrift_framed_transport_size_in_mb: 15
-
 # Set to true to have Scylla create a hard link to each sstable
 # flushed or streamed locally in a backups/ subdirectory of the
 # keyspace data.  Removing these links is the operator's
@@ -588,30 +374,6 @@ commitlog_total_space_in_mb: -1
 # column_index_size_in_kb: 64


-# Number of simultaneous compactions to allow, NOT including
-# validation "compactions" for anti-entropy repair.  Simultaneous
-# compactions can help preserve read performance in a mixed read/write
-# workload, by mitigating the tendency of small sstables to accumulate
-# during a single long running compactions. The default is usually
-# fine and if you experience problems with compaction running too
-# slowly or too fast, you should look at
-# compaction_throughput_mb_per_sec first.
-#
-# concurrent_compactors defaults to the smaller of (number of disks,
-# number of cores), with a minimum of 2 and a maximum of 8.
-# 
-# If your data directories are backed by SSD, you should increase this
-# to the number of cores.
-#concurrent_compactors: 1
-
-# Throttles compaction to the given total throughput across the entire
-# system. The faster you insert data, the faster you need to compact in
-# order to keep the sstable count down, but in general, setting this to
-# 16 to 32 times the rate you are inserting data is more than sufficient.
-# Setting this to 0 disables throttling. Note that this account for all types
-# of compaction, including validation compaction.
-# compaction_throughput_mb_per_sec: 16
-
 # Log a warning when writing partitions larger than this value
 # compaction_large_partition_warning_threshold_mb: 1000

@@ -624,18 +386,6 @@ commitlog_total_space_in_mb: -1
 # Log a warning when row number is larger than this value
 # compaction_rows_count_warning_threshold: 100000

-# When compacting, the replacement sstable(s) can be opened before they
-# are completely written, and used in place of the prior sstables for
-# any range that has been written. This helps to smoothly transfer reads 
-# between the sstables, reducing page cache churn and keeping hot rows hot
-# sstable_preemptive_open_interval_in_mb: 50
-
-# Throttles all streaming file transfer between the datacenters,
-# this setting allows users to throttle inter dc stream throughput in addition
-# to throttling all network stream traffic as configured with
-# stream_throughput_outbound_megabits_per_sec
-# inter_dc_stream_throughput_outbound_megabits_per_sec:
-
 # How long the coordinator should wait for seq or index scans to complete
 # range_request_timeout_in_ms: 10000
 # How long the coordinator should wait for writes to complete
@@ -650,88 +400,23 @@ commitlog_total_space_in_mb: -1
 # The default timeout for other, miscellaneous operations
 # request_timeout_in_ms: 10000

-# Enable operation timeout information exchange between nodes to accurately
-# measure request timeouts.  If disabled, replicas will assume that requests
-# were forwarded to them instantly by the coordinator, which means that
-# under overload conditions we will waste that much extra time processing 
-# already-timed-out requests.
-#
-# Warning: before enabling this property make sure to ntp is installed
-# and the times are synchronized between the nodes.
-# cross_node_timeout: false
-
-# Enable socket timeout for streaming operation.
-# When a timeout occurs during streaming, streaming is retried from the start
-# of the current file. This _can_ involve re-streaming an important amount of
-# data, so you should avoid setting the value too low.
-# Default value is 0, which never timeout streams.
-# streaming_socket_timeout_in_ms: 0
-
-# controls how often to perform the more expensive part of host score
-# calculation
-# dynamic_snitch_update_interval_in_ms: 100 
-
-# controls how often to reset all host scores, allowing a bad host to
-# possibly recover
-# dynamic_snitch_reset_interval_in_ms: 600000
-
-# if set greater than zero and read_repair_chance is < 1.0, this will allow
-# 'pinning' of replicas to hosts in order to increase cache capacity.
-# The badness threshold will control how much worse the pinned host has to be
-# before the dynamic snitch will prefer other replicas over it.  This is
-# expressed as a double which represents a percentage.  Thus, a value of
-# 0.2 means Scylla would continue to prefer the static snitch values
-# until the pinned host was 20% worse than the fastest.
-# dynamic_snitch_badness_threshold: 0.1
-
-# request_scheduler -- Set this to a class that implements
-# RequestScheduler, which will schedule incoming client requests
-# according to the specific policy. This is useful for multi-tenancy
-# with a single Scylla cluster.
-# NOTE: This is specifically for requests from the client and does
-# not affect inter node communication.
-# org.apache.cassandra.scheduler.NoScheduler - No scheduling takes place
-# org.apache.cassandra.scheduler.RoundRobinScheduler - Round robin of
-# client requests to a node with a separate queue for each
-# request_scheduler_id. The scheduler is further customized by
-# request_scheduler_options as described below.
-# request_scheduler: org.apache.cassandra.scheduler.NoScheduler
-
-# Scheduler Options vary based on the type of scheduler
-# NoScheduler - Has no options
-# RoundRobin
-#  - throttle_limit -- The throttle_limit is the number of in-flight
-#                      requests per client.  Requests beyond 
-#                      that limit are queued up until
-#                      running requests can complete.
-#                      The value of 80 here is twice the number of
-#                      concurrent_reads + concurrent_writes.
-#  - default_weight -- default_weight is optional and allows for
-#                      overriding the default which is 1.
-#  - weights -- Weights are optional and will default to 1 or the
-#               overridden default_weight. The weight translates into how
-#               many requests are handled during each turn of the
-#               RoundRobin, based on the scheduler id.
-#
-# request_scheduler_options:
-#    throttle_limit: 80
-#    default_weight: 5
-#    weights:
-#      Keyspace1: 1
-#      Keyspace2: 5
-
-# request_scheduler_id -- An identifier based on which to perform
-# the request scheduling. Currently the only valid option is keyspace.
-# request_scheduler_id: keyspace
-
 # Enable or disable inter-node encryption. 
 # You must also generate keys and provide the appropriate key and trust store locations and passwords. 
-# No custom encryption options are currently enabled. The available options are:
 #
 # The available internode options are : all, none, dc, rack
 # If set to dc scylla  will encrypt the traffic between the DCs
 # If set to rack scylla  will encrypt the traffic between the racks
 #
+# SSL/TLS algorithm and ciphers used can be controlled by 
+# the priority_string parameter. Info on priority string
+# syntax and values is available at:
+#   https://gnutls.org/manual/html_node/Priority-Strings.html
+#
+# The require_client_auth parameter allows you to 
+# restrict access to service based on certificate 
+# validation. Client must provide a certificate 
+# accepted by the used trust store to connect.
+# 
 # server_encryption_options:
 #    internode_encryption: none
 #    certificate: conf/scylla.crt
--- a/configure.py
+++ b/configure.py
@@ -144,8 +144,12 @@ def flag_supported(flag, compiler):

 def gold_supported(compiler):
    src_main = 'int main(int argc, char **argv) { return 0; }'
-    if try_compile_and_link(source=src_main, flags=['-fuse-ld=gold'], compiler=compiler):
-        return '-fuse-ld=gold'
+    link_flags = ['-fuse-ld=gold']
+    if try_compile_and_link(source=src_main, flags=link_flags, compiler=compiler):
+        threads_flag = '-Wl,--threads'
+        if try_compile_and_link(source=src_main, flags=link_flags + [threads_flag], compiler=compiler):
+            link_flags.append(threads_flag)
+        return ' '.join(link_flags)
    else:
        print('Note: gold not found; using default system linker')
        return ''
@@ -257,136 +261,142 @@ modes = {
 }

 scylla_tests = [
-    'tests/mutation_test',
-    'tests/mvcc_test',
-    'tests/mutation_fragment_test',
-    'tests/flat_mutation_reader_test',
-    'tests/schema_registry_test',
-    'tests/canonical_mutation_test',
-    'tests/range_test',
-    'tests/types_test',
-    'tests/keys_test',
-    'tests/partitioner_test',
-    'tests/frozen_mutation_test',
-    'tests/serialized_action_test',
-    'tests/hint_test',
-    'tests/clustering_ranges_walker_test',
-    'tests/perf/perf_mutation',
-    'tests/lsa_async_eviction_test',
-    'tests/lsa_sync_eviction_test',
-    'tests/row_cache_alloc_stress',
-    'tests/perf_row_cache_update',
-    'tests/perf/perf_hash',
-    'tests/perf/perf_cql_parser',
-    'tests/perf/perf_simple_query',
-    'tests/perf/perf_fast_forward',
-    'tests/perf/perf_cache_eviction',
-    'tests/cache_flat_mutation_reader_test',
-    'tests/row_cache_stress_test',
-    'tests/memory_footprint',
-    'tests/perf/perf_sstable',
-    'tests/cql_query_test',
-    'tests/secondary_index_test',
-    'tests/json_cql_query_test',
-    'tests/filtering_test',
-    'tests/storage_proxy_test',
-    'tests/schema_change_test',
-    'tests/mutation_reader_test',
-    'tests/mutation_query_test',
-    'tests/row_cache_test',
-    'tests/test-serialization',
-    'tests/broken_sstable_test',
-    'tests/sstable_test',
-    'tests/sstable_datafile_test',
-    'tests/sstable_3_x_test',
-    'tests/sstable_mutation_test',
-    'tests/sstable_resharding_test',
-    'tests/memtable_test',
-    'tests/commitlog_test',
-    'tests/cartesian_product_test',
-    'tests/hash_test',
-    'tests/map_difference_test',
-    'tests/message',
-    'tests/gossip',
-    'tests/gossip_test',
-    'tests/compound_test',
-    'tests/config_test',
-    'tests/gossiping_property_file_snitch_test',
-    'tests/ec2_snitch_test',
-    'tests/gce_snitch_test',
-    'tests/snitch_reset_test',
-    'tests/network_topology_strategy_test',
-    'tests/query_processor_test',
-    'tests/batchlog_manager_test',
-    'tests/bytes_ostream_test',
-    'tests/UUID_test',
-    'tests/murmur_hash_test',
-    'tests/allocation_strategy_test',
-    'tests/logalloc_test',
-    'tests/log_heap_test',
-    'tests/managed_vector_test',
-    'tests/crc_test',
-    'tests/checksum_utils_test',
-    'tests/flush_queue_test',
-    'tests/dynamic_bitset_test',
-    'tests/auth_test',
-    'tests/idl_test',
-    'tests/range_tombstone_list_test',
-    'tests/anchorless_list_test',
-    'tests/database_test',
-    'tests/nonwrapping_range_test',
-    'tests/input_stream_test',
-    'tests/virtual_reader_test',
-    'tests/view_schema_test',
-    'tests/view_build_test',
-    'tests/view_complex_test',
-    'tests/counter_test',
-    'tests/cell_locker_test',
-    'tests/row_locker_test',
-    'tests/streaming_histogram_test',
-    'tests/duration_test',
-    'tests/vint_serialization_test',
-    'tests/continuous_data_consumer_test',
-    'tests/compress_test',
-    'tests/chunked_vector_test',
-    'tests/loading_cache_test',
-    'tests/castas_fcts_test',
-    'tests/big_decimal_test',
-    'tests/aggregate_fcts_test',
-    'tests/role_manager_test',
-    'tests/caching_options_test',
-    'tests/auth_resource_test',
-    'tests/cql_auth_query_test',
-    'tests/enum_set_test',
-    'tests/extensions_test',
-    'tests/cql_auth_syntax_test',
-    'tests/querier_cache',
-    'tests/limiting_data_source_test',
-    'tests/meta_test',
-    'tests/imr_test',
-    'tests/partition_data_test',
-    'tests/reusable_buffer_test',
-    'tests/mutation_writer_test',
-    'tests/observable_test',
-    'tests/transport_test',
-    'tests/fragmented_temporary_buffer_test',
-    'tests/json_test',
-    'tests/auth_passwords_test',
-    'tests/multishard_mutation_query_test',
-    'tests/top_k_test',
-    'tests/utf8_test',
-    'tests/small_vector_test',
-    'tests/data_listeners_test',
-    'tests/truncation_migration_test',
-    'tests/like_matcher_test',
+    'test/boost/UUID_test',
+    'test/boost/aggregate_fcts_test',
+    'test/boost/allocation_strategy_test',
+    'test/boost/anchorless_list_test',
+    'test/boost/auth_passwords_test',
+    'test/boost/auth_resource_test',
+    'test/boost/auth_test',
+    'test/boost/batchlog_manager_test',
+    'test/boost/big_decimal_test',
+    'test/boost/broken_sstable_test',
+    'test/boost/bytes_ostream_test',
+    'test/boost/cache_flat_mutation_reader_test',
+    'test/boost/caching_options_test',
+    'test/boost/canonical_mutation_test',
+    'test/boost/cartesian_product_test',
+    'test/boost/castas_fcts_test',
+    'test/boost/cdc_test',
+    'test/boost/cell_locker_test',
+    'test/boost/checksum_utils_test',
+    'test/boost/chunked_vector_test',
+    'test/boost/clustering_ranges_walker_test',
+    'test/boost/commitlog_test',
+    'test/boost/compound_test',
+    'test/boost/compress_test',
+    'test/boost/config_test',
+    'test/boost/continuous_data_consumer_test',
+    'test/boost/counter_test',
+    'test/boost/cql_auth_query_test',
+    'test/boost/cql_auth_syntax_test',
+    'test/boost/cql_query_test',
+    'test/boost/crc_test',
+    'test/boost/data_listeners_test',
+    'test/boost/database_test',
+    'test/boost/duration_test',
+    'test/boost/dynamic_bitset_test',
+    'test/boost/enum_option_test',
+    'test/boost/enum_set_test',
+    'test/boost/extensions_test',
+    'test/boost/filtering_test',
+    'test/boost/flat_mutation_reader_test',
+    'test/boost/flush_queue_test',
+    'test/boost/fragmented_temporary_buffer_test',
+    'test/boost/frozen_mutation_test',
+    'test/boost/gossip_test',
+    'test/boost/gossiping_property_file_snitch_test',
+    'test/boost/hash_test',
+    'test/boost/idl_test',
+    'test/boost/input_stream_test',
+    'test/boost/json_cql_query_test',
+    'test/boost/keys_test',
+    'test/boost/like_matcher_test',
+    'test/boost/limiting_data_source_test',
+    'test/boost/linearizing_input_stream_test',
+    'test/boost/loading_cache_test',
+    'test/boost/log_heap_test',
+    'test/boost/logalloc_test',
+    'test/boost/managed_vector_test',
+    'test/boost/map_difference_test',
+    'test/boost/memtable_test',
+    'test/boost/meta_test',
+    'test/boost/multishard_mutation_query_test',
+    'test/boost/murmur_hash_test',
+    'test/boost/mutation_fragment_test',
+    'test/boost/mutation_query_test',
+    'test/boost/mutation_reader_test',
+    'test/boost/mutation_test',
+    'test/boost/mutation_writer_test',
+    'test/boost/mvcc_test',
+    'test/boost/network_topology_strategy_test',
+    'test/boost/nonwrapping_range_test',
+    'test/boost/observable_test',
+    'test/boost/partitioner_test',
+    'test/boost/querier_cache_test',
+    'test/boost/query_processor_test',
+    'test/boost/range_test',
+    'test/boost/range_tombstone_list_test',
+    'test/boost/reusable_buffer_test',
+    'test/boost/role_manager_test',
+    'test/boost/row_cache_test',
+    'test/boost/schema_change_test',
+    'test/boost/schema_registry_test',
+    'test/boost/secondary_index_test',
+    'test/boost/serialization_test',
+    'test/boost/serialized_action_test',
+    'test/boost/small_vector_test',
+    'test/boost/snitch_reset_test',
+    'test/boost/sstable_3_x_test',
+    'test/boost/sstable_datafile_test',
+    'test/boost/sstable_mutation_test',
+    'test/boost/sstable_resharding_test',
+    'test/boost/sstable_test',
+    'test/boost/storage_proxy_test',
+    'test/boost/top_k_test',
+    'test/boost/transport_test',
+    'test/boost/truncation_migration_test',
+    'test/boost/types_test',
+    'test/boost/user_function_test',
+    'test/boost/user_types_test',
+    'test/boost/utf8_test',
+    'test/boost/view_build_test',
+    'test/boost/view_complex_test',
+    'test/boost/view_schema_test',
+    'test/boost/vint_serialization_test',
+    'test/boost/virtual_reader_test',
+    'test/manual/ec2_snitch_test',
+    'test/manual/gce_snitch_test',
+    'test/manual/gossip',
+    'test/manual/hint_test',
+    'test/manual/imr_test',
+    'test/manual/json_test',
+    'test/manual/message',
+    'test/manual/partition_data_test',
+    'test/manual/row_locker_test',
+    'test/manual/streaming_histogram_test',
+    'test/perf/perf_cache_eviction',
+    'test/perf/perf_cql_parser',
+    'test/perf/perf_fast_forward',
+    'test/perf/perf_hash',
+    'test/perf/perf_mutation',
+    'test/perf/perf_row_cache_update',
+    'test/perf/perf_simple_query',
+    'test/perf/perf_sstable',
+    'test/tools/cql_repl',
+    'test/unit/lsa_async_eviction_test',
+    'test/unit/lsa_sync_eviction_test',
+    'test/unit/memory_footprint_test',
+    'test/unit/row_cache_alloc_stress_test',
+    'test/unit/row_cache_stress_test',
 ]

 perf_tests = [
-    'tests/perf/perf_mutation_readers',
-    'tests/perf/perf_checksum',
-    'tests/perf/perf_mutation_fragment',
-    'tests/perf/perf_idl',
-    'tests/perf/perf_vint',
+    'test/perf/perf_mutation_readers',
+    'test/perf/perf_checksum',
+    'test/perf/perf_mutation_fragment',
+    'test/perf/perf_idl',
+    'test/perf/perf_vint',
 ]

 apps = [
@@ -429,8 +439,6 @@ arg_parser.add_argument('--dpdk-target', action='store', dest='dpdk_target', def
                        help='Path to DPDK SDK target location (e.g. <DPDK SDK dir>/x86_64-native-linuxapp-gcc)')
 arg_parser.add_argument('--debuginfo', action='store', dest='debuginfo', type=int, default=1,
                        help='Enable(1)/disable(0)compiler debug information generation')
-arg_parser.add_argument('--compress-exec-debuginfo', action='store', dest='compress_exec_debuginfo', type=int, default=1,
-                        help='Enable(1)/disable(0) debug information compression in executables')
 arg_parser.add_argument('--static-stdc++', dest='staticcxx', action='store_true',
                        help='Link libgcc and libstdc++ statically')
 arg_parser.add_argument('--static-thrift', dest='staticthrift', action='store_true',
@@ -453,6 +461,8 @@ arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_i
                        help='enable allocation failure injection')
 arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
                        help='path to antlr3 executable')
+arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default='ragel',
+        help='path to ragel executable')
 args = arg_parser.parse_args()

 defines = ['XXH_PRIVATE_API',
@@ -466,6 +476,8 @@ cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassa
 scylla_core = (['database.cc',
                'table.cc',
                'atomic_cell.cc',
+                'collection_mutation.cc',
+                'connection_notifier.cc',
                'hashers.cc',
                'schema.cc',
                'frozen_schema.cc',
@@ -485,6 +497,7 @@ scylla_core = (['database.cc',
                'utils/buffer_input_stream.cc',
                'utils/limiting_data_source.cc',
                'utils/updateable_value.cc',
+                'utils/directories.cc',
                'mutation_partition.cc',
                'mutation_partition_view.cc',
                'mutation_partition_serializer.cc',
@@ -505,6 +518,8 @@ scylla_core = (['database.cc',
                'sstables/partition.cc',
                'sstables/compaction.cc',
                'sstables/compaction_strategy.cc',
+                'sstables/size_tiered_compaction_strategy.cc',
+                'sstables/leveled_compaction_strategy.cc',
                'sstables/compaction_manager.cc',
                'sstables/integrity_checked_file_impl.cc',
                'sstables/prepended_input_stream.cc',
@@ -513,6 +528,8 @@ scylla_core = (['database.cc',
                'transport/event_notifier.cc',
                'transport/server.cc',
                'transport/messages/result_message.cc',
+                'cdc/cdc.cc',
+                'cql3/type_json.cc',
                'cql3/abstract_marker.cc',
                'cql3/attributes.cc',
                'cql3/cf_name.cc',
@@ -524,7 +541,9 @@ scylla_core = (['database.cc',
                'cql3/sets.cc',
                'cql3/tuples.cc',
                'cql3/maps.cc',
+                'cql3/functions/user_function.cc',
                'cql3/functions/functions.cc',
+                'cql3/functions/aggregate_fcts.cc',
                'cql3/functions/castas_fcts.cc',
                'cql3/statements/cf_prop_defs.cc',
                'cql3/statements/cf_statement.cc',
@@ -533,14 +552,18 @@ scylla_core = (['database.cc',
                'cql3/statements/create_table_statement.cc',
                'cql3/statements/create_view_statement.cc',
                'cql3/statements/create_type_statement.cc',
+                'cql3/statements/create_function_statement.cc',
                'cql3/statements/drop_index_statement.cc',
                'cql3/statements/drop_keyspace_statement.cc',
                'cql3/statements/drop_table_statement.cc',
                'cql3/statements/drop_view_statement.cc',
                'cql3/statements/drop_type_statement.cc',
+                'cql3/statements/drop_function_statement.cc',
                'cql3/statements/schema_altering_statement.cc',
                'cql3/statements/ks_prop_defs.cc',
+                'cql3/statements/function_statement.cc',
                'cql3/statements/modification_statement.cc',
+                'cql3/statements/cas_request.cc',
                'cql3/statements/parsed_statement.cc',
                'cql3/statements/property_definitions.cc',
                'cql3/statements/update_statement.cc',
@@ -578,6 +601,10 @@ scylla_core = (['database.cc',
                'service/priority_manager.cc',
                'service/migration_manager.cc',
                'service/storage_proxy.cc',
+                'service/paxos/proposal.cc',
+                'service/paxos/prepare_response.cc',
+                'service/paxos/paxos_state.cc',
+                'service/paxos/prepare_summary.cc',
                'cql3/operator.cc',
                'cql3/relation.cc',
                'cql3/column_identifier.cc',
@@ -716,6 +743,7 @@ scylla_core = (['database.cc',
                'tracing/trace_keyspace_helper.cc',
                'tracing/trace_state.cc',
                'tracing/tracing_backend_registry.cc',
+                'tracing/traced_file.cc',
                'table_helper.cc',
                'range_tombstone.cc',
                'range_tombstone_list.cc',
@@ -733,6 +761,7 @@ scylla_core = (['database.cc',
                'utils/ascii.cc',
                'utils/like_matcher.cc',
                'mutation_writer/timestamp_based_splitting_writer.cc',
+                'lua.cc',
                ] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')]
               )

@@ -772,6 +801,34 @@ api = ['api/api.cc',
       'api/api-doc/config.json',
       ]

+alternator = [
+       'alternator/server.cc',
+       'alternator/executor.cc',
+       'alternator/stats.cc',
+       'alternator/base64.cc',
+       'alternator/serialization.cc',
+       'alternator/expressions.cc',
+       Antlr3Grammar('alternator/expressions.g'),
+       'alternator/conditions.cc',
+       'alternator/rjson.cc',
+       'alternator/auth.cc',
+]
+
+redis = [
+        'redis/service.cc',
+        'redis/server.cc',
+        'redis/query_processor.cc',
+        'redis/protocol_parser.rl',
+        'redis/keyspace_utils.cc',
+        'redis/options.cc',
+        'redis/stats.cc',
+        'redis/mutation_utils.cc',
+        'redis/query_utils.cc',
+        'redis/abstract_command.cc',
+        'redis/command_factory.cc',
+        'redis/commands.cc',
+        ]
+
 idls = ['idl/gossip_digest.idl.hh',
        'idl/uuid.idl.hh',
        'idl/range.idl.hh',
@@ -796,77 +853,80 @@ idls = ['idl/gossip_digest.idl.hh',
        'idl/consistency_level.idl.hh',
        'idl/cache_temperature.idl.hh',
        'idl/view.idl.hh',
+        'idl/messaging_service.idl.hh',
+        'idl/paxos.idl.hh',
        ]

 headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])

 scylla_tests_generic_dependencies = [
-    'tests/cql_test_env.cc',
-    'tests/test_services.cc',
+    'test/lib/cql_test_env.cc',
+    'test/lib/test_services.cc',
 ]

 scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
-    'tests/cql_assertions.cc',
-    'tests/result_set_assertions.cc',
-    'tests/mutation_source_test.cc',
-    'tests/data_model.cc',
-    'tests/exception_utils.cc',
-    'tests/random_schema.cc',
+    'test/lib/cql_assertions.cc',
+    'test/lib/result_set_assertions.cc',
+    'test/lib/mutation_source_test.cc',
+    'test/lib/data_model.cc',
+    'test/lib/exception_utils.cc',
+    'test/lib/random_schema.cc',
 ]

 deps = {
-    'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api,
+    'scylla': idls + ['main.cc', 'release.cc', 'build_id.cc'] + scylla_core + api + alternator + redis,
 }

 pure_boost_tests = set([
-    'tests/map_difference_test',
-    'tests/keys_test',
-    'tests/compound_test',
-    'tests/range_tombstone_list_test',
-    'tests/anchorless_list_test',
-    'tests/nonwrapping_range_test',
-    'tests/test-serialization',
-    'tests/range_test',
-    'tests/crc_test',
-    'tests/checksum_utils_test',
-    'tests/managed_vector_test',
-    'tests/dynamic_bitset_test',
-    'tests/idl_test',
-    'tests/cartesian_product_test',
-    'tests/streaming_histogram_test',
-    'tests/duration_test',
-    'tests/vint_serialization_test',
-    'tests/compress_test',
-    'tests/chunked_vector_test',
-    'tests/big_decimal_test',
-    'tests/caching_options_test',
-    'tests/auth_resource_test',
-    'tests/enum_set_test',
-    'tests/cql_auth_syntax_test',
-    'tests/meta_test',
-    'tests/observable_test',
-    'tests/json_test',
-    'tests/auth_passwords_test',
-    'tests/top_k_test',
-    'tests/small_vector_test',
-    'tests/like_matcher_test',
+    'test/boost/anchorless_list_test',
+    'test/boost/auth_passwords_test',
+    'test/boost/auth_resource_test',
+    'test/boost/big_decimal_test',
+    'test/boost/caching_options_test',
+    'test/boost/cartesian_product_test',
+    'test/boost/checksum_utils_test',
+    'test/boost/chunked_vector_test',
+    'test/boost/compound_test',
+    'test/boost/compress_test',
+    'test/boost/cql_auth_syntax_test',
+    'test/boost/crc_test',
+    'test/boost/duration_test',
+    'test/boost/dynamic_bitset_test',
+    'test/boost/enum_option_test',
+    'test/boost/enum_set_test',
+    'test/boost/idl_test',
+    'test/boost/keys_test',
+    'test/boost/like_matcher_test',
+    'test/boost/linearizing_input_stream_test',
+    'test/boost/map_difference_test',
+    'test/boost/meta_test',
+    'test/boost/nonwrapping_range_test',
+    'test/boost/observable_test',
+    'test/boost/range_test',
+    'test/boost/range_tombstone_list_test',
+    'test/boost/serialization_test',
+    'test/boost/small_vector_test',
+    'test/boost/top_k_test',
+    'test/boost/vint_serialization_test',
+    'test/manual/json_test',
+    'test/manual/streaming_histogram_test',
 ])

 tests_not_using_seastar_test_framework = set([
-    'tests/perf/perf_mutation',
-    'tests/lsa_async_eviction_test',
-    'tests/lsa_sync_eviction_test',
-    'tests/row_cache_alloc_stress',
-    'tests/perf_row_cache_update',
-    'tests/perf/perf_hash',
-    'tests/perf/perf_cql_parser',
-    'tests/message',
-    'tests/perf/perf_cache_eviction',
-    'tests/row_cache_stress_test',
-    'tests/memory_footprint',
-    'tests/gossip',
-    'tests/perf/perf_sstable',
-    'tests/small_vector_test',
+    'test/boost/small_vector_test',
+    'test/manual/gossip',
+    'test/manual/message',
+    'test/perf/perf_cache_eviction',
+    'test/perf/perf_cql_parser',
+    'test/perf/perf_hash',
+    'test/perf/perf_mutation',
+    'test/perf/perf_row_cache_update',
+    'test/perf/perf_sstable',
+    'test/unit/lsa_async_eviction_test',
+    'test/unit/lsa_sync_eviction_test',
+    'test/unit/memory_footprint_test',
+    'test/unit/row_cache_alloc_stress_test',
+    'test/unit/row_cache_stress_test',
 ]) | pure_boost_tests

 for t in tests_not_using_seastar_test_framework:
@@ -887,28 +947,29 @@ perf_tests_seastar_deps = [
 for t in perf_tests:
    deps[t] = [t + '.cc'] + scylla_tests_dependencies + perf_tests_seastar_deps

-deps['tests/sstable_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
-deps['tests/sstable_datafile_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
-deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
+deps['test/boost/sstable_test'] += ['test/lib/sstable_utils.cc', 'test/lib/normalizing_reader.cc']
+deps['test/boost/sstable_datafile_test'] += ['test/lib/sstable_utils.cc', 'test/lib/normalizing_reader.cc']
+deps['test/boost/mutation_reader_test'] += ['test/lib/sstable_utils.cc']

-deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
-deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
-deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
-deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
-deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
-deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
-deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
-deps['tests/perf/perf_fast_forward'] += ['release.cc']
-deps['tests/perf/perf_simple_query'] += ['release.cc']
-deps['tests/meta_test'] = ['tests/meta_test.cc']
-deps['tests/imr_test'] = ['tests/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
-deps['tests/reusable_buffer_test'] = ['tests/reusable_buffer_test.cc']
-deps['tests/utf8_test'] = ['utils/utf8.cc', 'tests/utf8_test.cc']
-deps['tests/small_vector_test'] = ['tests/small_vector_test.cc']
-deps['tests/multishard_mutation_query_test'] += ['tests/test_table.cc']
-deps['tests/vint_serialization_test'] = ['tests/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
+deps['test/boost/bytes_ostream_test'] = ['test/boost/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
+deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
+deps['test/boost/UUID_test'] = ['utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
+deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
+deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
+deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
+deps['test/boost/anchorless_list_test'] = ['test/boost/anchorless_list_test.cc']
+deps['test/perf/perf_fast_forward'] += ['release.cc']
+deps['test/perf/perf_simple_query'] += ['release.cc']
+deps['test/boost/meta_test'] = ['test/boost/meta_test.cc']
+deps['test/manual/imr_test'] = ['test/manual/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
+deps['test/boost/reusable_buffer_test'] = ['test/boost/reusable_buffer_test.cc']
+deps['test/boost/utf8_test'] = ['utils/utf8.cc', 'test/boost/utf8_test.cc']
+deps['test/boost/small_vector_test'] = ['test/boost/small_vector_test.cc']
+deps['test/boost/multishard_mutation_query_test'] += ['test/boost/test_table.cc']
+deps['test/boost/vint_serialization_test'] = ['test/boost/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
+deps['test/boost/linearizing_input_stream_test'] = ['test/boost/linearizing_input_stream_test.cc']

-deps['tests/duration_test'] += ['tests/exception_utils.cc']
+deps['test/boost/duration_test'] += ['test/lib/exception_utils.cc']

 deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']

@@ -951,9 +1012,13 @@ modes['release']['cxx_ld_flags'] += ' ' + ' '.join(optimization_flags)

 gold_linker_flag = gold_supported(compiler=args.cxx)

-dbgflag = '-g' if args.debuginfo else ''
+dbgflag = '-g -gz' if args.debuginfo else ''
 tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'

+# Strip if debuginfo is disabled, otherwise we end up with partial
+# debug info from the libraries we static link with
+regular_link_rule = 'link' if args.debuginfo else 'link_stripped'
+
 if args.so:
    args.pie = '-shared'
    args.fpie = '-fpic'
@@ -970,6 +1035,10 @@ else:
 optional_packages = [['libsystemd', 'libsystemd-daemon']]
 pkgs = []

+# Lua can be provided by lua53 package on Debian-like
+# systems and by Lua on others.
+pkgs.append('lua53' if have_pkg('lua53') else 'lua')
+

 def setup_first_pkg_of_list(pkglist):
    # The HAVE_pkg symbol is taken from the first alternative
@@ -1060,23 +1129,6 @@ scylla_release = file.read().strip()

 extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""

-seastar_flags = []
-if args.dpdk:
-    # fake dependencies on dpdk, so that it is built before anything else
-    seastar_flags += ['--enable-dpdk']
-if args.gcc6_concepts:
-    seastar_flags += ['--enable-gcc6-concepts']
-if args.alloc_failure_injector:
-    seastar_flags += ['--enable-alloc-failure-injector']
-if args.split_dwarf:
-    seastar_flags += ['--split-dwarf']
-
-# We never compress debug info in debug mode
-modes['debug']['cxxflags'] += ' -gz'
-# We compress it by default in release mode
-flag_dest = 'cxx_ld_flags' if args.compress_exec_debuginfo else 'cxxflags'
-modes['release'][flag_dest] += ' -gz'
-
 for m in ['debug', 'release', 'sanitize']:
    modes[m]['cxxflags'] += ' ' + dbgflag

@@ -1085,27 +1137,56 @@ seastar_cflags += ' -Wno-error'
 if args.target != '':
    seastar_cflags += ' -march=' + args.target
 seastar_ldflags = args.user_ldflags
-seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' % (seastar_ldflags),
-                  '--c++-dialect=gnu++17', '--use-std-optional-variant-stringview=1', '--optflags=%s' % (modes['release']['cxx_ld_flags']), ]

 libdeflate_cflags = seastar_cflags
 zstd_cflags = seastar_cflags + ' -Wno-implicit-fallthrough'

-status = subprocess.call([args.python, './configure.py'] + seastar_flags, cwd='seastar')
+MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }

-if status != 0:
-    print('Seastar configuration failed')
-    sys.exit(1)
+def configure_seastar(build_dir, mode):
+    seastar_build_dir = os.path.join(build_dir, mode, 'seastar')

+    seastar_cmake_args = [
+        '-DCMAKE_BUILD_TYPE={}'.format(MODE_TO_CMAKE_BUILD_TYPE[mode]),
+        '-DCMAKE_C_COMPILER={}'.format(args.cc),
+        '-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
+        '-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
+        '-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
+        '-DSeastar_CXX_DIALECT=gnu++17',
+        '-DSeastar_STD_OPTIONAL_VARIANT_STRINGVIEW=ON',
+        '-DSeastar_UNUSED_RESULT_ERROR=ON',
+    ]
+    if args.dpdk:
+        seastar_cmake_args += ['-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']
+    if args.gcc6_concepts:
+        seastar_cmake_args += ['-DSeastar_GCC6_CONCEPTS=ON']
+    if args.split_dwarf:
+        seastar_cmake_args += ['-DSeastar_SPLIT_DWARF=ON']
+    if args.alloc_failure_injector:
+        seastar_cmake_args += ['-DSeastar_ALLOC_FAILURE_INJECTION=ON']

-pc = {mode: 'build/{}/seastar.pc'.format(mode) for mode in build_modes}
+    seastar_cmd = ['cmake', '-G', 'Ninja', os.path.relpath('seastar', seastar_build_dir)] + seastar_cmake_args
+    cmake_dir = seastar_build_dir
+    if args.dpdk:
+        # need to cook first
+        cmake_dir = 'seastar' # required by cooking.sh
+        relative_seastar_build_dir = os.path.join('..', seastar_build_dir)  # relative to seastar/
+        seastar_cmd = ['./cooking.sh', '-i', 'dpdk', '-d', relative_seastar_build_dir, '--'] + seastar_cmd[4:]
+
+    print(seastar_cmd)
+    os.makedirs(seastar_build_dir, exist_ok=True)
+    subprocess.check_call(seastar_cmd, shell=False, cwd=cmake_dir)
+
+for mode in build_modes:
+    configure_seastar('build', mode)
+
+pc = {mode: 'build/{}/seastar/seastar.pc'.format(mode) for mode in build_modes}
 ninja = find_executable('ninja') or find_executable('ninja-build')
 if not ninja:
    print('Ninja executable (ninja or ninja-build) not found on PATH\n')
    sys.exit(1)

-def query_seastar_flags(seastar_pc_file, link_static_cxx=False):
-    pc_file = os.path.join('seastar', seastar_pc_file)
+def query_seastar_flags(pc_file, link_static_cxx=False):
    cflags = pkg_config(pc_file, '--cflags', '--static')
    libs = pkg_config(pc_file, '--libs', '--static')

@@ -1119,8 +1200,6 @@ for mode in build_modes:
    modes[mode]['seastar_cflags'] = seastar_cflags
    modes[mode]['seastar_libs'] = seastar_libs

-MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
-
 # We need to use experimental features of the zstd library (to use our own allocators for the (de)compression context),
 # which are available only when the library is linked statically.
 def configure_zstd(build_dir, mode):
@@ -1190,6 +1269,11 @@ if args.antlr3_exec:
 else:
    antlr3_exec = "antlr3"

+if args.ragel_exec:
+    ragel_exec = args.ragel_exec
+else:
+    ragel_exec = "ragel"
+
 for mode in build_modes:
    configure_zstd(outdir, mode)

@@ -1206,6 +1290,7 @@ with open(buildfile_tmp, 'w') as f:
        cxx = {cxx}
        cxxflags = {user_cflags} {warnings} {defines}
        ldflags = {gold_linker_flag} {user_ldflags}
+        ldflags_build = {gold_linker_flag}
        libs = {libs}
        pool link_pool
            depth = {link_pool_depth}
@@ -1224,6 +1309,11 @@ with open(buildfile_tmp, 'w') as f:
            command = {ninja} -C $subdir $target
            restat = 1
            description = NINJA $out
+        rule ragel
+            # sed away a bug in ragel 7 that emits some extraneous _nfa* variables
+            # (the $$ is collapsed to a single one by ninja)
+            command = {ragel_exec} -G2 -o $out $in && sed -i -e '1h;2,$$H;$$!d;g' -re 's/static const char _nfa[^;]*;//g' $out
+            description = RAGEL $out
        rule run
            command = $in > $out
            description = GEN $out
@@ -1243,7 +1333,7 @@ with open(buildfile_tmp, 'w') as f:
            libs_{mode} = -l{fmt_lib}
            seastar_libs_{mode} = {seastar_libs}
            rule cxx.{mode}
-              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
+              command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
              description = CXX $out
              depfile = $out.d
            rule link.{mode}
@@ -1254,6 +1344,10 @@ with open(buildfile_tmp, 'w') as f:
              command = $cxx  $ld_flags_{mode} -s $ldflags -o $out $in $libs $libs_{mode}
              description = LINK (stripped) $out
              pool = link_pool
+            rule link_build.{mode}
+              command = $cxx  $ld_flags_{mode} $ldflags_build -o $out $in $libs $libs_{mode}
+              description = LINK (build) $out
+              pool = link_pool
            rule ar.{mode}
              command = rm -f $out; ar cr $out $in; ranlib $out
              description = AR $out
@@ -1288,8 +1382,10 @@ with open(buildfile_tmp, 'w') as f:
        swaggers = {}
        serializers = {}
        thrifts = set()
+        ragels = {}
        antlr3_grammars = set()
-        seastar_dep = 'seastar/build/{}/libseastar.a'.format(mode)
+        seastar_dep = 'build/{}/seastar/libseastar.a'.format(mode)
+        seastar_testing_dep = 'build/{}/seastar/libseastar_testing.a'.format(mode)
        for binary in build_artifacts:
            if binary in other:
                continue
@@ -1313,12 +1409,12 @@ with open(buildfile_tmp, 'w') as f:
                    'zstd/lib/libzstd.a',
                ]])
                objs.append('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.o')
-                if binary.startswith('tests/'):
+                if binary.startswith('test/'):
                    local_libs = '$seastar_libs_{} $libs'.format(mode)
                    if binary in pure_boost_tests:
                        local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
                    if binary not in tests_not_using_seastar_test_framework:
-                        pc_path = os.path.join('seastar', pc[mode].replace('seastar.pc', 'seastar-testing.pc'))
+                        pc_path = pc[mode].replace('seastar.pc', 'seastar-testing.pc')
                        local_libs += ' ' + pkg_config(pc_path, '--libs', '--static')
                    if has_thrift:
                        local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
@@ -1327,12 +1423,12 @@ with open(buildfile_tmp, 'w') as f:
                    # So we strip the tests by default; The user can very
                    # quickly re-link the test unstripped by adding a "_g"
                    # to the test name, e.g., "ninja build/release/testname_g"
-                    f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep))
+                    f.write('build $builddir/{}/{}: {}.{} {} | {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
                    f.write('   libs = {}\n'.format(local_libs))
-                    f.write('build $builddir/{}/{}_g: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
+                    f.write('build $builddir/{}/{}_g: {}.{} {} | {} {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep, seastar_testing_dep))
                    f.write('   libs = {}\n'.format(local_libs))
                else:
-                    f.write('build $builddir/{}/{}: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
+                    f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, regular_link_rule, mode, str.join(' ', objs), seastar_dep))
                    if has_thrift:
                        f.write('   libs =  {} {} $seastar_libs_{} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system'), mode))
            for src in srcs:
@@ -1345,6 +1441,9 @@ with open(buildfile_tmp, 'w') as f:
                elif src.endswith('.json'):
                    hh = '$builddir/' + mode + '/gen/' + src + '.hh'
                    swaggers[hh] = src
+                elif src.endswith('.rl'):
+                    hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
+                    ragels[hh] = src
                elif src.endswith('.thrift'):
                    thrifts.add(src)
                elif src.endswith('.g'):
@@ -1355,7 +1454,7 @@ with open(buildfile_tmp, 'w') as f:
        compiles['$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'] = 'utils/gz/gen_crc_combine_table.cc'
        f.write('build {}: run {}\n'.format('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.cc',
                                            '$builddir/' + mode + '/utils/gz/gen_crc_combine_table'))
-        f.write('build {}: link.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
+        f.write('build {}: link_build.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
                                                '$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'))
        f.write('   libs = $seastar_libs_{}\n'.format(mode))
        f.write(
@@ -1373,6 +1472,7 @@ with open(buildfile_tmp, 'w') as f:
            gen_headers += g.headers('$builddir/{}/gen'.format(mode))
        gen_headers += list(swaggers.keys())
        gen_headers += list(serializers.keys())
+        gen_headers += list(ragels.keys())
        gen_headers_dep = ' '.join(gen_headers)

        for obj in compiles:
@@ -1386,6 +1486,9 @@ with open(buildfile_tmp, 'w') as f:
        for hh in serializers:
            src = serializers[hh]
            f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh, src))
+        for hh in ragels:
+            src = ragels[hh]
+            f.write('build {}: ragel {}\n'.format(hh, src))
        for thrift in thrifts:
            outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
            f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))
@@ -1399,25 +1502,33 @@ with open(buildfile_tmp, 'w') as f:
            for cc in grammar.sources('$builddir/{}/gen'.format(mode)):
                obj = cc.replace('.cpp', '.o')
                f.write('build {}: cxx.{} {} || {}\n'.format(obj, mode, cc, ' '.join(serializers)))
-                if cc.endswith('Parser.cpp') and has_sanitize_address_use_after_scope:
-                    # Parsers end up using huge amounts of stack space and overflowing their stack
-                    f.write('  obj_cxxflags = -fno-sanitize-address-use-after-scope\n')
+                if cc.endswith('Parser.cpp'):
+                    # Unoptimized parsers end up using huge amounts of stack space and overflowing their stack
+                    flags = '-O1'
+                    if has_sanitize_address_use_after_scope:
+                        flags += ' -fno-sanitize-address-use-after-scope'
+                    f.write('  obj_cxxflags = %s\n' % flags)
        for hh in headers:
            f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} || {gen_headers_dep}\n'.format(
                    mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))

-        f.write('build seastar/build/{mode}/libseastar.a: ninja | always\n'
+        f.write('build build/{mode}/seastar/libseastar.a: ninja | always\n'
                .format(**locals()))
        f.write('  pool = submodule_pool\n')
-        f.write('  subdir = seastar/build/{mode}\n'.format(**locals()))
-        f.write('  target = seastar seastar_testing\n'.format(**locals()))
-        f.write('build seastar/build/{mode}/apps/iotune/iotune: ninja\n'
+        f.write('  subdir = build/{mode}/seastar\n'.format(**locals()))
+        f.write('  target = seastar\n'.format(**locals()))
+        f.write('build build/{mode}/seastar/libseastar_testing.a: ninja\n'
                .format(**locals()))
        f.write('  pool = submodule_pool\n')
-        f.write('  subdir = seastar/build/{mode}\n'.format(**locals()))
+        f.write('  subdir = build/{mode}/seastar\n'.format(**locals()))
+        f.write('  target = seastar_testing\n'.format(**locals()))
+        f.write('build build/{mode}/seastar/apps/iotune/iotune: ninja\n'
+                .format(**locals()))
+        f.write('  pool = submodule_pool\n')
+        f.write('  subdir = build/{mode}/seastar\n'.format(**locals()))
        f.write('  target = iotune\n'.format(**locals()))
        f.write(textwrap.dedent('''\
-            build build/{mode}/iotune: copy seastar/build/{mode}/apps/iotune/iotune
+            build build/{mode}/iotune: copy build/{mode}/seastar/apps/iotune/iotune
            ''').format(**locals()))
        f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE | always\n'.format(**locals()))
        f.write('  pool = submodule_pool\n')
@@ -1438,7 +1549,7 @@ with open(buildfile_tmp, 'w') as f:
        rule configure
          command = {python} configure.py $configure_args
          generator = 1
-        build build.ninja: configure | configure.py seastar/configure.py
+        build build.ninja: configure | configure.py SCYLLA-VERSION-GEN
        rule cscope
            command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
            description = CSCOPE
@@ -1447,6 +1558,10 @@ with open(buildfile_tmp, 'w') as f:
            command = rm -rf build
            description = CLEAN
        build clean: clean
+        rule mode_list
+            command = echo {modes_list}
+            description = List configured modes
+        build mode_list: mode_list
        default {modes_list}
        ''').format(modes_list=' '.join(default_modes), **globals()))
    f.write(textwrap.dedent('''\
--- a/connection_notifier.cc
+++ b/connection_notifier.cc
@@ -0,0 +1,71 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "connection_notifier.hh"
+#include "db/query_context.hh"
+#include "cql3/constants.hh"
+#include "database.hh"
+#include "service/storage_proxy.hh"
+
+#include <stdexcept>
+
+namespace db::system_keyspace {
+extern const char *const CLIENTS;
+}
+
+static sstring to_string(client_type ct) {
+    switch (ct) {
+        case client_type::cql: return "cql";
+        case client_type::thrift: return "thrift";
+        case client_type::alternator: return "alternator";
+        default: throw std::runtime_error("Invalid client_type");
+    }
+}
+
+future<> notify_new_client(client_data cd) {
+    // FIXME: consider prepared statement
+    const static sstring req
+            = format("INSERT INTO system.{} (address, port, client_type, shard_id, protocol_version, username) "
+                     "VALUES (?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
+    
+    return db::execute_cql(req,
+            std::move(cd.ip), cd.port, to_string(cd.ct), cd.shard_id,
+            cd.protocol_version.has_value() ? data_value(*cd.protocol_version) : data_value::make_null(int32_type),
+            cd.username.value_or("anonymous")).discard_result();
+}
+
+future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port) {
+    // FIXME: consider prepared statement
+    const static sstring req
+            = format("DELETE FROM system.{} where address=? AND port=? AND client_type=?;",
+                     db::system_keyspace::CLIENTS);
+    return db::execute_cql(req, addr.addr(), port, to_string(ct)).discard_result();
+}
+
+future<> clear_clientlist() {
+    auto& db_local = service::get_storage_proxy().local().get_db().local();
+    return db_local.truncate(
+            db_local.find_keyspace(db::system_keyspace_name()),
+            db_local.find_column_family(db::system_keyspace_name(),
+                    db::system_keyspace::CLIENTS),
+            [] { return make_ready_future<db_clock::time_point>(db_clock::now()); },
+            false /* with_snapshot */);
+}
--- a/connection_notifier.hh
+++ b/connection_notifier.hh
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#pragma once
+
+#include "gms/inet_address.hh"
+#include <seastar/core/sstring.hh>
+#include <optional>
+
+enum class client_type {
+    cql = 0,
+    thrift,
+    alternator,
+};
+
+// Representation of a row in `system.clients'. std::optionals are for nullable cells.
+struct client_data {
+    gms::inet_address ip;
+    int32_t port;
+    client_type ct;
+    int32_t shard_id;  /// ID of server-side shard which is processing the connection.
+
+    // `optional' column means that it's nullable (possibly because it's
+    // unimplemented yet). If you want to fill ("implement") any of them,
+    // remember to update the query in `notify_new_client()'.
+    std::optional<sstring> connection_stage;
+    std::optional<sstring> driver_name;
+    std::optional<sstring> driver_version;
+    std::optional<sstring> hostname;
+    std::optional<int32_t> protocol_version;
+    std::optional<sstring> ssl_cipher_suite;
+    std::optional<bool> ssl_enabled;
+    std::optional<sstring> ssl_protocol;
+    std::optional<sstring> username;
+};
+
+future<> notify_new_client(client_data cd);
+future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port);
+
+future<> clear_clientlist();
--- a/converting_mutation_partition_applier.hh
+++ b/converting_mutation_partition_applier.hh
@@ -21,6 +21,9 @@

 #pragma once

+#include "types/user.hh"
+#include "concrete_types.hh"
+
 #include "mutation_partition_view.hh"
 #include "mutation_partition.hh"
 #include "schema.hh"
@@ -35,8 +38,8 @@ class converting_mutation_partition_applier : public mutation_partition_visitor
    const column_mapping& _visited_column_mapping;
    deletable_row* _current_row;
 private:
-    static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
-        return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
+    static bool is_compatible(const column_definition& new_def, const abstract_type& old_type, column_kind kind) {
+        return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(old_type);
    }
    static atomic_cell upgrade_cell(const abstract_type& new_type, const abstract_type& old_type, atomic_cell_view cell,
                                    atomic_cell::collection_member cm = atomic_cell::collection_member::no) {
@@ -49,32 +52,59 @@ private:
            return atomic_cell(new_type, cell);
        }
    }
-    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
+    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, atomic_cell_view cell) {
        if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
            return;
        }
-        dst.apply(new_def, upgrade_cell(*new_def.type, *old_type, cell));
+        dst.apply(new_def, upgrade_cell(*new_def.type, old_type, cell));
    }
-    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
+    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, collection_mutation_view cell) {
        if (!is_compatible(new_def, old_type, kind)) {
            return;
        }
-      cell.data.with_linearized([&] (bytes_view cell_bv) {
-        auto new_ctype = static_pointer_cast<const collection_type_impl>(new_def.type);
-        auto old_ctype = static_pointer_cast<const collection_type_impl>(old_type);
-        auto old_view = old_ctype->deserialize_mutation_form(cell_bv);

-        collection_type_impl::mutation new_view;
+      cell.with_deserialized(old_type, [&] (collection_mutation_view_description old_view) {
+        collection_mutation_description new_view;
        if (old_view.tomb.timestamp > new_def.dropped_at()) {
            new_view.tomb = old_view.tomb;
        }
-        for (auto& c : old_view.cells) {
-            if (c.second.timestamp() > new_def.dropped_at()) {
-                new_view.cells.emplace_back(c.first, upgrade_cell(*new_ctype->value_comparator(), *old_ctype->value_comparator(), c.second, atomic_cell::collection_member::yes));
+
+        visit(old_type, make_visitor(
+            [&] (const collection_type_impl& old_ctype) {
+                assert(new_def.type->is_collection()); // because is_compatible
+                auto& new_ctype = static_cast<const collection_type_impl&>(*new_def.type);
+
+                auto& new_value_type = *new_ctype.value_comparator();
+                auto& old_value_type = *old_ctype.value_comparator();
+
+                for (auto& c : old_view.cells) {
+                    if (c.second.timestamp() > new_def.dropped_at()) {
+                        new_view.cells.emplace_back(c.first, upgrade_cell(
+                                new_value_type, old_value_type, c.second, atomic_cell::collection_member::yes));
+                    }
+                }
+            },
+            [&] (const user_type_impl& old_utype) {
+                assert(new_def.type->is_user_type()); // because is_compatible
+                auto& new_utype = static_cast<const user_type_impl&>(*new_def.type);
+
+                for (auto& c : old_view.cells) {
+                    if (c.second.timestamp() > new_def.dropped_at()) {
+                        auto idx = deserialize_field_index(c.first);
+                        assert(idx < new_utype.size() && idx < old_utype.size());
+
+                        new_view.cells.emplace_back(c.first, upgrade_cell(
+                                *new_utype.type(idx), *old_utype.type(idx), c.second, atomic_cell::collection_member::yes));
+                    }
+                }
+            },
+            [&] (const abstract_type& o) {
+                throw std::runtime_error(format("not a multi-cell type: {}", o.name()));
            }
-        }
+        ));
+
        if (new_view.tomb || !new_view.cells.empty()) {
-            dst.apply(new_def, new_ctype->serialize_mutation_form(std::move(new_view)));
+            dst.apply(new_def, new_view.serialize(*new_def.type));
        }
      });
    }
@@ -100,7 +130,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
+            accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), cell);
        }
    }

@@ -108,7 +138,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
+            accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), collection);
        }
    }

@@ -131,7 +161,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), cell);
        }
    }

@@ -139,7 +169,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), collection);
        }
    }

@@ -147,9 +177,9 @@ public:
    // Cells must have monotonic names.
    static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
        if (new_def.is_atomic()) {
-            accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
+            accept_cell(dst, kind, new_def, *old_def.type, cell.as_atomic_cell(old_def));
        } else {
-            accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
+            accept_cell(dst, kind, new_def, *old_def.type, cell.as_collection_mutation());
        }
    }
 };
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -43,12 +43,14 @@ options {
 #include "cql3/statements/create_table_statement.hh"
 #include "cql3/statements/create_view_statement.hh"
 #include "cql3/statements/create_type_statement.hh"
+#include "cql3/statements/create_function_statement.hh"
 #include "cql3/statements/drop_type_statement.hh"
 #include "cql3/statements/alter_type_statement.hh"
 #include "cql3/statements/property_definitions.hh"
 #include "cql3/statements/drop_index_statement.hh"
 #include "cql3/statements/drop_table_statement.hh"
 #include "cql3/statements/drop_view_statement.hh"
+#include "cql3/statements/drop_function_statement.hh"
 #include "cql3/statements/truncate_statement.hh"
 #include "cql3/statements/raw/update_statement.hh"
 #include "cql3/statements/raw/insert_statement.hh"
@@ -243,10 +245,14 @@ struct uninitialized {
        return res;
    }

-    bool convert_boolean_literal(std::string_view s) {
-        std::string lower_s(s.size(), '\0');
+    sstring to_lower(std::string_view s) {
+        sstring lower_s(s.size(), '\0');
        std::transform(s.cbegin(), s.cend(), lower_s.begin(), &::tolower);
-        return lower_s == "true";
+        return lower_s;
+    }
+
+    bool convert_boolean_literal(std::string_view s) {
+        return to_lower(s) == "true";
    }

    void add_raw_update(std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>,::shared_ptr<cql3::operation::raw_update>>>& operations,
@@ -348,9 +354,9 @@ cqlStatement returns [shared_ptr<raw::parsed_statement> stmt]
    | st25=createTypeStatement         { $stmt = st25; }
    | st26=alterTypeStatement          { $stmt = st26; }
    | st27=dropTypeStatement           { $stmt = st27; }
-#if 0
    | st28=createFunctionStatement     { $stmt = st28; }
    | st29=dropFunctionStatement       { $stmt = st29; }
+#if 0
    | st30=createAggregateStatement    { $stmt = st30; }
    | st31=dropAggregateStatement      { $stmt = st31; }
 #endif
@@ -524,6 +530,7 @@ usingClauseObjective[::shared_ptr<cql3::attributes::raw> attrs]
 */
 updateStatement returns [::shared_ptr<raw::update_statement> expr]
    @init {
+        bool if_exists = false;
        auto attrs = ::make_shared<cql3::attributes::raw>();
        std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, ::shared_ptr<cql3::operation::raw_update>>> operations;
    }
@@ -531,13 +538,14 @@ updateStatement returns [::shared_ptr<raw::update_statement> expr]
      ( usingClause[attrs] )?
      K_SET columnOperation[operations] (',' columnOperation[operations])*
      K_WHERE wclause=whereClause
-      ( K_IF conditions=updateConditions )?
+      ( K_IF (K_EXISTS{ if_exists = true; } | conditions=updateConditions) )?
      {
          return ::make_shared<raw::update_statement>(std::move(cf),
                                                  std::move(attrs),
                                                  std::move(operations),
                                                  std::move(wclause),
-                                                  std::move(conditions));
+                                                  std::move(conditions),
+                                                  if_exists);
     }
    ;

@@ -581,6 +589,7 @@ deleteSelection returns [std::vector<::shared_ptr<cql3::operation::raw_deletion>
 deleteOp returns [::shared_ptr<cql3::operation::raw_deletion> op]
    : c=cident                { $op = ::make_shared<cql3::operation::column_deletion>(std::move(c)); }
    | c=cident '[' t=term ']' { $op = ::make_shared<cql3::operation::element_deletion>(std::move(c), std::move(t)); }
+    | c=cident '.' field=ident { $op = ::make_shared<cql3::operation::field_deletion>(std::move(c), std::move(field)); }
    ;

 usingClauseDelete[::shared_ptr<cql3::attributes::raw> attrs]
@@ -683,54 +692,56 @@ dropAggregateStatement returns [DropAggregateStatement expr]
      )?
      { $expr = new DropAggregateStatement(fn, argsTypes, argsPresent, ifExists); }
    ;
+#endif

-createFunctionStatement returns [CreateFunctionStatement expr]
+createFunctionStatement returns [shared_ptr<cql3::statements::create_function_statement> expr]
    @init {
-        boolean orReplace = false;
-        boolean ifNotExists = false;
+        bool or_replace = false;
+        bool if_not_exists = false;

-        boolean deterministic = true;
-        List<ColumnIdentifier> argsNames = new ArrayList<>();
-        List<CQL3Type.Raw> argsTypes = new ArrayList<>();
+        std::vector<shared_ptr<cql3::column_identifier>> arg_names;
+        std::vector<shared_ptr<cql3_type::raw>> arg_types;
+        bool called_on_null_input = false;
    }
-    : K_CREATE (K_OR K_REPLACE { orReplace = true; })?
-      ((K_NON { deterministic = false; })? K_DETERMINISTIC)?
-      K_FUNCTION
-      (K_IF K_NOT K_EXISTS { ifNotExists = true; })?
+    : K_CREATE
+        // "OR REPLACE" and "IF NOT EXISTS" cannot be used together
+        ((K_OR K_REPLACE { or_replace = true; } K_FUNCTION)
+         | (K_FUNCTION K_IF K_NOT K_EXISTS { if_not_exists = true; })
+         | K_FUNCTION)
      fn=functionName
      '('
        (
-          k=ident v=comparatorType { argsNames.add(k); argsTypes.add(v); }
-          ( ',' k=ident v=comparatorType { argsNames.add(k); argsTypes.add(v); } )*
+          k=ident v=comparatorType { arg_names.push_back(k); arg_types.push_back(v); }
+          ( ',' k=ident v=comparatorType { arg_names.push_back(k); arg_types.push_back(v); } )*
        )?
      ')'
+      ( (K_RETURNS K_NULL) | (K_CALLED { called_on_null_input = true; })) K_ON K_NULL K_INPUT
      K_RETURNS rt = comparatorType
      K_LANGUAGE language = IDENT
      K_AS body = STRING_LITERAL
-      { $expr = new CreateFunctionStatement(fn, $language.text.toLowerCase(), $body.text, deterministic, argsNames, argsTypes, rt, orReplace, ifNotExists); }
+      { $expr = ::make_shared<cql3::statements::create_function_statement>(std::move(fn), to_lower($language.text), $body.text, std::move(arg_names), std::move(arg_types), std::move(rt), called_on_null_input, or_replace, if_not_exists); }
    ;

-dropFunctionStatement returns [DropFunctionStatement expr]
+dropFunctionStatement returns [shared_ptr<cql3::statements::drop_function_statement> expr]
    @init {
-        boolean ifExists = false;
-        List<CQL3Type.Raw> argsTypes = new ArrayList<>();
-        boolean argsPresent = false;
+        bool if_exists = false;
+        std::vector<shared_ptr<cql3_type::raw>> arg_types;
+        bool args_present = false;
    }
    : K_DROP K_FUNCTION
-      (K_IF K_EXISTS { ifExists = true; } )?
+      (K_IF K_EXISTS { if_exists = true; } )?
      fn=functionName
      (
        '('
          (
-            v=comparatorType { argsTypes.add(v); }
-            ( ',' v=comparatorType { argsTypes.add(v); } )*
+            v=comparatorType { arg_types.push_back(v); }
+            ( ',' v=comparatorType { arg_types.push_back(v); } )*
          )?
        ')'
-        { argsPresent = true; }
+        { args_present = true; }
      )?
-      { $expr = new DropFunctionStatement(fn, argsTypes, argsPresent, ifExists); }
+      { $expr = ::make_shared<cql3::statements::drop_function_statement>(std::move(fn), std::move(arg_types), args_present, if_exists); }
    ;
-#endif

 /**
 * CREATE KEYSPACE [IF NOT EXISTS] <KEYSPACE> WITH attr1 = value1 AND attr2 = value2;
@@ -1396,8 +1407,9 @@ columnOperation[operations_type& operations]

 columnOperationDifferentiator[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
    : '=' normalColumnOperation[operations, key]
-    | '[' k=term ']' specializedColumnOperation[operations, key, k, false]
-    | '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' specializedColumnOperation[operations, key, k, true]
+    | '[' k=term ']' collectionColumnOperation[operations, key, k, false]
+    | '.' field=ident udtColumnOperation[operations, key, field]
+    | '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' collectionColumnOperation[operations, key, k, true]
    ;

 normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
@@ -1440,31 +1452,38 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
      }
    ;

-specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
-                                                 shared_ptr<cql3::operation::raw_update>>>& operations,
-                           shared_ptr<cql3::column_identifier::raw> key,
-                           shared_ptr<cql3::term::raw> k,
-                           bool by_uuid]
-
+collectionColumnOperation[operations_type& operations,
+                          shared_ptr<cql3::column_identifier::raw> key,
+                          shared_ptr<cql3::term::raw> k,
+                          bool by_uuid]
    : '=' t=term
      {
          add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
      }
    ;

+udtColumnOperation[operations_type& operations,
+                   shared_ptr<cql3::column_identifier::raw> key,
+                   shared_ptr<cql3::column_identifier> field]
+    : '=' t=term
+      {
+          add_raw_update(operations, std::move(key), make_shared<cql3::operation::set_field>(std::move(field), std::move(t)));
+      }
+    ;
+
 columnCondition[conditions_type& conditions]
    // Note: we'll reject duplicates later
    : key=cident
-        ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, *op)); }
+        ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, {}, *op)); }
        | K_IN
-            ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(values)); }
-            | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(marker)); }
+            ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, {}, values)); }
+            | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, marker, {})); }
            )
        | '[' element=term ']'
-            ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::collection_condition(t, element, *op)); }
+            ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, element, *op)); }
            | K_IN
-                ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, values)); }
-                | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, marker)); }
+                ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, {}, values)); }
+                | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, marker, {})); }
                )
            )
        )
@@ -1732,8 +1751,8 @@ basic_unreserved_keyword returns [sstring str]
        | K_INITCOND
        | K_RETURNS
        | K_LANGUAGE
-        | K_NON
-        | K_DETERMINISTIC
+        | K_CALLED
+        | K_INPUT
        | K_JSON
        | K_CACHE
        | K_BYPASS
@@ -1872,11 +1891,11 @@ K_STYPE:       S T Y P E;
 K_FINALFUNC:   F I N A L F U N C;
 K_INITCOND:    I N I T C O N D;
 K_RETURNS:     R E T U R N S;
+K_CALLED:      C A L L E D;
+K_INPUT:       I N P U T;
 K_LANGUAGE:    L A N G U A G E;
-K_NON:         N O N;
 K_OR:          O R;
 K_REPLACE:     R E P L A C E;
-K_DETERMINISTIC: D E T E R M I N I S T I C;
 K_JSON:        J S O N;
 K_DEFAULT:     D E F A U L T;
 K_UNSET:       U N S E T;
--- a/cql3/abstract_marker.cc
+++ b/cql3/abstract_marker.cc
@@ -45,6 +45,7 @@
 #include "cql3/lists.hh"
 #include "cql3/maps.hh"
 #include "cql3/sets.hh"
+#include "cql3/user_types.hh"
 #include "types/list.hh"

 namespace cql3 {
@@ -54,7 +55,7 @@ abstract_marker::abstract_marker(int32_t bind_index, ::shared_ptr<column_specifi
    , _receiver{std::move(receiver)}
 { }

-void abstract_marker::collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
+void abstract_marker::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
    bound_names->add(_bind_index, _receiver);
 }

@@ -68,19 +69,22 @@ abstract_marker::raw::raw(int32_t bind_index)

 ::shared_ptr<term> abstract_marker::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
 {
-    auto receiver_type = ::dynamic_pointer_cast<const collection_type_impl>(receiver->type);
-    if (receiver_type == nullptr) {
-        return ::make_shared<constants::marker>(_bind_index, receiver);
+    if (receiver->type->is_collection()) {
+        if (receiver->type->get_kind() == abstract_type::kind::list) {
+            return ::make_shared<lists::marker>(_bind_index, receiver);
+        } else if (receiver->type->get_kind() == abstract_type::kind::set) {
+            return ::make_shared<sets::marker>(_bind_index, receiver);
+        } else if (receiver->type->get_kind() == abstract_type::kind::map) {
+            return ::make_shared<maps::marker>(_bind_index, receiver);
+        }
+        assert(0);
    }
-    if (receiver_type->get_kind() == abstract_type::kind::list) {
-        return ::make_shared<lists::marker>(_bind_index, receiver);
-    } else if (receiver_type->get_kind() == abstract_type::kind::set) {
-        return ::make_shared<sets::marker>(_bind_index, receiver);
-    } else if (receiver_type->get_kind() == abstract_type::kind::map) {
-        return ::make_shared<maps::marker>(_bind_index, receiver);
+
+    if (receiver->type->is_user_type()) {
+        return ::make_shared<user_types::marker>(_bind_index, receiver);
    }
-    assert(0);
-    return shared_ptr<term>();
+
+    return ::make_shared<constants::marker>(_bind_index, receiver);
 }

 assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {
--- a/cql3/abstract_marker.hh
+++ b/cql3/abstract_marker.hh
@@ -57,7 +57,7 @@ protected:
 public:
    abstract_marker(int32_t bind_index, ::shared_ptr<column_specification>&& receiver);

-    virtual void collect_marker_specification(::shared_ptr<variable_specifications> bound_names) override;
+    virtual void collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) override;

    virtual bool contains_bind_marker() const override;

--- a/cql3/attributes.cc
+++ b/cql3/attributes.cc
@@ -120,7 +120,7 @@ int32_t attributes::get_time_to_live(const query_options& options) {
    return ttl;
 }

-void attributes::collect_marker_specification(::shared_ptr<variable_specifications> bound_names) {
+void attributes::collect_marker_specification(lw_shared_ptr<variable_specifications> bound_names) {
    if (_timestamp) {
        _timestamp->collect_marker_specification(bound_names);
    }
--- a/Show More
+++ b/Show More