Merge 'Fix hang in multishard_writer' from Asias

" This series fix hang in multishard_writer when error happens. It contains - multishard_writer: Abort the queue attached to consumers when producer fails - repair: Fix hang when the writer is dead Fixes #6241 Refs: #6248 " * asias-stream_fix_multishard_writer_hang: repair: Fix hang when the writer is dead mutation_writer_test: Add test_multishard_writer_producer_aborts multishard_writer: Abort the queue attached to consumers when producer fails (cherry picked from commit 8925e00e96)
api/service: fix segfault when taking a snapshot without keyspace specified
2020-05-02 07:35:46 +03:00 · 2020-04-30 12:57:39 +03:00 · 2020-04-19 18:25:09 +03:00 · 2020-04-18 18:57:59 +03:00 · 2020-04-17 09:53:17 +03:00 · 2020-04-17 09:16:28 +03:00
733 changed files with 20047 additions and 6325 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -19,3 +19,6 @@ CMakeLists.txt.user
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
+.pytest_cache
+/expressions.tokens
+tags
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "seastar"]
 	path = seastar
-	url = ../seastar
+	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
--- a/HACKING.md
+++ b/HACKING.md
@@ -56,7 +56,7 @@ $ ./configure.py --help

 The most important option is:

- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
+- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.

 Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.

--- a/README-DPDK.md
+++ b/README-DPDK.md
@@ -1,29 +0,0 @@
-Seastar and DPDK
-================
-
-Seastar uses the Data Plane Development Kit to drive NIC hardware directly.  This
-provides an enormous performance boost.
-
-To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
-run-time parameter.  This will use the DPDK package provided as a git submodule with the
-seastar sources.
-
-To use your own self-compiled DPDK package, follow this procedure:
-
-1. Setup host to compile DPDK:
-   - Ubuntu 
-     `sudo apt-get install -y build-essential linux-image-extra-$(uname -r)` 
-2. Prepare a DPDK SDK:
-   - Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
-   - Untar it.
-   - Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
-   - For DPDK 1.7.x: edit config/common_linuxapp: 
-     - Set CONFIG_RTE_LIBRTE_PMD_BOND  to 'n'.
-     - Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
-     - Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
-   - Start the tools/setup.sh script as root.
-   - Compile a linuxapp target (option 9).
-   - Install IGB_UIO module (option 11).
-   - Bind some physical port to IGB_UIO (option 17).
-   - Configure hugepage mappings (option 14/15).
-3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.
--- a/README.md
+++ b/README.md
@@ -38,6 +38,24 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
 ./build/release/scylla --help
 ```

+## Scylla APIs and compatibility
+By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
+Thrift. There is also experimental support for the API of Amazon DynamoDB,
+but being experimental it needs to be explicitly enabled to be used. For more
+information on how to enable the experimental DynamoDB compatibility in Scylla,
+and the current limitations of this feature, see
+[Alternator](docs/alternator/alternator.md) and
+[Getting started with Alternator](docs/alternator/getting-started.md).
+
+## Documentation
+
+Documentation can be found in [./docs](./docs) and on the
+[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
+definition of what goes where, so when looking for something be sure to check
+both.
+Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
+User documentation can be found [here](https://docs.scylladb.com/).
+
 ## Building Fedora RPM

 As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:
--- a/2
+++ b/2
@@ -1,7 +1,7 @@
 #!/bin/sh

 PRODUCT=scylla
-VERSION=666.development
+VERSION=3.2.5

 if test -f version
 then
--- a/alternator-test/README.md
+++ b/alternator-test/README.md
@@ -31,3 +31,48 @@ and ~/.aws/config with the default region to use in the test:
 region = us-east-1
 ```

+## HTTPS support
+
+In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided
+with alternator\_https\_port configuration option in order to initialize a HTTPS server.
+Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate
+a key and a self-signed certificate, which is sufficient to run `--https` tests:
+
+```
+openssl genrsa 2048 > scylla.key
+openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt
+```
+
+If this pair is put into `conf/` directory, it will be enough
+to allow the alternator HTTPS server to think it's been authorized and properly certified.
+Still, boto3 library issues warnings that the certificate used for communication is self-signed,
+and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.
+
+
+## Authorization
+
+By default, boto3 prepares a properly signed Authorization header with every request.
+In order to confirm the authorization, the server recomputes the signature by using
+user credentials (user-provided username + a secret key known by the server),
+and then checks if it matches the signature from the header.
+Early alternator code did not verify signatures at all, which is also allowed by the protocol.
+A partial implementation of the authorization verification can be allowed by providing a Scylla
+configuration parameter:
+```yaml
+  alternator_enforce_authorization: true
+```
+The implementation is currently coupled with Scylla's system\_auth.roles table,
+which means that an additional step needs to be performed when setting up Scylla
+as the test environment. Tests will use the following credentials:
+Username: `alternator`
+Secret key: `secret_pass`
+
+With CQLSH, it can be achieved by executing this snipped:
+
+```bash
+cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"
+```
+
+Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`
+turned off. However, test cases from `test_authorization.py` may require this option to be turned on,
+so it's advised.
--- a/alternator-test/conftest.py
+++ b/alternator-test/conftest.py
@@ -43,6 +43,9 @@ if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):
 def pytest_addoption(parser):
    parser.addoption("--aws", action="store_true",
        help="run against AWS instead of a local Scylla installation")
+    parser.addoption("--https", action="store_true",
+        help="communicate via HTTPS protocol on port 8043 instead of HTTP when"
+            " running against a local Scylla installation")

 # "dynamodb" fixture: set up client object for communicating with the DynamoDB
 # API. Currently this chooses either Amazon's DynamoDB in the default region
@@ -59,8 +62,15 @@ def dynamodb(request):
        # requires us to specify dummy region and credential parameters,
        # otherwise the user is forced to properly configure ~/.aws even
        # for local runs.
-        return boto3.resource('dynamodb', endpoint_url='http://localhost:8000',
-            region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='whatever')
+        local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'
+        # Disable verifying in order to be able to use self-signed TLS certificates
+        verify = not request.config.getoption('https')
+        # Silencing the 'Unverified HTTPS request warning'
+        if request.config.getoption('https'):
+            import urllib3
+            urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
+        return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,
+            region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')

 # "test_table" fixture: Create and return a temporary table to be used in tests
 # that need a table to work on. The table is automatically deleted at the end.
--- a/alternator-test/test_authorization.py
+++ b/alternator-test/test_authorization.py
@@ -0,0 +1,74 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for authorization
+
+import pytest
+import botocore
+from botocore.exceptions import ClientError
+import boto3
+import requests
+
+# Test that trying to perform an operation signed with a wrong key
+# will not succeed
+def test_wrong_key_access(request, dynamodb):
+    print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
+    url = dynamodb.meta.client._endpoint.host
+    with pytest.raises(ClientError, match='UnrecognizedClientException'):
+        if url.endswith('.amazonaws.com'):
+            boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()
+        else:
+            verify = not url.startswith('https')
+            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()
+
+# A similar test, but this time the user is expected to exist in the database (for local tests)
+def test_wrong_password(request, dynamodb):
+    print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
+    url = dynamodb.meta.client._endpoint.host
+    with pytest.raises(ClientError, match='UnrecognizedClientException'):
+        if url.endswith('.amazonaws.com'):
+            boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()
+        else:
+            verify = not url.startswith('https')
+            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()
+
+# A test ensuring that expired signatures are not accepted
+def test_expired_signature(dynamodb, test_table):
+    url = dynamodb.meta.client._endpoint.host
+    print(url)
+    headers = {'Content-Type': 'application/x-amz-json-1.0',
+               'X-Amz-Date': '20170101T010101Z',
+               'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
+               'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
+    }
+    response = requests.post(url, headers=headers)
+    assert not response.ok
+    assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
+
+# A test ensuring that signatures that exceed current time too much are not accepted.
+# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.
+def test_signature_too_futuristic(dynamodb, test_table):
+    url = dynamodb.meta.client._endpoint.host
+    print(url)
+    headers = {'Content-Type': 'application/x-amz-json-1.0',
+               'X-Amz-Date': '30200101T010101Z',
+               'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
+               'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
+    }
+    response = requests.post(url, headers=headers)
+    assert not response.ok
+    assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text
--- a/alternator-test/test_describe_endpoints.py
+++ b/alternator-test/test_describe_endpoints.py
@@ -22,7 +22,7 @@ import boto3
 # Test that the DescribeEndpoints operation works as expected: that it
 # returns one endpoint (it may return more, but it never does this in
 # Amazon), and this endpoint can be used to make more requests.
-def test_describe_endpoints(dynamodb):
+def test_describe_endpoints(request, dynamodb):
    endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']
    # It is not strictly necessary that only a single endpoint be returned,
    # but this is what Amazon DynamoDB does today (and so does Alternator).
@@ -34,14 +34,16 @@ def test_describe_endpoints(dynamodb):
        # send it another describe_endpoints() request ;-) Note that the
        # address does not include the "http://" or "https://" prefix, and
        # we need to choose one manually.
-        url = "http://" + address
+        prefix = "https://" if request.config.getoption('https') else "http://"
+        verify = not request.config.getoption('https')
+        url = prefix + address
        if address.endswith('.amazonaws.com'):
-            boto3.client('dynamodb',endpoint_url=url).describe_endpoints()
+            boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()
        else:
            # Even though we connect to the local installation, Boto3 still
            # requires us to specify dummy region and credential parameters,
            # otherwise the user is forced to properly configure ~/.aws even
            # for local runs.
-            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='whatever').describe_endpoints()
+            boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()
        # Nothing to check here - if the above call failed with an exception,
        # the test would fail.
--- a/alternator-test/test_expected.py
+++ b/alternator-test/test_expected.py
@@ -56,6 +56,7 @@ def test_update_expression_and_expected(test_table_s):
 # and the "false" case, where the condition evaluates to false, so the update
 # doesn't happen and we get a ConditionalCheckFailedException instead.

+# Tests for Expected with ComparisonOperator = "EQ":
 def test_update_expected_1_eq_true(test_table_s):
    p = random_string()
    test_table_s.update_item(Key={'p': p},
@@ -125,6 +126,576 @@ def test_update_expected_1_eq_false(test_table_s):
        )
    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1}

+# Tests for Expected with ComparisonOperator = "NE":
+def test_update_expected_1_ne_true(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'}})
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'NE',
+                        'AttributeValueList': [2]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 3}
+    # For NE, AttributeValueList must have a single element
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NE',
+                            'AttributeValueList': [2, 3]}}
+        )
+    # If the types are different, this is considered "not equal":
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'b': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'NE',
+                        'AttributeValueList': ["1"]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 4}
+    # If the attribute does not exist at all, this is also considered "not equal":
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'b': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'q': {'ComparisonOperator': 'NE',
+                        'AttributeValueList': [1]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 5}
+
+def test_update_expected_1_ne_false(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'}})
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NE',
+                            'AttributeValueList': [1]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "LE":
+@pytest.mark.xfail(reason="ComparisonOperator=LE in Expected not yet implemented")
+def test_update_expected_1_le(test_table_s):
+    p = random_string()
+    # LE should work for string, number, and binary type
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'LE',
+                        'AttributeValueList': [2]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'LE',
+                        'AttributeValueList': [1]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'LE',
+                        'AttributeValueList': ['dog']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'LE',
+                        'AttributeValueList': [bytearray('dog', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LE',
+                            'AttributeValueList': [0]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'LE',
+                            'AttributeValueList': ['aardvark']}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'LE',
+                            'AttributeValueList': [bytearray('aardvark', 'utf-8')]}}
+        )
+    # If the types are different, this is also considered false
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LE',
+                            'AttributeValueList': ["1"]}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # For LE, AttributeValueList must have a single element
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LE',
+                            'AttributeValueList': [2, 3]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "LT":
+def test_update_expected_1_lt(test_table_s):
+    p = random_string()
+    # LT should work for string, number, and binary type
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'LT',
+                        'AttributeValueList': [2]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'LT',
+                        'AttributeValueList': ['dog']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'LT',
+                        'AttributeValueList': [bytearray('dog', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LT',
+                            'AttributeValueList': [1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LT',
+                            'AttributeValueList': [0]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'LT',
+                            'AttributeValueList': ['aardvark']}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'LT',
+                            'AttributeValueList': [bytearray('aardvark', 'utf-8')]}}
+        )
+    # If the types are different, this is also considered false
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LT',
+                            'AttributeValueList': ["1"]}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # For LT, AttributeValueList must have a single element
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'LT',
+                            'AttributeValueList': [2, 3]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "GE":
+@pytest.mark.xfail(reason="ComparisonOperator=GE in Expected not yet implemented")
+def test_update_expected_1_ge(test_table_s):
+    p = random_string()
+    # GE should work for string, number, and binary type
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'GE',
+                        'AttributeValueList': [0]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'GE',
+                        'AttributeValueList': [1]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'GE',
+                        'AttributeValueList': ['aardvark']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'GE',
+                        'AttributeValueList': [bytearray('aardvark', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GE',
+                            'AttributeValueList': [3]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'GE',
+                            'AttributeValueList': ['dog']}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'GE',
+                            'AttributeValueList': [bytearray('dog', 'utf-8')]}}
+        )
+    # If the types are different, this is also considered false
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GE',
+                            'AttributeValueList': ["1"]}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # For GE, AttributeValueList must have a single element
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GE',
+                            'AttributeValueList': [2, 3]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "GT":
+def test_update_expected_1_gt(test_table_s):
+    p = random_string()
+    # GT should work for string, number, and binary type
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'GT',
+                        'AttributeValueList': [0]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'GT',
+                        'AttributeValueList': ['aardvark']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'GT',
+                        'AttributeValueList': [bytearray('aardvark', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GT',
+                            'AttributeValueList': [3]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GT',
+                            'AttributeValueList': [1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'GT',
+                            'AttributeValueList': ['dog']}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'GT',
+                            'AttributeValueList': [bytearray('dog', 'utf-8')]}}
+        )
+    # If the types are different, this is also considered false
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GT',
+                            'AttributeValueList': ["1"]}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # For GE, AttributeValueList must have a single element
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'GT',
+                            'AttributeValueList': [2, 3]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "NOT_NULL":
+def test_update_expected_1_not_null(test_table_s):
+    # Note that despite its name, the "NOT_NULL" comparison operator doesn't check if
+    # the attribute has the type "NULL", or an empty value. Rather it is explicitly
+    # documented to check if the attribute exists at all.
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': None, 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'NOT_NULL', 'AttributeValueList': []}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'NOT_NULL', 'AttributeValueList': []}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'NOT_NULL', 'AttributeValueList': []}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'q': {'ComparisonOperator': 'NOT_NULL', 'AttributeValueList': []}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    # For NOT_NULL, AttributeValueList must be empty
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NOT_NULL', 'AttributeValueList': [2]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "NULL":
+def test_update_expected_1_null(test_table_s):
+    # Note that despite its name, the "NULL" comparison operator doesn't check if
+    # the attribute has the type "NULL", or an empty value. Rather it is explicitly
+    # documented to check if the attribute exists at all.
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 1, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': None, 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'q': {'ComparisonOperator': 'NULL', 'AttributeValueList': []}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NULL', 'AttributeValueList': []}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'NULL', 'AttributeValueList': []}}
+        )
+        assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'NULL', 'AttributeValueList': []}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    # For NULL, AttributeValueList must be empty
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NULL', 'AttributeValueList': [2]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "CONTAINS":
+@pytest.mark.xfail(reason="ComparisonOperator=CONTAINS in Expected not yet implemented")
+def test_update_expected_1_contains(test_table_s):
+    # true cases. CONTAINS can be used for two unrelated things: check substrings
+    # (in string or binary) and membership (in set or list).
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 'hello', 'Action': 'PUT'},
+                          'b': {'Value': set([2, 4, 7]), 'Action': 'PUT'},
+                          'c': {'Value': [2, 4, 7], 'Action': 'PUT'},
+                          'd': {'Value': bytearray('hi there', 'utf-8'), 'Action': 'PUT'}})
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': ['ell']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [4]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    # The CONTAINS documentation uses confusing wording on whether it works
+    # only on sets, or also on lists. In fact, it does work on lists:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [4]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'d': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [bytearray('here', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': ['dog']}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'q': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'d': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [bytearray('dog', 'utf-8')]}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    # For CONTAINS, AttributeValueList must have just one item, and it must be
+    # a string, number or binary
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [2, 3]}}
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': []}}
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'CONTAINS', 'AttributeValueList': [[1]]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "NOT_CONTAINS":
+@pytest.mark.xfail(reason="ComparisonOperator=NOT_CONTAINS in Expected not yet implemented")
+def test_update_expected_1_not_contains(test_table_s):
+    # true cases. NOT_CONTAINS can be used for two unrelated things: check substrings
+    # (in string or binary) and membership (in set or list).
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 'hello', 'Action': 'PUT'},
+                          'b': {'Value': set([2, 4, 7]), 'Action': 'PUT'},
+                          'c': {'Value': [2, 4, 7], 'Action': 'PUT'},
+                          'd': {'Value': bytearray('hi there', 'utf-8'), 'Action': 'PUT'}})
+
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': ['dog']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [1]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [1]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [1]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 7, 'Action': 'PUT'}},
+        Expected={'d': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [bytearray('dog', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 7
+
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': ['ell']}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'b': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [4]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [4]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+            Expected={'d': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [bytearray('here', 'utf-8')]}}
+        )
+    # Surprisingly, if an attribute does not exist at all, NOT_CONTAINS
+    # fails, rather than succeeding. This is surprising because it means in
+    # this case both CONTAINS and NOT_CONTAINS are false, and because "NE" does not
+    # behave this way (if the attribute does not exist, NE succeeds).
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'q': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [1]}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 7
+    # For NOT_CONTAINS, AttributeValueList must have just one item, and it must be
+    # a string, number or binary
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [2, 3]}}
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': []}}
+        )
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 17, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'NOT_CONTAINS', 'AttributeValueList': [[1]]}}
+        )
+
+# Tests for Expected with ComparisonOperator = "BEGINS_WITH":
 def test_update_expected_1_begins_with_true(test_table_s):
    p = random_string()
    test_table_s.update_item(Key={'p': p},
@@ -136,38 +707,163 @@ def test_update_expected_1_begins_with_true(test_table_s):
                        'AttributeValueList': ['hell']}}
    )
    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 3}
-    # For BEGIN_WITH, AttributeValueList must have a single element
+    # For BEGINS_WITH, AttributeValueList must have a single element
    with pytest.raises(ClientError, match='ValidationException'):
        test_table_s.update_item(Key={'p': p},
            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
-            Expected={'a': {'ComparisonOperator': 'EQ',
+            Expected={'a': {'ComparisonOperator': 'BEGINS_WITH',
                            'AttributeValueList': ['hell', 'heaven']}}
        )

 def test_update_expected_1_begins_with_false(test_table_s):
    p = random_string()
    test_table_s.update_item(Key={'p': p},
-        AttributeUpdates={'a': {'Value': 'hello', 'Action': 'PUT'}})
+        AttributeUpdates={'a': {'Value': 'hello', 'Action': 'PUT'},
+                          'x': {'Value': 3, 'Action': 'PUT'}})
    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
        test_table_s.update_item(Key={'p': p},
            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
-            Expected={'a': {'ComparisonOperator': 'EQ',
+            Expected={'a': {'ComparisonOperator': 'BEGINS_WITH',
                            'AttributeValueList': ['dog']}}
        )
-    # Although BEGINS_WITH requires String or Binary type, giving it a
-    # number results not with a ValidationException but rather a
-    # failed condition (ConditionalCheckFailedException)
+    # BEGINS_WITH requires String or Binary operand, giving it a number
+    # results with a ValidationException (not a normal failed condition):
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BEGINS_WITH',
+                            'AttributeValueList': [3]}}
+        )
+    # However, if we try to compare the attribute to a String or Binary, and
+    # the attribute value itself is a number, this is just a failed condition:
    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
        test_table_s.update_item(Key={'p': p},
            AttributeUpdates={'b': {'Value': 3, 'Action': 'PUT'}},
-            Expected={'a': {'ComparisonOperator': 'EQ',
-                            'AttributeValueList': [3]}}
+            Expected={'x': {'ComparisonOperator': 'BEGINS_WITH',
+                            'AttributeValueList': ['dog']}}
        )
-    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'x': 3}

-# FIXME: need to test many more ComparisonOperator options... See full list in
-# description in https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.Expected.html
+# Tests for Expected with ComparisonOperator = "IN":
+def test_update_expected_1_in(test_table_s):
+    # Some copies of "IN"'s documentation are outright wrong: "IN" checks
+    # whether the attribute value is in the give list of values. It does NOT
+    # do the opposite - testing whether certain items are in a set attribute.
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': set([2, 4, 7]), 'Action': 'PUT'},
+                          'c': {'Value': 3, 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'IN', 'AttributeValueList': [2, 3, 8]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'IN', 'AttributeValueList': [1, 2, 4]}}
+        )
+    # a bunch of wrong interpretations of what the heck that "IN" does :-(
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'IN', 'AttributeValueList': [2]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'IN', 'AttributeValueList': [1, 2, 4, 7, 8]}}
+        )
+    # Strangely, all the items in AttributeValueList must be of the same type,
+    # we can't check if an item is either the number 3 or the string 'dog',
+    # although allowing this case as well would have been easy:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'IN', 'AttributeValueList': [3, 'dog']}}
+        )
+    # Empty list is not allowed
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'IN', 'AttributeValueList': []}}
+        )
+    # Non-scalar attribute values are not allowed
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+            Expected={'c': {'ComparisonOperator': 'IN', 'AttributeValueList': [[1], [2]]}}
+        )

+# Tests for Expected with ComparisonOperator = "BETWEEN":
+@pytest.mark.xfail(reason="ComparisonOperator=BETWEEN in Expected not yet implemented")
+def test_update_expected_1_between(test_table_s):
+    p = random_string()
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'a': {'Value': 2, 'Action': 'PUT'},
+                          'b': {'Value': 'cat', 'Action': 'PUT'},
+                          'c': {'Value': bytearray('cat', 'utf-8'), 'Action': 'PUT'}})
+    # true cases:
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [1, 3]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 2
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 3, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [1, 2]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 3
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 4, 'Action': 'PUT'}},
+        Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [2, 3]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 4
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 5, 'Action': 'PUT'}},
+        Expected={'b': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': ['aardvark', 'dog']}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 5
+    test_table_s.update_item(Key={'p': p},
+        AttributeUpdates={'z': {'Value': 6, 'Action': 'PUT'}},
+        Expected={'c': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [bytearray('aardvark', 'utf-8'), bytearray('dog', 'utf-8')]}}
+    )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 6
+    # false cases:
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [0, 1]}}
+        )
+    with pytest.raises(ClientError, match='ConditionalCheckFailedException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': ['cat', 'dog']}}
+        )
+    assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['z'] == 6
+    # The given AttributeValueList array must contain exactly two items of the
+    # same type, and in the right order. Any other input is considered a validation
+    # error:
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': []}})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [2, 3, 4]}})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [4, 3]}})
+    with pytest.raises(ClientError, match='ValidationException'):
+        test_table_s.update_item(Key={'p': p},
+            AttributeUpdates={'z': {'Value': 2, 'Action': 'PUT'}},
+            Expected={'a': {'ComparisonOperator': 'BETWEEN', 'AttributeValueList': [4, 'dog']}})
+
+##############################################################################
 # Instead of ComparisonOperator and AttributeValueList, one can specify either
 # Value or Exists:
 def test_update_expected_1_value_true(test_table_s):
--- a/alternator-test/test_health.py
+++ b/alternator-test/test_health.py
@@ -0,0 +1,34 @@
+# Copyright 2019 ScyllaDB
+#
+# This file is part of Scylla.
+#
+# Scylla is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Scylla is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+
+# Tests for the health check
+
+import requests
+
+# Test that a health check can be performed with a GET packet
+def test_health_works(dynamodb):
+    url = dynamodb.meta.client._endpoint.host
+    response = requests.get(url)
+    assert response.ok
+    assert response.content.decode('utf-8').strip()  == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))
+
+# Test that a health check only works for the root URL ('/')
+def test_health_only_works_for_root_path(dynamodb):
+    url = dynamodb.meta.client._endpoint.host
+    for suffix in ['/abc', '/..', '/-', '/index.htm', '/health']:
+        response = requests.get(url + suffix)
+        assert response.status_code in range(400, 405)
--- a/alternator-test/test_update_expression.py
+++ b/alternator-test/test_update_expression.py
@@ -584,7 +584,7 @@ def test_update_expression_if_not_exists(test_table_s):
 # value may itself be a function call - ad infinitum. So expressions like
 # list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper
 # nesting.
-@pytest.mark.xfail(reason="SET functions not yet implemented")
+@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")
 def test_update_expression_function_nesting(test_table_s):
    p = random_string()
    test_table_s.update_item(Key={'p': p},
--- a/alternator/auth.cc
+++ b/alternator/auth.cc
@@ -0,0 +1,146 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "alternator/error.hh"
+#include "log.hh"
+#include <string>
+#include <string_view>
+#include <gnutls/crypto.h>
+#include <seastar/util/defer.hh>
+#include "hashers.hh"
+#include "bytes.hh"
+#include "alternator/auth.hh"
+#include <fmt/format.h>
+#include "auth/common.hh"
+#include "auth/password_authenticator.hh"
+#include "auth/roles-metadata.hh"
+#include "cql3/query_processor.hh"
+#include "cql3/untyped_result_set.hh"
+
+namespace alternator {
+
+static logging::logger alogger("alternator-auth");
+
+static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
+    hmac_sha256_digest digest;
+    int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
+    if (ret) {
+        throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
+    }
+    return digest;
+}
+
+static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
+    auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
+    auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
+    auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
+    auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
+    return signing;
+}
+
+static std::string apply_sha256(std::string_view msg) {
+    sha256_hasher hasher;
+    hasher.update(msg.data(), msg.size());
+    return to_hex(hasher.finalize());
+}
+
+static std::string format_time_point(db_clock::time_point tp) {
+    time_t time_point_repr = db_clock::to_time_t(tp);
+    std::string time_point_str;
+    time_point_str.resize(17);
+    // strftime prints the terminating null character as well
+    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", std::gmtime(&time_point_repr));
+    time_point_str.resize(16);
+    return time_point_str;
+}
+
+void check_expiry(std::string_view signature_date) {
+    //FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
+    std::string expiration_str = format_time_point(db_clock::now() - 15min);
+    std::string validity_str = format_time_point(db_clock::now() + 15min);
+    if (signature_date < expiration_str) {
+        throw api_error("InvalidSignatureException",
+                fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
+                signature_date, expiration_str));
+    }
+    if (signature_date > validity_str) {
+        throw api_error("InvalidSignatureException",
+                fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
+                signature_date, validity_str));
+    }
+}
+
+std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
+        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
+        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
+    auto amz_date_it = signed_headers_map.find("x-amz-date");
+    if (amz_date_it == signed_headers_map.end()) {
+        throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
+    }
+    std::string_view amz_date = amz_date_it->second;
+    check_expiry(amz_date);
+    std::string_view datestamp = amz_date.substr(0, 8);
+    if (datestamp != orig_datestamp) {
+        throw api_error("InvalidSignatureException",
+                format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
+                        orig_datestamp, datestamp));
+    }
+    std::string_view canonical_uri = "/";
+
+    std::stringstream canonical_headers;
+    for (const auto& header : signed_headers_map) {
+        canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
+    }
+
+    std::string payload_hash = apply_sha256(body_content);
+    std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
+
+    std::string_view algorithm = "AWS4-HMAC-SHA256";
+    std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
+    std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope,  apply_sha256(canonical_request));
+
+    hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
+    hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
+
+    return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
+}
+
+future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
+    static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
+            auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
+
+    auto cl = auth::password_authenticator::consistency_for_user(username);
+    auto timeout = auth::internal_distributed_timeout_config();
+    return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
+        auto res = f.get0();
+        auto salted_hash = std::optional<sstring>();
+        if (res->empty()) {
+            throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
+        }
+        salted_hash = res->one().get_opt<sstring>("salted_hash");
+        if (!salted_hash) {
+            throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
+        }
+        return make_ready_future<std::string>(*salted_hash);
+    });
+}
+
+}
--- a/alternator/auth.hh
+++ b/alternator/auth.hh
@@ -0,0 +1,46 @@
+/*
+ * Copyright 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <string>
+#include <string_view>
+#include <array>
+#include "gc_clock.hh"
+#include "utils/loading_cache.hh"
+
+namespace cql3 {
+class query_processor;
+}
+
+namespace alternator {
+
+using hmac_sha256_digest = std::array<char, 32>;
+
+using key_cache = utils::loading_cache<std::string, std::string>;
+
+std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
+        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
+        std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
+
+future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
+
+}
--- a/alternator/base64.hh
+++ b/alternator/base64.hh
@@ -21,8 +21,14 @@

 #pragma once

-#include <string>
+#include <string_view>
 #include "bytes.hh"
+#include "rjson.hh"

 std::string base64_encode(bytes_view);
+
 bytes base64_decode(std::string_view);
+
+inline bytes base64_decode(const rjson::value& v) {
+  return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
+}
--- a/alternator/conditions.cc
+++ b/alternator/conditions.cc
@@ -27,6 +27,8 @@
 #include "cql3/constants.hh"
 #include <unordered_map>
 #include "rjson.hh"
+#include "serialization.hh"
+#include "base64.hh"

 namespace alternator {

@@ -35,13 +37,17 @@ static logging::logger clogger("alternator-conditions");
 comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
    static std::unordered_map<std::string, comparison_operator_type> ops = {
            {"EQ", comparison_operator_type::EQ},
+            {"NE", comparison_operator_type::NE},
            {"LE", comparison_operator_type::LE},
            {"LT", comparison_operator_type::LT},
            {"GE", comparison_operator_type::GE},
            {"GT", comparison_operator_type::GT},
+            {"IN", comparison_operator_type::IN},
+            {"NULL", comparison_operator_type::IS_NULL},
+            {"NOT_NULL", comparison_operator_type::NOT_NULL},
            {"BETWEEN", comparison_operator_type::BETWEEN},
            {"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
-    }; //TODO(sarna): NE, IN, CONTAINS, NULL, NOT_NULL
+    }; //TODO: CONTAINS
    if (!comparison_operator.IsString()) {
        throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
    }
@@ -96,25 +102,76 @@ static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_
    return filtering_restrictions;
 }

+namespace {
+
+struct size_check {
+    // True iff size passes this check.
+    virtual bool operator()(rapidjson::SizeType size) const = 0;
+    // Check description, such that format("expected array {}", check.what()) is human-readable.
+    virtual sstring what() const = 0;
+};
+
+class exact_size : public size_check {
+    rapidjson::SizeType _expected;
+  public:
+    explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
+    bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
+    sstring what() const override { return format("of size {}", _expected); }
+};
+
+struct empty : public size_check {
+    bool operator()(rapidjson::SizeType size) const override { return size < 1; }
+    sstring what() const override { return "to be empty"; }
+};
+
+struct nonempty : public size_check {
+    bool operator()(rapidjson::SizeType size) const override { return size > 0; }
+    sstring what() const override { return "to be non-empty"; }
+};
+
+} // anonymous namespace
+
+// Check that array has the expected number of elements
+static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
+    if (!array || !array->IsArray()) {
+        throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
+    }
+    if (!expected(array->Size())) {
+        throw api_error("ValidationException",
+                        format("{} operator requires AttributeValueList {}, instead found list size {}",
+                               op, expected.what(), array->Size()));
+    }
+}
+
 // Check if two JSON-encoded values match with the EQ relation
-static bool check_EQ(const rjson::value& v1, const rjson::value& v2) {
-    return v1 == v2;
+static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
+    return v1 && *v1 == v2;
+}
+
+// Check if two JSON-encoded values match with the NE relation
+static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
+    return !v1 || *v1 != v2; // null is unequal to anything.
 }

 // Check if two JSON-encoded values match with the BEGINS_WITH relation
-static bool check_BEGINS_WITH(const rjson::value& v1, const rjson::value& v2) {
-    // BEGINS_WITH only supports comparing two strings or two binaries -
-    // any other combinations of types, or other malformed values, return
-    // false (no match).
-    if (!v1.IsObject() || v1.MemberCount() != 1 || !v2.IsObject() || v2.MemberCount() != 1) {
-        return false;
+static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
+    // BEGINS_WITH requires that its single operand (v2) be a string or
+    // binary - otherwise it's a validation error. However, problems with
+    // the stored attribute (v1) will just return false (no match).
+    if (!v2.IsObject() || v2.MemberCount() != 1) {
+        throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
    }
-    auto it1 = v1.MemberBegin();
    auto it2 = v2.MemberBegin();
-    if (it1->name != it2->name) {
+    if (it2->name != "S" && it2->name != "B") {
+        throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
+    }
+
+
+    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
        return false;
    }
-    if (it1->name != "S" && it1->name != "B") {
+    auto it1 = v1->MemberBegin();
+    if (it1->name != it2->name) {
        return false;
    }
    std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
@@ -122,6 +179,88 @@ static bool check_BEGINS_WITH(const rjson::value& v1, const rjson::value& v2) {
    return val1.substr(0, val2.size()) == val2;
 }

+// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
+static bool check_IN(const rjson::value* val, const rjson::value& array) {
+    if (!array[0].IsObject() || array[0].MemberCount() != 1) {
+        throw api_error("ValidationException",
+                        format("IN operator encountered malformed AttributeValue: {}", array[0]));
+    }
+    const auto& type = array[0].MemberBegin()->name;
+    if (type != "S" && type != "N" && type != "B") {
+        throw api_error("ValidationException",
+                        "IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
+    }
+    if (!val) {
+        return false;
+    }
+    bool have_match = false;
+    for (const auto& elem : array.GetArray()) {
+        if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
+            throw api_error("ValidationException",
+                            "IN operator requires all AttributeValueList elements to have the same type ");
+        }
+        if (!have_match && *val == elem) {
+            // Can't return yet, must check types of all array elements. <sigh>
+            have_match = true;
+        }
+    }
+    return have_match;
+}
+
+static bool check_NULL(const rjson::value* val) {
+    return val == nullptr;
+}
+
+static bool check_NOT_NULL(const rjson::value* val) {
+    return val != nullptr;
+}
+
+// Check if two JSON-encoded values match with cmp.
+template <typename Comparator>
+bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
+    if (!v2.IsObject() || v2.MemberCount() != 1) {
+        throw api_error("ValidationException",
+                        format("{} requires a single AttributeValue of type String, Number, or Binary",
+                               cmp.diagnostic()));
+    }
+    const auto& kv2 = *v2.MemberBegin();
+    if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
+        throw api_error("ValidationException",
+                        format("{} requires a single AttributeValue of type String, Number, or Binary",
+                               cmp.diagnostic()));
+    }
+    if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
+        return false;
+    }
+    const auto& kv1 = *v1->MemberBegin();
+    if (kv1.name != kv2.name) {
+        return false;
+    }
+    if (kv1.name == "N") {
+        return cmp(unwrap_number(*v1, cmp.diagnostic()), unwrap_number(v2, cmp.diagnostic()));
+    }
+    if (kv1.name == "S") {
+        return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
+                   std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
+    }
+    if (kv1.name == "B") {
+        return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
+    }
+    clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
+    return false;
+}
+
+struct cmp_lt {
+    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
+    const char* diagnostic() const { return "LT operator"; }
+};
+
+struct cmp_gt {
+    // bytes only has <
+    template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
+    const char* diagnostic() const { return "GT operator"; }
+};
+
 // Verify one Expect condition on one attribute (whose content is "got")
 // for the verify_expected() below.
 // This function returns true or false depending on whether the condition
@@ -142,7 +281,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
        if (comparison_operator) {
            throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
        }
-        return got && check_EQ(*got, *value);
+        return check_EQ(got, *value);
    } else if (exists) {
        if (comparison_operator) {
            throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
@@ -156,29 +295,32 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
        if (!comparison_operator) {
            throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
        }
-        if (!attribute_value_list || !attribute_value_list->IsArray()) {
-            throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
-        }
        comparison_operator_type op = get_comparison_operator(*comparison_operator);
        switch (op) {
        case comparison_operator_type::EQ:
-            if (attribute_value_list->Size() != 1) {
-                throw api_error("ValidationException", "EQ operator requires one element in AttributeValueList");
-            }
-            if (got) {
-                const rjson::value& expected = (*attribute_value_list)[0];
-                return check_EQ(*got, expected);
-            }
-            return false;
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_EQ(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::NE:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_NE(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::LT:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
+        case comparison_operator_type::GT:
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
        case comparison_operator_type::BEGINS_WITH:
-            if (attribute_value_list->Size() != 1) {
-                throw api_error("ValidationException", "BEGINS_WITH operator requires one element in AttributeValueList");
-            }
-            if (got) {
-                const rjson::value& expected = (*attribute_value_list)[0];
-                return check_BEGINS_WITH(*got, expected);
-            }
-            return false;
+            verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
+            return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
+        case comparison_operator_type::IN:
+            verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
+            return check_IN(got, *attribute_value_list);
+        case comparison_operator_type::IS_NULL:
+            verify_operand_count(attribute_value_list, empty(), *comparison_operator);
+            return check_NULL(got);
+        case comparison_operator_type::NOT_NULL:
+            verify_operand_count(attribute_value_list, empty(), *comparison_operator);
+            return check_NOT_NULL(got);
        default:
            // FIXME: implement all the missing types, so there will be no default here.
            throw api_error("ValidationException", format("ComparisonOperator {} is not yet supported", *comparison_operator));
--- a/alternator/executor.cc
+++ b/alternator/executor.cc
@@ -49,6 +49,7 @@
 #include "utils/big_decimal.hh"
 #include "seastar/json/json_elements.hh"
 #include <boost/algorithm/cxx11/any_of.hpp>
+#include "collection_mutation.hh"

 #include <boost/range/adaptors.hpp>

@@ -606,8 +607,8 @@ public:
    void del(bytes&& name, api::timestamp_type ts) {
        add(std::move(name), atomic_cell::make_dead(ts, gc_clock::now()));
    }
-    collection_type_impl::mutation to_mut() {
-        collection_type_impl::mutation ret;
+    collection_mutation_description to_mut() {
+        collection_mutation_description ret;
        for (auto&& e : collected) {
            ret.cells.emplace_back(e.first, std::move(e.second));
        }
@@ -643,7 +644,7 @@ static mutation make_item_mutation(const rjson::value& item, schema_ptr schema)
    }

    if (!attrs_collector.empty()) {
-        auto serialized_map = attrs_type()->serialize_mutation_form(attrs_collector.to_mut());
+        auto serialized_map = attrs_collector.to_mut().serialize(*attrs_type());
        row.cells().apply(attrs_column(*schema), std::move(serialized_map));
    }
    // To allow creation of an item with no attributes, we need a row marker.
@@ -667,6 +668,7 @@ static db::timeout_clock::time_point default_timeout() {

 static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
        service::storage_proxy& proxy,
+        service::client_state& client_state,
        schema_ptr schema,
        const rjson::value& item,
        bool need_read_before_write,
@@ -690,7 +692,7 @@ future<json::json_return_type> executor::put_item(client_state& client_state, st

    mutation m = make_item_mutation(item, schema);

-    return maybe_get_previous_item(_proxy, schema, item, has_expected, _stats).then(
+    return maybe_get_previous_item(_proxy, client_state, schema, item, has_expected, _stats).then(
            [this, schema, has_expected,  update_info = rjson::copy(update_info), m = std::move(m),
             &client_state, start_time] (std::unique_ptr<rjson::value> previous_item) mutable {
        if (has_expected) {
@@ -742,7 +744,7 @@ future<json::json_return_type> executor::delete_item(client_state& client_state,
    mutation m = make_delete_item_mutation(key, schema);
    check_key(key, schema);

-    return maybe_get_previous_item(_proxy, schema, key, has_expected, _stats).then(
+    return maybe_get_previous_item(_proxy, client_state, schema, key, has_expected, _stats).then(
            [this, schema, has_expected,  update_info = rjson::copy(update_info), m = std::move(m),
             &client_state, start_time] (std::unique_ptr<rjson::value> previous_item) mutable {
        if (has_expected) {
@@ -1039,31 +1041,11 @@ static rjson::value set_diff(const rjson::value& v1, const rjson::value& v2) {
    return ret;
 }

-// Check if a given JSON object encodes a number (i.e., it is a {"N": [...]}
-// and returns an object representing it.
-static big_decimal unwrap_number(const rjson::value& v) {
-    if (!v.IsObject() || v.MemberCount() != 1) {
-        throw api_error("ValidationException", "UpdateExpression: invalid number object");
-    }
-    auto it = v.MemberBegin();
-    if (it->name != "N") {
-        throw api_error("ValidationException",
-                format("UpdateExpression: expected number, found type '{}'", it->name));
-    }
-    if (it->value.IsNumber()) {
-        return big_decimal(rjson::print(it->value)); // FIXME(sarna): should use big_decimal constructor with numeric values directly
-    }
-    if (!it->value.IsString()) {
-        throw api_error("ValidationException", "UpdateExpression: improperly formatted number constant");
-    }
-    return big_decimal(it->value.GetString());
-}
-
 // Take two JSON-encoded numeric values ({"N": "thenumber"}) and return the
 // sum, again as a JSON-encoded number.
 static rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {
-    auto n1 = unwrap_number(v1);
-    auto n2 = unwrap_number(v2);
+    auto n1 = unwrap_number(v1, "UpdateExpression");
+    auto n2 = unwrap_number(v2, "UpdateExpression");
    rjson::value ret = rjson::empty_object();
    std::string str_ret = std::string((n1 + n2).to_string());
    rjson::set(ret, "N", rjson::from_string(str_ret));
@@ -1071,8 +1053,8 @@ static rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {
 }

 static rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2) {
-    auto n1 = unwrap_number(v1);
-    auto n2 = unwrap_number(v2);
+    auto n1 = unwrap_number(v1, "UpdateExpression");
+    auto n2 = unwrap_number(v2, "UpdateExpression");
    rjson::value ret = rjson::empty_object();
    std::string str_ret = std::string((n1 - n2).to_string());
    rjson::set(ret, "N", rjson::from_string(str_ret));
@@ -1334,6 +1316,7 @@ static bool check_needs_read_before_write(const std::vector<parsed::update_expre
 // It should be overridden once we can leverage a consensus protocol.
 static future<std::unique_ptr<rjson::value>> do_get_previous_item(
        service::storage_proxy& proxy,
+        service::client_state& client_state,
        schema_ptr schema,
        const partition_key& pk,
        const clustering_key& ck,
@@ -1358,7 +1341,7 @@ static future<std::unique_ptr<rjson::value>> do_get_previous_item(

    auto cl = db::consistency_level::LOCAL_QUORUM;

-    return proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit())).then(
+    return proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
            [schema, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) {
        auto previous_item = describe_item(schema, partition_slice, *selection, std::move(qr.query_result), {});
        return make_ready_future<std::unique_ptr<rjson::value>>(std::make_unique<rjson::value>(std::move(previous_item)));
@@ -1367,6 +1350,7 @@ static future<std::unique_ptr<rjson::value>> do_get_previous_item(

 static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
        service::storage_proxy& proxy,
+        service::client_state& client_state,
        schema_ptr schema,
        const partition_key& pk,
        const clustering_key& ck,
@@ -1380,11 +1364,12 @@ static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
    if (!needs_read_before_write) {
        return make_ready_future<std::unique_ptr<rjson::value>>();
    }
-    return do_get_previous_item(proxy, std::move(schema), pk, ck, stats);
+    return do_get_previous_item(proxy, client_state, std::move(schema), pk, ck, stats);
 }

 static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
        service::storage_proxy& proxy,
+        service::client_state& client_state,
        schema_ptr schema,
        const rjson::value& item,
        bool needs_read_before_write,
@@ -1395,7 +1380,7 @@ static future<std::unique_ptr<rjson::value>> maybe_get_previous_item(
    }
    partition_key pk = pk_from_json(item, schema);
    clustering_key ck = ck_from_json(item, schema);
-    return do_get_previous_item(proxy, std::move(schema), pk, ck, stats);
+    return do_get_previous_item(proxy, client_state, std::move(schema), pk, ck, stats);
 }


@@ -1453,7 +1438,7 @@ future<json::json_return_type> executor::update_item(client_state& client_state,
        attribute_updates = update_info["AttributeUpdates"];
    }

-    return maybe_get_previous_item(_proxy, schema, pk, ck, has_update_expression, expression, has_expected, _stats).then(
+    return maybe_get_previous_item(_proxy, client_state, schema, pk, ck, has_update_expression, expression, has_expected, _stats).then(
            [this, schema, expression = std::move(expression), has_update_expression, ck = std::move(ck), has_expected,
             update_info = rjson::copy(update_info), m = std::move(m), attrs_collector = std::move(attrs_collector),
             attribute_updates = rjson::copy(attribute_updates), ts, &client_state, start_time] (std::unique_ptr<rjson::value> previous_item) mutable {
@@ -1578,7 +1563,7 @@ future<json::json_return_type> executor::update_item(client_state& client_state,
            }
        }
        if (!attrs_collector.empty()) {
-            auto serialized_map = attrs_type()->serialize_mutation_form(attrs_collector.to_mut());
+            auto serialized_map = attrs_collector.to_mut().serialize(*attrs_type());
            row.cells().apply(attrs_column(*schema), std::move(serialized_map));
        }
        // To allow creation of an item with no attributes, we need a row marker.
@@ -1650,7 +1635,7 @@ future<json::json_return_type> executor::get_item(client_state& client_state, st

    auto attrs_to_get = calculate_attrs_to_get(table_info);

-    return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit())).then(
+    return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
            [this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), attrs_to_get = std::move(attrs_to_get), start_time = std::move(start_time)] (service::storage_proxy::coordinator_query_result qr) mutable {
        _stats.api_operations.get_item_latency.add(std::chrono::steady_clock::now() - start_time, _stats.api_operations.get_item_latency._count + 1);
        return make_ready_future<json::json_return_type>(make_jsonable(describe_item(schema, partition_slice, *selection, std::move(qr.query_result), std::move(attrs_to_get))));
@@ -1713,7 +1698,7 @@ future<json::json_return_type> executor::batch_get_item(client_state& client_sta
            auto selection = cql3::selection::selection::wildcard(rs.schema);
            auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());
            auto command = ::make_lw_shared<query::read_command>(rs.schema->id(), rs.schema->version(), partition_slice, query::max_partitions);
-            future<std::tuple<std::string, std::optional<rjson::value>>> f = _proxy.query(rs.schema, std::move(command), std::move(partition_ranges), rs.cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit())).then(
+            future<std::tuple<std::string, std::optional<rjson::value>>> f = _proxy.query(rs.schema, std::move(command), std::move(partition_ranges), rs.cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
                    [schema = rs.schema, partition_slice = std::move(partition_slice), selection = std::move(selection), attrs_to_get = rs.attrs_to_get] (service::storage_proxy::coordinator_query_result qr) mutable {
                std::optional<rjson::value> json = describe_single_item(schema, partition_slice, *selection, std::move(qr.query_result), std::move(attrs_to_get));
                return make_ready_future<std::tuple<std::string, std::optional<rjson::value>>>(
--- a/alternator/serialization.cc
+++ b/alternator/serialization.cc
@@ -67,7 +67,7 @@ struct from_json_visitor {
        bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
    }
    void operator()(const bytes_type_impl& t) const {
-        bo.write(base64_decode(std::string_view(v.GetString(), v.GetStringLength())));
+        bo.write(base64_decode(v));
    }
    void operator()(const boolean_type_impl& t) const {
        bo.write(boolean_type->decompose(v.GetBool()));
@@ -177,7 +177,7 @@ bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column
                        expected_type, column.name_as_text(), it->name.GetString()));
    }
    if (column.type == bytes_type) {
-        return base64_decode(it->value.GetString());
+        return base64_decode(it->value);
    } else {
        return column.type->from_string(it->value.GetString());
    }
@@ -227,4 +227,22 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
    return clustering_key::from_exploded(raw_ck);
 }

+big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
+    if (!v.IsObject() || v.MemberCount() != 1) {
+        throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
+    }
+    auto it = v.MemberBegin();
+    if (it->name != "N") {
+        throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
+    }
+    if (it->value.IsNumber()) {
+         // FIXME(sarna): should use big_decimal constructor with numeric values directly:
+        return big_decimal(rjson::print(it->value));
+    }
+    if (!it->value.IsString()) {
+        throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
+    }
+    return big_decimal(it->value.GetString());
+}
+
 }
--- a/alternator/serialization.hh
+++ b/alternator/serialization.hh
@@ -22,10 +22,12 @@
 #pragma once

 #include <string>
+#include <string_view>
 #include "types.hh"
 #include "schema.hh"
 #include "keys.hh"
 #include "rjson.hh"
+#include "utils/big_decimal.hh"

 namespace alternator {

@@ -58,4 +60,7 @@ rjson::value json_key_column_value(bytes_view cell, const column_definition& col
 partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
 clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);

+// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it.  Otherwise,
+// raises ValidationException with diagnostic.
+big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
 }
--- a/alternator/server.cc
+++ b/alternator/server.cc
@@ -24,10 +24,11 @@
 #include <seastar/http/function_handlers.hh>
 #include <seastar/json/json_elements.hh>
 #include <seastarx.hh>
-#include <boost/algorithm/string/split.hpp>
-#include <boost/algorithm/string/classification.hpp>
 #include "error.hh"
 #include "rjson.hh"
+#include "auth.hh"
+#include <cctype>
+#include "cql3/query_processor.hh"

 static logging::logger slogger("alternator-server");

@@ -37,12 +38,23 @@ namespace alternator {

 static constexpr auto TARGET = "X-Amz-Target";

-inline std::vector<sstring> split(const sstring& text, const char* separator) {
+inline std::vector<std::string_view> split(std::string_view text, char separator) {
+    std::vector<std::string_view> tokens;
    if (text == "") {
-        return std::vector<sstring>();
+        return tokens;
    }
-    std::vector<sstring> tokens;
-    return boost::split(tokens, text, boost::is_any_of(separator));
+
+    while (true) {
+        auto pos = text.find_first_of(separator);
+        if (pos != std::string_view::npos) {
+            tokens.emplace_back(text.data(), pos);
+            text.remove_prefix(pos + 1);
+        } else {
+            tokens.emplace_back(text);
+            break;
+        }
+    }
+    return tokens;
 }

 // DynamoDB HTTP error responses are structured as follows
@@ -107,9 +119,140 @@ protected:
    sstring _type;
 };

+class health_handler : public handler_base {
+    virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
+        rep->set_status(reply::status_type::ok);
+        rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
+        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
+    }
+};
+
+future<> server::verify_signature(const request& req) {
+    if (!_enforce_authorization) {
+        slogger.debug("Skipping authorization");
+        return make_ready_future<>();
+    }
+    auto host_it = req._headers.find("Host");
+    if (host_it == req._headers.end()) {
+        throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
+    }
+    auto authorization_it = req._headers.find("Authorization");
+    if (host_it == req._headers.end()) {
+        throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
+    }
+    std::string host = host_it->second;
+    std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
+    std::string credential;
+    std::string user_signature;
+    std::string signed_headers_str;
+    std::vector<std::string_view> signed_headers;
+    for (std::string_view entry : credentials_raw) {
+        std::vector<std::string_view> entry_split = split(entry, '=');
+        if (entry_split.size() != 2) {
+            if (entry != "AWS4-HMAC-SHA256") {
+                throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
+            }
+            continue;
+        }
+        std::string_view auth_value = entry_split[1];
+        // Commas appear as an additional (quite redundant) delimiter
+        if (auth_value.back() == ',') {
+            auth_value.remove_suffix(1);
+        }
+        if (entry_split[0] == "Credential") {
+            credential = std::string(auth_value);
+        } else if (entry_split[0] == "Signature") {
+            user_signature = std::string(auth_value);
+        } else if (entry_split[0] == "SignedHeaders") {
+            signed_headers_str = std::string(auth_value);
+            signed_headers = split(auth_value, ';');
+            std::sort(signed_headers.begin(), signed_headers.end());
+        }
+    }
+    std::vector<std::string_view> credential_split = split(credential, '/');
+    if (credential_split.size() != 5) {
+        throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
+    }
+    std::string user(credential_split[0]);
+    std::string datestamp(credential_split[1]);
+    std::string region(credential_split[2]);
+    std::string service(credential_split[3]);
+
+    std::map<std::string_view, std::string_view> signed_headers_map;
+    for (const auto& header : signed_headers) {
+        signed_headers_map.emplace(header, std::string_view());
+    }
+    for (auto& header : req._headers) {
+        std::string header_str;
+        header_str.resize(header.first.size());
+        std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
+        auto it = signed_headers_map.find(header_str);
+        if (it != signed_headers_map.end()) {
+            it->second = std::string_view(header.second);
+        }
+    }
+
+    auto cache_getter = [] (std::string username) {
+        return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
+    };
+    return _key_cache.get_ptr(user, cache_getter).then([this, &req,
+                                                    user = std::move(user),
+                                                    host = std::move(host),
+                                                    datestamp = std::move(datestamp),
+                                                    signed_headers_str = std::move(signed_headers_str),
+                                                    signed_headers_map = std::move(signed_headers_map),
+                                                    region = std::move(region),
+                                                    service = std::move(service),
+                                                    user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
+        std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
+                datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
+
+        if (signature != std::string_view(user_signature)) {
+            _key_cache.remove(user);
+            throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
+        }
+    });
+}
+
+future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
+    sstring target = req->get_header(TARGET);
+    std::vector<std::string_view> split_target = split(target, '.');
+    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
+    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
+    slogger.trace("Request: {} {}", op, req->content);
+    return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
+        auto callback_it = _callbacks.find(op);
+        if (callback_it == _callbacks.end()) {
+            _executor.local()._stats.unsupported_operations++;
+            throw api_error("UnknownOperationException",
+                    format("Unsupported operation {}", op));
+        }
+        //FIXME: Client state can provide more context, e.g. client's endpoint address
+        // We use unique_ptr because client_state cannot be moved or copied
+        return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
+            client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
+            executor::maybe_trace_query(*client_state, op, req->content);
+            tracing::trace(client_state->get_trace_state(), op);
+            return callback_it->second(_executor.local(), *client_state, std::move(req));
+        });
+    });
+}
+
 void server::set_routes(routes& r) {
-    using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, std::unique_ptr<request>)>;
-    std::unordered_map<std::string, alternator_callback> routes{
+    api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
+        return handle_api_request(std::move(req));
+    });
+
+    r.add(operation_type::POST, url("/"), req_handler);
+    r.add(operation_type::GET, url("/"), new health_handler);
+}
+
+//FIXME: A way to immediately invalidate the cache should be considered,
+// e.g. when the system table which stores the keys is changed.
+// For now, this propagation may take up to 1 minute.
+server::server(seastar::sharded<executor>& e)
+        : _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
+      , _callbacks{
        {"CreateTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) {
            return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req)] { return e.create_table(client_state, req->content); }); }
        },
@@ -125,46 +268,42 @@ void server::set_routes(routes& r) {
        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, req->content); }},
        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, req->content); }},
        {"Query", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.query(client_state, req->content); }},
-    };
+    } {
+}

-    api_handler* handler = new api_handler([this, routes = std::move(routes)](std::unique_ptr<request> req) -> future<json::json_return_type> {
-        _executor.local()._stats.total_operations++;
-        sstring target = req->get_header(TARGET);
-        std::vector<sstring> split_target = split(target, ".");
-        //NOTICE(sarna): Target consists of Dynamo API version folllowed by a dot '.' and operation type (e.g. CreateTable)
-        sstring op = split_target.empty() ? sstring() : split_target.back();
-        slogger.trace("Request: {} {}", op, req->content);
-        auto callback_it = routes.find(op);
-        if (callback_it == routes.end()) {
-            _executor.local()._stats.unsupported_operations++;
-            throw api_error("UnknownOperationException",
-                    format("Unsupported operation {}", op));
+future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {
+    _enforce_authorization = enforce_authorization;
+    if (!port && !https_port) {
+        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
+                " must be specified in order to init an alternator HTTP server instance"));
+    }
+    return seastar::async([this, addr, port, https_port, creds] {
+        try {
+            _executor.invoke_on_all([] (executor& e) {
+                return e.start();
+            }).get();
+
+            if (port) {
+                _control.start().get();
+                _control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
+                _control.listen(socket_address{addr, *port}).get();
+                slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);
+            }
+            if (https_port) {
+                _https_control.start().get();
+                _https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
+                _https_control.server().invoke_on_all([creds] (http_server& serv) {
+                    return serv.set_tls_credentials(creds->build_server_credentials());
+                }).get();
+
+                _https_control.listen(socket_address{addr, *https_port}).get();
+                slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
+            }
+        } catch (...) {
+            slogger.warn("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
+                    addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
+            throw;
        }
-        //FIXME: Client state can provide more context, e.g. client's endpoint address
-        return do_with(executor::client_state::for_internal_calls(), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (executor::client_state& client_state) mutable {
-            client_state.set_raw_keyspace(executor::KEYSPACE_NAME);
-            executor::maybe_trace_query(client_state, op, req->content);
-            tracing::trace(client_state.get_trace_state(), op);
-            return callback_it->second(_executor.local(), client_state, std::move(req));
-        });
-    });
-
-    r.add(operation_type::POST, url("/"), handler);
-}
-
-future<> server::init(net::inet_address addr, uint16_t port) {
-    return _executor.invoke_on_all([] (executor& e) {
-        return e.start();
-    }).then([this] {
-        return _control.start();
-    }).then([this] {
-        return _control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1));
-    }).then([this, addr, port] {
-        return _control.listen(socket_address{addr, port});
-    }).then([addr, port] {
-        slogger.info("Alternator HTTP server listening on {} port {}", addr, port);
-    }).handle_exception([addr, port] (std::exception_ptr e) {
-        slogger.warn("Failed to set up Alternator HTTP server on {} port {}: {}", addr, port, e);
    });
 }

--- a/alternator/server.hh
+++ b/alternator/server.hh
@@ -24,18 +24,30 @@
 #include "alternator/executor.hh"
 #include <seastar/core/future.hh>
 #include <seastar/http/httpd.hh>
+#include <seastar/net/tls.hh>
+#include <optional>
+#include <alternator/auth.hh>

 namespace alternator {

 class server {
-    seastar::httpd::http_server_control _control;
-    seastar::sharded<executor>& _executor;
-public:
-    server(seastar::sharded<executor>& executor) : _executor(executor) {}
+    using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, std::unique_ptr<request>)>;
+    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

-    seastar::future<> init(net::inet_address addr, uint16_t port);
+    seastar::httpd::http_server_control _control;
+    seastar::httpd::http_server_control _https_control;
+    seastar::sharded<executor>& _executor;
+    key_cache _key_cache;
+    bool _enforce_authorization;
+    alternator_callbacks_map _callbacks;
+public:
+    server(seastar::sharded<executor>& executor);
+
+    seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);
 private:
    void set_routes(seastar::httpd::routes& r);
+    future<> verify_signature(const seastar::httpd::request& r);
+    future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);
 };

 }
--- a/api/api-doc/storage_proxy.json
+++ b/api/api-doc/storage_proxy.json
@@ -671,21 +671,6 @@
        }
      ]
    },
-    {
-      "path": "/storage_proxy/metrics/cas_read/condition_not_met",
-      "operations": [
-        {
-          "method": "GET",
-          "summary": "Get cas read metrics",
-          "type": "int",
-          "nickname": "get_cas_read_metrics_condition_not_met",
-          "produces": [
-            "application/json"
-          ],
-          "parameters": []
-        }
-      ]
-    },
    {
      "path": "/storage_proxy/metrics/read/timeouts",
      "operations": [
--- a/api/column_family.cc
+++ b/api/column_family.cc
@@ -26,7 +26,7 @@
 #include "sstables/sstables.hh"
 #include "utils/estimated_histogram.hh"
 #include <algorithm>
-
+#include "db/system_keyspace_view_types.hh"
 #include "db/data_listeners.hh"

 extern logging::logger apilog;
@@ -53,8 +53,7 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
    return std::make_tuple(name.substr(0, pos), name.substr(end));
 }

-const utils::UUID& get_uuid(const sstring& name, const database& db) {
-    auto [ks, cf] = parse_fully_qualified_cf_name(name);
+const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
    try {
        return db.find_uuid(ks, cf);
    } catch (std::out_of_range& e) {
@@ -62,6 +61,11 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
    }
 }

+const utils::UUID& get_uuid(const sstring& name, const database& db) {
+    auto [ks, cf] = parse_fully_qualified_cf_name(name);
+    return get_uuid(ks, cf, db);
+}
+
 future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
    auto uuid = get_uuid(name, ctx.db.local());

@@ -71,28 +75,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,
-        int64_t column_family::stats::*f) {
+        int64_t column_family_stats::*f) {
    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
        return cf.get_stats().*f;
    }, std::plus<int64_t>());
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx,
-        int64_t column_family::stats::*f) {
+        int64_t column_family_stats::*f) {
    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
        return cf.get_stats().*f;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
        return (cf.get_stats().*f).hist.count;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    auto uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([uuid, f](database& db) {
        // Histograms information is sample of the actual load
@@ -108,14 +112,14 @@ static future<json::json_return_type>  get_cf_stats_sum(http_context& ctx, const


 static future<json::json_return_type>  get_cf_stats_count(http_context& ctx,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
        return (cf.get_stats().*f).hist.count;
    }, std::plus<int64_t>());
 }

 static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    utils::UUID uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([f, uuid](const database& p) {
        return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -126,7 +130,7 @@ static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const
    });
 }

-static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    std::function<utils::ihistogram(const database&)> fun = [f] (const database& db)  {
        utils::ihistogram res;
        for (auto i : db.get_column_families()) {
@@ -142,7 +146,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
 }

 static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
-        utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+        utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    utils::UUID uuid = get_uuid(name, ctx.db.local());
    return ctx.db.map_reduce0([f, uuid](const database& p) {
        return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -153,7 +157,7 @@ static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& c
    });
 }

-static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
+static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
    std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db)  {
        utils::rate_moving_average_and_histogram res;
        for (auto i : db.get_column_families()) {
@@ -250,12 +254,11 @@ class sum_ratio {
    uint64_t _n = 0;
    T _total = 0;
 public:
-    future<> operator()(T value) {
+    void operator()(T value) {
        if (value > 0) {
            _total += value;
            _n++;
        }
-        return make_ready_future<>();
    }
    // Returns average value of all registered ratios.
    T get() && {
@@ -404,11 +407,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
+        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
    });

    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
+        return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
    });

    // FIXME: this refers to partitions, not rows.
@@ -453,67 +456,67 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
+        return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
    });

    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::pending_flushes);
+        return get_cf_stats(ctx, &column_family_stats::pending_flushes);
    });

    cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
+        return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
    });

    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx, &column_family::stats::reads);
+        return get_cf_stats_count(ctx, &column_family_stats::reads);
    });

    cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
+        return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
    });

    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_count(ctx, &column_family::stats::writes);
+        return get_cf_stats_count(ctx, &column_family_stats::writes);
    });

    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
    });

    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
+        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
    });

    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
+        return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
    });

    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
+        return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
    });

    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, &column_family::stats::writes);
+        return get_cf_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
+        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
    });

    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
+        return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
    });

    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, &column_family::stats::writes);
+        return get_cf_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
+        return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
    });

    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -529,11 +532,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
+        return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
    });

    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
+        return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
    });

    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -792,25 +795,25 @@ void set_column_family(http_context& ctx, routes& r) {

    });

-    cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
+            return cf.get_stats().estimated_cas_prepare;
+        },
+        utils::estimated_histogram_merge, utils_json::estimated_histogram());
    });

-    cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
+            return cf.get_stats().estimated_cas_propose;
+        },
+        utils::estimated_histogram_merge, utils_json::estimated_histogram());
    });

-    cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        //auto id = get_uuid(req->param["name"], ctx.db.local());
-        return make_ready_future<json::json_return_type>(0);
+    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
+        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
+            return cf.get_stats().estimated_cas_commit;
+        },
+        utils::estimated_histogram_merge, utils_json::estimated_histogram());
    });

    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -821,11 +824,11 @@ void set_column_family(http_context& ctx, routes& r) {
    });

    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
    });

    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
-        return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
+        return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
    });

    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -843,13 +846,28 @@ void set_column_family(http_context& ctx, routes& r) {
        return true;
    });

-    cf::get_built_indexes.set(r, [](const_req) {
-        // FIXME
-        // Currently there are no index support
-        return std::vector<sstring>();
+    cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
+        auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
+        return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
+            std::set<sstring> vp;
+            for (auto b : vb) {
+                if (b.view.first == ks) {
+                    vp.insert(b.view.second);
+                }
+            }
+            std::vector<sstring> res;
+            auto uuid = get_uuid(ks, cf_name, ctx.db.local());
+            column_family& cf = ctx.db.local().find_column_family(uuid);
+            res.reserve(cf.get_index_manager().list_indexes().size());
+            for (auto&& i : cf.get_index_manager().list_indexes()) {
+                if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
+                    res.emplace_back(i.metadata().name());
+                }
+            }
+            return make_ready_future<json::json_return_type>(res);
+        });
    });

-
    cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
        // FIXME
        // Currently there are no information on the compression
--- a/api/column_family.hh
+++ b/api/column_family.hh
@@ -109,9 +109,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
 }

 future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,
-        int64_t column_family::stats::*f);
+        int64_t column_family_stats::*f);

 future<json::json_return_type>  get_cf_stats(http_context& ctx,
-        int64_t column_family::stats::*f);
+        int64_t column_family_stats::*f);

 }
--- a/api/compaction_manager.cc
+++ b/api/compaction_manager.cc
@@ -74,13 +74,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {

    cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
        return ctx.db.map_reduce0([&ctx](database& db) {
-            std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> tasks;
-            return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
-                table& cf = *i.second.get();
-                tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
-                return make_ready_future<>();
-            }).then([&tasks] {
-                return tasks;
+            return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
+                return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
+                    table& cf = *i.second.get();
+                    tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
+                    return make_ready_future<>();
+                }).then([&tasks] {
+                    return std::move(tasks);
+                });
            });
        }, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
                [](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {
--- a/api/storage_proxy.cc
+++ b/api/storage_proxy.cc
@@ -81,12 +81,9 @@ void set_storage_proxy(http_context& ctx, routes& r) {
        return make_ready_future<json::json_return_type>(0);
    });

-    sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {
-        //TBD
-        // FIXME
-        // hinted handoff is not supported currently,
-        // so we should return false
-        return make_ready_future<json::json_return_type>(false);
+    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req)  {
+        auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
+        return make_ready_future<json::json_return_type>(enabled);
    });

    sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {
@@ -250,68 +247,40 @@ void set_storage_proxy(http_context& ctx, routes& r) {
        });
    });

-    sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
    });

-    sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
    });

-    sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
    });

-    sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return 0
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
    });

-    sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
    });

-    sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
    });

-    sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
    });

-    sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
    });

-    sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
-    });
-
-    sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
-        //TBD
-        unimplemented();
-        return make_ready_future<json::json_return_type>(0);
+    sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
+        return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
    });

    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -382,19 +351,11 @@ void set_storage_proxy(http_context& ctx, routes& r) {
        return sum_timer_stats(ctx.sp, &proxy::stats::write);
    });
    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return empty moving average
-
-        return make_ready_future<json::json_return_type>(get_empty_moving_average());
+        return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
    });

    sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
-        //TBD
-        // FIXME
-        // cas is not supported yet, so just return empty moving average
-
-        return make_ready_future<json::json_return_type>(get_empty_moving_average());
+        return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
    });

    sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
--- a/api/storage_service.cc
+++ b/api/storage_service.cc
@@ -192,7 +192,7 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
+        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
    });

    ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
@@ -254,6 +254,9 @@ void set_storage_service(http_context& ctx, routes& r) {
        if (column_family.empty()) {
            resp = service::get_local_storage_service().take_snapshot(tag, keynames);
        } else {
+            if (keynames.empty()) {
+                throw httpd::bad_param_exception("The keyspace of column families must be specified");
+            }
            if (keynames.size() > 1) {
                throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
            }
@@ -304,17 +307,24 @@ void set_storage_service(http_context& ctx, routes& r) {
        if (column_families.empty()) {
            column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
        }
-        return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
-            std::vector<column_family*> column_families_vec;
-            auto& cm = db.get_compaction_manager();
-            for (auto cf : column_families) {
-                column_families_vec.push_back(&db.find_column_family(keyspace, cf));
+        return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
+                column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
+            if (!is_cleanup_allowed) {
+                return make_exception_future<json::json_return_type>(
+                        std::runtime_error("Can not perform cleanup operation when topology changes"));
            }
-            return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
-                return cm.perform_cleanup(cf);
+            return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
+                std::vector<column_family*> column_families_vec;
+                auto& cm = db.get_compaction_manager();
+                for (auto cf : column_families) {
+                    column_families_vec.push_back(&db.find_column_family(keyspace, cf));
+                }
+                return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
+                    return cm.perform_cleanup(cf);
+                });
+            }).then([]{
+                return make_ready_future<json::json_return_type>(0);
            });
-        }).then([]{
-            return make_ready_future<json::json_return_type>(0);
        });
    });

@@ -860,7 +870,7 @@ void set_storage_service(http_context& ctx, routes& r) {
    });

    ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
-        return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
+        return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
    });

    ss::get_exceptions.set(r, [](const_req req) {
--- a/atomic_cell.cc
+++ b/atomic_cell.cc
@@ -22,7 +22,6 @@
 #include "atomic_cell.hh"
 #include "atomic_cell_or_collection.hh"
 #include "types.hh"
-#include "types/collection.hh"

 /// LSA mirator for cells with irrelevant type
 ///
@@ -148,35 +147,6 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,
 {
 }

-static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
-{
-    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
-    auto ti = data::type_info::make_collection();
-    data::cell::context ctx(f, ti);
-    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
-    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
-    return collection_mutation_view { dv };
-}
-
-collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
-    return get_collection_mutation_view(_data.get());
-}
-
-collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
-    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
-{
-}
-
-collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
-    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
-{
-}
-
-collection_mutation::operator collection_mutation_view() const
-{
-    return get_collection_mutation_view(_data.get());
-}
-
 bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
 {
    auto ptr_a = _data.get();
@@ -231,7 +201,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
    size_t external_value_size = 0;
    if (flags.get<data::cell::tags::external_data>()) {
        if (flags.get<data::cell::tags::collection>()) {
-            external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
+            external_value_size = as_collection_mutation().data.size_bytes();
        } else {
            auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
            external_value_size = cell_view.value_size();
--- a/atomic_cell.hh
+++ b/atomic_cell.hh
@@ -221,30 +221,6 @@ public:
    friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
 };

-class collection_mutation_view;
-
-// Represents a mutation of a collection.  Actual format is determined by collection type,
-// and is:
-//   set:  list of atomic_cell
-//   map:  list of pair<atomic_cell, bytes> (for key/value)
-//   list: tbd, probably ugly
-class collection_mutation {
-public:
-    using imr_object_type =  imr::utils::object<data::cell::structure>;
-    imr_object_type _data;
-
-    collection_mutation() {}
-    collection_mutation(const collection_type_impl&, collection_mutation_view v);
-    collection_mutation(const collection_type_impl&, bytes_view bv);
-    operator collection_mutation_view() const;
-};
-
-
-class collection_mutation_view {
-public:
-    atomic_cell_value_view data;
-};
-
 class column_definition;

 int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);
--- a/atomic_cell_hash.hh
+++ b/atomic_cell_hash.hh
@@ -34,14 +34,12 @@ template<>
 struct appending_hash<collection_mutation_view> {
    template<typename Hasher>
    void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
-      cell.data.with_linearized([&] (bytes_view cell_bv) {
-        auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
-        auto m_view = ctype->deserialize_mutation_form(cell_bv);
-        ::feed_hash(h, m_view.tomb);
-        for (auto&& key_and_value : m_view.cells) {
-            ::feed_hash(h, key_and_value.first);
-            ::feed_hash(h, key_and_value.second, cdef);
-        }
+        cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
+            ::feed_hash(h, m_view.tomb);
+            for (auto&& key_and_value : m_view.cells) {
+                ::feed_hash(h, key_and_value.first);
+                ::feed_hash(h, key_and_value.second, cdef);
+            }
      });
    }
 };
--- a/atomic_cell_or_collection.hh
+++ b/atomic_cell_or_collection.hh
@@ -22,6 +22,7 @@
 #pragma once

 #include "atomic_cell.hh"
+#include "collection_mutation.hh"
 #include "schema.hh"
 #include "hashing.hh"

--- a/auth/service.cc
+++ b/auth/service.cc
@@ -77,17 +77,23 @@ private:
    void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}

    void on_drop_keyspace(const sstring& ks_name) override {
-        _authorizer.revoke_all(
+        // Do it in the background.
+        (void)_authorizer.revoke_all(
                auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
            // Nothing.
+        }).handle_exception([] (std::exception_ptr e) {
+            log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);
        });
    }

    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
-        _authorizer.revoke_all(
+        // Do it in the background.
+        (void)_authorizer.revoke_all(
                auth::make_data_resource(
                        ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
            // Nothing.
+        }).handle_exception([] (std::exception_ptr e) {
+            log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);
        });
    }

--- a/auth/standard_role_manager.cc
+++ b/auth/standard_role_manager.cc
@@ -101,8 +101,8 @@ static future<std::optional<record>> find_record(cql3::query_processor& qp, std:
        return std::make_optional(
                record{
                        row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
-                        row.get_as<bool>("is_superuser"),
-                        row.get_as<bool>("can_login"),
+                        row.get_or<bool>("is_superuser", false),
+                        row.get_or<bool>("can_login", false),
                        (row.has("member_of")
                                 ? row.get_set<sstring>("member_of")
                                 : role_set())});
@@ -203,7 +203,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
            internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
        return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
            role_config config;
-            config.is_superuser = row.get_as<bool>("super");
+            config.is_superuser = row.get_or<bool>("super", false);
            config.can_login = true;

            return do_with(
--- a/cache_flat_mutation_reader.hh
+++ b/cache_flat_mutation_reader.hh
@@ -61,6 +61,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
        // - _last_row points at a direct predecessor of the next row which is going to be read.
        //   Used for populating continuity.
        // - _population_range_starts_before_all_rows is set accordingly
+        // - _underlying is engaged and fast-forwarded
        reading_from_underlying,

        end_of_stream
@@ -99,7 +100,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
    // forward progress is not guaranteed in case iterators are getting constantly invalidated.
    bool _lower_bound_changed = false;

+    // Points to the underlying reader conforming to _schema,
+    // either to *_underlying_holder or _read_context->underlying().underlying().
+    flat_mutation_reader* _underlying = nullptr;
+    std::optional<flat_mutation_reader> _underlying_holder;
+
    future<> do_fill_buffer(db::timeout_clock::time_point);
+    future<> ensure_underlying(db::timeout_clock::time_point);
    void copy_from_cache_to_buffer();
    future<> process_static_row(db::timeout_clock::time_point);
    void move_to_end();
@@ -186,23 +193,22 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
        return make_ready_future<>();
    } else {
        _read_context->cache().on_row_miss();
-        return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
-            if (sr) {
-                assert(sr->is_static_row());
-                maybe_add_to_cache(sr->as_static_row());
-                push_mutation_fragment(std::move(*sr));
-            }
-            maybe_set_static_row_continuous();
+        return ensure_underlying(timeout).then([this, timeout] {
+            return (*_underlying)(timeout).then([this] (mutation_fragment_opt&& sr) {
+                if (sr) {
+                    assert(sr->is_static_row());
+                    maybe_add_to_cache(sr->as_static_row());
+                    push_mutation_fragment(std::move(*sr));
+                }
+                maybe_set_static_row_continuous();
+            });
        });
    }
 }

 inline
 void cache_flat_mutation_reader::touch_partition() {
-    if (_snp->at_latest_version()) {
-        rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
-        _snp->tracker()->touch(last_dummy);
-    }
+    _snp->touch();
 }

 inline
@@ -232,14 +238,36 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
    });
 }

+inline
+future<> cache_flat_mutation_reader::ensure_underlying(db::timeout_clock::time_point timeout) {
+    if (_underlying) {
+        return make_ready_future<>();
+    }
+    return _read_context->ensure_underlying(timeout).then([this, timeout] {
+        flat_mutation_reader& ctx_underlying = _read_context->underlying().underlying();
+        if (ctx_underlying.schema() != _schema) {
+            _underlying_holder = make_delegating_reader(ctx_underlying);
+            _underlying_holder->upgrade_schema(_schema);
+            _underlying = &*_underlying_holder;
+        } else {
+            _underlying = &ctx_underlying;
+        }
+    });
+}
+
 inline
 future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
    if (_state == state::move_to_underlying) {
+        if (!_underlying) {
+            return ensure_underlying(timeout).then([this, timeout] {
+                return do_fill_buffer(timeout);
+            });
+        }
        _state = state::reading_from_underlying;
        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
        auto end = _next_row_in_range ? position_in_partition(_next_row.position())
                                      : position_in_partition(_upper_bound);
-        return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
+        return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
            return read_from_underlying(timeout);
        });
    }
@@ -280,7 +308,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin

 inline
 future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
-    return consume_mutation_fragments_until(_read_context->underlying().underlying(),
+    return consume_mutation_fragments_until(*_underlying,
        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },
        [this] (mutation_fragment mf) {
            _read_context->cache().on_row_miss();
--- a/canonical_mutation.cc
+++ b/canonical_mutation.cc
@@ -79,7 +79,8 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {

    if (version == m.schema()->version()) {
        auto partition_view = mutation_partition_view::from_view(mv.partition());
-        m.partition().apply(*m.schema(), partition_view, *m.schema());
+        mutation_application_stats app_stats;
+        m.partition().apply(*m.schema(), partition_view, *m.schema(), app_stats);
    } else {
        column_mapping cm = mv.mapping();
        converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
--- a/cdc/cdc.cc
+++ b/cdc/cdc.cc
@@ -0,0 +1,604 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <utility>
+#include <algorithm>
+
+#include <seastar/util/defer.hh>
+#include <seastar/core/thread.hh>
+
+#include "cdc/cdc.hh"
+#include "bytes.hh"
+#include "database.hh"
+#include "db/config.hh"
+#include "dht/murmur3_partitioner.hh"
+#include "partition_slice_builder.hh"
+#include "schema.hh"
+#include "schema_builder.hh"
+#include "service/migration_manager.hh"
+#include "service/storage_service.hh"
+#include "types/tuple.hh"
+#include "cql3/statements/select_statement.hh"
+#include "cql3/multi_column_relation.hh"
+#include "cql3/tuples.hh"
+#include "log.hh"
+
+using locator::snitch_ptr;
+using locator::token_metadata;
+using locator::topology;
+using seastar::sstring;
+using service::migration_manager;
+using service::storage_proxy;
+
+namespace std {
+
+template<> struct hash<std::pair<net::inet_address, unsigned int>> {
+    std::size_t operator()(const std::pair<net::inet_address, unsigned int> &p) const {
+        return std::hash<net::inet_address>{}(p.first) ^ std::hash<int>{}(p.second);
+    }
+};
+
+}
+
+using namespace std::chrono_literals;
+
+static logging::logger cdc_log("cdc");
+
+namespace cdc {
+
+using operation_native_type = std::underlying_type_t<operation>;
+using column_op_native_type = std::underlying_type_t<column_op>;
+
+sstring log_name(const sstring& table_name) {
+    static constexpr auto cdc_log_suffix = "_scylla_cdc_log";
+    return table_name + cdc_log_suffix;
+}
+
+sstring desc_name(const sstring& table_name) {
+    static constexpr auto cdc_desc_suffix = "_scylla_cdc_desc";
+    return table_name + cdc_desc_suffix;
+}
+
+static future<>
+remove_log(db_context ctx, const sstring& ks_name, const sstring& table_name) {
+    try {
+        return ctx._migration_manager.announce_column_family_drop(
+                ks_name, log_name(table_name), false);
+    } catch (exceptions::configuration_exception& e) {
+        // It's fine if the table does not exist.
+        return make_ready_future<>();
+    } catch (...) {
+        return make_exception_future<>(std::current_exception());
+    }
+}
+
+static future<>
+remove_desc(db_context ctx, const sstring& ks_name, const sstring& table_name) {
+    try {
+        return ctx._migration_manager.announce_column_family_drop(
+                ks_name, desc_name(table_name), false);
+    } catch (exceptions::configuration_exception& e) {
+        // It's fine if the table does not exist.
+        return make_ready_future<>();
+    } catch (...) {
+        return make_exception_future<>(std::current_exception());
+    }
+}
+
+future<>
+remove(db_context ctx, const sstring& ks_name, const sstring& table_name) {
+    return when_all(remove_log(ctx, ks_name, table_name),
+                    remove_desc(ctx, ks_name, table_name)).discard_result();
+}
+
+static future<> setup_log(db_context ctx, const schema& s) {
+    schema_builder b(s.ks_name(), log_name(s.cf_name()));
+    b.set_default_time_to_live(gc_clock::duration{s.cdc_options().ttl()});
+    b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
+    b.with_column("stream_id", uuid_type, column_kind::partition_key);
+    b.with_column("time", timeuuid_type, column_kind::clustering_key);
+    b.with_column("batch_seq_no", int32_type, column_kind::clustering_key);
+    b.with_column("operation", data_type_for<operation_native_type>());
+    b.with_column("ttl", long_type);
+    auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
+        for (const auto& column : columns) {
+            auto type = column.type;
+            if (is_data_col) {
+                type = tuple_type_impl::get_instance({ /* op */ data_type_for<column_op_native_type>(), /* value */ type, /* ttl */long_type});
+            }
+            b.with_column("_" + column.name(), type);
+        }
+    };
+    add_columns(s.partition_key_columns());
+    add_columns(s.clustering_key_columns());
+    add_columns(s.static_columns(), true);
+    add_columns(s.regular_columns(), true);
+    return ctx._migration_manager.announce_new_column_family(b.build(), false);
+}
+
+static future<> setup_stream_description_table(db_context ctx, const schema& s) {
+    schema_builder b(s.ks_name(), desc_name(s.cf_name()));
+    b.set_comment(sprint("CDC description for %s.%s", s.ks_name(), s.cf_name()));
+    b.with_column("node_ip", inet_addr_type, column_kind::partition_key);
+    b.with_column("shard_id", int32_type, column_kind::partition_key);
+    b.with_column("created_at", timestamp_type, column_kind::clustering_key);
+    b.with_column("stream_id", uuid_type);
+    return ctx._migration_manager.announce_new_column_family(b.build(), false);
+}
+
+// This function assumes setup_stream_description_table was called on |s| before the call to this
+// function.
+static future<> populate_desc(db_context ctx, const schema& s) {
+    auto& db = ctx._proxy.get_db().local();
+    auto desc_schema =
+        db.find_schema(s.ks_name(), desc_name(s.cf_name()));
+    auto log_schema =
+        db.find_schema(s.ks_name(), log_name(s.cf_name()));
+    auto belongs_to = [&](const gms::inet_address& endpoint,
+                          const unsigned int shard_id,
+                          const int shard_count,
+                          const unsigned int ignore_msb_bits,
+                          const utils::UUID& stream_id) {
+        const auto log_pk = partition_key::from_singular(*log_schema,
+                                                         data_value(stream_id));
+        const auto token = ctx._partitioner.decorate_key(*log_schema, log_pk).token();
+        if (ctx._token_metadata.get_endpoint(ctx._token_metadata.first_token(token)) != endpoint) {
+            return false;
+        }
+        const auto owning_shard_id = dht::murmur3_partitioner(shard_count, ignore_msb_bits).shard_of(token);
+        return owning_shard_id == shard_id;
+    };
+
+    std::vector<mutation> mutations;
+    const auto ts = api::new_timestamp();
+    const auto ck = clustering_key::from_single_value(
+            *desc_schema, timestamp_type->decompose(ts));
+    auto cdef = desc_schema->get_column_definition(to_bytes("stream_id"));
+
+    for (const auto& dc : ctx._token_metadata.get_topology().get_datacenter_endpoints()) {
+        for (const auto& endpoint : dc.second) {
+            const auto decomposed_ip = inet_addr_type->decompose(endpoint.addr());
+            const unsigned int shard_count = ctx._snitch->get_shard_count(endpoint);
+            const unsigned int ignore_msb_bits = ctx._snitch->get_ignore_msb_bits(endpoint);
+            for (unsigned int shard_id = 0; shard_id < shard_count; ++shard_id) {
+                const auto pk = partition_key::from_exploded(
+                        *desc_schema, { decomposed_ip, int32_type->decompose(static_cast<int>(shard_id)) });
+                mutations.emplace_back(desc_schema, pk);
+
+                auto stream_id = utils::make_random_uuid();
+                while (!belongs_to(endpoint, shard_id, shard_count, ignore_msb_bits, stream_id)) {
+                    stream_id = utils::make_random_uuid();
+                }
+                auto value = atomic_cell::make_live(*uuid_type,
+                                                    ts,
+                                                    uuid_type->decompose(stream_id));
+                mutations.back().set_cell(ck, *cdef, std::move(value));
+            }
+        }
+    }
+    return ctx._proxy.mutate(std::move(mutations),
+                             db::consistency_level::QUORUM,
+                             db::no_timeout,
+                             nullptr,
+                             empty_service_permit());
+}
+
+future<> setup(db_context ctx, schema_ptr s) {
+    return seastar::async([ctx = std::move(ctx), s = std::move(s)] {
+        setup_log(ctx, *s).get();
+        auto log_guard = seastar::defer([&] { remove_log(ctx, s->ks_name(), s->cf_name()).get(); });
+        setup_stream_description_table(ctx, *s).get();
+        auto desc_guard = seastar::defer([&] { remove_desc(ctx, s->ks_name(), s->cf_name()).get(); });
+        populate_desc(ctx, *s).get();
+        desc_guard.cancel();
+        log_guard.cancel();
+    });
+}
+
+db_context db_context::builder::build() {
+    return db_context{
+        _proxy,
+        _migration_manager ? _migration_manager->get() : service::get_local_migration_manager(),
+        _token_metadata ? _token_metadata->get() : service::get_local_storage_service().get_token_metadata(),
+        _snitch ? _snitch->get() : locator::i_endpoint_snitch::get_local_snitch_ptr(),
+        _partitioner ? _partitioner->get() : dht::global_partitioner()
+    };
+}
+
+class transformer final {
+public:
+    using streams_type = std::unordered_map<std::pair<net::inet_address, unsigned int>, utils::UUID>;
+private:
+    db_context _ctx;
+    schema_ptr _schema;
+    schema_ptr _log_schema;
+    utils::UUID _time;
+    bytes _decomposed_time;
+    ::shared_ptr<const transformer::streams_type> _streams;
+    const column_definition& _op_col;
+
+    clustering_key set_pk_columns(const partition_key& pk, int batch_no, mutation& m) const {
+        const auto log_ck = clustering_key::from_exploded(
+                *m.schema(), { _decomposed_time, int32_type->decompose(batch_no) });
+        auto pk_value = pk.explode(*_schema);
+        size_t pos = 0;
+        for (const auto& column : _schema->partition_key_columns()) {
+            assert (pos < pk_value.size());
+            auto cdef = m.schema()->get_column_definition(to_bytes("_" + column.name()));
+            auto value = atomic_cell::make_live(*column.type,
+                                                _time.timestamp(),
+                                                bytes_view(pk_value[pos]));
+            m.set_cell(log_ck, *cdef, std::move(value));
+            ++pos;
+        }
+        return log_ck;
+    }
+
+    void set_operation(const clustering_key& ck, operation op, mutation& m) const {
+        m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op))));
+    }
+
+    partition_key stream_id(const net::inet_address& ip, unsigned int shard_id) const {
+        auto it = _streams->find(std::make_pair(ip, shard_id));
+        if (it == std::end(*_streams)) {
+                throw std::runtime_error(format("No stream found for node {} and shard {}", ip, shard_id));
+        }
+        return partition_key::from_exploded(*_log_schema, { uuid_type->decompose(it->second) });
+    }
+public:
+    transformer(db_context ctx, schema_ptr s, ::shared_ptr<const transformer::streams_type> streams)
+        : _ctx(ctx)
+        , _schema(std::move(s))
+        , _log_schema(ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
+        , _time(utils::UUID_gen::get_time_UUID())
+        , _decomposed_time(timeuuid_type->decompose(_time))
+        , _streams(std::move(streams))
+        , _op_col(*_log_schema->get_column_definition(to_bytes("operation")))
+    {}
+
+    // TODO: is pre-image data based on query enough. We only have actual column data. Do we need
+    // more details like tombstones/ttl? Probably not but keep in mind.
+    mutation transform(const mutation& m, const cql3::untyped_result_set* rs = nullptr) const {
+        auto& t = m.token();
+        auto&& ep = _ctx._token_metadata.get_endpoint(
+                _ctx._token_metadata.first_token(t));
+        if (!ep) {
+            throw std::runtime_error(format("No owner found for key {}", m.decorated_key()));
+        }
+        auto shard_id = dht::murmur3_partitioner(_ctx._snitch->get_shard_count(*ep), _ctx._snitch->get_ignore_msb_bits(*ep)).shard_of(t);
+        mutation res(_log_schema, stream_id(ep->addr(), shard_id));
+        auto& p = m.partition();
+        if (p.partition_tombstone()) {
+            // Partition deletion
+            auto log_ck = set_pk_columns(m.key(), 0, res);
+            set_operation(log_ck, operation::partition_delete, res);
+        } else if (!p.row_tombstones().empty()) {
+            // range deletion
+            int batch_no = 0;
+            for (auto& rt : p.row_tombstones()) {
+                auto set_bound = [&] (const clustering_key& log_ck, const clustering_key_prefix& ckp) {
+                    auto exploded = ckp.explode(*_schema);
+                    size_t pos = 0;
+                    for (const auto& column : _schema->clustering_key_columns()) {
+                        if (pos >= exploded.size()) {
+                            break;
+                        }
+                        auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
+                        auto value = atomic_cell::make_live(*column.type,
+                                                            _time.timestamp(),
+                                                            bytes_view(exploded[pos]));
+                        res.set_cell(log_ck, *cdef, std::move(value));
+                        ++pos;
+                    }
+                };
+                {
+                    auto log_ck = set_pk_columns(m.key(), batch_no, res);
+                    set_bound(log_ck, rt.start);
+                    // TODO: separate inclusive/exclusive range
+                    set_operation(log_ck, operation::range_delete_start, res);
+                    ++batch_no;
+                }
+                {
+                    auto log_ck = set_pk_columns(m.key(), batch_no, res);
+                    set_bound(log_ck, rt.end);
+                    // TODO: separate inclusive/exclusive range
+                    set_operation(log_ck, operation::range_delete_end, res);
+                    ++batch_no;
+                }
+            }
+        } else {
+            // should be update or deletion
+            int batch_no = 0;
+            for (const rows_entry& r : p.clustered_rows()) {
+                auto ck_value = r.key().explode(*_schema);
+
+                std::optional<clustering_key> pikey;
+                const cql3::untyped_result_set_row * pirow = nullptr;
+
+                if (rs) {
+                    for (auto& utr : *rs) {
+                        bool match = true;
+                        for (auto& c : _schema->clustering_key_columns()) {
+                            auto rv = utr.get_view(c.name_as_text());
+                            auto cv = r.key().get_component(*_schema, c.component_index());
+                            if (rv != cv) {
+                                match = false;
+                                break;
+                            }
+                        }
+                        if (match) {
+                            pikey = set_pk_columns(m.key(), batch_no, res);
+                            set_operation(*pikey, operation::pre_image, res);
+                            pirow = &utr;
+                            ++batch_no;
+                            break;
+                        }
+                    }
+                }
+
+                auto log_ck = set_pk_columns(m.key(), batch_no, res);
+
+                size_t pos = 0;
+                for (const auto& column : _schema->clustering_key_columns()) {
+                    assert (pos < ck_value.size());
+                    auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
+                    res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos])));
+
+                    if (pirow) {
+                        assert(pirow->has(column.name_as_text()));
+                        res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos])));
+                    }
+
+                    ++pos;
+                }
+
+                std::vector<bytes_opt> values(3);
+
+                auto process_cells = [&](const row& r, column_kind ckind) {
+                    r.for_each_cell([&](column_id id, const atomic_cell_or_collection& cell) {
+                        auto& cdef = _schema->column_at(ckind, id);
+                        auto* dst = _log_schema->get_column_definition(to_bytes("_" + cdef.name()));
+                        // todo: collections.
+                        if (cdef.is_atomic()) {
+                            column_op op;
+
+                            values[1] = values[2] = std::nullopt;
+                            auto view = cell.as_atomic_cell(cdef);
+                            if (view.is_live()) {
+                                op = column_op::set;
+                                values[1] = view.value().linearize();
+                                if (view.is_live_and_has_ttl()) {
+                                    values[2] = long_type->decompose(data_value(view.ttl().count()));
+                                }
+                            } else {
+                                op = column_op::del;
+                            }
+
+                            values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(op)));
+                            res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values)));
+
+                            if (pirow && pirow->has(cdef.name_as_text())) {
+                                values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(column_op::set)));
+                                values[1] = pirow->get_blob(cdef.name_as_text());
+                                values[2] = std::nullopt;
+
+                                assert(std::addressof(res.partition().clustered_row(*_log_schema, *pikey)) != std::addressof(res.partition().clustered_row(*_log_schema, log_ck)));
+                                assert(pikey->explode() != log_ck.explode());
+                                res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values)));
+                            }
+                        } else {
+                            cdc_log.warn("Non-atomic cell ignored {}.{}:{}", _schema->ks_name(), _schema->cf_name(), cdef.name_as_text());
+                        }
+                    });
+                };
+
+                process_cells(r.row().cells(), column_kind::regular_column);
+                process_cells(p.static_row().get(), column_kind::static_column);
+
+                set_operation(log_ck, operation::update, res);
+                ++batch_no;
+            }
+        }
+
+        return res;
+    }
+
+    static db::timeout_clock::time_point default_timeout() {
+        return db::timeout_clock::now() + 10s;
+    }
+
+    future<lw_shared_ptr<cql3::untyped_result_set>> pre_image_select(
+            service::storage_proxy& proxy,
+            service::client_state& client_state,
+            db::consistency_level cl,
+            const mutation& m)
+    {
+        auto& p = m.partition();
+        if (p.partition_tombstone() || !p.row_tombstones().empty() || p.clustered_rows().empty()) {
+            return make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>();
+        }
+
+        dht::partition_range_vector partition_ranges{dht::partition_range(m.decorated_key())};
+
+        auto&& pc = _schema->partition_key_columns();
+        auto&& cc = _schema->clustering_key_columns();
+
+        std::vector<query::clustering_range> bounds;
+        if (cc.empty()) {
+            bounds.push_back(query::clustering_range::make_open_ended_both_sides());
+        } else {
+            for (const rows_entry& r : p.clustered_rows()) {
+                auto& ck = r.key();
+                bounds.push_back(query::clustering_range::make_singular(ck));
+            }
+        }
+
+        std::vector<const column_definition*> columns;
+        columns.reserve(_schema->all_columns().size());
+
+        std::transform(pc.begin(), pc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
+        std::transform(cc.begin(), cc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
+
+        query::column_id_vector static_columns, regular_columns;
+
+        auto sk = column_kind::static_column;
+        auto rk = column_kind::regular_column;
+        // TODO: this assumes all mutations touch the same set of columns. This might not be true, and we may need to do more horrible set operation here.
+        for (auto& [r, cids, kind] : { std::tie(p.static_row().get(), static_columns, sk), std::tie(p.clustered_rows().begin()->row().cells(), regular_columns, rk) }) {
+            r.for_each_cell([&](column_id id, const atomic_cell_or_collection&) {
+                auto& cdef =_schema->column_at(kind, id);
+                cids.emplace_back(id);
+                columns.emplace_back(&cdef);
+            });
+        }
+
+        auto selection = cql3::selection::selection::for_columns(_schema, std::move(columns));
+        auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), selection->get_query_options());
+        auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_partitions);
+
+        return proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
+                [this, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
+                    cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
+                    query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *_schema, *selection));
+                    auto result_set = builder.build();
+                    if (!result_set || result_set->empty()) {
+                        return {};
+                    }
+                    return make_lw_shared<cql3::untyped_result_set>(*result_set);
+        });
+    }
+};
+
+// This class is used to build a mapping from <node ip, shard id> to stream_id
+// It is used as a consumer for rows returned by the query to CDC Description Table
+class streams_builder {
+    const schema& _schema;
+    transformer::streams_type _streams;
+    net::inet_address _node_ip = net::inet_address();
+    unsigned int _shard_id = 0;
+    api::timestamp_type _latest_row_timestamp = api::min_timestamp;
+    utils::UUID _latest_row_stream_id = utils::UUID();
+public:
+    streams_builder(const schema& s) : _schema(s) {}
+
+    void accept_new_partition(const partition_key& key, uint32_t row_count) {
+        auto exploded = key.explode(_schema);
+        _node_ip = value_cast<net::inet_address>(inet_addr_type->deserialize(exploded[0]));
+        _shard_id = static_cast<unsigned int>(value_cast<int>(int32_type->deserialize(exploded[1])));
+        _latest_row_timestamp = api::min_timestamp;
+        _latest_row_stream_id = utils::UUID();
+    }
+
+    void accept_new_partition(uint32_t row_count) {
+        assert(false);
+    }
+
+    void accept_new_row(
+            const clustering_key& key,
+            const query::result_row_view& static_row,
+            const query::result_row_view& row) {
+        auto row_iterator = row.iterator();
+        api::timestamp_type timestamp = value_cast<db_clock::time_point>(
+                timestamp_type->deserialize(key.explode(_schema)[0])).time_since_epoch().count();
+        if (timestamp <= _latest_row_timestamp) {
+            return;
+        }
+        _latest_row_timestamp = timestamp;
+        for (auto&& cdef : _schema.regular_columns()) {
+            if (cdef.name_as_text() != "stream_id") {
+                row_iterator.skip(cdef);
+                continue;
+            }
+            auto val_opt = row_iterator.next_atomic_cell();
+            assert(val_opt);
+            val_opt->value().with_linearized([&] (bytes_view bv) {
+                _latest_row_stream_id = value_cast<utils::UUID>(uuid_type->deserialize(bv));
+            });
+        }
+    }
+
+    void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
+        assert(false);
+    }
+
+    void accept_partition_end(const query::result_row_view& static_row) {
+        _streams.emplace(std::make_pair(_node_ip, _shard_id), _latest_row_stream_id);
+    }
+
+    transformer::streams_type build() {
+        return std::move(_streams);
+    }
+};
+
+static future<::shared_ptr<transformer::streams_type>> get_streams(
+        db_context ctx,
+        const sstring& ks_name,
+        const sstring& cf_name,
+        lowres_clock::time_point timeout,
+        service::query_state& qs) {
+    auto s =
+        ctx._proxy.get_db().local().find_schema(ks_name, desc_name(cf_name));
+    query::read_command cmd(
+            s->id(),
+            s->version(),
+            partition_slice_builder(*s).with_no_static_columns().build());
+    return ctx._proxy.query(
+            s,
+            make_lw_shared(std::move(cmd)),
+            {dht::partition_range::make_open_ended_both_sides()},
+            db::consistency_level::QUORUM,
+            {timeout, qs.get_permit(), qs.get_client_state()}).then([s = std::move(s)] (auto qr) mutable {
+        return query::result_view::do_with(*qr.query_result,
+                [s = std::move(s)] (query::result_view v) {
+            auto slice = partition_slice_builder(*s)
+                    .with_no_static_columns()
+                    .build();
+            streams_builder builder{ *s };
+            v.consume(slice, builder);
+            return ::make_shared<transformer::streams_type>(builder.build());
+        });
+    });
+}
+
+future<std::vector<mutation>> append_log_mutations(
+        db_context ctx,
+        schema_ptr s,
+        service::storage_proxy::clock_type::time_point timeout,
+        service::query_state& qs,
+        std::vector<mutation> muts) {
+    auto mp = ::make_lw_shared<std::vector<mutation>>(std::move(muts));
+
+    return get_streams(ctx, s->ks_name(), s->cf_name(), timeout, qs).then([ctx, s = std::move(s), mp, &qs](::shared_ptr<transformer::streams_type> streams) mutable {
+        mp->reserve(2 * mp->size());
+        auto trans = make_lw_shared<transformer>(ctx, s, std::move(streams));
+        auto i = mp->begin();
+        auto e = mp->end();
+        return parallel_for_each(i, e, [ctx, &qs, trans, mp](mutation& m) {
+            return trans->pre_image_select(ctx._proxy, qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m).then([trans, mp, &m](lw_shared_ptr<cql3::untyped_result_set> rs) {
+                mp->push_back(trans->transform(m, rs.get()));
+            });
+        }).then([mp] {
+            return std::move(*mp);
+        });
+    });
+}
+
+} // namespace cdc
--- a/cdc/cdc.hh
+++ b/cdc/cdc.hh
@@ -0,0 +1,233 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include <functional>
+#include <optional>
+#include <map>
+#include <string>
+#include <vector>
+
+#include <seastar/core/future.hh>
+#include <seastar/core/lowres_clock.hh>
+#include <seastar/core/shared_ptr.hh>
+#include <seastar/core/sstring.hh>
+
+#include "exceptions/exceptions.hh"
+#include "json.hh"
+#include "timestamp.hh"
+
+class schema;
+using schema_ptr = seastar::lw_shared_ptr<const schema>;
+
+namespace locator {
+
+class snitch_ptr;
+class token_metadata;
+
+} // namespace locator
+
+namespace service {
+
+class migration_manager;
+class storage_proxy;
+class query_state;
+
+} // namespace service
+
+namespace dht {
+
+class i_partitioner;
+
+} // namespace dht
+
+class mutation;
+class partition_key;
+
+namespace cdc {
+
+class options final {
+    bool _enabled = false;
+    bool _preimage = false;
+    bool _postimage = false;
+    int _ttl = 86400; // 24h in seconds
+public:
+    options() = default;
+    options(const std::map<sstring, sstring>& map) {
+        if (map.find("enabled") == std::end(map)) {
+            return;
+        }
+
+        for (auto& p : map) {
+            if (p.first == "enabled") {
+                _enabled = p.second == "true";
+            } else if (p.first == "preimage") {
+                _preimage = p.second == "true";
+            } else if (p.first == "postimage") {
+                _postimage = p.second == "true";
+            } else if (p.first == "ttl") {
+                _ttl = std::stoi(p.second);
+            } else {
+                throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
+            }
+        }
+    }
+    std::map<sstring, sstring> to_map() const {
+        if (!_enabled) {
+            return {};
+        }
+        return {
+            { "enabled", _enabled ? "true" : "false" },
+            { "preimage", _preimage ? "true" : "false" },
+            { "postimage", _postimage ? "true" : "false" },
+            { "ttl", std::to_string(_ttl) },
+        };
+    }
+
+    sstring to_sstring() const {
+        return json::to_json(to_map());
+    }
+
+    bool enabled() const { return _enabled; }
+    bool preimage() const { return _preimage; }
+    bool postimage() const { return _postimage; }
+    int ttl() const { return _ttl; }
+
+    bool operator==(const options& o) const {
+        return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
+    }
+    bool operator!=(const options& o) const {
+        return !(*this == o);
+    }
+};
+
+struct db_context final {
+    service::storage_proxy& _proxy;
+    service::migration_manager& _migration_manager;
+    locator::token_metadata& _token_metadata;
+    locator::snitch_ptr& _snitch;
+    dht::i_partitioner& _partitioner;
+
+    class builder final {
+        service::storage_proxy& _proxy;
+        std::optional<std::reference_wrapper<service::migration_manager>> _migration_manager;
+        std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
+        std::optional<std::reference_wrapper<locator::snitch_ptr>> _snitch;
+        std::optional<std::reference_wrapper<dht::i_partitioner>> _partitioner;
+    public:
+        builder(service::storage_proxy& proxy) : _proxy(proxy) { }
+
+        builder& with_migration_manager(service::migration_manager& migration_manager) {
+            _migration_manager = migration_manager;
+            return *this;
+        }
+
+        builder& with_token_metadata(locator::token_metadata& token_metadata) {
+            _token_metadata = token_metadata;
+            return *this;
+        }
+
+        builder& with_snitch(locator::snitch_ptr& snitch) {
+            _snitch = snitch;
+            return *this;
+        }
+
+        builder& with_partitioner(dht::i_partitioner& partitioner) {
+            _partitioner = partitioner;
+            return *this;
+        }
+
+        db_context build();
+    };
+};
+
+/// \brief Sets up CDC related tables for a given table
+///
+/// This function not only creates CDC Log and CDC Description for a given table
+/// but also populates CDC Description with a list of change streams.
+///
+/// param[in] ctx object with references to database components
+/// param[in] schema schema of a table for which CDC tables are being created
+seastar::future<> setup(db_context ctx, schema_ptr schema);
+
+// cdc log table operation
+enum class operation : int8_t {
+    // note: these values will eventually be read by a third party, probably not privvy to this
+    // enum decl, so don't change the constant values (or the datatype).
+    pre_image = 0, update = 1, row_delete = 2, range_delete_start = 3, range_delete_end = 4, partition_delete = 5
+};
+
+// cdc log data column operation
+enum class column_op : int8_t {
+    // same as "operation". Do not edit values or type/type unless you _really_ want to.
+    set = 0, del = 1, add = 2,
+};
+
+/// \brief Deletes CDC Log and CDC Description tables for a given table
+///
+/// This function cleans up all CDC related tables created for a given table.
+/// At the moment, CDC Log and CDC Description are the only affected tables.
+/// It's ok if some/all of them don't exist.
+///
+/// \param[in] ctx object with references to database components
+/// \param[in] ks_name keyspace name of a table for which CDC tables are removed
+/// \param[in] table_name name of a table for which CDC tables are removed
+///
+/// \pre This function works correctly no matter if CDC Log and/or CDC Description
+///      exist.
+seastar::future<>
+remove(db_context ctx, const seastar::sstring& ks_name, const seastar::sstring& table_name);
+
+seastar::sstring log_name(const seastar::sstring& table_name);
+
+seastar::sstring desc_name(const seastar::sstring& table_name);
+
+/// \brief For each mutation in the set appends related CDC Log mutation
+///
+/// This function should be called with a set of mutations of a table
+/// with CDC enabled. Returned set of mutations contains all original mutations
+/// and for each original mutation appends a mutation to CDC Log that reflects
+/// the change.
+///
+/// \param[in] ctx object with references to database components
+/// \param[in] s schema of a CDC enabled table which is being modified
+/// \param[in] timeout period of time after which a request is considered timed out
+/// \param[in] qs the state of the query that's being executed
+/// \param[in] mutations set of changes of a CDC enabled table
+///
+/// \return set of mutations from input parameter with relevant CDC Log mutations appended
+///
+/// \pre CDC Log and CDC Description have to exist
+/// \pre CDC Description has to be in sync with cluster topology
+///
+/// \note At the moment, cluster topology changes are not supported
+//        so the assumption that CDC Description is in sync with cluster topology
+//        is easy to enforce. When support for cluster topology changes is added
+//        it has to make sure the assumption holds.
+seastar::future<std::vector<mutation>>append_log_mutations(
+        db_context ctx,
+        schema_ptr s,
+        lowres_clock::time_point timeout,
+        service::query_state& qs,
+        std::vector<mutation> mutations);
+
+} // namespace cdc
--- a/cell_locking.hh
+++ b/cell_locking.hh
@@ -68,7 +68,7 @@ public:
    public:
        explicit iterator(const mutation_partition& mp)
            : _mp(mp)
-            , _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
+            , _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row().get())
        { }

        iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)
--- a/collection_mutation.cc
+++ b/collection_mutation.cc
@@ -0,0 +1,440 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "types/collection.hh"
+#include "types/user.hh"
+#include "concrete_types.hh"
+#include "atomic_cell_or_collection.hh"
+#include "mutation_partition.hh"
+#include "compaction_garbage_collector.hh"
+#include "combine.hh"
+
+#include "collection_mutation.hh"
+
+collection_mutation::collection_mutation(const abstract_type& type, collection_mutation_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator())) {}
+
+collection_mutation::collection_mutation(const abstract_type& type, bytes_view v)
+    : _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator())) {}
+
+static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
+{
+    auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
+    auto ti = data::type_info::make_collection();
+    data::cell::context ctx(f, ti);
+    auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
+    auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
+    return collection_mutation_view { dv };
+}
+
+collection_mutation::operator collection_mutation_view() const
+{
+    return get_collection_mutation_view(_data.get());
+}
+
+collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
+    return get_collection_mutation_view(_data.get());
+}
+
+bool collection_mutation_view::is_empty() const {
+  return data.with_linearized([&] (bytes_view in) { // FIXME: we can guarantee that this is in the first fragment
+    auto has_tomb = read_simple<bool>(in);
+    return !has_tomb && read_simple<uint32_t>(in) == 0;
+  });
+}
+
+template <typename F>
+GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, bytes_view&>)
+static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
+  return data.with_linearized([&] (bytes_view in) {
+    auto has_tomb = read_simple<bool>(in);
+    if (has_tomb) {
+        auto ts = read_simple<api::timestamp_type>(in);
+        auto ttl = read_simple<gc_clock::duration::rep>(in);
+        tomb.apply(tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))});
+    }
+
+    auto nr = read_simple<uint32_t>(in);
+    for (uint32_t i = 0; i != nr; ++i) {
+        auto& type_info = read_cell_type_info(in);
+        auto vsize = read_simple<uint32_t>(in);
+        auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
+        if (value.is_live(tomb, now, false)) {
+            return true;
+        }
+    }
+
+    return false;
+  });
+}
+
+bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone tomb, gc_clock::time_point now) const {
+    return visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        auto& type_info = ctype.value_comparator()->imr_state().type_info();
+        return ::is_any_live(data, tomb, now, [&type_info] (bytes_view& in) -> const data::type_info& {
+            auto key_size = read_simple<uint32_t>(in);
+            in.remove_prefix(key_size);
+            return type_info;
+        });
+    },
+    [&] (const user_type_impl& utype) {
+        return ::is_any_live(data, tomb, now, [&utype] (bytes_view& in) -> const data::type_info& {
+            auto key_size = read_simple<uint32_t>(in);
+            auto key = read_simple_bytes(in, key_size);
+            return utype.type(deserialize_field_index(key))->imr_state().type_info();
+        });
+    },
+    [&] (const abstract_type& o) -> bool {
+        throw std::runtime_error(format("collection_mutation_view::is_any_live: unknown type {}", o.name()));
+    }
+    ));
+}
+
+template <typename F>
+GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, bytes_view&>)
+static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
+  return data.with_linearized([&] (bytes_view in) {
+    api::timestamp_type max = api::missing_timestamp;
+    auto has_tomb = read_simple<bool>(in);
+    if (has_tomb) {
+        max = std::max(max, read_simple<api::timestamp_type>(in));
+        (void)read_simple<gc_clock::duration::rep>(in);
+    }
+
+    auto nr = read_simple<uint32_t>(in);
+    for (uint32_t i = 0; i != nr; ++i) {
+        auto& type_info = read_cell_type_info(in);
+        auto vsize = read_simple<uint32_t>(in);
+        auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
+        max = std::max(value.timestamp(), max);
+    }
+
+    return max;
+  });
+}
+
+
+api::timestamp_type collection_mutation_view::last_update(const abstract_type& type) const {
+    return visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        auto& type_info = ctype.value_comparator()->imr_state().type_info();
+        return ::last_update(data, [&type_info] (bytes_view& in) -> const data::type_info& {
+            auto key_size = read_simple<uint32_t>(in);
+            in.remove_prefix(key_size);
+            return type_info;
+        });
+    },
+    [&] (const user_type_impl& utype) {
+        return ::last_update(data, [&utype] (bytes_view& in) -> const data::type_info& {
+            auto key_size = read_simple<uint32_t>(in);
+            auto key = read_simple_bytes(in, key_size);
+            return utype.type(deserialize_field_index(key))->imr_state().type_info();
+        });
+    },
+    [&] (const abstract_type& o) -> api::timestamp_type {
+        throw std::runtime_error(format("collection_mutation_view::last_update: unknown type {}", o.name()));
+    }
+    ));
+}
+
+collection_mutation_description
+collection_mutation_view_description::materialize(const abstract_type& type) const {
+    collection_mutation_description m;
+    m.tomb = tomb;
+    m.cells.reserve(cells.size());
+
+    visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        auto& value_type = *ctype.value_comparator();
+        for (auto&& e : cells) {
+            m.cells.emplace_back(to_bytes(e.first), atomic_cell(value_type, e.second));
+        }
+    },
+    [&] (const user_type_impl& utype) {
+        for (auto&& e : cells) {
+            m.cells.emplace_back(to_bytes(e.first), atomic_cell(*utype.type(deserialize_field_index(e.first)), e.second));
+        }
+    },
+    [&] (const abstract_type& o) {
+        throw std::runtime_error(format("attempted to materialize collection_mutation_view_description with type {}", o.name()));
+    }
+    ));
+
+    return m;
+}
+
+bool collection_mutation_description::compact_and_expire(column_id id, row_tombstone base_tomb, gc_clock::time_point query_time,
+    can_gc_fn& can_gc, gc_clock::time_point gc_before, compaction_garbage_collector* collector)
+{
+    bool any_live = false;
+    auto t = tomb;
+    tombstone purged_tomb;
+    if (tomb <= base_tomb.regular()) {
+        tomb = tombstone();
+    } else if (tomb.deletion_time < gc_before && can_gc(tomb)) {
+        purged_tomb = tomb;
+        tomb = tombstone();
+    }
+    t.apply(base_tomb.regular());
+    utils::chunked_vector<std::pair<bytes, atomic_cell>> survivors;
+    utils::chunked_vector<std::pair<bytes, atomic_cell>> losers;
+    for (auto&& name_and_cell : cells) {
+        atomic_cell& cell = name_and_cell.second;
+        auto cannot_erase_cell = [&] {
+            return cell.deletion_time() >= gc_before || !can_gc(tombstone(cell.timestamp(), cell.deletion_time()));
+        };
+
+        if (cell.is_covered_by(t, false) || cell.is_covered_by(base_tomb.shadowable().tomb(), false)) {
+            continue;
+        }
+        if (cell.has_expired(query_time)) {
+            if (cannot_erase_cell()) {
+                survivors.emplace_back(std::make_pair(
+                    std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
+            } else if (collector) {
+                losers.emplace_back(std::pair(
+                        std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
+            }
+        } else if (!cell.is_live()) {
+            if (cannot_erase_cell()) {
+                survivors.emplace_back(std::move(name_and_cell));
+            } else if (collector) {
+                losers.emplace_back(std::move(name_and_cell));
+            }
+        } else {
+            any_live |= true;
+            survivors.emplace_back(std::move(name_and_cell));
+        }
+    }
+    if (collector) {
+        collector->collect(id, collection_mutation_description{purged_tomb, std::move(losers)});
+    }
+    cells = std::move(survivors);
+    return any_live;
+}
+
+template <typename Iterator>
+static collection_mutation serialize_collection_mutation(
+        const abstract_type& type,
+        const tombstone& tomb,
+        boost::iterator_range<Iterator> cells) {
+    auto element_size = [] (size_t c, auto&& e) -> size_t {
+        return c + 8 + e.first.size() + e.second.serialize().size();
+    };
+    auto size = accumulate(cells, (size_t)4, element_size);
+    size += 1;
+    if (tomb) {
+        size += sizeof(tomb.timestamp) + sizeof(tomb.deletion_time);
+    }
+    bytes ret(bytes::initialized_later(), size);
+    bytes::iterator out = ret.begin();
+    *out++ = bool(tomb);
+    if (tomb) {
+        write(out, tomb.timestamp);
+        write(out, tomb.deletion_time.time_since_epoch().count());
+    }
+    auto writeb = [&out] (bytes_view v) {
+        serialize_int32(out, v.size());
+        out = std::copy_n(v.begin(), v.size(), out);
+    };
+    // FIXME: overflow?
+    serialize_int32(out, boost::distance(cells));
+    for (auto&& kv : cells) {
+        auto&& k = kv.first;
+        auto&& v = kv.second;
+        writeb(k);
+
+        writeb(v.serialize());
+    }
+    return collection_mutation(type, ret);
+}
+
+collection_mutation collection_mutation_description::serialize(const abstract_type& type) const {
+    return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
+}
+
+collection_mutation collection_mutation_view_description::serialize(const abstract_type& type) const {
+    return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
+}
+
+template <typename C>
+GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
+static collection_mutation_view_description
+merge(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type) {
+    using element_type = std::pair<bytes_view, atomic_cell_view>;
+
+    auto compare = [&] (const element_type& e1, const element_type& e2) {
+        return key_type.less(e1.first, e2.first);
+    };
+
+    auto merge = [] (const element_type& e1, const element_type& e2) {
+        // FIXME: use std::max()?
+        return std::make_pair(e1.first, compare_atomic_cell_for_merge(e1.second, e2.second) > 0 ? e1.second : e2.second);
+    };
+
+    // applied to a tombstone, returns a predicate checking whether a cell is killed by
+    // the tombstone
+    auto cell_killed = [] (const std::optional<tombstone>& t) {
+        return [&t] (const element_type& e) {
+            if (!t) {
+                return false;
+            }
+            // tombstone wins if timestamps equal here, unlike row tombstones
+            if (t->timestamp < e.second.timestamp()) {
+                return false;
+            }
+            return true;
+            // FIXME: should we consider TTLs too?
+        };
+    };
+
+    collection_mutation_view_description merged;
+    merged.cells.reserve(a.cells.size() + b.cells.size());
+
+    combine(a.cells.begin(), std::remove_if(a.cells.begin(), a.cells.end(), cell_killed(b.tomb)),
+            b.cells.begin(), std::remove_if(b.cells.begin(), b.cells.end(), cell_killed(a.tomb)),
+            std::back_inserter(merged.cells),
+            compare,
+            merge);
+    merged.tomb = std::max(a.tomb, b.tomb);
+
+    return merged;
+}
+
+collection_mutation merge(const abstract_type& type, collection_mutation_view a, collection_mutation_view b) {
+    return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
+        return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
+            return visit(type, make_visitor(
+            [&] (const collection_type_impl& ctype) {
+                return merge(std::move(a_view), std::move(b_view), *ctype.name_comparator());
+            },
+            [&] (const user_type_impl& utype) {
+                return merge(std::move(a_view), std::move(b_view), *short_type);
+            },
+            [] (const abstract_type& o) -> collection_mutation_view_description {
+                throw std::runtime_error(format("collection_mutation merge: unknown type: {}", o.name()));
+            }
+            )).serialize(type);
+        });
+    });
+}
+
+template <typename C>
+GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
+static collection_mutation_view_description
+difference(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type)
+{
+    collection_mutation_view_description diff;
+    diff.cells.reserve(std::max(a.cells.size(), b.cells.size()));
+
+    auto it = b.cells.begin();
+    for (auto&& c : a.cells) {
+        while (it != b.cells.end() && key_type.less(it->first, c.first)) {
+            ++it;
+        }
+        if (it == b.cells.end() || !key_type.equal(it->first, c.first)
+            || compare_atomic_cell_for_merge(c.second, it->second) > 0) {
+
+            auto cell = std::make_pair(c.first, c.second);
+            diff.cells.emplace_back(std::move(cell));
+        }
+    }
+    if (a.tomb > b.tomb) {
+        diff.tomb = a.tomb;
+    }
+
+    return diff;
+}
+
+collection_mutation difference(const abstract_type& type, collection_mutation_view a, collection_mutation_view b)
+{
+    return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
+        return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
+            return visit(type, make_visitor(
+            [&] (const collection_type_impl& ctype) {
+                return difference(std::move(a_view), std::move(b_view), *ctype.name_comparator());
+            },
+            [&] (const user_type_impl& utype) {
+                return difference(std::move(a_view), std::move(b_view), *short_type);
+            },
+            [] (const abstract_type& o) -> collection_mutation_view_description {
+                throw std::runtime_error(format("collection_mutation difference: unknown type: {}", o.name()));
+            }
+            )).serialize(type);
+        });
+    });
+}
+
+template <typename F>
+GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, bytes_view&>)
+static collection_mutation_view_description
+deserialize_collection_mutation(bytes_view in, F&& read_kv) {
+    collection_mutation_view_description ret;
+
+    auto has_tomb = read_simple<bool>(in);
+    if (has_tomb) {
+        auto ts = read_simple<api::timestamp_type>(in);
+        auto ttl = read_simple<gc_clock::duration::rep>(in);
+        ret.tomb = tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))};
+    }
+
+    auto nr = read_simple<uint32_t>(in);
+    ret.cells.reserve(nr);
+    for (uint32_t i = 0; i != nr; ++i) {
+        ret.cells.push_back(read_kv(in));
+    }
+
+    assert(in.empty());
+    return ret;
+}
+
+collection_mutation_view_description
+deserialize_collection_mutation(const abstract_type& type, bytes_view in) {
+    return visit(type, make_visitor(
+    [&] (const collection_type_impl& ctype) {
+        // value_comparator(), ugh
+        auto& type_info = ctype.value_comparator()->imr_state().type_info();
+        return deserialize_collection_mutation(in, [&type_info] (bytes_view& in) {
+            // FIXME: we could probably avoid the need for size
+            auto ksize = read_simple<uint32_t>(in);
+            auto key = read_simple_bytes(in, ksize);
+            auto vsize = read_simple<uint32_t>(in);
+            auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
+            return std::make_pair(key, value);
+        });
+    },
+    [&] (const user_type_impl& utype) {
+        return deserialize_collection_mutation(in, [&utype] (bytes_view& in) {
+            // FIXME: we could probably avoid the need for size
+            auto ksize = read_simple<uint32_t>(in);
+            auto key = read_simple_bytes(in, ksize);
+            auto vsize = read_simple<uint32_t>(in);
+            auto value = atomic_cell_view::from_bytes(
+                    utype.type(deserialize_field_index(key))->imr_state().type_info(), read_simple_bytes(in, vsize));
+            return std::make_pair(key, value);
+        });
+    },
+    [&] (const abstract_type& o) -> collection_mutation_view_description {
+        throw std::runtime_error(format("deserialize_collection_mutation: unknown type {}", o.name()));
+    }
+    ));
+}
--- a/collection_mutation.hh
+++ b/collection_mutation.hh
@@ -0,0 +1,124 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#pragma once
+
+#include "utils/chunked_vector.hh"
+#include "schema_fwd.hh"
+#include "gc_clock.hh"
+#include "atomic_cell.hh"
+#include "cql_serialization_format.hh"
+
+class abstract_type;
+class compaction_garbage_collector;
+class row_tombstone;
+
+class collection_mutation;
+
+// An auxiliary struct used to (de)construct collection_mutations.
+// Unlike collection_mutation which is a serialized blob, this struct allows to inspect logical units of information
+// (tombstone and cells) inside the mutation easily.
+struct collection_mutation_description {
+    tombstone tomb;
+    // FIXME: use iterators?
+    // we never iterate over `cells` more than once, so there is no need to store them in memory.
+    // In some cases instead of constructing the `cells` vector, it would be more efficient to provide
+    // a one-time-use forward iterator which returns the cells.
+    utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
+
+    // Expires cells based on query_time. Expires tombstones based on max_purgeable and gc_before.
+    // Removes cells covered by tomb or this->tomb.
+    bool compact_and_expire(column_id id, row_tombstone tomb, gc_clock::time_point query_time,
+        can_gc_fn&, gc_clock::time_point gc_before, compaction_garbage_collector* collector = nullptr);
+
+    // Packs the data to a serialized blob.
+    collection_mutation serialize(const abstract_type&) const;
+};
+
+// Similar to collection_mutation_description, except that it doesn't store the cells' data, only observes it.
+struct collection_mutation_view_description {
+    tombstone tomb;
+    // FIXME: use iterators? See the fixme in collection_mutation_description; the same considerations apply here.
+    utils::chunked_vector<std::pair<bytes_view, atomic_cell_view>> cells;
+
+    // Copies the observed data, storing it in a collection_mutation_description.
+    collection_mutation_description materialize(const abstract_type&) const;
+
+    // Packs the data to a serialized blob.
+    collection_mutation serialize(const abstract_type&) const;
+};
+
+// Given a linearized collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
+// The struct is an observer of the data given by the collection_mutation_view and doesn't extend its lifetime.
+// The function needs to be given the type of stored data to reconstruct the structural information.
+collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, bytes_view);
+
+class collection_mutation_view {
+public:
+    atomic_cell_value_view data;
+
+    // Is this a noop mutation?
+    bool is_empty() const;
+
+    // Is any of the stored cells live (not deleted nor expired) at the time point `tp`,
+    // given the later of the tombstones `t` and the one stored in the mutation (if any)?
+    // Requires a type to reconstruct the structural information.
+    bool is_any_live(const abstract_type&, tombstone t = tombstone(), gc_clock::time_point tp = gc_clock::time_point::min()) const;
+
+    // The maximum of timestamps of the mutation's cells and tombstone.
+    api::timestamp_type last_update(const abstract_type&) const;
+
+    // Given a function that operates on a collection_mutation_view_description,
+    // calls it on the corresponding description of `this`.
+    template <typename F>
+    inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
+        return data.with_linearized([&] (bytes_view bv) {
+            return f(deserialize_collection_mutation(type, std::move(bv)));
+        });
+    }
+};
+
+// A serialized mutation of a collection of cells.
+// Used to represent mutations of collections (lists, maps, sets) or non-frozen user defined types.
+// It contains a sequence of cells, each representing a mutation of a single entry (element or field) of the collection.
+// Each cell has an associated 'key' (or 'path'). The meaning of each (key, cell) pair is:
+//  for sets: the key is the serialized set element, the cell contains no data (except liveness information),
+//  for maps: the key is the serialized map element's key, the cell contains the serialized map element's value,
+//  for lists: the key is a timeuuid identifying the list entry, the cell contains the serialized value,
+//  for user types: the key is an index identifying the field, the cell contains the value of the field.
+//  The mutation may also contain a collection-wide tombstone.
+class collection_mutation {
+public:
+    using imr_object_type =  imr::utils::object<data::cell::structure>;
+    imr_object_type _data;
+
+    collection_mutation() {}
+    collection_mutation(const abstract_type&, collection_mutation_view);
+    collection_mutation(const abstract_type&, bytes_view);
+    operator collection_mutation_view() const;
+};
+
+collection_mutation merge(const abstract_type&, collection_mutation_view, collection_mutation_view);
+
+collection_mutation difference(const abstract_type&, collection_mutation_view, collection_mutation_view);
+
+// Serializes the given collection of cells to a sequence of bytes ready to be sent over the CQL protocol.
+bytes serialize_for_cql(const abstract_type&, collection_mutation_view, cql_serialization_format);
--- a/compaction_garbage_collector.hh
+++ b/compaction_garbage_collector.hh
@@ -22,7 +22,7 @@
 #pragma once

 #include "schema.hh"
-#include "types/collection.hh"
+#include "collection_mutation.hh"

 class atomic_cell;
 class row_marker;
@@ -31,6 +31,6 @@ class compaction_garbage_collector {
 public:
    virtual ~compaction_garbage_collector() = default;
    virtual void collect(column_id id, atomic_cell) = 0;
-    virtual void collect(column_id id, collection_type_impl::mutation) = 0;
+    virtual void collect(column_id id, collection_mutation_description) = 0;
    virtual void collect(row_marker) = 0;
 };
--- a/concrete_types.hh
+++ b/concrete_types.hh
@@ -125,6 +125,8 @@ struct date_type_impl final : public concrete_type<db_clock::time_point> {
    date_type_impl();
 };

+using timestamp_date_base_class = concrete_type<db_clock::time_point>;
+
 struct timeuuid_type_impl final : public concrete_type<utils::UUID> {
    timeuuid_type_impl();
 };
--- a/conf/scylla.yaml
+++ b/conf/scylla.yaml
@@ -112,6 +112,9 @@ read_request_timeout_in_ms: 5000

 # How long the coordinator should wait for writes to complete
 write_request_timeout_in_ms: 2000
+# how long a coordinator should continue to retry a CAS operation
+# that contends with other proposals for the same row
+cas_contention_timeout_in_ms: 1000

 # phi value that must be reached for a host to be marked down.
 # most users should never need to adjust this.
@@ -238,7 +241,9 @@ batch_size_fail_threshold_in_kb: 50
 # broadcast_rpc_address: 1.2.3.4

 # Uncomment to enable experimental features
-# experimental: true
+# experimental_features:
+#     - cdc
+#     - lwt

 # The directory where hints files are stored if hinted handoff is enabled.
 # hints_directory: /var/lib/scylla/hints
--- a/configure.py
+++ b/configure.py
@@ -285,7 +285,9 @@ scylla_tests = [
    'tests/row_cache_stress_test',
    'tests/memory_footprint',
    'tests/perf/perf_sstable',
+    'tests/cdc_test',
    'tests/cql_query_test',
+    'tests/user_types_test',
    'tests/secondary_index_test',
    'tests/json_cql_query_test',
    'tests/filtering_test',
@@ -379,6 +381,7 @@ scylla_tests = [
    'tests/data_listeners_test',
    'tests/truncation_migration_test',
    'tests/like_matcher_test',
+    'tests/enum_option_test',
 ]

 perf_tests = [
@@ -466,6 +469,7 @@ cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassa
 scylla_core = (['database.cc',
                'table.cc',
                'atomic_cell.cc',
+                'collection_mutation.cc',
                'hashers.cc',
                'schema.cc',
                'frozen_schema.cc',
@@ -505,6 +509,7 @@ scylla_core = (['database.cc',
                'sstables/partition.cc',
                'sstables/compaction.cc',
                'sstables/compaction_strategy.cc',
+                'sstables/leveled_compaction_strategy.cc',
                'sstables/compaction_manager.cc',
                'sstables/integrity_checked_file_impl.cc',
                'sstables/prepended_input_stream.cc',
@@ -513,6 +518,7 @@ scylla_core = (['database.cc',
                'transport/event_notifier.cc',
                'transport/server.cc',
                'transport/messages/result_message.cc',
+                'cdc/cdc.cc',
                'cql3/abstract_marker.cc',
                'cql3/attributes.cc',
                'cql3/cf_name.cc',
@@ -541,6 +547,7 @@ scylla_core = (['database.cc',
                'cql3/statements/schema_altering_statement.cc',
                'cql3/statements/ks_prop_defs.cc',
                'cql3/statements/modification_statement.cc',
+                'cql3/statements/cas_request.cc',
                'cql3/statements/parsed_statement.cc',
                'cql3/statements/property_definitions.cc',
                'cql3/statements/update_statement.cc',
@@ -578,6 +585,10 @@ scylla_core = (['database.cc',
                'service/priority_manager.cc',
                'service/migration_manager.cc',
                'service/storage_proxy.cc',
+                'service/paxos/proposal.cc',
+                'service/paxos/prepare_response.cc',
+                'service/paxos/paxos_state.cc',
+                'service/paxos/prepare_summary.cc',
                'cql3/operator.cc',
                'cql3/relation.cc',
                'cql3/column_identifier.cc',
@@ -716,6 +727,7 @@ scylla_core = (['database.cc',
                'tracing/trace_keyspace_helper.cc',
                'tracing/trace_state.cc',
                'tracing/tracing_backend_registry.cc',
+                'tracing/traced_file.cc',
                'table_helper.cc',
                'range_tombstone.cc',
                'range_tombstone_list.cc',
@@ -782,6 +794,7 @@ alternator = [
       Antlr3Grammar('alternator/expressions.g'),
       'alternator/conditions.cc',
       'alternator/rjson.cc',
+       'alternator/auth.cc',
 ]

 idls = ['idl/gossip_digest.idl.hh',
@@ -808,6 +821,8 @@ idls = ['idl/gossip_digest.idl.hh',
        'idl/consistency_level.idl.hh',
        'idl/cache_temperature.idl.hh',
        'idl/view.idl.hh',
+        'idl/messaging_service.idl.hh',
+        'idl/paxos.idl.hh',
        ]

 headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
@@ -841,7 +856,6 @@ pure_boost_tests = set([
    'tests/range_test',
    'tests/crc_test',
    'tests/checksum_utils_test',
-    'tests/managed_vector_test',
    'tests/dynamic_bitset_test',
    'tests/idl_test',
    'tests/cartesian_product_test',
@@ -862,6 +876,7 @@ pure_boost_tests = set([
    'tests/top_k_test',
    'tests/small_vector_test',
    'tests/like_matcher_test',
+    'tests/enum_option_test',
 ])

 tests_not_using_seastar_test_framework = set([
@@ -1072,17 +1087,6 @@ scylla_release = file.read().strip()

 extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""

-seastar_flags = []
-if args.dpdk:
-    # fake dependencies on dpdk, so that it is built before anything else
-    seastar_flags += ['--enable-dpdk']
-if args.gcc6_concepts:
-    seastar_flags += ['--enable-gcc6-concepts']
-if args.alloc_failure_injector:
-    seastar_flags += ['--enable-alloc-failure-injector']
-if args.split_dwarf:
-    seastar_flags += ['--split-dwarf']
-
 # We never compress debug info in debug mode
 modes['debug']['cxxflags'] += ' -gz'
 # We compress it by default in release mode
@@ -1097,27 +1101,56 @@ seastar_cflags += ' -Wno-error'
 if args.target != '':
    seastar_cflags += ' -march=' + args.target
 seastar_ldflags = args.user_ldflags
-seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' % (seastar_ldflags),
-                  '--c++-dialect=gnu++17', '--use-std-optional-variant-stringview=1', '--optflags=%s' % (modes['release']['cxx_ld_flags']), ]

 libdeflate_cflags = seastar_cflags
 zstd_cflags = seastar_cflags + ' -Wno-implicit-fallthrough'

-status = subprocess.call([args.python, './configure.py'] + seastar_flags, cwd='seastar')
+MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }

-if status != 0:
-    print('Seastar configuration failed')
-    sys.exit(1)
+def configure_seastar(build_dir, mode):
+    seastar_build_dir = os.path.join(build_dir, mode, 'seastar')

+    seastar_cmake_args = [
+        '-DCMAKE_BUILD_TYPE={}'.format(MODE_TO_CMAKE_BUILD_TYPE[mode]),
+        '-DCMAKE_C_COMPILER={}'.format(args.cc),
+        '-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
+        '-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
+        '-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
+        '-DSeastar_CXX_DIALECT=gnu++17',
+        '-DSeastar_STD_OPTIONAL_VARIANT_STRINGVIEW=ON',
+        '-DSeastar_UNUSED_RESULT_ERROR=ON',
+    ]
+    if args.dpdk:
+        seastar_cmake_args += ['-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']
+    if args.gcc6_concepts:
+        seastar_cmake_args += ['-DSeastar_GCC6_CONCEPTS=ON']
+    if args.split_dwarf:
+        seastar_cmake_args += ['-DSeastar_SPLIT_DWARF=ON']
+    if args.alloc_failure_injector:
+        seastar_cmake_args += ['-DSeastar_ALLOC_FAILURE_INJECTION=ON']

-pc = {mode: 'build/{}/seastar.pc'.format(mode) for mode in build_modes}
+    seastar_cmd = ['cmake', '-G', 'Ninja', os.path.relpath('seastar', seastar_build_dir)] + seastar_cmake_args
+    cmake_dir = seastar_build_dir
+    if args.dpdk:
+        # need to cook first
+        cmake_dir = 'seastar' # required by cooking.sh
+        relative_seastar_build_dir = os.path.join('..', seastar_build_dir)  # relative to seastar/
+        seastar_cmd = ['./cooking.sh', '-i', 'dpdk', '-d', relative_seastar_build_dir, '--'] + seastar_cmd[4:]
+
+    print(seastar_cmd)
+    os.makedirs(seastar_build_dir, exist_ok=True)
+    subprocess.check_call(seastar_cmd, shell=False, cwd=cmake_dir)
+
+for mode in build_modes:
+    configure_seastar('build', mode)
+
+pc = {mode: 'build/{}/seastar/seastar.pc'.format(mode) for mode in build_modes}
 ninja = find_executable('ninja') or find_executable('ninja-build')
 if not ninja:
    print('Ninja executable (ninja or ninja-build) not found on PATH\n')
    sys.exit(1)

-def query_seastar_flags(seastar_pc_file, link_static_cxx=False):
-    pc_file = os.path.join('seastar', seastar_pc_file)
+def query_seastar_flags(pc_file, link_static_cxx=False):
    cflags = pkg_config(pc_file, '--cflags', '--static')
    libs = pkg_config(pc_file, '--libs', '--static')

@@ -1131,8 +1164,6 @@ for mode in build_modes:
    modes[mode]['seastar_cflags'] = seastar_cflags
    modes[mode]['seastar_libs'] = seastar_libs

-MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
-
 # We need to use experimental features of the zstd library (to use our own allocators for the (de)compression context),
 # which are available only when the library is linked statically.
 def configure_zstd(build_dir, mode):
@@ -1301,7 +1332,7 @@ with open(buildfile_tmp, 'w') as f:
        serializers = {}
        thrifts = set()
        antlr3_grammars = set()
-        seastar_dep = 'seastar/build/{}/libseastar.a'.format(mode)
+        seastar_dep = 'build/{}/seastar/libseastar.a'.format(mode)
        for binary in build_artifacts:
            if binary in other:
                continue
@@ -1330,7 +1361,7 @@ with open(buildfile_tmp, 'w') as f:
                    if binary in pure_boost_tests:
                        local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
                    if binary not in tests_not_using_seastar_test_framework:
-                        pc_path = os.path.join('seastar', pc[mode].replace('seastar.pc', 'seastar-testing.pc'))
+                        pc_path = pc[mode].replace('seastar.pc', 'seastar-testing.pc')
                        local_libs += ' ' + pkg_config(pc_path, '--libs', '--static')
                    if has_thrift:
                        local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
@@ -1418,18 +1449,18 @@ with open(buildfile_tmp, 'w') as f:
            f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} || {gen_headers_dep}\n'.format(
                    mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))

-        f.write('build seastar/build/{mode}/libseastar.a: ninja | always\n'
+        f.write('build build/{mode}/seastar/libseastar.a: ninja | always\n'
                .format(**locals()))
        f.write('  pool = submodule_pool\n')
-        f.write('  subdir = seastar/build/{mode}\n'.format(**locals()))
+        f.write('  subdir = build/{mode}/seastar\n'.format(**locals()))
        f.write('  target = seastar seastar_testing\n'.format(**locals()))
-        f.write('build seastar/build/{mode}/apps/iotune/iotune: ninja\n'
+        f.write('build build/{mode}/seastar/apps/iotune/iotune: ninja\n'
                .format(**locals()))
        f.write('  pool = submodule_pool\n')
-        f.write('  subdir = seastar/build/{mode}\n'.format(**locals()))
+        f.write('  subdir = build/{mode}/seastar\n'.format(**locals()))
        f.write('  target = iotune\n'.format(**locals()))
        f.write(textwrap.dedent('''\
-            build build/{mode}/iotune: copy seastar/build/{mode}/apps/iotune/iotune
+            build build/{mode}/iotune: copy build/{mode}/seastar/apps/iotune/iotune
            ''').format(**locals()))
        f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE | always\n'.format(**locals()))
        f.write('  pool = submodule_pool\n')
@@ -1450,7 +1481,7 @@ with open(buildfile_tmp, 'w') as f:
        rule configure
          command = {python} configure.py $configure_args
          generator = 1
-        build build.ninja: configure | configure.py seastar/configure.py
+        build build.ninja: configure | configure.py
        rule cscope
            command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
            description = CSCOPE
--- a/converting_mutation_partition_applier.hh
+++ b/converting_mutation_partition_applier.hh
@@ -21,6 +21,9 @@

 #pragma once

+#include "types/user.hh"
+#include "concrete_types.hh"
+
 #include "mutation_partition_view.hh"
 #include "mutation_partition.hh"
 #include "schema.hh"
@@ -35,8 +38,8 @@ class converting_mutation_partition_applier : public mutation_partition_visitor
    const column_mapping& _visited_column_mapping;
    deletable_row* _current_row;
 private:
-    static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
-        return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
+    static bool is_compatible(const column_definition& new_def, const abstract_type& old_type, column_kind kind) {
+        return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(old_type);
    }
    static atomic_cell upgrade_cell(const abstract_type& new_type, const abstract_type& old_type, atomic_cell_view cell,
                                    atomic_cell::collection_member cm = atomic_cell::collection_member::no) {
@@ -49,32 +52,59 @@ private:
            return atomic_cell(new_type, cell);
        }
    }
-    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
+    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, atomic_cell_view cell) {
        if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
            return;
        }
-        dst.apply(new_def, upgrade_cell(*new_def.type, *old_type, cell));
+        dst.apply(new_def, upgrade_cell(*new_def.type, old_type, cell));
    }
-    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
+    static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, collection_mutation_view cell) {
        if (!is_compatible(new_def, old_type, kind)) {
            return;
        }
-      cell.data.with_linearized([&] (bytes_view cell_bv) {
-        auto new_ctype = static_pointer_cast<const collection_type_impl>(new_def.type);
-        auto old_ctype = static_pointer_cast<const collection_type_impl>(old_type);
-        auto old_view = old_ctype->deserialize_mutation_form(cell_bv);

-        collection_type_impl::mutation new_view;
+      cell.with_deserialized(old_type, [&] (collection_mutation_view_description old_view) {
+        collection_mutation_description new_view;
        if (old_view.tomb.timestamp > new_def.dropped_at()) {
            new_view.tomb = old_view.tomb;
        }
-        for (auto& c : old_view.cells) {
-            if (c.second.timestamp() > new_def.dropped_at()) {
-                new_view.cells.emplace_back(c.first, upgrade_cell(*new_ctype->value_comparator(), *old_ctype->value_comparator(), c.second, atomic_cell::collection_member::yes));
+
+        visit(old_type, make_visitor(
+            [&] (const collection_type_impl& old_ctype) {
+                assert(new_def.type->is_collection()); // because is_compatible
+                auto& new_ctype = static_cast<const collection_type_impl&>(*new_def.type);
+
+                auto& new_value_type = *new_ctype.value_comparator();
+                auto& old_value_type = *old_ctype.value_comparator();
+
+                for (auto& c : old_view.cells) {
+                    if (c.second.timestamp() > new_def.dropped_at()) {
+                        new_view.cells.emplace_back(c.first, upgrade_cell(
+                                new_value_type, old_value_type, c.second, atomic_cell::collection_member::yes));
+                    }
+                }
+            },
+            [&] (const user_type_impl& old_utype) {
+                assert(new_def.type->is_user_type()); // because is_compatible
+                auto& new_utype = static_cast<const user_type_impl&>(*new_def.type);
+
+                for (auto& c : old_view.cells) {
+                    if (c.second.timestamp() > new_def.dropped_at()) {
+                        auto idx = deserialize_field_index(c.first);
+                        assert(idx < new_utype.size() && idx < old_utype.size());
+
+                        new_view.cells.emplace_back(c.first, upgrade_cell(
+                                *new_utype.type(idx), *old_utype.type(idx), c.second, atomic_cell::collection_member::yes));
+                    }
+                }
+            },
+            [&] (const abstract_type& o) {
+                throw std::runtime_error(format("not a multi-cell type: {}", o.name()));
            }
-        }
+        ));
+
        if (new_view.tomb || !new_view.cells.empty()) {
-            dst.apply(new_def, new_ctype->serialize_mutation_form(std::move(new_view)));
+            dst.apply(new_def, new_view.serialize(*new_def.type));
        }
      });
    }
@@ -100,7 +130,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
+            accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), cell);
        }
    }

@@ -108,7 +138,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
+            accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), collection);
        }
    }

@@ -131,7 +161,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), cell);
        }
    }

@@ -139,7 +169,7 @@ public:
        const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
        const column_definition* def = _p_schema.get_column_definition(col.name());
        if (def) {
-            accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
+            accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), collection);
        }
    }

@@ -147,9 +177,9 @@ public:
    // Cells must have monotonic names.
    static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
        if (new_def.is_atomic()) {
-            accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
+            accept_cell(dst, kind, new_def, *old_def.type, cell.as_atomic_cell(old_def));
        } else {
-            accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
+            accept_cell(dst, kind, new_def, *old_def.type, cell.as_collection_mutation());
        }
    }
 };
--- a/cql3/Cql.g
+++ b/cql3/Cql.g
@@ -524,6 +524,7 @@ usingClauseObjective[::shared_ptr<cql3::attributes::raw> attrs]
 */
 updateStatement returns [::shared_ptr<raw::update_statement> expr]
    @init {
+        bool if_exists = false;
        auto attrs = ::make_shared<cql3::attributes::raw>();
        std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, ::shared_ptr<cql3::operation::raw_update>>> operations;
    }
@@ -531,13 +532,14 @@ updateStatement returns [::shared_ptr<raw::update_statement> expr]
      ( usingClause[attrs] )?
      K_SET columnOperation[operations] (',' columnOperation[operations])*
      K_WHERE wclause=whereClause
-      ( K_IF conditions=updateConditions )?
+      ( K_IF (K_EXISTS{ if_exists = true; } | conditions=updateConditions) )?
      {
          return ::make_shared<raw::update_statement>(std::move(cf),
                                                  std::move(attrs),
                                                  std::move(operations),
                                                  std::move(wclause),
-                                                  std::move(conditions));
+                                                  std::move(conditions),
+                                                  if_exists);
     }
    ;

@@ -581,6 +583,7 @@ deleteSelection returns [std::vector<::shared_ptr<cql3::operation::raw_deletion>
 deleteOp returns [::shared_ptr<cql3::operation::raw_deletion> op]
    : c=cident                { $op = ::make_shared<cql3::operation::column_deletion>(std::move(c)); }
    | c=cident '[' t=term ']' { $op = ::make_shared<cql3::operation::element_deletion>(std::move(c), std::move(t)); }
+    | c=cident '.' field=ident { $op = ::make_shared<cql3::operation::field_deletion>(std::move(c), std::move(field)); }
    ;

 usingClauseDelete[::shared_ptr<cql3::attributes::raw> attrs]
@@ -1396,8 +1399,9 @@ columnOperation[operations_type& operations]

 columnOperationDifferentiator[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
    : '=' normalColumnOperation[operations, key]
-    | '[' k=term ']' specializedColumnOperation[operations, key, k, false]
-    | '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' specializedColumnOperation[operations, key, k, true]
+    | '[' k=term ']' collectionColumnOperation[operations, key, k, false]
+    | '.' field=ident udtColumnOperation[operations, key, field]
+    | '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' collectionColumnOperation[operations, key, k, true]
    ;

 normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
@@ -1440,31 +1444,38 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
      }
    ;

-specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
-                                                 shared_ptr<cql3::operation::raw_update>>>& operations,
-                           shared_ptr<cql3::column_identifier::raw> key,
-                           shared_ptr<cql3::term::raw> k,
-                           bool by_uuid]
-
+collectionColumnOperation[operations_type& operations,
+                          shared_ptr<cql3::column_identifier::raw> key,
+                          shared_ptr<cql3::term::raw> k,
+                          bool by_uuid]
    : '=' t=term
      {
          add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
      }
    ;

+udtColumnOperation[operations_type& operations,
+                   shared_ptr<cql3::column_identifier::raw> key,
+                   shared_ptr<cql3::column_identifier> field]
+    : '=' t=term
+      {
+          add_raw_update(operations, std::move(key), make_shared<cql3::operation::set_field>(std::move(field), std::move(t)));
+      }
+    ;
+
 columnCondition[conditions_type& conditions]
    // Note: we'll reject duplicates later
    : key=cident
-        ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, *op)); }
+        ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, {}, *op)); }
        | K_IN
-            ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(values)); }
-            | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(marker)); }
+            ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, {}, values)); }
+            | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, marker, {})); }
            )
        | '[' element=term ']'
-            ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::collection_condition(t, element, *op)); }
+            ( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, element, *op)); }
            | K_IN
-                ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, values)); }
-                | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, marker)); }
+                ( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, {}, values)); }
+                | marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, marker, {})); }
                )
            )
        )
--- a/cql3/abstract_marker.cc
+++ b/cql3/abstract_marker.cc
@@ -45,6 +45,7 @@
 #include "cql3/lists.hh"
 #include "cql3/maps.hh"
 #include "cql3/sets.hh"
+#include "cql3/user_types.hh"
 #include "types/list.hh"

 namespace cql3 {
@@ -68,19 +69,22 @@ abstract_marker::raw::raw(int32_t bind_index)

 ::shared_ptr<term> abstract_marker::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
 {
-    auto receiver_type = ::dynamic_pointer_cast<const collection_type_impl>(receiver->type);
-    if (receiver_type == nullptr) {
-        return ::make_shared<constants::marker>(_bind_index, receiver);
+    if (receiver->type->is_collection()) {
+        if (receiver->type->get_kind() == abstract_type::kind::list) {
+            return ::make_shared<lists::marker>(_bind_index, receiver);
+        } else if (receiver->type->get_kind() == abstract_type::kind::set) {
+            return ::make_shared<sets::marker>(_bind_index, receiver);
+        } else if (receiver->type->get_kind() == abstract_type::kind::map) {
+            return ::make_shared<maps::marker>(_bind_index, receiver);
+        }
+        assert(0);
    }
-    if (receiver_type->get_kind() == abstract_type::kind::list) {
-        return ::make_shared<lists::marker>(_bind_index, receiver);
-    } else if (receiver_type->get_kind() == abstract_type::kind::set) {
-        return ::make_shared<sets::marker>(_bind_index, receiver);
-    } else if (receiver_type->get_kind() == abstract_type::kind::map) {
-        return ::make_shared<maps::marker>(_bind_index, receiver);
+
+    if (receiver->type->is_user_type()) {
+        return ::make_shared<user_types::marker>(_bind_index, receiver);
    }
-    assert(0);
-    return shared_ptr<term>();
+
+    return ::make_shared<constants::marker>(_bind_index, receiver);
 }

 assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {
--- a/cql3/column_condition.cc
+++ b/cql3/column_condition.cc
@@ -45,6 +45,8 @@
 #include "lists.hh"
 #include "maps.hh"
 #include <boost/range/algorithm_ext/push_back.hpp>
+#include "types/map.hh"
+#include "types/list.hh"

 namespace {

@@ -61,8 +63,54 @@ void validate_operation_on_durations(const abstract_type& type, const cql3::oper
    }
 }

+int is_satisfied_by(const cql3::operator_type &op, const abstract_type& cell_type,
+        const abstract_type& param_type, const data_value& cell_value, const bytes& param) {
+
+        int rc;
+        // For multi-cell sets and lists, cell value is represented as a map,
+        // thanks to collections_as_maps flag in partition_slice. param, however,
+        // is represented as a set or list type.
+        // We must implement an own compare of two different representations
+        // to compare the two.
+        if (cell_type.is_map() && cell_type.is_multi_cell() && param_type.is_listlike()) {
+            const listlike_collection_type_impl& list_type = static_cast<const listlike_collection_type_impl&>(param_type);
+            const map_type_impl& map_type = static_cast<const map_type_impl&>(cell_type);
+            assert(list_type.is_multi_cell());
+            // Inverse comparison result since the order of arguments is inverse.
+            rc = -list_type.compare_with_map(map_type, param, map_type.decompose(cell_value));
+        } else {
+            rc = cell_type.compare(cell_type.decompose(cell_value), param);
+        }
+        if (op == cql3::operator_type::EQ) {
+            return rc == 0;
+        } else if (op == cql3::operator_type::NEQ) {
+            return rc != 0;
+        } else if (op == cql3::operator_type::GTE) {
+            return rc >= 0;
+        } else if (op == cql3::operator_type::LTE) {
+            return rc <= 0;
+        } else if (op == cql3::operator_type::GT) {
+            return rc > 0;
+        } else if (op == cql3::operator_type::LT) {
+            return rc < 0;
+        }
+        assert(false);
+        return false;
 }

+// Read the list index from key and check that list index is not
+// negative. The negative range check repeats Cassandra behaviour.
+uint32_t read_and_check_list_index(const cql3::raw_value_view& key) {
+    // The list element type is always int32_type, see lists::index_spec_of
+    int32_t idx = read_simple_exactly<int32_t>(to_bytes(key));
+    if (idx < 0) {
+        throw exceptions::invalid_request_exception(format("Invalid negative list index {}", idx));
+    }
+    return static_cast<uint32_t>(idx);
+}
+
+} // end of anonymous namespace
+
 namespace cql3 {

 bool
@@ -92,7 +140,134 @@ void column_condition::collect_marker_specificaton(::shared_ptr<variable_specifi
            value->collect_marker_specification(bound_names);
        }
    }
-    _value->collect_marker_specification(bound_names);
+    if (_value) {
+        _value->collect_marker_specification(bound_names);
+    }
+}
+
+bool column_condition::applies_to(const data_value* cell_value, const query_options& options) const {
+
+    // Cassandra condition support has a few quirks:
+    // - only a simple conjunct of predicates is supported "predicate AND predicate AND ..."
+    // - a predicate can operate on a column or a collection element, which must always be
+    // on the right side: "a = 3" or "collection['key'] IN (1,2,3)"
+    // - parameter markers are allowed on the right hand side only
+    // - only <, >, >=, <=, != and IN predicates are supported.
+    // - NULLs and missing values are treated differently from the WHERE clause:
+    // a term or cell in IF clause is allowed to be NULL or compared with NULL,
+    // and NULL value is treated just like any other value in the domain (there is no
+    // three-value logic or UNKNOWN like in SQL).
+    // - empty sets/lists/maps are treated differently when comparing with NULLs depending on
+    // whether the object is frozen or not. An empty *frozen* set/map/list is not equal to NULL.
+    //  An empty *multi-cell* set/map/list is identical to NULL.
+    // The code below implements these rules in a way compatible with Cassandra.
+
+    // Use a map/list value instead of entire collection if a key is present in the predicate.
+    if (_collection_element != nullptr && cell_value != nullptr) {
+        // Checked in column_condition::raw::prepare()
+        assert(cell_value->type()->is_collection());
+        const collection_type_impl& cell_type = static_cast<const collection_type_impl&>(*cell_value->type());
+
+        cql3::raw_value_view key = _collection_element->bind_and_get(options);
+        if (key.is_unset_value()) {
+            throw exceptions::invalid_request_exception(
+                    format("Invalid 'unset' value in {} element access", cell_type.cql3_type_name()));
+        }
+        if (key.is_null()) {
+            throw exceptions::invalid_request_exception(
+                    format("Invalid null value for {} element access", cell_type.cql3_type_name()));
+        }
+        if (cell_type.is_map()) {
+            // If a collection is multi-cell and not frozen, it is returned as a map even if the
+            // underlying data type is "set" or "list". This is controlled by
+            // partition_slice::collections_as_maps enum, which is set when preparing a read command
+            // object. Representing a list as a map<timeuuid, listval> is necessary to identify the list field
+            // being updated, e.g. in case of UPDATE t SET list[3] = null WHERE a = 1 IF list[3]
+            // = 'key'
+            const map_type_impl& map_type = static_cast<const map_type_impl&>(cell_type);
+            // A map is serialized as a vector of data value pairs.
+            const std::vector<std::pair<data_value, data_value>>& map = map_type.from_value(*cell_value);
+            if (column.type->is_map()) {
+                // We're working with a map *type*, not only map *representation*.
+                with_linearized(*key, [&map, &map_type, &cell_value] (bytes_view key) {
+                    auto end = map.end();
+                    const auto& map_key_type = *map_type.get_keys_type();
+                    auto less = [&map_key_type](const std::pair<data_value, data_value>& value, bytes_view key) {
+                        return map_key_type.less(map_key_type.decompose(value.first), key);
+                    };
+                    // Map elements are sorted by key.
+                    auto it = std::lower_bound(map.begin(), end, key, less);
+                    if (it != end && map_key_type.equal(map_key_type.decompose(it->first), key)) {
+                        cell_value = &it->second;
+                    } else {
+                        cell_value = nullptr;
+                    }
+                });
+            } else if (column.type->is_list()) {
+                // We're working with a list type, represented as map.
+                uint32_t idx = read_and_check_list_index(key);
+                cell_value = idx >= map.size() ? nullptr : &map[idx].second;
+            } else {
+                // Syntax like "set_column['key'] = constant" is invalid.
+                assert(false);
+            }
+        } else if (cell_type.is_list()) {
+            // This is a *frozen* list.
+            const list_type_impl& list_type = static_cast<const list_type_impl&>(cell_type);
+            const std::vector<data_value>& list = list_type.from_value(*cell_value);
+            uint32_t idx = read_and_check_list_index(key);
+            cell_value = idx >= list.size() ? nullptr : &list[idx];
+        } else {
+            assert(false);
+        }
+    }
+
+    if (_op.is_compare()) {
+        // <, >, >=, <=, !=
+        cql3::raw_value_view param = _value->bind_and_get(options);
+
+        if (param.is_unset_value()) {
+            throw exceptions::invalid_request_exception("Invalid 'unset' value in condition");
+        }
+        if (param.is_null()) {
+            if (_op == operator_type::EQ) {
+                return cell_value == nullptr;
+            } else if (_op == operator_type::NEQ) {
+                return cell_value != nullptr;
+            } else {
+                throw exceptions::invalid_request_exception(format("Invalid comparison with null for operator \"{}\"", _op));
+            }
+        } else if (cell_value == nullptr) {
+            // The condition parameter is not null, so only NEQ can return true
+            return _op == operator_type::NEQ;
+        }
+        // type::validate() is called by bind_and_get(), so it's safe to pass to_bytes() result
+        // directly to compare.
+        return is_satisfied_by(_op, *cell_value->type(), *column.type, *cell_value, to_bytes(param));
+    }
+    assert(_op == operator_type::IN);
+
+    std::vector<bytes_opt> in_values;
+
+    if (_value) {
+        auto&& lval = dynamic_pointer_cast<multi_item_terminal>(_value->bind(options));
+        if (!lval) {
+            throw exceptions::invalid_request_exception("Invalid null value for IN condition");
+        }
+        in_values = std::move(lval->get_elements());
+    } else {
+        for (auto&& v : _in_values) {
+            in_values.emplace_back(to_bytes_opt(v->bind_and_get(options)));
+        }
+    }
+    // If cell value is NULL, IN list must contain NULL or an empty set/list. Otherwise it must contain cell value.
+    if (cell_value) {
+        return std::any_of(in_values.begin(), in_values.end(), [this, cell_value] (const bytes_opt& value) {
+            return value.has_value() && is_satisfied_by(operator_type::EQ, *cell_value->type(), *column.type, *cell_value, *value);
+        });
+    } else {
+        return std::any_of(in_values.begin(), in_values.end(), [] (const bytes_opt& value) { return !value.has_value() || value->empty(); });
+    }
 }

 ::shared_ptr<column_condition>
@@ -100,61 +275,54 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
    if (receiver.type->is_counter()) {
        throw exceptions::invalid_request_exception("Conditions on counters are not supported");
    }
+    shared_ptr<term> collection_element_term;
+    shared_ptr<column_specification> value_spec = receiver.column_specification;

-    if (!_collection_element) {
-        if (_op == operator_type::IN) {
-            if (_in_values.empty()) { // ?
-                return column_condition::in_condition(receiver, _in_marker->prepare(db, keyspace, receiver.column_specification));
-            }
-
-            std::vector<::shared_ptr<term>> terms;
-            for (auto&& value : _in_values) {
-                terms.push_back(value->prepare(db, keyspace, receiver.column_specification));
-            }
-            return column_condition::in_condition(receiver, std::move(terms));
+    if (_collection_element) {
+        if (!receiver.type->is_collection()) {
+            throw exceptions::invalid_request_exception(format("Invalid element access syntax for non-collection column {}",
+                        receiver.name_as_text()));
+        }
+        // Pass  a correct type specification to the collection_element->prepare(), so that it can
+        // later be used to validate the parameter type is compatible with receiver type.
+        shared_ptr<column_specification> element_spec;
+        auto ctype = static_cast<const collection_type_impl*>(receiver.type.get());
+        if (ctype->get_kind() == abstract_type::kind::list) {
+            element_spec = lists::index_spec_of(receiver.column_specification);
+            value_spec = lists::value_spec_of(receiver.column_specification);
+        } else if (ctype->get_kind() == abstract_type::kind::map) {
+            element_spec = maps::key_spec_of(*receiver.column_specification);
+            value_spec = maps::value_spec_of(*receiver.column_specification);
+        } else if (ctype->get_kind() == abstract_type::kind::set) {
+            throw exceptions::invalid_request_exception(format("Invalid element access syntax for set column {}",
+                        receiver.name_as_text()));
        } else {
-            validate_operation_on_durations(*receiver.type, _op);
-            return column_condition::condition(receiver, _value->prepare(db, keyspace, receiver.column_specification), _op);
+            throw exceptions::invalid_request_exception(
+                    format("Unsupported collection type {} in a condition with element access", ctype->cql3_type_name()));
        }
+        collection_element_term = _collection_element->prepare(db, keyspace, element_spec);
    }

-    if (!receiver.type->is_collection()) {
-        throw exceptions::invalid_request_exception(format("Invalid element access syntax for non-collection column {}", receiver.name_as_text()));
-    }
-
-    shared_ptr<column_specification> element_spec, value_spec;
-    auto ctype = static_cast<const collection_type_impl*>(receiver.type.get());
-    if (ctype->get_kind() == abstract_type::kind::list) {
-        element_spec = lists::index_spec_of(receiver.column_specification);
-        value_spec = lists::value_spec_of(receiver.column_specification);
-    } else if (ctype->get_kind() == abstract_type::kind::map) {
-        element_spec = maps::key_spec_of(*receiver.column_specification);
-        value_spec = maps::value_spec_of(*receiver.column_specification);
-    } else if (ctype->get_kind() == abstract_type::kind::set) {
-        throw exceptions::invalid_request_exception(format("Invalid element access syntax for set column {}", receiver.name()));
-    } else {
-        abort();
-    }
-
-    if (_op == operator_type::IN) {
-        if (_in_values.empty()) {
-            return column_condition::in_condition(receiver,
-                    _collection_element->prepare(db, keyspace, element_spec),
-                    _in_marker->prepare(db, keyspace, value_spec));
-        }
-        std::vector<shared_ptr<term>> terms;
-        terms.reserve(_in_values.size());
-        boost::push_back(terms, _in_values
-                                | boost::adaptors::transformed(std::bind(&term::raw::prepare, std::placeholders::_1, std::ref(db), std::ref(keyspace), value_spec)));
-        return column_condition::in_condition(receiver, _collection_element->prepare(db, keyspace, element_spec), terms);
-    } else {
+    if (_op.is_compare()) {
        validate_operation_on_durations(*receiver.type, _op);
-
-        return column_condition::condition(receiver,
-                _collection_element->prepare(db, keyspace, element_spec),
-                _value->prepare(db, keyspace, value_spec),
-                _op);
+        return column_condition::condition(receiver, collection_element_term, _value->prepare(db, keyspace, value_spec), _op);
    }
+    if (_op != operator_type::IN) {
+        throw exceptions::invalid_request_exception(format("Unsupported operator type {} in a condition ", _op));
+    }
+
+    if (_in_marker) {
+        assert(_in_values.empty());
+        shared_ptr<term> multi_item_term = _in_marker->prepare(db, keyspace, value_spec);
+        return column_condition::in_condition(receiver, collection_element_term, multi_item_term, {});
+    }
+    // Both _in_values and in _in_marker can be missing in case of empty IN list: "a IN ()"
+    std::vector<::shared_ptr<term>> terms;
+    terms.reserve(_in_values.size());
+    for (auto&& value : _in_values) {
+        terms.push_back(value->prepare(db, keyspace, value_spec));
+    }
+    return column_condition::in_condition(receiver, collection_element_term, {}, std::move(terms));
 }

-}
+} // end of namespace cql3
--- a/cql3/column_condition.hh
+++ b/cql3/column_condition.hh
@@ -52,11 +52,18 @@ namespace cql3 {
 */
 class column_condition final {
 public:
+    // If _collection_element is not zero, this defines the receiver cell, not the entire receiver
+    // column.
+    // E.g. if column type is list<string> and expression is "a = ['test']", then the type of the
+    // column definition below is list<string>. If expression is "a[0] = 'test'", then the column
+    // object stands for the string cell. See column_condition::raw::prepare() for details.
    const column_definition& column;
 private:
    // For collection, when testing the equality of a specific element, nullptr otherwise.
    ::shared_ptr<term> _collection_element;
+    // A literal value for comparison predicates or a multi item terminal for "a IN ?"
    ::shared_ptr<term> _value;
+    // List of terminals for "a IN (value, value, ...)"
    std::vector<::shared_ptr<term>> _in_values;
    const operator_type& _op;
 public:
@@ -72,41 +79,6 @@ public:
            assert(_in_values.empty());
        }
    }
-
-    static ::shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> value, const operator_type& op) {
-        return ::make_shared<column_condition>(def, ::shared_ptr<term>{}, std::move(value), std::vector<::shared_ptr<term>>{}, op);
-    }
-
-    static ::shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> collection_element,
-            ::shared_ptr<term> value, const operator_type& op) {
-        return ::make_shared<column_condition>(def, std::move(collection_element), std::move(value),
-            std::vector<::shared_ptr<term>>{}, op);
-    }
-
-    static ::shared_ptr<column_condition> in_condition(const column_definition& def, std::vector<::shared_ptr<term>> in_values) {
-        return ::make_shared<column_condition>(def, ::shared_ptr<term>{}, ::shared_ptr<term>{},
-            std::move(in_values), operator_type::IN);
-    }
-
-    static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
-            std::vector<::shared_ptr<term>> in_values) {
-        return ::make_shared<column_condition>(def, std::move(collection_element), ::shared_ptr<term>{},
-            std::move(in_values), operator_type::IN);
-    }
-
-    static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> in_marker) {
-        return ::make_shared<column_condition>(def, ::shared_ptr<term>{}, std::move(in_marker),
-            std::vector<::shared_ptr<term>>{}, operator_type::IN);
-    }
-
-    static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
-        ::shared_ptr<term> in_marker) {
-        return ::make_shared<column_condition>(def, std::move(collection_element), std::move(in_marker),
-            std::vector<::shared_ptr<term>>{}, operator_type::IN);
-    }
-
-    bool uses_function(const sstring& ks_name, const sstring& function_name);
-public:
    /**
     * Collects the column specification for the bind variables of this operation.
     *
@@ -115,592 +87,34 @@ public:
     */
    void collect_marker_specificaton(::shared_ptr<variable_specifications> bound_names);

-#if 0
-    public ColumnCondition.Bound bind(QueryOptions options) throws InvalidRequestException
-    {
-        boolean isInCondition = operator == Operator.IN;
-        if (column.type instanceof CollectionType)
-        {
-            if (collectionElement == null)
-                return isInCondition ? new CollectionInBound(this, options) : new CollectionBound(this, options);
-            else
-                return isInCondition ? new ElementAccessInBound(this, options) : new ElementAccessBound(this, options);
-        }
-        return isInCondition ? new SimpleInBound(this, options) : new SimpleBound(this, options);
+    bool uses_function(const sstring& ks_name, const sstring& function_name);
+
+    // Retrieve parameter marker values, if any, find the appropriate collection
+    // element if the cell is a collection and an element access is used in the expression,
+    // and evaluate the condition.
+    bool applies_to(const data_value* cell_value, const query_options& options) const;
+
+    // Helper constructor wrapper for  "IF col['key'] = 'foo'" or "IF col = 'foo'" */
+    static ::shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> collection_element,
+            ::shared_ptr<term> value, const operator_type& op) {
+        return ::make_shared<column_condition>(def, std::move(collection_element), std::move(value),
+            std::vector<::shared_ptr<term>>{}, op);
    }

-    public static abstract class Bound
-    {
-        public final ColumnDefinition column;
-        public final Operator operator;
-
-        protected Bound(ColumnDefinition column, Operator operator)
-        {
-            this.column = column;
-            this.operator = operator;
-        }
-
-        /**
-         * Validates whether this condition applies to {@code current}.
-         */
-        public abstract boolean appliesTo(Composite rowPrefix, ColumnFamily current, long now) throws InvalidRequestException;
-
-        public ByteBuffer getCollectionElementValue()
-        {
-            return null;
-        }
-
-        protected boolean isSatisfiedByValue(ByteBuffer value, Cell c, AbstractType<?> type, Operator operator, long now) throws InvalidRequestException
-        {
-            ByteBuffer columnValue = (c == null || !c.isLive(now)) ? null : c.value();
-            return compareWithOperator(operator, type, value, columnValue);
-        }
-
-        /** Returns true if the operator is satisfied (i.e. "value operator otherValue == true"), false otherwise. */
-        protected boolean compareWithOperator(Operator operator, AbstractType<?> type, ByteBuffer value, ByteBuffer otherValue) throws InvalidRequestException
-        {
-            if (value == null)
-            {
-                switch (operator)
-                {
-                    case EQ:
-                        return otherValue == null;
-                    case NEQ:
-                        return otherValue != null;
-                    default:
-                        throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
-                }
-            }
-            else if (otherValue == null)
-            {
-                // the condition value is not null, so only NEQ can return true
-                return operator == Operator.NEQ;
-            }
-            int comparison = type.compare(otherValue, value);
-            switch (operator)
-            {
-                case EQ:
-                    return comparison == 0;
-                case LT:
-                    return comparison < 0;
-                case LTE:
-                    return comparison <= 0;
-                case GT:
-                    return comparison > 0;
-                case GTE:
-                    return comparison >= 0;
-                case NEQ:
-                    return comparison != 0;
-                default:
-                    // we shouldn't get IN, CONTAINS, or CONTAINS KEY here
-                    throw new AssertionError();
-            }
-        }
-
-        protected Iterator<Cell> collectionColumns(CellName collection, ColumnFamily cf, final long now)
-        {
-            // We are testing for collection equality, so we need to have the expected values *and* only those.
-            ColumnSlice[] collectionSlice = new ColumnSlice[]{ collection.slice() };
-            // Filter live columns, this makes things simpler afterwards
-            return Iterators.filter(cf.iterator(collectionSlice), new Predicate<Cell>()
-            {
-                public boolean apply(Cell c)
-                {
-                    // we only care about live columns
-                    return c.isLive(now);
-                }
-            });
-        }
+    // Helper constructor wrapper for  "IF col IN ... and IF col['key'] IN ... */
+    static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
+            ::shared_ptr<term> in_marker, std::vector<::shared_ptr<term>> in_values) {
+        return ::make_shared<column_condition>(def, std::move(collection_element), std::move(in_marker),
+            std::move(in_values), operator_type::IN);
    }

-    /**
-     * A condition on a single non-collection column. This does not support IN operators (see SimpleInBound).
-     */
-    static class SimpleBound extends Bound
-    {
-        public final ByteBuffer value;
-
-        private SimpleBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
-        {
-            super(condition.column, condition.operator);
-            assert !(column.type instanceof CollectionType) && condition.collectionElement == null;
-            assert condition.operator != Operator.IN;
-            this.value = condition.value.bindAndGet(options);
-        }
-
-        public boolean appliesTo(Composite rowPrefix, ColumnFamily current, long now) throws InvalidRequestException
-        {
-            CellName name = current.metadata().comparator.create(rowPrefix, column);
-            return isSatisfiedByValue(value, current.getColumn(name), column.type, operator, now);
-        }
-    }
-
-    /**
-     * An IN condition on a single non-collection column.
-     */
-    static class SimpleInBound extends Bound
-    {
-        public final List<ByteBuffer> inValues;
-
-        private SimpleInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
-        {
-            super(condition.column, condition.operator);
-            assert !(column.type instanceof CollectionType) && condition.collectionElement == null;
-            assert condition.operator == Operator.IN;
-            if (condition.inValues == null)
-                this.inValues = ((Lists.Marker) condition.value).bind(options).getElements();
-            else
-            {
-                this.inValues = new ArrayList<>(condition.inValues.size());
-                for (Term value : condition.inValues)
-                    this.inValues.add(value.bindAndGet(options));
-            }
-        }
-
-        public boolean appliesTo(Composite rowPrefix, ColumnFamily current, long now) throws InvalidRequestException
-        {
-            CellName name = current.metadata().comparator.create(rowPrefix, column);
-            for (ByteBuffer value : inValues)
-            {
-                if (isSatisfiedByValue(value, current.getColumn(name), column.type, Operator.EQ, now))
-                    return true;
-            }
-            return false;
-        }
-    }
-
-    /** A condition on an element of a collection column. IN operators are not supported here, see ElementAccessInBound. */
-    static class ElementAccessBound extends Bound
-    {
-        public final ByteBuffer collectionElement;
-        public final ByteBuffer value;
-
-        private ElementAccessBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
-        {
-            super(condition.column, condition.operator);
-            assert column.type instanceof CollectionType && condition.collectionElement != null;
-            assert condition.operator != Operator.IN;
-            this.collectionElement = condition.collectionElement.bindAndGet(options);
-            this.value = condition.value.bindAndGet(options);
-        }
-
-        public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
-        {
-            if (collectionElement == null)
-                throw new InvalidRequestException("Invalid null value for " + (column.type instanceof MapType ? "map" : "list") + " element access");
-
-            if (column.type instanceof MapType)
-            {
-                MapType mapType = (MapType) column.type;
-                if (column.type.isMultiCell())
-                {
-                    Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column, collectionElement));
-                    return isSatisfiedByValue(value, cell, mapType.getValuesType(), operator, now);
-                }
-                else
-                {
-                    Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column));
-                    ByteBuffer mapElementValue = cell.isLive(now) ? mapType.getSerializer().getSerializedValue(cell.value(), collectionElement, mapType.getKeysType())
-                                                                  : null;
-                    return compareWithOperator(operator, mapType.getValuesType(), value, mapElementValue);
-                }
-            }
-
-            // sets don't have element access, so it's a list
-            ListType listType = (ListType) column.type;
-            if (column.type.isMultiCell())
-            {
-                ByteBuffer columnValue = getListItem(
-                        collectionColumns(current.metadata().comparator.create(rowPrefix, column), current, now),
-                        getListIndex(collectionElement));
-                return compareWithOperator(operator, listType.getElementsType(), value, columnValue);
-            }
-            else
-            {
-                Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column));
-                ByteBuffer listElementValue = cell.isLive(now) ? listType.getSerializer().getElement(cell.value(), getListIndex(collectionElement))
-                                                               : null;
-                return compareWithOperator(operator, listType.getElementsType(), value, listElementValue);
-            }
-        }
-
-        static int getListIndex(ByteBuffer collectionElement) throws InvalidRequestException
-        {
-            int idx = ByteBufferUtil.toInt(collectionElement);
-            if (idx < 0)
-                throw new InvalidRequestException(String.format("Invalid negative list index %d", idx));
-            return idx;
-        }
-
-        static ByteBuffer getListItem(Iterator<Cell> iter, int index)
-        {
-            int adv = Iterators.advance(iter, index);
-            if (adv == index && iter.hasNext())
-                return iter.next().value();
-            else
-                return null;
-        }
-
-        public ByteBuffer getCollectionElementValue()
-        {
-            return collectionElement;
-        }
-    }
-
-    static class ElementAccessInBound extends Bound
-    {
-        public final ByteBuffer collectionElement;
-        public final List<ByteBuffer> inValues;
-
-        private ElementAccessInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
-        {
-            super(condition.column, condition.operator);
-            assert column.type instanceof CollectionType && condition.collectionElement != null;
-            this.collectionElement = condition.collectionElement.bindAndGet(options);
-
-            if (condition.inValues == null)
-                this.inValues = ((Lists.Marker) condition.value).bind(options).getElements();
-            else
-            {
-                this.inValues = new ArrayList<>(condition.inValues.size());
-                for (Term value : condition.inValues)
-                    this.inValues.add(value.bindAndGet(options));
-            }
-        }
-
-        public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
-        {
-            if (collectionElement == null)
-                throw new InvalidRequestException("Invalid null value for " + (column.type instanceof MapType ? "map" : "list") + " element access");
-
-            CellNameType nameType = current.metadata().comparator;
-            if (column.type instanceof MapType)
-            {
-                MapType mapType = (MapType) column.type;
-                AbstractType<?> valueType = mapType.getValuesType();
-                if (column.type.isMultiCell())
-                {
-                    CellName name = nameType.create(rowPrefix, column, collectionElement);
-                    Cell item = current.getColumn(name);
-                    for (ByteBuffer value : inValues)
-                    {
-                        if (isSatisfiedByValue(value, item, valueType, Operator.EQ, now))
-                            return true;
-                    }
-                    return false;
-                }
-                else
-                {
-                    Cell cell = current.getColumn(nameType.create(rowPrefix, column));
-                    ByteBuffer mapElementValue  = null;
-                    if (cell != null && cell.isLive(now))
-                        mapElementValue =  mapType.getSerializer().getSerializedValue(cell.value(), collectionElement, mapType.getKeysType());
-                    for (ByteBuffer value : inValues)
-                    {
-                        if (value == null)
-                        {
-                            if (mapElementValue == null)
-                                return true;
-                            continue;
-                        }
-                        if (valueType.compare(value, mapElementValue) == 0)
-                            return true;
-                    }
-                    return false;
-                }
-            }
-
-            ListType listType = (ListType) column.type;
-            AbstractType<?> elementsType = listType.getElementsType();
-            if (column.type.isMultiCell())
-            {
-                ByteBuffer columnValue = ElementAccessBound.getListItem(
-                        collectionColumns(nameType.create(rowPrefix, column), current, now),
-                        ElementAccessBound.getListIndex(collectionElement));
-
-                for (ByteBuffer value : inValues)
-                {
-                    if (compareWithOperator(Operator.EQ, elementsType, value, columnValue))
-                        return true;
-                }
-            }
-            else
-            {
-                Cell cell = current.getColumn(nameType.create(rowPrefix, column));
-                ByteBuffer listElementValue = null;
-                if (cell != null && cell.isLive(now))
-                    listElementValue = listType.getSerializer().getElement(cell.value(), ElementAccessBound.getListIndex(collectionElement));
-
-                for (ByteBuffer value : inValues)
-                {
-                    if (value == null)
-                    {
-                        if (listElementValue == null)
-                            return true;
-                        continue;
-                    }
-                    if (elementsType.compare(value, listElementValue) == 0)
-                        return true;
-                }
-            }
-            return false;
-        }
-    }
-
-    /** A condition on an entire collection column. IN operators are not supported here, see CollectionInBound. */
-    static class CollectionBound extends Bound
-    {
-        private final Term.Terminal value;
-
-        private CollectionBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
-        {
-            super(condition.column, condition.operator);
-            assert column.type.isCollection() && condition.collectionElement == null;
-            assert condition.operator != Operator.IN;
-            this.value = condition.value.bind(options);
-        }
-
-        public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
-        {
-            CollectionType type = (CollectionType)column.type;
-
-            if (type.isMultiCell())
-            {
-                Iterator<Cell> iter = collectionColumns(current.metadata().comparator.create(rowPrefix, column), current, now);
-                if (value == null)
-                {
-                    if (operator == Operator.EQ)
-                        return !iter.hasNext();
-                    else if (operator == Operator.NEQ)
-                        return iter.hasNext();
-                    else
-                        throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
-                }
-
-                return valueAppliesTo(type, iter, value, operator);
-            }
-
-            // frozen collections
-            Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column));
-            if (value == null)
-            {
-                if (operator == Operator.EQ)
-                    return cell == null || !cell.isLive(now);
-                else if (operator == Operator.NEQ)
-                    return cell != null && cell.isLive(now);
-                else
-                    throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
-            }
-
-            // make sure we use v3 serialization format for comparison
-            ByteBuffer conditionValue;
-            if (type.kind == CollectionType.Kind.LIST)
-                conditionValue = ((Lists.Value) value).getWithProtocolVersion(Server.VERSION_3);
-            else if (type.kind == CollectionType.Kind.SET)
-                conditionValue = ((Sets.Value) value).getWithProtocolVersion(Server.VERSION_3);
-            else
-                conditionValue = ((Maps.Value) value).getWithProtocolVersion(Server.VERSION_3);
-
-            return compareWithOperator(operator, type, conditionValue, cell.value());
-        }
-
-        static boolean valueAppliesTo(CollectionType type, Iterator<Cell> iter, Term.Terminal value, Operator operator)
-        {
-            if (value == null)
-                return !iter.hasNext();
-
-            switch (type.kind)
-            {
-                case LIST: return listAppliesTo((ListType)type, iter, ((Lists.Value)value).elements, operator);
-                case SET: return setAppliesTo((SetType)type, iter, ((Sets.Value)value).elements, operator);
-                case MAP: return mapAppliesTo((MapType)type, iter, ((Maps.Value)value).map, operator);
-            }
-            throw new AssertionError();
-        }
-
-        private static boolean setOrListAppliesTo(AbstractType<?> type, Iterator<Cell> iter, Iterator<ByteBuffer> conditionIter, Operator operator, boolean isSet)
-        {
-            while(iter.hasNext())
-            {
-                if (!conditionIter.hasNext())
-                    return (operator == Operator.GT) || (operator == Operator.GTE) || (operator == Operator.NEQ);
-
-                // for lists we use the cell value; for sets we use the cell name
-                ByteBuffer cellValue = isSet? iter.next().name().collectionElement() : iter.next().value();
-                int comparison = type.compare(cellValue, conditionIter.next());
-                if (comparison != 0)
-                    return evaluateComparisonWithOperator(comparison, operator);
-            }
-
-            if (conditionIter.hasNext())
-                return (operator == Operator.LT) || (operator == Operator.LTE) || (operator == Operator.NEQ);
-
-            // they're equal
-            return operator == Operator.EQ || operator == Operator.LTE || operator == Operator.GTE;
-        }
-
-        private static boolean evaluateComparisonWithOperator(int comparison, Operator operator)
-        {
-            // called when comparison != 0
-            switch (operator)
-            {
-                case EQ:
-                    return false;
-                case LT:
-                case LTE:
-                    return comparison < 0;
-                case GT:
-                case GTE:
-                    return comparison > 0;
-                case NEQ:
-                    return true;
-                default:
-                    throw new AssertionError();
-            }
-        }
-
-        static boolean listAppliesTo(ListType type, Iterator<Cell> iter, List<ByteBuffer> elements, Operator operator)
-        {
-            return setOrListAppliesTo(type.getElementsType(), iter, elements.iterator(), operator, false);
-        }
-
-        static boolean setAppliesTo(SetType type, Iterator<Cell> iter, Set<ByteBuffer> elements, Operator operator)
-        {
-            ArrayList<ByteBuffer> sortedElements = new ArrayList<>(elements.size());
-            sortedElements.addAll(elements);
-            Collections.sort(sortedElements, type.getElementsType());
-            return setOrListAppliesTo(type.getElementsType(), iter, sortedElements.iterator(), operator, true);
-        }
-
-        static boolean mapAppliesTo(MapType type, Iterator<Cell> iter, Map<ByteBuffer, ByteBuffer> elements, Operator operator)
-        {
-            Iterator<Map.Entry<ByteBuffer, ByteBuffer>> conditionIter = elements.entrySet().iterator();
-            while(iter.hasNext())
-            {
-                if (!conditionIter.hasNext())
-                    return (operator == Operator.GT) || (operator == Operator.GTE) || (operator == Operator.NEQ);
-
-                Map.Entry<ByteBuffer, ByteBuffer> conditionEntry = conditionIter.next();
-                Cell c = iter.next();
-
-                // compare the keys
-                int comparison = type.getKeysType().compare(c.name().collectionElement(), conditionEntry.getKey());
-                if (comparison != 0)
-                    return evaluateComparisonWithOperator(comparison, operator);
-
-                // compare the values
-                comparison = type.getValuesType().compare(c.value(), conditionEntry.getValue());
-                if (comparison != 0)
-                    return evaluateComparisonWithOperator(comparison, operator);
-            }
-
-            if (conditionIter.hasNext())
-                return (operator == Operator.LT) || (operator == Operator.LTE) || (operator == Operator.NEQ);
-
-            // they're equal
-            return operator == Operator.EQ || operator == Operator.LTE || operator == Operator.GTE;
-        }
-    }
-
-    public static class CollectionInBound extends Bound
-    {
-        private final List<Term.Terminal> inValues;
-
-        private CollectionInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
-        {
-            super(condition.column, condition.operator);
-            assert column.type instanceof CollectionType && condition.collectionElement == null;
-            assert condition.operator == Operator.IN;
-            inValues = new ArrayList<>();
-            if (condition.inValues == null)
-            {
-                // We have a list of serialized collections that need to be deserialized for later comparisons
-                CollectionType collectionType = (CollectionType) column.type;
-                Lists.Marker inValuesMarker = (Lists.Marker) condition.value;
-                if (column.type instanceof ListType)
-                {
-                    ListType deserializer = ListType.getInstance(collectionType.valueComparator(), false);
-                    for (ByteBuffer buffer : inValuesMarker.bind(options).elements)
-                    {
-                        if (buffer == null)
-                            this.inValues.add(null);
-                        else
-                            this.inValues.add(Lists.Value.fromSerialized(buffer, deserializer, options.getProtocolVersion()));
-                    }
-                }
-                else if (column.type instanceof MapType)
-                {
-                    MapType deserializer = MapType.getInstance(collectionType.nameComparator(), collectionType.valueComparator(), false);
-                    for (ByteBuffer buffer : inValuesMarker.bind(options).elements)
-                    {
-                        if (buffer == null)
-                            this.inValues.add(null);
-                        else
-                            this.inValues.add(Maps.Value.fromSerialized(buffer, deserializer, options.getProtocolVersion()));
-                    }
-                }
-                else if (column.type instanceof SetType)
-                {
-                    SetType deserializer = SetType.getInstance(collectionType.valueComparator(), false);
-                    for (ByteBuffer buffer : inValuesMarker.bind(options).elements)
-                    {
-                        if (buffer == null)
-                            this.inValues.add(null);
-                        else
-                            this.inValues.add(Sets.Value.fromSerialized(buffer, deserializer, options.getProtocolVersion()));
-                    }
-                }
-            }
-            else
-            {
-                for (Term value : condition.inValues)
-                    this.inValues.add(value.bind(options));
-            }
-        }
-
-        public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
-        {
-            CollectionType type = (CollectionType)column.type;
-            CellName name = current.metadata().comparator.create(rowPrefix, column);
-            if (type.isMultiCell())
-            {
-                // copy iterator contents so that we can properly reuse them for each comparison with an IN value
-                List<Cell> cells = newArrayList(collectionColumns(name, current, now));
-                for (Term.Terminal value : inValues)
-                {
-                    if (CollectionBound.valueAppliesTo(type, cells.iterator(), value, Operator.EQ))
-                        return true;
-                }
-                return false;
-            }
-            else
-            {
-                Cell cell = current.getColumn(name);
-                for (Term.Terminal value : inValues)
-                {
-                    if (value == null)
-                    {
-                        if (cell == null || !cell.isLive(now))
-                            return true;
-                    }
-                    else if (type.compare(((Term.CollectionTerminal)value).getWithProtocolVersion(Server.VERSION_3), cell.value()) == 0)
-                    {
-                        return true;
-                    }
-                }
-                return false;
-            }
-        }
-    }
-#endif
-
    class raw final {
    private:
        ::shared_ptr<term::raw> _value;
        std::vector<::shared_ptr<term::raw>> _in_values;
        ::shared_ptr<abstract_marker::in_raw> _in_marker;

-        // Can be nullptr, only used with the syntax "IF m[e] = ..." (in which case it's 'e')
+        // Can be nullptr, used with the syntax "IF m[e] = ..." (in which case it's 'e')
        ::shared_ptr<term::raw> _collection_element;
        const operator_type& _op;
    public:
@@ -716,46 +130,29 @@ public:
                , _op(op)
        { }

-        /** A condition on a column. For example: "IF col = 'foo'" */
-        static ::shared_ptr<raw> simple_condition(::shared_ptr<term::raw> value, const operator_type& op) {
-            return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{},
-                ::shared_ptr<abstract_marker::in_raw>{}, ::shared_ptr<term::raw>{}, op);
-        }
-
-        /** An IN condition on a column. For example: "IF col IN ('foo', 'bar', ...)" */
-        static ::shared_ptr<raw> simple_in_condition(std::vector<::shared_ptr<term::raw>> in_values) {
-            return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values),
-                ::shared_ptr<abstract_marker::in_raw>{}, ::shared_ptr<term::raw>{}, operator_type::IN);
-        }
-
-        /** An IN condition on a column with a single marker. For example: "IF col IN ?" */
-        static ::shared_ptr<raw> simple_in_condition(::shared_ptr<abstract_marker::in_raw> in_marker) {
-            return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::vector<::shared_ptr<term::raw>>{},
-                std::move(in_marker), ::shared_ptr<term::raw>{}, operator_type::IN);
-        }
-
-        /** A condition on a collection element. For example: "IF col['key'] = 'foo'" */
-        static ::shared_ptr<raw> collection_condition(::shared_ptr<term::raw> value, ::shared_ptr<term::raw> collection_element,
+        /** A condition on a column or collection element. For example: "IF col['key'] = 'foo'" or "IF col = 'foo'" */
+        static ::shared_ptr<raw> simple_condition(::shared_ptr<term::raw> value, ::shared_ptr<term::raw> collection_element,
                const operator_type& op) {
-            return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{}, ::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), op);
+            return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{},
+                    ::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), op);
        }

-        /** An IN condition on a collection element. For example: "IF col['key'] IN ('foo', 'bar', ...)" */
-        static ::shared_ptr<raw> collection_in_condition(::shared_ptr<term::raw> collection_element,
-                std::vector<::shared_ptr<term::raw>> in_values) {
-            return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values), ::shared_ptr<abstract_marker::in_raw>{},
-                std::move(collection_element), operator_type::IN);
-        }
-
-        /** An IN condition on a collection element with a single marker. For example: "IF col['key'] IN ?" */
-        static ::shared_ptr<raw> collection_in_condition(::shared_ptr<term::raw> collection_element,
-                ::shared_ptr<abstract_marker::in_raw> in_marker) {
-            return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::vector<::shared_ptr<term::raw>>{}, std::move(in_marker),
-                std::move(collection_element), operator_type::IN);
+        /**
+         * An IN condition on a column or a collection element. IN may contain a list of values or a single marker.
+         * For example:
+         * "IF col IN ('foo', 'bar', ...)"
+         * "IF col IN ?"
+         * "IF col['key'] IN * ('foo', 'bar', ...)"
+         * "IF col['key'] IN ?"
+         */
+        static ::shared_ptr<raw> in_condition(::shared_ptr<term::raw> collection_element,
+                ::shared_ptr<abstract_marker::in_raw> in_marker, std::vector<::shared_ptr<term::raw>> in_values) {
+            return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values), std::move(in_marker),
+                    std::move(collection_element), operator_type::IN);
        }

        ::shared_ptr<column_condition> prepare(database& db, const sstring& keyspace, const column_definition& receiver);
    };
 };

-}
+} // end of namespace cql3
--- a/cql3/constants.cc
+++ b/cql3/constants.cc
@@ -85,7 +85,7 @@ assignment_testable::test_result
 constants::literal::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
 {
    auto receiver_type = receiver->type->as_cql3_type();
-    if (receiver_type.is_collection()) {
+    if (receiver_type.is_collection() || receiver_type.is_user_type()) {
        return test_result::NOT_ASSIGNABLE;
    }
    if (!receiver_type.is_native()) {
@@ -166,10 +166,10 @@ constants::literal::prepare(database& db, const sstring& keyspace, ::shared_ptr<

 void constants::deleter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
    if (column.type->is_multi_cell()) {
-        collection_type_impl::mutation coll_m;
+        collection_mutation_description coll_m;
        coll_m.tomb = params.make_tombstone();
-        auto ctype = static_pointer_cast<const collection_type_impl>(column.type);
-        m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ctype->serialize_mutation_form(coll_m)));
+
+        m.set_cell(prefix, column, coll_m.serialize(*column.type));
    } else {
        m.set_cell(prefix, column, make_dead_cell(params));
    }
--- a/cql3/constants.hh
+++ b/cql3/constants.hh
@@ -173,7 +173,7 @@ public:
        marker(int32_t bind_index, ::shared_ptr<column_specification> receiver)
            : abstract_marker{bind_index, std::move(receiver)}
        {
-            assert(!_receiver->type->is_collection());
+            assert(!_receiver->type->is_collection() && !_receiver->type->is_user_type());
        }

        virtual cql3::raw_value_view bind_and_get(const query_options& options) override {
--- a/cql3/cql3_type.cc
+++ b/cql3/cql3_type.cc
@@ -37,7 +37,7 @@ namespace cql3 {
 cql3_type cql3_type::raw::prepare(database& db, const sstring& keyspace) {
    try {
        auto&& ks = db.find_keyspace(keyspace);
-        return prepare_internal(keyspace, ks.metadata()->user_types());
+        return prepare_internal(keyspace, *ks.metadata()->user_types());
    } catch (no_such_keyspace& nsk) {
        throw exceptions::invalid_request_exception("Unknown keyspace " + keyspace);
    }
@@ -54,6 +54,10 @@ bool cql3_type::raw::references_user_type(const sstring& name) const {
 class cql3_type::raw_type : public raw {
 private:
    cql3_type _type;
+
+    virtual sstring to_string() const override {
+        return _type.to_string();
+    }
 public:
    raw_type(cql3_type type)
        : _type{type}
@@ -62,7 +66,7 @@ public:
    virtual cql3_type prepare(database& db, const sstring& keyspace) {
        return _type;
    }
-    cql3_type prepare_internal(const sstring&, lw_shared_ptr<user_types_metadata>) override {
+    cql3_type prepare_internal(const sstring&, user_types_metadata&) override {
        return _type;
    }

@@ -74,10 +78,6 @@ public:
        return _type.is_counter();
    }

-    virtual sstring to_string() const {
-        return _type.to_string();
-    }
-
    virtual bool is_duration() const override {
        return _type.get_type() == duration_type;
    }
@@ -87,6 +87,19 @@ class cql3_type::raw_collection : public raw {
    const abstract_type::kind _kind;
    shared_ptr<raw> _keys;
    shared_ptr<raw> _values;
+
+    virtual sstring to_string() const override {
+        sstring start = is_frozen() ? "frozen<" : "";
+        sstring end = is_frozen() ? ">" : "";
+        if (_kind == abstract_type::kind::list) {
+            return format("{}list<{}>{}", start, _values, end);
+        } else if (_kind == abstract_type::kind::set) {
+            return format("{}set<{}>{}", start, _values, end);
+        } else if (_kind == abstract_type::kind::map) {
+            return format("{}map<{}, {}>{}", start, _keys, _values, end);
+        }
+        abort();
+    }
 public:
    raw_collection(const abstract_type::kind kind, shared_ptr<raw> keys, shared_ptr<raw> values)
            : _kind(kind), _keys(std::move(keys)), _values(std::move(values)) {
@@ -110,35 +123,37 @@ public:
        return true;
    }

-    virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
+    virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
        assert(_values); // "Got null values type for a collection";

-        if (!_frozen && _values->supports_freezing() && !_values->_frozen) {
-            throw exceptions::invalid_request_exception(format("Non-frozen collections are not allowed inside collections: {}", *this));
+        if (!is_frozen() && _values->supports_freezing() && !_values->is_frozen()) {
+            throw exceptions::invalid_request_exception(
+                    format("Non-frozen user types or collections are not allowed inside collections: {}", *this));
        }
        if (_values->is_counter()) {
            throw exceptions::invalid_request_exception(format("Counters are not allowed inside collections: {}", *this));
        }

        if (_keys) {
-            if (!_frozen && _keys->supports_freezing() && !_keys->_frozen) {
-                throw exceptions::invalid_request_exception(format("Non-frozen collections are not allowed inside collections: {}", *this));
+            if (!is_frozen() && _keys->supports_freezing() && !_keys->is_frozen()) {
+                throw exceptions::invalid_request_exception(
+                        format("Non-frozen user types or collections are not allowed inside collections: {}", *this));
            }
        }

        if (_kind == abstract_type::kind::list) {
-            return cql3_type(list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
+            return cql3_type(list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !is_frozen()));
        } else if (_kind == abstract_type::kind::set) {
            if (_values->is_duration()) {
                throw exceptions::invalid_request_exception(format("Durations are not allowed inside sets: {}", *this));
            }
-            return cql3_type(set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
+            return cql3_type(set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !is_frozen()));
        } else if (_kind == abstract_type::kind::map) {
            assert(_keys); // "Got null keys type for a collection";
            if (_keys->is_duration()) {
                throw exceptions::invalid_request_exception(format("Durations are not allowed as map keys: {}", *this));
            }
-            return cql3_type(map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types).get_type(), _values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
+            return cql3_type(map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types).get_type(), _values->prepare_internal(keyspace, user_types).get_type(), !is_frozen()));
        }
        abort();
    }
@@ -150,23 +165,18 @@ public:
    bool is_duration() const override {
        return false;
    }
-
-    virtual sstring to_string() const override {
-        sstring start = _frozen ? "frozen<" : "";
-        sstring end = _frozen ? ">" : "";
-        if (_kind == abstract_type::kind::list) {
-            return format("{}list<{}>{}", start, _values, end);
-        } else if (_kind == abstract_type::kind::set) {
-            return format("{}set<{}>{}", start, _values, end);
-        } else if (_kind == abstract_type::kind::map) {
-            return format("{}map<{}, {}>{}", start, _keys, _values, end);
-        }
-        abort();
-    }
 };

 class cql3_type::raw_ut : public raw {
    ut_name _name;
+
+    virtual sstring to_string() const override {
+        if (is_frozen()) {
+            return format("frozen<{}>", _name.to_string());
+        }
+
+        return _name.to_string();
+    }
 public:
    raw_ut(ut_name name)
            : _name(std::move(name)) {
@@ -180,7 +190,7 @@ public:
        _frozen = true;
    }

-    virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
+    virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
        if (_name.has_keyspace()) {
            // The provided keyspace is the one of the current statement this is part of. If it's different from the keyspace of
            // the UTName, we reject since we want to limit user types to their own keyspace (see #6643)
@@ -192,14 +202,10 @@ public:
        } else {
            _name.set_keyspace(keyspace);
        }
-        if (!user_types) {
-            // bootstrap mode.
-            throw exceptions::invalid_request_exception(format("Unknown type {}", _name));
-        }
        try {
-            auto&& type = user_types->get_type(_name.get_user_type_name());
-            if (!_frozen) {
-                throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
+            data_type type = user_types.get_type(_name.get_user_type_name());
+            if (is_frozen()) {
+                type = type->freeze();
            }
            return cql3_type(std::move(type));
        } catch (std::out_of_range& e) {
@@ -213,14 +219,18 @@ public:
        return true;
    }

-    virtual sstring to_string() const override {
-        return _name.to_string();
+    virtual bool is_user_type() const override {
+        return true;
    }
 };


 class cql3_type::raw_tuple : public raw {
    std::vector<shared_ptr<raw>> _types;
+
+    virtual sstring to_string() const override {
+        return format("tuple<{}>", join(", ", _types));
+    }
 public:
    raw_tuple(std::vector<shared_ptr<raw>> types)
            : _types(std::move(types)) {
@@ -239,8 +249,8 @@ public:
        }
        _frozen = true;
    }
-    virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
-        if (!_frozen) {
+    virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
+        if (!is_frozen()) {
            freeze();
        }
        std::vector<data_type> ts;
@@ -258,10 +268,6 @@ public:
            return t->references_user_type(name);
        });
    }
-
-    virtual sstring to_string() const override {
-        return format("tuple<{}>", join(", ", _types));
-    }
 };

 bool
@@ -274,6 +280,16 @@ cql3_type::raw::is_counter() const {
    return false;
 }

+bool
+cql3_type::raw::is_user_type() const {
+    return false;
+}
+
+bool
+cql3_type::raw::is_frozen() const {
+    return _frozen;
+}
+
 std::optional<sstring>
 cql3_type::raw::keyspace() const {
    return std::nullopt;
--- a/cql3/cql3_type.hh
+++ b/cql3/cql3_type.hh
@@ -60,23 +60,28 @@ public:
    bool is_collection() const { return _type->is_collection(); }
    bool is_counter() const { return _type->is_counter(); }
    bool is_native() const { return _type->is_native(); }
+    bool is_user_type() const { return _type->is_user_type(); }
    data_type get_type() const { return _type; }
    const sstring& to_string() const { return _type->cql3_type_name(); }

    // For UserTypes, we need to know the current keyspace to resolve the
    // actual type used, so Raw is a "not yet prepared" CQL3Type.
    class raw {
+        virtual sstring to_string() const = 0;
+    protected:
+        bool _frozen = false;
    public:
        virtual ~raw() {}
-        bool _frozen = false;
        virtual bool supports_freezing() const = 0;
        virtual bool is_collection() const;
        virtual bool is_counter() const;
        virtual bool is_duration() const;
+        virtual bool is_user_type() const;
+        bool is_frozen() const;
        virtual bool references_user_type(const sstring&) const;
        virtual std::optional<sstring> keyspace() const;
        virtual void freeze();
-        virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata>) = 0;
+        virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata&) = 0;
        virtual cql3_type prepare(database& db, const sstring& keyspace);
        static shared_ptr<raw> from(cql3_type type);
        static shared_ptr<raw> user_type(ut_name name);
@@ -85,7 +90,6 @@ public:
        static shared_ptr<raw> set(shared_ptr<raw> t);
        static shared_ptr<raw> tuple(std::vector<shared_ptr<raw>> ts);
        static shared_ptr<raw> frozen(shared_ptr<raw> t);
-        virtual sstring to_string() const = 0;
        friend std::ostream& operator<<(std::ostream& os, const raw& r);
    };

--- a/cql3/cql_config.hh
+++ b/cql3/cql_config.hh
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+
+#pragma once
+
+#include "restrictions/restrictions_config.hh"
+
+namespace cql3 {
+
+struct cql_config {
+    restrictions::restrictions_config restrictions;
+};
+
+extern const cql_config default_cql_config;
+
+}
--- a/cql3/cql_statement.hh
+++ b/cql3/cql_statement.hh
@@ -115,4 +115,21 @@ public:
    }
 };

+// Conditional modification statements and batches
+// return a result set and have metadata, while same
+// statements without conditions do not.
+class cql_statement_opt_metadata : public cql_statement {
+protected:
+    // Result set metadata, may be empty for simple updates and batches
+    shared_ptr<metadata> _metadata;
+public:
+    using cql_statement::cql_statement;
+    virtual shared_ptr<const metadata> get_result_metadata() const override {
+        if (_metadata) {
+            return _metadata;
+        }
+        return make_empty_metadata();
+    }
+};
+
 }
--- a/cql3/functions/aggregate_fcts.hh
+++ b/cql3/functions/aggregate_fcts.hh
@@ -233,13 +233,13 @@ struct aggregate_type_for {
 };

 template<>
-struct aggregate_type_for<simple_date_native_type> {
-    using type = simple_date_native_type::primary_type;
+struct aggregate_type_for<ascii_native_type> {
+    using type = ascii_native_type::primary_type;
 };

 template<>
-struct aggregate_type_for<timestamp_native_type> {
-    using type = timestamp_native_type::primary_type;
+struct aggregate_type_for<simple_date_native_type> {
+    using type = simple_date_native_type::primary_type;
 };

 template<>
--- a/cql3/functions/functions.cc
+++ b/cql3/functions/functions.cc
@@ -27,10 +27,13 @@
 #include "cql3/sets.hh"
 #include "cql3/lists.hh"
 #include "cql3/constants.hh"
+#include "cql3/user_types.hh"
 #include "database.hh"
 #include "types/map.hh"
 #include "types/set.hh"
 #include "types/list.hh"
+#include "types/user.hh"
+#include "concrete_types.hh"

 namespace cql3 {
 namespace functions {
@@ -111,13 +114,17 @@ functions::init() {
    declare(aggregate_fcts::make_max_function<sstring>());
    declare(aggregate_fcts::make_min_function<sstring>());

+    declare(aggregate_fcts::make_count_function<ascii_native_type>());
+    declare(aggregate_fcts::make_max_function<ascii_native_type>());
+    declare(aggregate_fcts::make_min_function<ascii_native_type>());
+
    declare(aggregate_fcts::make_count_function<simple_date_native_type>());
    declare(aggregate_fcts::make_max_function<simple_date_native_type>());
    declare(aggregate_fcts::make_min_function<simple_date_native_type>());

-    declare(aggregate_fcts::make_count_function<timestamp_native_type>());
-    declare(aggregate_fcts::make_max_function<timestamp_native_type>());
-    declare(aggregate_fcts::make_min_function<timestamp_native_type>());
+    declare(aggregate_fcts::make_count_function<db_clock::time_point>());
+    declare(aggregate_fcts::make_max_function<db_clock::time_point>());
+    declare(aggregate_fcts::make_min_function<db_clock::time_point>());

    declare(aggregate_fcts::make_count_function<timeuuid_native_type>());
    declare(aggregate_fcts::make_max_function<timeuuid_native_type>());
@@ -463,23 +470,34 @@ function_call::contains_bind_marker() const {

 shared_ptr<terminal>
 function_call::make_terminal(shared_ptr<function> fun, cql3::raw_value result, cql_serialization_format sf)  {
-    if (!dynamic_pointer_cast<const collection_type_impl>(fun->return_type())) {
-        return ::make_shared<constants::value>(std::move(result));
-    }
+    static constexpr auto to_buffer = [] (const cql3::raw_value& v) {
+        if (v) {
+            return fragmented_temporary_buffer::view{bytes_view{*v}};
+        }
+        return fragmented_temporary_buffer::view{};
+    };

-    auto ctype = static_pointer_cast<const collection_type_impl>(fun->return_type());
-    fragmented_temporary_buffer::view res;
-    if (result) {
-        res = fragmented_temporary_buffer::view(bytes_view(*result));
+    return visit(*fun->return_type(), make_visitor(
+    [&] (const list_type_impl& ltype) -> shared_ptr<terminal> {
+        return make_shared(lists::value::from_serialized(to_buffer(result), ltype, sf));
+    },
+    [&] (const set_type_impl& stype) -> shared_ptr<terminal> {
+        return make_shared(sets::value::from_serialized(to_buffer(result), stype, sf));
+    },
+    [&] (const map_type_impl& mtype) -> shared_ptr<terminal> {
+        return make_shared(maps::value::from_serialized(to_buffer(result), mtype, sf));
+    },
+    [&] (const user_type_impl& utype) -> shared_ptr<terminal> {
+        // TODO (kbraun): write a test for this case when User Defined Functions are implemented
+        return make_shared(user_types::value::from_serialized(to_buffer(result), utype));
+    },
+    [&] (const abstract_type& type) -> shared_ptr<terminal> {
+        if (type.is_collection()) {
+            throw std::runtime_error(format("function_call::make_terminal: unhandled collection type {}", type.name()));
+        }
+        return make_shared<constants::value>(std::move(result));
    }
-    if (ctype->get_kind() == abstract_type::kind::list) {
-        return make_shared(lists::value::from_serialized(std::move(res), static_pointer_cast<const list_type_impl>(ctype), sf));
-    } else if (ctype->get_kind() == abstract_type::kind::set) {
-        return make_shared(sets::value::from_serialized(std::move(res), static_pointer_cast<const set_type_impl>(ctype), sf));
-    } else if (ctype->get_kind() == abstract_type::kind::map) {
-        return make_shared(maps::value::from_serialized(std::move(res), static_pointer_cast<const map_type_impl>(ctype), sf));
-    }
-    abort();
+    ));
 }

 ::shared_ptr<term>
--- a/cql3/functions/time_uuid_fcts.hh
+++ b/cql3/functions/time_uuid_fcts.hh
@@ -61,6 +61,16 @@ make_now_fct() {
    });
 }

+static int64_t get_valid_timestamp(const data_value& ts_obj) {
+    auto ts = value_cast<db_clock::time_point>(ts_obj);
+    int64_t ms = ts.time_since_epoch().count();
+    auto nanos_since = utils::UUID_gen::make_nanos_since(ms);
+    if (!utils::UUID_gen::is_valid_nanos_since(nanos_since)) {
+        throw exceptions::server_exception(format("{}: timestamp is out of range. Must be in milliseconds since epoch", ms));
+    }
+    return ms;
+}
+
 inline
 shared_ptr<function>
 make_min_timeuuid_fct() {
@@ -74,8 +84,7 @@ make_min_timeuuid_fct() {
        if (ts_obj.is_null()) {
            return {};
        }
-        auto ts = value_cast<db_clock::time_point>(ts_obj);
-        auto uuid = utils::UUID_gen::min_time_UUID(ts.time_since_epoch().count());
+        auto uuid = utils::UUID_gen::min_time_UUID(get_valid_timestamp(ts_obj));
        return {timeuuid_type->decompose(uuid)};
    });
 }
@@ -85,7 +94,6 @@ shared_ptr<function>
 make_max_timeuuid_fct() {
    return make_native_scalar_function<true>("maxtimeuuid", timeuuid_type, { timestamp_type },
            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
-        // FIXME: should values be a vector<optional<bytes>>?
        auto& bb = values[0];
        if (!bb) {
            return {};
@@ -94,12 +102,22 @@ make_max_timeuuid_fct() {
        if (ts_obj.is_null()) {
            return {};
        }
-        auto ts = value_cast<db_clock::time_point>(ts_obj);
-        auto uuid = utils::UUID_gen::max_time_UUID(ts.time_since_epoch().count());
+        auto uuid = utils::UUID_gen::max_time_UUID(get_valid_timestamp(ts_obj));
        return {timeuuid_type->decompose(uuid)};
    });
 }

+inline utils::UUID get_valid_timeuuid(bytes raw) {
+    if (!utils::UUID_gen::is_valid_UUID(raw)) {
+        throw exceptions::server_exception(format("invalid timeuuid: size={}", raw.size()));
+    }
+    auto uuid = utils::UUID_gen::get_UUID(raw);
+    if (!uuid.is_timestamp()) {
+        throw exceptions::server_exception(format("{}: Not a timeuuid: version={}", uuid, uuid.version()));
+    }
+    return uuid;
+}
+
 inline
 shared_ptr<function>
 make_date_of_fct() {
@@ -110,7 +128,7 @@ make_date_of_fct() {
        if (!bb) {
            return {};
        }
-        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
        return {timestamp_type->decompose(ts)};
    });
 }
@@ -125,7 +143,7 @@ make_unix_timestamp_of_fct() {
        if (!bb) {
            return {};
        }
-        return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
+        return {long_type->decompose(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb)))};
    });
 }

@@ -133,7 +151,7 @@ inline shared_ptr<function>
 make_currenttimestamp_fct() {
    return make_native_scalar_function<true>("currenttimestamp", timestamp_type, {},
            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
-        return {timestamp_type->decompose(timestamp_native_type{db_clock::now()})};
+        return {timestamp_type->decompose(db_clock::now())};
    });
 }

@@ -153,7 +171,7 @@ make_currentdate_fct() {
    return make_native_scalar_function<true>("currentdate", simple_date_type, {},
            [] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
-        return {simple_date_type->decompose(to_simple_date(timestamp_native_type{db_clock::now()}))};
+        return {simple_date_type->decompose(to_simple_date(db_clock::now()))};
    });
 }

@@ -176,7 +194,7 @@ make_timeuuidtodate_fct() {
        if (!bb) {
            return {};
        }
-        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
        auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
        return {simple_date_type->decompose(to_simple_date(ts))};
    });
@@ -211,7 +229,7 @@ make_timeuuidtotimestamp_fct() {
        if (!bb) {
            return {};
        }
-        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
+        auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
        return {timestamp_type->decompose(ts)};
    });
 }
@@ -245,10 +263,14 @@ make_timeuuidtounixtimestamp_fct() {
        if (!bb) {
            return {};
        }
-        return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
+        return {long_type->decompose(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb)))};
    });
 }

+inline bytes time_point_to_long(const data_value& v) {
+    return data_value(get_valid_timestamp(v)).serialize();
+}
+
 inline
 shared_ptr<function>
 make_timestamptounixtimestamp_fct() {
@@ -263,7 +285,7 @@ make_timestamptounixtimestamp_fct() {
        if (ts_obj.is_null()) {
            return {};
        }
-        return {long_type->decompose(ts_obj)};
+        return time_point_to_long(ts_obj);
    });
 }

@@ -282,7 +304,7 @@ make_datetounixtimestamp_fct() {
            return {};
        }
        auto from_simple_date = get_castas_fctn(timestamp_type, simple_date_type);
-        return {long_type->decompose(from_simple_date(simple_date_obj))};
+        return time_point_to_long(from_simple_date(simple_date_obj));
    });
 }

--- a/cql3/lists.cc
+++ b/cql3/lists.cc
@@ -54,6 +54,14 @@ shared_ptr<term>
 lists::literal::prepare(database& db, const sstring& keyspace, shared_ptr<column_specification> receiver) {
    validate_assignable_to(db, keyspace, receiver);

+    // In Cassandra, an empty (unfrozen) map/set/list is equivalent to the column being null. In
+    // other words a non-frozen collection only exists if it has elements. Return nullptr right
+    // away to simplify predicate evaluation. See also
+    // https://issues.apache.org/jira/browse/CASSANDRA-5141
+    if (receiver->type->is_multi_cell() &&  _elements.empty()) {
+        return cql3::constants::null_literal::NULL_VALUE;
+    }
+
    auto&& value_spec = value_spec_of(receiver);
    std::vector<shared_ptr<term>> values;
    values.reserve(_elements.size());
@@ -116,24 +124,24 @@ lists::literal::to_string() const {
 }

 lists::value
-lists::value::from_serialized(const fragmented_temporary_buffer::view& val, list_type type, cql_serialization_format sf) {
+lists::value::from_serialized(const fragmented_temporary_buffer::view& val, const list_type_impl& type, cql_serialization_format sf) {
    return with_linearized(val, [&] (bytes_view v) {
        return from_serialized(v, type, sf);
    });
 }

 lists::value
-lists::value::from_serialized(bytes_view v, list_type type, cql_serialization_format sf) {
+lists::value::from_serialized(bytes_view v, const list_type_impl& type, cql_serialization_format sf) {
    try {
        // Collections have this small hack that validate cannot be called on a serialized object,
        // but compose does the validation (so we're fine).
        // FIXME: deserializeForNativeProtocol()?!
-        auto l = value_cast<list_type_impl::native_type>(type->deserialize(v, sf));
+        auto l = value_cast<list_type_impl::native_type>(type.deserialize(v, sf));
        std::vector<bytes_opt> elements;
        elements.reserve(l.size());
        for (auto&& element : l) {
            // elements can be null in lists that represent a set of IN values
-            elements.push_back(element.is_null() ? bytes_opt() : bytes_opt(type->get_elements_type()->decompose(element)));
+            elements.push_back(element.is_null() ? bytes_opt() : bytes_opt(type.get_elements_type()->decompose(element)));
        }
        return value(std::move(elements));
    } catch (marshal_exception& e) {
@@ -166,7 +174,7 @@ lists::value::equals(shared_ptr<list_type_impl> lt, const value& v) {
            [t = lt->get_elements_type()] (const bytes_opt& e1, const bytes_opt& e2) { return t->equal(*e1, *e2); });
 }

-std::vector<bytes_opt>
+const std::vector<bytes_opt>&
 lists::value::get_elements() {
    return _elements;
 }
@@ -219,7 +227,7 @@ lists::delayed_value::bind(const query_options& options) {
 ::shared_ptr<terminal>
 lists::marker::bind(const query_options& options) {
    const auto& value = options.get_value_at(_bind_index);
-    auto ltype = static_pointer_cast<const list_type_impl>(_receiver->type);
+    auto& ltype = static_cast<const list_type_impl&>(*_receiver->type);
    if (value.is_null()) {
        return nullptr;
    } else if (value.is_unset_value()) {
@@ -227,8 +235,8 @@ lists::marker::bind(const query_options& options) {
    } else {
        try {
            return with_linearized(*value, [&] (bytes_view v) {
-                ltype->validate(v, options.get_cql_serialization_format());
-                return make_shared(value::from_serialized(v, std::move(ltype), options.get_cql_serialization_format()));
+                ltype.validate(v, options.get_cql_serialization_format());
+                return make_shared(value::from_serialized(v, ltype, options.get_cql_serialization_format()));
            });
        } catch (marshal_exception& e) {
            throw exceptions::invalid_request_exception(e.what());
@@ -262,12 +270,11 @@ lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const u
        return;
    }
    if (column.type->is_multi_cell()) {
-        // delete + append
-        collection_type_impl::mutation mut;
+        // Delete all cells first, then append new ones
+        collection_mutation_view_description mut;
        mut.tomb = params.make_tombstone_just_before();
-        auto ctype = static_pointer_cast<const list_type_impl>(column.type);
-        auto col_mut = ctype->serialize_mutation_form(std::move(mut));
-        m.set_cell(prefix, column, std::move(col_mut));
+
+        m.set_cell(prefix, column, mut.serialize(*column.type));
    }
    do_append(value, m, prefix, column, params);
 }
@@ -303,11 +310,10 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
    auto idx = with_linearized(*index, [] (bytes_view v) {
        return value_cast<int32_t>(data_type_for<int32_t>()->deserialize(v));
    });
-    auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
+    auto&& existing_list_opt = params.get_prefetched_list(m.key(), prefix, column);
    if (!existing_list_opt) {
        throw exceptions::invalid_request_exception("Attempted to set an element on a list which is null");
    }
-    auto ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
    auto&& existing_list = *existing_list_opt;
    // we verified that index is an int32_type
    if (idx < 0 || size_t(idx) >= existing_list.size()) {
@@ -315,16 +321,19 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
                idx, existing_list.size()));
    }

-    const bytes& eidx = existing_list[idx].key;
-    list_type_impl::mutation mut;
+    auto ltype = static_cast<const list_type_impl*>(column.type.get());
+    const data_value& eidx_dv = existing_list[idx].first;
+    bytes eidx = eidx_dv.type()->decompose(eidx_dv);
+    collection_mutation_description mut;
    mut.cells.reserve(1);
    if (!value) {
-        mut.cells.emplace_back(eidx, params.make_dead_cell());
+        mut.cells.emplace_back(std::move(eidx), params.make_dead_cell());
    } else {
-        mut.cells.emplace_back(eidx, params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
+        mut.cells.emplace_back(std::move(eidx),
+                params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
    }
-    auto smut = ltype->serialize_mutation_form(mut);
-    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(std::move(smut)));
+
+    m.set_cell(prefix, column, mut.serialize(*ltype));
 }

 bool
@@ -344,15 +353,13 @@ lists::setter_by_uuid::execute(mutation& m, const clustering_key_prefix& prefix,
        throw exceptions::invalid_request_exception("Invalid null value for list index");
    }

-    auto ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
+    auto ltype = static_cast<const list_type_impl*>(column.type.get());

-    list_type_impl::mutation mut;
+    collection_mutation_description mut;
    mut.cells.reserve(1);
    mut.cells.emplace_back(to_bytes(*index), params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
-    auto smut = ltype->serialize_mutation_form(mut);
-    m.set_cell(prefix, column,
-                    atomic_cell_or_collection::from_collection_mutation(
-                                    std::move(smut)));
+
+    m.set_cell(prefix, column, mut.serialize(*ltype));
 }

 void
@@ -372,7 +379,6 @@ lists::do_append(shared_ptr<term> value,
        const column_definition& column,
        const update_parameters& params) {
    auto&& list_value = dynamic_pointer_cast<lists::value>(value);
-    auto&& ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
    if (column.type->is_multi_cell()) {
        // If we append null, do nothing. Note that for Setter, we've
        // already removed the previous value so we're good here too
@@ -380,8 +386,10 @@ lists::do_append(shared_ptr<term> value,
            return;
        }

+        auto ltype = static_cast<const list_type_impl*>(column.type.get());
+
        auto&& to_add = list_value->_elements;
-        collection_type_impl::mutation appended;
+        collection_mutation_description appended;
        appended.cells.reserve(to_add.size());
        for (auto&& e : to_add) {
            auto uuid1 = utils::UUID_gen::get_time_UUID_bytes();
@@ -389,7 +397,7 @@ lists::do_append(shared_ptr<term> value,
            // FIXME: can e be empty?
            appended.cells.emplace_back(std::move(uuid), params.make_cell(*ltype->value_comparator(), *e, atomic_cell::collection_member::yes));
        }
-        m.set_cell(prefix, column, ltype->serialize_mutation_form(appended));
+        m.set_cell(prefix, column, appended.serialize(*ltype));
    } else {
        // for frozen lists, we're overwriting the whole cell value
        if (!value) {
@@ -413,11 +421,11 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
    assert(lvalue);
    auto time = precision_time::REFERENCE_TIME - (db_clock::now() - precision_time::REFERENCE_TIME);

-    collection_type_impl::mutation mut;
+    collection_mutation_description mut;
    mut.cells.reserve(lvalue->get_elements().size());
    // We reverse the order of insertion, so that the last element gets the lastest time
    // (lists are sorted by time)
-    auto&& ltype = static_cast<const list_type_impl*>(column.type.get());
+    auto ltype = static_cast<const list_type_impl*>(column.type.get());
    for (auto&& v : lvalue->_elements | boost::adaptors::reversed) {
        auto&& pt = precision_time::get_next(time);
        auto uuid = utils::UUID_gen::get_time_UUID_bytes(pt.millis.time_since_epoch().count(), pt.nanos);
@@ -425,7 +433,7 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
    }
    // now reverse again, to get the original order back
    std::reverse(mut.cells.begin(), mut.cells.end());
-    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ltype->serialize_mutation_form(std::move(mut))));
+    m.set_cell(prefix, column, mut.serialize(*ltype));
 }

 bool
@@ -437,12 +445,10 @@ void
 lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
    assert(column.type->is_multi_cell()); // "Attempted to delete from a frozen list";

-    auto&& existing_list = params.get_prefetched_list(m.key().view(), prefix.view(), column);
+    auto&& existing_list = params.get_prefetched_list(m.key(), prefix, column);
    // We want to call bind before possibly returning to reject queries where the value provided is not a list.
    auto&& value = _t->bind(params._options);

-    auto&& ltype = static_pointer_cast<const list_type_impl>(column.type);
-
    if (!existing_list) {
        return;
    }
@@ -460,24 +466,27 @@ lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, cons
    auto lvalue = dynamic_pointer_cast<lists::value>(value);
    assert(lvalue);

+    auto ltype = static_cast<const list_type_impl*>(column.type.get());
+
    // Note: below, we will call 'contains' on this toDiscard list for each element of existingList.
    // Meaning that if toDiscard is big, converting it to a HashSet might be more efficient. However,
    // the read-before-write this operation requires limits its usefulness on big lists, so in practice
    // toDiscard will be small and keeping a list will be more efficient.
    auto&& to_discard = lvalue->_elements;
-    collection_type_impl::mutation mnew;
+    collection_mutation_description mnew;
    for (auto&& cell : elist) {
-        auto have_value = [&] (bytes_view value) {
+        auto has_value = [&] (bytes_view value) {
            return std::find_if(to_discard.begin(), to_discard.end(),
                                [ltype, value] (auto&& v) { return ltype->get_elements_type()->equal(*v, value); })
                                         != to_discard.end();
        };
-        if (have_value(cell.value)) {
-            mnew.cells.emplace_back(cell.key, params.make_dead_cell());
+        bytes eidx = cell.first.type()->decompose(cell.first);
+        bytes value = cell.second.type()->decompose(cell.second);
+        if (has_value(value)) {
+            mnew.cells.emplace_back(std::move(eidx), params.make_dead_cell());
        }
    }
-    auto mnew_ser = ltype->serialize_mutation_form(mnew);
-    m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(std::move(mnew_ser)));
+    m.set_cell(prefix, column, mnew.serialize(*ltype));
 }

 bool
@@ -496,11 +505,10 @@ lists::discarder_by_index::execute(mutation& m, const clustering_key_prefix& pre
        return;
    }

-    auto ltype = static_pointer_cast<const list_type_impl>(column.type);
    auto cvalue = dynamic_pointer_cast<constants::value>(index);
    assert(cvalue);

-    auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
+    auto&& existing_list_opt = params.get_prefetched_list(m.key(), prefix, column);
    int32_t idx = read_simple_exactly<int32_t>(*cvalue->_bytes);
    if (!existing_list_opt) {
        throw exceptions::invalid_request_exception("Attempted to delete an element from a list which is null");
@@ -509,9 +517,11 @@ lists::discarder_by_index::execute(mutation& m, const clustering_key_prefix& pre
    if (idx < 0 || size_t(idx) >= existing_list.size()) {
        throw exceptions::invalid_request_exception(format("List index {:d} out of bound, list has size {:d}", idx, existing_list.size()));
    }
-    collection_type_impl::mutation mut;
-    mut.cells.emplace_back(existing_list[idx].key, params.make_dead_cell());
-    m.set_cell(prefix, column, ltype->serialize_mutation_form(mut));
+    collection_mutation_description mut;
+    const data_value& eidx_dv = existing_list[idx].first;
+    bytes eidx = eidx_dv.type()->decompose(eidx_dv);
+    mut.cells.emplace_back(std::move(eidx), params.make_dead_cell());
+    m.set_cell(prefix, column, mut.serialize(*column.type));
 }

 }
--- a/cql3/lists.hh
+++ b/cql3/lists.hh
@@ -73,18 +73,18 @@ public:
    };

    class value : public multi_item_terminal, collection_terminal {
-        static value from_serialized(bytes_view v, list_type type, cql_serialization_format sf);
+        static value from_serialized(bytes_view v, const list_type_impl& type, cql_serialization_format sf);
    public:
        std::vector<bytes_opt> _elements;
    public:
        explicit value(std::vector<bytes_opt> elements)
            : _elements(std::move(elements)) {
        }
-        static value from_serialized(const fragmented_temporary_buffer::view& v, list_type type, cql_serialization_format sf);
+        static value from_serialized(const fragmented_temporary_buffer::view& v, const list_type_impl& type, cql_serialization_format sf);
        virtual cql3::raw_value get(const query_options& options) override;
        virtual bytes get_with_protocol_version(cql_serialization_format sf) override;
        bool equals(shared_ptr<list_type_impl> lt, const value& v);
-        virtual std::vector<bytes_opt> get_elements() override;
+        virtual const std::vector<bytes_opt>& get_elements() override;
        virtual sstring to_string() const;
        friend class lists;
    };
--- a/cql3/maps.cc
+++ b/cql3/maps.cc
@@ -153,17 +153,17 @@ maps::literal::to_string() const {
 }

 maps::value
-maps::value::from_serialized(const fragmented_temporary_buffer::view& fragmented_value, map_type type, cql_serialization_format sf) {
+maps::value::from_serialized(const fragmented_temporary_buffer::view& fragmented_value, const map_type_impl& type, cql_serialization_format sf) {
    try {
        // Collections have this small hack that validate cannot be called on a serialized object,
        // but compose does the validation (so we're fine).
        // FIXME: deserialize_for_native_protocol?!
      return with_linearized(fragmented_value, [&] (bytes_view value) {
-        auto m = value_cast<map_type_impl::native_type>(type->deserialize(value, sf));
-        std::map<bytes, bytes, serialized_compare> map(type->get_keys_type()->as_less_comparator());
+        auto m = value_cast<map_type_impl::native_type>(type.deserialize(value, sf));
+        std::map<bytes, bytes, serialized_compare> map(type.get_keys_type()->as_less_comparator());
        for (auto&& e : m) {
-            map.emplace(type->get_keys_type()->decompose(e.first),
-                        type->get_values_type()->decompose(e.second));
+            map.emplace(type.get_keys_type()->decompose(e.first),
+                        type.get_values_type()->decompose(e.second));
        }
        return maps::value { std::move(map) };
      });
@@ -269,8 +269,7 @@ maps::marker::bind(const query_options& options) {
    } catch (marshal_exception& e) {
        throw exceptions::invalid_request_exception(e.what());
    }
-    return ::make_shared<maps::value>(maps::value::from_serialized(*val, static_pointer_cast<const map_type_impl>(_receiver->type),
-                                      options.get_cql_serialization_format()));
+    return ::make_shared(maps::value::from_serialized(*val, static_cast<const map_type_impl&>(*_receiver->type), options.get_cql_serialization_format()));
 }

 void
@@ -285,12 +284,10 @@ maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const u
        return;
    }
    if (column.type->is_multi_cell()) {
-        // delete + put
-        collection_type_impl::mutation mut;
+        // Delete all cells first, then put new ones
+        collection_mutation_description mut;
        mut.tomb = params.make_tombstone_just_before();
-        auto ctype = static_pointer_cast<const map_type_impl>(column.type);
-        auto col_mut = ctype->serialize_mutation_form(std::move(mut));
-        m.set_cell(row_key, column, std::move(col_mut));
+        m.set_cell(row_key, column, mut.serialize(*column.type));
    }
    do_put(m, row_key, params, value, column);
 }
@@ -310,13 +307,12 @@ maps::setter_by_key::execute(mutation& m, const clustering_key_prefix& prefix, c
    if (!key) {
        throw invalid_request_exception("Invalid null map key");
    }
-    auto ctype = static_pointer_cast<const map_type_impl>(column.type);
+    auto ctype = static_cast<const map_type_impl*>(column.type.get());
    auto avalue = value ? params.make_cell(*ctype->get_values_type(), *value, atomic_cell::collection_member::yes) : params.make_dead_cell();
-    map_type_impl::mutation update;
+    collection_mutation_description update;
    update.cells.emplace_back(std::move(to_bytes(*key)), std::move(avalue));
-    // should have been verified as map earlier?
-    auto col_mut = ctype->serialize_mutation_form(std::move(update));
-    m.set_cell(prefix, column, std::move(col_mut));
+
+    m.set_cell(prefix, column, update.serialize(*ctype));
 }

 void
@@ -333,18 +329,18 @@ maps::do_put(mutation& m, const clustering_key_prefix& prefix, const update_para
        shared_ptr<term> value, const column_definition& column) {
    auto map_value = dynamic_pointer_cast<maps::value>(value);
    if (column.type->is_multi_cell()) {
-        collection_type_impl::mutation mut;
-
        if (!value) {
            return;
        }

-        auto ctype = static_pointer_cast<const map_type_impl>(column.type);
+        collection_mutation_description mut;
+
+        auto ctype = static_cast<const map_type_impl*>(column.type.get());
        for (auto&& e : map_value->map) {
            mut.cells.emplace_back(e.first, params.make_cell(*ctype->get_values_type(), fragmented_temporary_buffer::view(e.second), atomic_cell::collection_member::yes));
        }
-        auto col_mut = ctype->serialize_mutation_form(std::move(mut));
-        m.set_cell(prefix, column, std::move(col_mut));
+
+        m.set_cell(prefix, column, mut.serialize(*ctype));
    } else {
        // for frozen maps, we're overwriting the whole cell
        if (!value) {
@@ -367,10 +363,10 @@ maps::discarder_by_key::execute(mutation& m, const clustering_key_prefix& prefix
    if (key == constants::UNSET_VALUE) {
        throw exceptions::invalid_request_exception("Invalid unset map key");
    }
-    collection_type_impl::mutation mut;
+    collection_mutation_description mut;
    mut.cells.emplace_back(*key->get(params._options), params.make_dead_cell());
-    auto mtype = static_cast<const map_type_impl*>(column.type.get());
-    m.set_cell(prefix, column, mtype->serialize_mutation_form(mut));
+
+    m.set_cell(prefix, column, mut.serialize(*column.type));
 }

 }
--- a/cql3/maps.hh
+++ b/cql3/maps.hh
@@ -81,7 +81,7 @@ public:
        value(std::map<bytes, bytes, serialized_compare> map)
            : map(std::move(map)) {
        }
-        static value from_serialized(const fragmented_temporary_buffer::view& value, map_type type, cql_serialization_format sf);
+        static value from_serialized(const fragmented_temporary_buffer::view& value, const map_type_impl& type, cql_serialization_format sf);
        virtual cql3::raw_value get(const query_options& options) override;
        virtual bytes get_with_protocol_version(cql_serialization_format sf);
        bool equals(map_type mt, const value& v);
--- a/cql3/operation.cc
+++ b/cql3/operation.cc
@@ -43,10 +43,12 @@
 #include "maps.hh"
 #include "sets.hh"
 #include "lists.hh"
+#include "user_types.hh"
 #include "types/tuple.hh"
 #include "types/map.hh"
 #include "types/list.hh"
 #include "types/set.hh"
+#include "types/user.hh"

 namespace cql3 {

@@ -91,6 +93,67 @@ operation::set_element::is_compatible_with(shared_ptr<raw_update> other) {
    return !dynamic_pointer_cast<set_value>(std::move(other));
 }

+sstring
+operation::set_field::to_string(const column_definition& receiver) const {
+    return format("{}.{} = {}", receiver.name_as_text(), *_field, *_value);
+}
+
+shared_ptr<operation>
+operation::set_field::prepare(database& db, const sstring& keyspace, const column_definition& receiver) {
+    if (!receiver.type->is_user_type()) {
+        throw exceptions::invalid_request_exception(
+                format("Invalid operation ({}) for non-UDT column {}", to_string(receiver), receiver.name_as_text()));
+    } else if (!receiver.type->is_multi_cell()) {
+        throw exceptions::invalid_request_exception(
+                format("Invalid operation ({}) for frozen UDT column {}", to_string(receiver), receiver.name_as_text()));
+    }
+
+    auto& type = static_cast<const user_type_impl&>(*receiver.type);
+    auto idx = type.idx_of_field(_field->name());
+    if (!idx) {
+        throw exceptions::invalid_request_exception(
+                format("UDT column {} does not have a field named {}", receiver.name_as_text(), *_field));
+    }
+
+    auto val = _value->prepare(db, keyspace, user_types::field_spec_of(receiver.column_specification, *idx));
+    return make_shared<user_types::setter_by_field>(receiver, *idx, std::move(val));
+}
+
+bool
+operation::set_field::is_compatible_with(shared_ptr<raw_update> other) {
+    auto x = dynamic_pointer_cast<set_field>(other);
+    if (x) {
+        return _field != x->_field;
+    }
+
+    return !dynamic_pointer_cast<set_value>(std::move(other));
+}
+
+shared_ptr<column_identifier::raw>
+operation::field_deletion::affected_column() {
+    return _id;
+}
+
+shared_ptr<operation>
+operation::field_deletion::prepare(database& db, const sstring& keyspace, const column_definition& receiver) {
+    if (!receiver.type->is_user_type()) {
+        throw exceptions::invalid_request_exception(
+                format("Invalid deletion operation for non-UDT column {}", receiver.name_as_text()));
+    } else if (!receiver.type->is_multi_cell()) {
+        throw exceptions::invalid_request_exception(
+                format("Frozen UDT column {} does not support field deletions", receiver.name_as_text()));
+    }
+
+    auto type = static_cast<const user_type_impl*>(receiver.type.get());
+    auto idx = type->idx_of_field(_field->name());
+    if (!idx) {
+        throw exceptions::invalid_request_exception(
+                format("UDT column {} does not have a field named {}", receiver.name_as_text(), *_field));
+    }
+
+    return make_shared<user_types::deleter_by_field>(receiver, *idx);
+}
+
 sstring
 operation::addition::to_string(const column_definition& receiver) const {
    return format("{} = {} + {}", receiver.name_as_text(), receiver.name_as_text(), *_value);
@@ -200,20 +263,24 @@ operation::set_value::prepare(database& db, const sstring& keyspace, const colum
        throw exceptions::invalid_request_exception(format("Cannot set the value of counter column {} (counters can only be incremented/decremented, not set)", receiver.name_as_text()));
    }

-    if (!receiver.type->is_collection()) {
-        return ::make_shared<constants::setter>(receiver, v);
+    if (receiver.type->is_collection()) {
+        auto k = receiver.type->get_kind();
+        if (k == abstract_type::kind::list) {
+            return make_shared<lists::setter>(receiver, v);
+        } else if (k == abstract_type::kind::set) {
+            return make_shared<sets::setter>(receiver, v);
+        } else if (k == abstract_type::kind::map) {
+            return make_shared<maps::setter>(receiver, v);
+        } else {
+            abort();
+        }
    }

-    auto k = static_pointer_cast<const collection_type_impl>(receiver.type)->get_kind();
-    if (k == abstract_type::kind::list) {
-        return make_shared<lists::setter>(receiver, v);
-    } else if (k == abstract_type::kind::set) {
-        return make_shared<sets::setter>(receiver, v);
-    } else if (k == abstract_type::kind::map) {
-        return make_shared<maps::setter>(receiver, v);
-    } else {
-        abort();
+    if (receiver.type->is_user_type()) {
+        return make_shared<user_types::setter>(receiver, v);
    }
+
+    return ::make_shared<constants::setter>(receiver, v);
 }

 ::shared_ptr <operation>
--- a/cql3/operation.hh
+++ b/cql3/operation.hh
@@ -142,6 +142,7 @@ public:
     * This can be one of:
     *   - Setting a value: c = v
     *   - Setting an element of a collection: c[x] = v
+     *   - Setting a field of a user-defined type: c.x = v
     *   - An addition/subtraction to a variable: c = c +/- v (where v can be a collection literal)
     *   - An prepend operation: c = v + c
     */
@@ -176,6 +177,7 @@ public:
     * This can be one of:
     *   - Deleting a column
     *   - Deleting an element of a collection
+     *   - Deleting a field of a user-defined type
     */
    class raw_deletion {
    public:
@@ -214,11 +216,42 @@ public:
            : _selector(std::move(selector)), _value(std::move(value)), _by_uuid(by_uuid) {
        }

-        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver);
+        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;

        virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
    };

+    // Set a single field inside a user-defined type.
+    class set_field : public raw_update {
+        const shared_ptr<column_identifier> _field;
+        const shared_ptr<term::raw> _value;
+    private:
+        sstring to_string(const column_definition& receiver) const;
+    public:
+        set_field(shared_ptr<column_identifier> field, shared_ptr<term::raw> value)
+            : _field(std::move(field)), _value(std::move(value)) {
+        }
+
+        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;
+
+        virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
+    };
+
+    // Delete a single field inside a user-defined type.
+    // Equivalent to setting the field to null.
+    class field_deletion : public raw_deletion {
+        const shared_ptr<column_identifier::raw> _id;
+        const shared_ptr<column_identifier> _field;
+    public:
+        field_deletion(shared_ptr<column_identifier::raw> id, shared_ptr<column_identifier> field)
+                : _id(std::move(id)), _field(std::move(field)) {
+        }
+
+        virtual shared_ptr<column_identifier::raw> affected_column() override;
+
+        virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;
+    };
+
    class addition : public raw_update {
        const shared_ptr<term::raw> _value;
    private:
--- a/cql3/operation_impl.hh
+++ b/cql3/operation_impl.hh
@@ -46,6 +46,7 @@
 #include "maps.hh"
 #include "sets.hh"
 #include "lists.hh"
+#include "user_types.hh"

 namespace cql3 {

--- a/cql3/operator.hh
+++ b/cql3/operator.hh
@@ -78,6 +78,10 @@ public:
    bool is_slice() const {
        return (*this == LT) || (*this == LTE) || (*this == GT) || (*this == GTE);
    }
+    bool is_compare() const {
+        // EQ, LT, LTE, GT, GTE, NEQ
+        return _b < 5 || _b == 8;
+    }
    sstring to_string() const { return _text; }
    bool operator==(const operator_type& other) const { return this == &other; }
    bool operator!=(const operator_type& other) const { return this != &other; }
--- a/cql3/query_options.cc
+++ b/cql3/query_options.cc
@@ -39,17 +39,22 @@
 * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
 */

+#include "cql3/cql_config.hh"
 #include "query_options.hh"
 #include "version.hh"

 namespace cql3 {

+const cql_config default_cql_config;
+
 thread_local const query_options::specific_options query_options::specific_options::DEFAULT{-1, {}, {}, api::missing_timestamp};

-thread_local query_options query_options::DEFAULT{db::consistency_level::ONE, infinite_timeout_config, std::nullopt,
+thread_local query_options query_options::DEFAULT{default_cql_config,
+    db::consistency_level::ONE, infinite_timeout_config, std::nullopt,
    std::vector<cql3::raw_value_view>(), false, query_options::specific_options::DEFAULT, cql_serialization_format::latest()};

-query_options::query_options(db::consistency_level consistency,
+query_options::query_options(const cql_config& cfg,
+                           db::consistency_level consistency,
                           const ::timeout_config& timeout_config,
                           std::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
@@ -57,7 +62,8 @@ query_options::query_options(db::consistency_level consistency,
                           bool skip_metadata,
                           specific_options options,
                           cql_serialization_format sf)
-   : _consistency(consistency)
+   : _cql_config(cfg)
+   , _consistency(consistency)
   , _timeout_config(timeout_config)
   , _names(std::move(names))
   , _values(std::move(values))
@@ -68,14 +74,16 @@ query_options::query_options(db::consistency_level consistency,
 {
 }

-query_options::query_options(db::consistency_level consistency,
+query_options::query_options(const cql_config& cfg,
+                             db::consistency_level consistency,
                             const ::timeout_config& timeout_config,
                             std::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value> values,
                             bool skip_metadata,
                             specific_options options,
                             cql_serialization_format sf)
-    : _consistency(consistency)
+    : _cql_config(cfg)
+    , _consistency(consistency)
    , _timeout_config(timeout_config)
    , _names(std::move(names))
    , _values(std::move(values))
@@ -87,14 +95,16 @@ query_options::query_options(db::consistency_level consistency,
    fill_value_views();
 }

-query_options::query_options(db::consistency_level consistency,
+query_options::query_options(const cql_config& cfg,
+                             db::consistency_level consistency,
                             const ::timeout_config& timeout_config,
                             std::optional<std::vector<sstring_view>> names,
                             std::vector<cql3::raw_value_view> value_views,
                             bool skip_metadata,
                             specific_options options,
                             cql_serialization_format sf)
-    : _consistency(consistency)
+    : _cql_config(cfg)
+    , _consistency(consistency)
    , _timeout_config(timeout_config)
    , _names(std::move(names))
    , _values()
@@ -105,8 +115,10 @@ query_options::query_options(db::consistency_level consistency,
 {
 }

-query_options::query_options(db::consistency_level cl, const ::timeout_config& timeout_config, std::vector<cql3::raw_value> values, specific_options options)
+query_options::query_options(db::consistency_level cl, const ::timeout_config& timeout_config, std::vector<cql3::raw_value> values,
+        specific_options options)
    : query_options(
+          default_cql_config,
          cl,
          timeout_config,
          {},
@@ -119,7 +131,8 @@ query_options::query_options(db::consistency_level cl, const ::timeout_config& t
 }

 query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<service::pager::paging_state> paging_state)
-        : query_options(qo->_consistency,
+        : query_options(qo->_cql_config,
+        qo->_consistency,
        qo->get_timeout_config(),
        std::move(qo->_names),
        std::move(qo->_values),
@@ -131,7 +144,8 @@ query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<ser
 }

 query_options::query_options(std::unique_ptr<query_options> qo, ::shared_ptr<service::pager::paging_state> paging_state, int32_t page_size)
-        : query_options(qo->_consistency,
+        : query_options(qo->_cql_config,
+        qo->_consistency,
        qo->get_timeout_config(),
        std::move(qo->_names),
        std::move(qo->_values),
@@ -203,4 +217,12 @@ void query_options::fill_value_views()
    }
 }

+db::consistency_level query_options::check_serial_consistency() const {
+
+    if (_options.serial_consistency.has_value()) {
+        return *_options.serial_consistency;
+    }
+    throw exceptions::protocol_exception("Consistency level for LWT is missing for a request with conditions");
+}
+
 }
--- a/cql3/query_options.hh
+++ b/cql3/query_options.hh
@@ -55,6 +55,9 @@

 namespace cql3 {

+class cql_config;
+extern const cql_config default_cql_config;
+
 /**
 * Options for a query.
 */
@@ -70,6 +73,7 @@ public:
        const api::timestamp_type timestamp;
    };
 private:
+    const cql_config& _cql_config;
    const db::consistency_level _consistency;
    const timeout_config& _timeout_config;
    const std::optional<std::vector<sstring_view>> _names;
@@ -104,14 +108,16 @@ public:
    query_options(query_options&&) = default;
    explicit query_options(const query_options&) = default;

-    explicit query_options(db::consistency_level consistency,
+    explicit query_options(const cql_config& cfg,
+                           db::consistency_level consistency,
                           const timeout_config& timeouts,
                           std::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
                           bool skip_metadata,
                           specific_options options,
                           cql_serialization_format sf);
-    explicit query_options(db::consistency_level consistency,
+    explicit query_options(const cql_config& cfg,
+                           db::consistency_level consistency,
                           const timeout_config& timeouts,
                           std::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value> values,
@@ -119,7 +125,8 @@ public:
                           bool skip_metadata,
                           specific_options options,
                           cql_serialization_format sf);
-    explicit query_options(db::consistency_level consistency,
+    explicit query_options(const cql_config& cfg,
+                           db::consistency_level consistency,
                           const timeout_config& timeouts,
                           std::optional<std::vector<sstring_view>> names,
                           std::vector<cql3::raw_value_view> value_views,
@@ -187,11 +194,14 @@ public:
        return get_specific_options().state;
    }

-    /**  Serial consistency for conditional updates. */
+    /** Serial consistency for conditional updates. */
    std::optional<db::consistency_level> get_serial_consistency() const {
        return get_specific_options().serial_consistency;
    }

+    /**  Return serial consistency for conditional updates. Throws if the consistency is not set. */
+    db::consistency_level check_serial_consistency() const;
+
    api::timestamp_type get_timestamp(service::query_state& state) const {
        auto tstamp = get_specific_options().timestamp;
        return tstamp != api::missing_timestamp ? tstamp : state.get_timestamp();
@@ -227,6 +237,10 @@ public:
        return _names;
    }

+    const cql_config& get_cql_config() const {
+        return _cql_config;
+    }
+
    void prepare(const std::vector<::shared_ptr<column_specification>>& specs);
 private:
    void fill_value_views();
@@ -244,7 +258,7 @@ query_options::query_options(query_options&& o, std::vector<OneMutationDataRange
    std::vector<query_options> tmp;
    tmp.reserve(values_ranges.size());
    std::transform(values_ranges.begin(), values_ranges.end(), std::back_inserter(tmp), [this](auto& values_range) {
-        return query_options(_consistency, _timeout_config, {}, std::move(values_range), _skip_metadata, _options, _cql_serialization_format);
+        return query_options(_cql_config, _consistency, _timeout_config, {}, std::move(values_range), _skip_metadata, _options, _cql_serialization_format);
    });
    _batch_options = std::move(tmp);
 }
--- a/cql3/query_processor.cc
+++ b/cql3/query_processor.cc
@@ -69,7 +69,7 @@ const std::chrono::minutes prepared_statements_cache::entry_expiry = std::chrono
 class query_processor::internal_state {
    service::query_state _qs;
 public:
-    internal_state() : _qs(service::client_state{service::client_state::internal_tag()}, empty_service_permit()) {
+    internal_state() : _qs(service::client_state::for_internal_calls(), empty_service_permit()) {
    }
    operator service::query_state&() {
        return _qs;
@@ -83,15 +83,8 @@ public:
    operator const service::client_state&() const {
        return _qs.get_client_state();
    }
-    api::timestamp_type next_timestamp() {
-        return _qs.get_client_state().get_timestamp();
-    }
 };

-api::timestamp_type query_processor::next_timestamp() {
-    return _internal_state->next_timestamp();
-}
-
 query_processor::query_processor(service::storage_proxy& proxy, database& db, query_processor::memory_config mcfg)
        : _migration_subscriber{std::make_unique<migration_subscriber>(this)}
        , _proxy(proxy)
@@ -121,38 +114,77 @@ query_processor::query_processor(service::storage_proxy& proxy, database& db, qu
    }
    _metrics.add_group("query_processor", qp_group);

+    sm::label cas_label("conditional");
+    auto cas_label_instance = cas_label("yes");
+    auto non_cas_label_instance = cas_label("no");
+
    _metrics.add_group(
            "cql",
            {
                    sm::make_derive(
                            "reads",
-                            _cql_stats.reads,
+                            _cql_stats.statements[size_t(statement_type::SELECT)],
                            sm::description("Counts a total number of CQL read requests.")),

                    sm::make_derive(
                            "inserts",
-                            _cql_stats.inserts,
-                            sm::description("Counts a total number of CQL INSERT requests.")),
+                            _cql_stats.statements[size_t(statement_type::INSERT)],
+                            sm::description("Counts a total number of CQL INSERT requests without conditions."),
+                            {non_cas_label_instance}),
+
+                    sm::make_derive(
+                            "inserts",
+                            _cql_stats.cas_statements[size_t(statement_type::INSERT)],
+                            sm::description("Counts a total number of CQL INSERT requests with conditions."),
+                            {cas_label_instance}),

                    sm::make_derive(
                            "updates",
-                            _cql_stats.updates,
-                            sm::description("Counts a total number of CQL UPDATE requests.")),
+                            _cql_stats.statements[size_t(statement_type::UPDATE)],
+                            sm::description("Counts a total number of CQL UPDATE requests without conditions."),
+                            {non_cas_label_instance}),
+
+                    sm::make_derive(
+                            "updates",
+                            _cql_stats.cas_statements[size_t(statement_type::UPDATE)],
+                            sm::description("Counts a total number of CQL UPDATE requests with conditions."),
+                            {cas_label_instance}),

                    sm::make_derive(
                            "deletes",
-                            _cql_stats.deletes,
-                            sm::description("Counts a total number of CQL DELETE requests.")),
+                            _cql_stats.statements[size_t(statement_type::DELETE)],
+                            sm::description("Counts a total number of CQL DELETE requests without conditions."),
+                            {non_cas_label_instance}),
+
+                    sm::make_derive(
+                            "deletes",
+                            _cql_stats.cas_statements[size_t(statement_type::DELETE)],
+                            sm::description("Counts a total number of CQL DELETE requests with conditions."),
+                            {cas_label_instance}),

                    sm::make_derive(
                            "batches",
                            _cql_stats.batches,
-                            sm::description("Counts a total number of CQL BATCH requests.")),
+                            sm::description("Counts a total number of CQL BATCH requests without conditions."),
+                            {non_cas_label_instance}),
+
+                    sm::make_derive(
+                            "batches",
+                            _cql_stats.cas_batches,
+                            sm::description("Counts a total number of CQL BATCH requests with conditions."),
+                            {cas_label_instance}),

                    sm::make_derive(
                            "statements_in_batches",
                            _cql_stats.statements_in_batches,
-                            sm::description("Counts a total number of sub-statements in CQL BATCH requests.")),
+                            sm::description("Counts a total number of sub-statements in CQL BATCH requests without conditions."),
+                            {non_cas_label_instance}),
+
+                    sm::make_derive(
+                            "statements_in_batches",
+                            _cql_stats.statements_in_cas_batches,
+                            sm::description("Counts a total number of sub-statements in CQL BATCH requests with conditions."),
+                            {cas_label_instance}),

                    sm::make_derive(
                            "batches_pure_logged",
@@ -284,7 +316,10 @@ query_processor::process(const sstring_view& query_string, service::query_state&
    auto p = get_statement(query_string, query_state.get_client_state());
    auto cql_statement = p->statement;
    if (cql_statement->get_bound_terms() != options.get_values_count()) {
-        throw exceptions::invalid_request_exception("Invalid amount of bind variables");
+        const auto msg = format("Invalid amount of bind variables: expected {:d} received {:d}",
+                cql_statement->get_bound_terms(),
+                options.get_values_count());
+        throw exceptions::invalid_request_exception(msg);
    }
    options.prepare(p->bound_names);

--- a/cql3/query_processor.hh
+++ b/cql3/query_processor.hh
@@ -298,16 +298,6 @@ public:
            const timeout_config& timeout_config,
            const std::initializer_list<data_value>& = { });

-    /*
-     * This function provides a timestamp that is guaranteed to be higher than any timestamp
-     * previously used in internal queries.
-     *
-     * This is useful because the client_state have a built-in mechanism to guarantee monotonicity.
-     * Bypassing that mechanism by the use of some other clock may yield times in the past, even if the operation
-     * was done in the future.
-     */
-    api::timestamp_type next_timestamp();
-
    future<::shared_ptr<cql_transport::messages::result_message::prepared>>
    prepare(sstring query_string, service::query_state& query_state);

--- a/cql3/restrictions/restrictions_config.hh
+++ b/cql3/restrictions/restrictions_config.hh
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2019 ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+
+#pragma once
+
+#include <cstdint>
+
+namespace cql3::restrictions {
+
+struct restrictions_config {
+    uint32_t partition_key_restrictions_max_cartesian_product_size = 100;
+    uint32_t clustering_key_restrictions_max_cartesian_product_size = 100;
+};
+
+}
--- a/cql3/restrictions/single_column_primary_key_restrictions.hh
+++ b/cql3/restrictions/single_column_primary_key_restrictions.hh
@@ -46,6 +46,7 @@
 #include "cartesian_product.hh"
 #include "cql3/restrictions/primary_key_restrictions.hh"
 #include "cql3/restrictions/single_column_restrictions.hh"
+#include "cql3/cql_config.hh"
 #include <boost/algorithm/cxx11/all_of.hpp>
 #include <boost/range/adaptor/transformed.hpp>
 #include <boost/range/adaptor/filtered.hpp>
@@ -55,6 +56,29 @@ namespace cql3 {

 namespace restrictions {

+namespace {
+
+template <typename ValueType>
+const char*
+restricted_component_name_v;
+
+template <>
+const char* restricted_component_name_v<partition_key> = "partition key";
+
+template <>
+const char* restricted_component_name_v<clustering_key> = "clustering key";
+
+
+inline
+void check_cartesian_product_size(size_t size, size_t max, const char* component_name) {
+    if (size > max) {
+        throw std::runtime_error(fmt::format("{} cartesian product size {} is greater than maximum {}",
+                component_name, size, max));
+    }
+}
+
+}
+
 /**
 * A set of single column restrictions on a primary key part (partition key or clustering key).
 */
@@ -69,6 +93,8 @@ private:
    schema_ptr _schema;
    bool _allow_filtering;
    ::shared_ptr<single_column_restrictions> _restrictions;
+private:
+    static uint32_t max_cartesian_product_size(const restrictions_config& config);
 public:
    single_column_primary_key_restrictions(schema_ptr schema, bool allow_filtering)
        : _schema(schema)
@@ -184,7 +210,10 @@ public:
        }

        std::vector<ValueType> result;
-        result.reserve(cartesian_product_size(value_vector));
+        auto size = cartesian_product_size(value_vector);
+        check_cartesian_product_size(size, max_cartesian_product_size(options.get_cql_config().restrictions),
+                restricted_component_name_v<ValueType>);
+        result.reserve(size);
        for (auto&& v : make_cartesian_product(value_vector)) {
            result.emplace_back(ValueType::from_optional_exploded(*_schema, std::move(v)));
        }
@@ -259,7 +288,10 @@ private:
                    return ranges;
                }

-                ranges.reserve(cartesian_product_size(vec_of_values));
+                auto size = cartesian_product_size(vec_of_values);
+                check_cartesian_product_size(size, max_cartesian_product_size(options.get_cql_config().restrictions),
+                        restricted_component_name_v<ValueType>);
+                ranges.reserve(size);
                for (auto&& prefix : make_cartesian_product(vec_of_values)) {
                    auto read_bound = [r, &prefix, &options, this](statements::bound bound) -> range_bound {
                        if (r->has_bound(bound)) {
@@ -300,7 +332,10 @@ private:
            vec_of_values.emplace_back(std::move(values));
        }

-        ranges.reserve(cartesian_product_size(vec_of_values));
+        auto size = cartesian_product_size(vec_of_values);
+        check_cartesian_product_size(size, max_cartesian_product_size(options.get_cql_config().restrictions),
+                restricted_component_name_v<ValueType>);
+        ranges.reserve(size);
        for (auto&& prefix : make_cartesian_product(vec_of_values)) {
            ranges.emplace_back(range_type::make_singular(ValueType::from_optional_exploded(*_schema, std::move(prefix))));
        }
@@ -443,7 +478,7 @@ inline bool single_column_primary_key_restrictions<clustering_key>::needs_filter
    // 3. a SLICE restriction isn't on a last place
    column_id position = 0;
    for (const auto& restriction : _restrictions->restrictions() | boost::adaptors::map_values) {
-        if (restriction->is_contains() || position != restriction->get_column_def().id) {
+        if (restriction->is_contains() || restriction->is_LIKE() || position != restriction->get_column_def().id) {
            return true;
        }
        if (!restriction->is_slice()) {
@@ -489,6 +524,19 @@ inline unsigned single_column_primary_key_restrictions<partition_key>::num_prefi
 using single_column_partition_key_restrictions = single_column_primary_key_restrictions<partition_key>;
 using single_column_clustering_key_restrictions = single_column_primary_key_restrictions<clustering_key>;

+template <>
+inline
+uint32_t single_column_primary_key_restrictions<partition_key>::max_cartesian_product_size(const restrictions_config& config) {
+    return config.partition_key_restrictions_max_cartesian_product_size;
+}
+
+template <>
+inline
+uint32_t single_column_primary_key_restrictions<clustering_key>::max_cartesian_product_size(const restrictions_config& config) {
+    return config.clustering_key_restrictions_max_cartesian_product_size;
+}
+
+
 }
 }

--- a/cql3/restrictions/single_column_restriction.hh
+++ b/cql3/restrictions/single_column_restriction.hh
@@ -41,6 +41,8 @@

 #pragma once

+#include <optional>
+
 #include "cql3/restrictions/restriction.hh"
 #include "cql3/restrictions/term_slice.hh"
 #include "cql3/term.hh"
@@ -51,6 +53,7 @@
 #include "exceptions/exceptions.hh"
 #include "keys.hh"
 #include "mutation_partition.hh"
+#include "utils/like_matcher.hh"

 namespace cql3 {

@@ -384,6 +387,11 @@ public:
 class single_column_restriction::LIKE final : public single_column_restriction {
 private:
    ::shared_ptr<term> _value;
+    /// Matches cell value against LIKE pattern.  Optional because it cannot be initialized in the
+    /// constructor when the pattern is a bind marker.  Mutable because it is initialized on demand
+    /// in is_satisfied_by().
+    mutable std::optional<like_matcher> _matcher;
+    mutable bytes_opt _last_pattern; ///< Pattern from which _matcher was last initialized.
 public:
    LIKE(const column_definition& column_def, ::shared_ptr<term> value)
        : single_column_restriction(op::LIKE, column_def)
@@ -424,6 +432,14 @@ public:
    virtual ::shared_ptr<single_column_restriction> apply_to(const column_definition& cdef) override {
        return ::make_shared<LIKE>(cdef, _value);
    }
+
+  private:
+    /// If necessary, reinitializes _matcher and _last_pattern.
+    ///
+    /// Invoked from is_satisfied_by(), so must be const.
+    ///
+    /// @return true iff _value was successfully translated to LIKE pattern (regardless of initialization)
+    bool init_matcher(const query_options& options) const;
 };

 // This holds CONTAINS, CONTAINS_KEY, and map[key] = value restrictions because we might want to have any combination of them.
--- a/cql3/restrictions/statement_restrictions.cc
+++ b/cql3/restrictions/statement_restrictions.cc
@@ -35,7 +35,6 @@
 #include "types/map.hh"
 #include "types/list.hh"
 #include "types/set.hh"
-#include "utils/like_matcher.hh"

 namespace cql3 {
 namespace restrictions {
@@ -236,6 +235,16 @@ statement_restrictions::statement_restrictions(database& db,
        _index_restrictions.push_back(_partition_key_restrictions);
    }

+    // If the only updated/deleted columns are static, then we don't need clustering columns.
+    // And in fact, unless it is an INSERT, we reject if clustering columns are provided as that
+    // suggest something unintended. For instance, given:
+    //   CREATE TABLE t (k int, v int, s int static, PRIMARY KEY (k, v))
+    // it can make sense to do:
+    //   INSERT INTO t(k, v, s) VALUES (0, 1, 2)
+    // but both
+    //   UPDATE t SET s = 3 WHERE k = 0 AND v = 1
+    //   DELETE s FROM t WHERE k = 0 AND v = 1
+    // sounds like you don't really understand what your are doing.
    if (selects_only_static_columns && has_clustering_columns_restriction()) {
        if (type.is_update() || type.is_delete()) {
            throw exceptions::invalid_request_exception(format("Invalid restrictions on clustering columns since the {} statement modifies only static columns", type));
@@ -381,28 +390,45 @@ std::vector<const column_definition*> statement_restrictions::get_column_defs_fo
    if (need_filtering()) {
        auto& sim = db.find_column_family(_schema).get_index_manager();
        auto [opt_idx, _] = find_idx(sim);
-        auto column_uses_indexing = [&opt_idx] (const column_definition* cdef) {
-            return opt_idx && opt_idx->depends_on(*cdef);
+        auto column_uses_indexing = [&opt_idx] (const column_definition* cdef, ::shared_ptr<single_column_restriction> restr) {
+            return opt_idx && restr && restr->is_supported_by(*opt_idx);
        };
+        auto single_pk_restrs = dynamic_pointer_cast<single_column_partition_key_restrictions>(_partition_key_restrictions);
        if (_partition_key_restrictions->needs_filtering(*_schema)) {
            for (auto&& cdef : _partition_key_restrictions->get_column_defs()) {
-                if (!column_uses_indexing(cdef)) {
+                ::shared_ptr<single_column_restriction> restr;
+                if (single_pk_restrs) {
+                    auto it = single_pk_restrs->restrictions().find(cdef);
+                    if (it != single_pk_restrs->restrictions().end()) {
+                        restr = dynamic_pointer_cast<single_column_restriction>(it->second);
+                    }
+                }
+                if (!column_uses_indexing(cdef, restr)) {
                    column_defs_for_filtering.emplace_back(cdef);
                }
            }
        }
+        auto single_ck_restrs = dynamic_pointer_cast<single_column_clustering_key_restrictions>(_clustering_columns_restrictions);
        const bool pk_has_unrestricted_components = _partition_key_restrictions->has_unrestricted_components(*_schema);
        if (pk_has_unrestricted_components || _clustering_columns_restrictions->needs_filtering(*_schema)) {
            column_id first_filtering_id = pk_has_unrestricted_components ? 0 : _schema->clustering_key_columns().begin()->id +
                    _clustering_columns_restrictions->num_prefix_columns_that_need_not_be_filtered();
            for (auto&& cdef : _clustering_columns_restrictions->get_column_defs()) {
-                if (cdef->id >= first_filtering_id && !column_uses_indexing(cdef)) {
+                ::shared_ptr<single_column_restriction> restr;
+                if (single_pk_restrs) {
+                    auto it = single_ck_restrs->restrictions().find(cdef);
+                    if (it != single_ck_restrs->restrictions().end()) {
+                        restr = dynamic_pointer_cast<single_column_restriction>(it->second);
+                    }
+                }
+                if (cdef->id >= first_filtering_id && !column_uses_indexing(cdef, restr)) {
                    column_defs_for_filtering.emplace_back(cdef);
                }
            }
        }
        for (auto&& cdef : _nonprimary_key_restrictions->get_column_defs()) {
-            if (!column_uses_indexing(cdef)) {
+            auto restr = dynamic_pointer_cast<single_column_restriction>(_nonprimary_key_restrictions->get_restriction(*cdef));
+            if (!column_uses_indexing(cdef, restr)) {
                column_defs_for_filtering.emplace_back(cdef);
            }
        }
@@ -707,7 +733,7 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
        return false;
    }

-    auto col_type = static_pointer_cast<const collection_type_impl>(_column_def.type);
+    auto col_type = static_cast<const collection_type_impl*>(_column_def.type.get());
    if ((!_keys.empty() || !_entry_keys.empty()) && !col_type->is_map()) {
        return false;
    }
@@ -717,8 +743,8 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
    auto&& element_type = col_type->is_set() ? col_type->name_comparator() : col_type->value_comparator();
    if (_column_def.type->is_multi_cell()) {
        auto cell = cells.find_cell(_column_def.id);
-      return cell->as_collection_mutation().data.with_linearized([&] (bytes_view collection_bv) {
-        auto&& elements = col_type->deserialize_mutation_form(collection_bv).cells;
+      return cell->as_collection_mutation().with_deserialized(*col_type, [&] (collection_mutation_view_description mv) {
+        auto&& elements = mv.cells;
        auto end = std::remove_if(elements.begin(), elements.end(), [now] (auto&& element) {
            return element.second.is_dead(now);
        });
@@ -791,6 +817,7 @@ bool single_column_restriction::contains::is_satisfied_by(const schema& schema,
 }

 bool single_column_restriction::contains::is_satisfied_by(bytes_view collection_bv, const query_options& options) const {
+    assert(_column_def.type->is_collection());
    auto col_type = static_pointer_cast<const collection_type_impl>(_column_def.type);
    if (collection_bv.empty() || ((!_keys.empty() || !_entry_keys.empty()) && !col_type->is_map())) {
        return false;
@@ -916,6 +943,18 @@ bool token_restriction::slice::is_satisfied_by(const schema& schema,
    return satisfied;
 }

+bool single_column_restriction::LIKE::init_matcher(const query_options& options) const {
+    auto pattern = to_bytes_opt(_value->bind_and_get(options));
+    if (!pattern) {
+        return false;
+    }
+    if (!_matcher || pattern != _last_pattern) {
+        _matcher.emplace(*pattern);
+        _last_pattern = std::move(pattern);
+    }
+    return true;
+}
+
 bool single_column_restriction::LIKE::is_satisfied_by(const schema& schema,
        const partition_key& key,
        const clustering_key_prefix& ckey,
@@ -926,19 +965,25 @@ bool single_column_restriction::LIKE::is_satisfied_by(const schema& schema,
        throw exceptions::invalid_request_exception("LIKE is allowed only on string types");
    }
    auto cell_value = get_value(schema, key, ckey, cells, now);
-    return !cell_value ? false :
-            cell_value->with_linearized([&] (bytes_view data) {
-                 auto pattern = to_bytes_opt(_value->bind_and_get(options));
-                 return pattern ? like_matcher(*pattern)(data) : false;
-            });
+    if (!cell_value) {
+        return false;
+    }
+    if (!init_matcher(options)) {
+        return false;
+    }
+    return cell_value->with_linearized([&] (bytes_view data) {
+        return (*_matcher)(data);
+    });
 }

 bool single_column_restriction::LIKE::is_satisfied_by(bytes_view data, const query_options& options) const {
    if (!_column_def.type->is_string()) {
        throw exceptions::invalid_request_exception("LIKE is allowed only on string types");
    }
-    auto pattern = to_bytes_opt(_value->bind_and_get(options));
-    return pattern ? like_matcher(*pattern)(data) : false;
+    if (!init_matcher(options)) {
+        return false;
+    }
+    return (*_matcher)(data);
 }

 }
--- a/cql3/restrictions/statement_restrictions.hh
+++ b/cql3/restrictions/statement_restrictions.hh
@@ -137,6 +137,20 @@ public:
        return _partition_key_restrictions->is_IN();
    }

+    /**
+     * Checks if the restrictions on the clustering key is an IN restriction.
+     *
+     * @return <code>true</code> the restrictions on the partition key is an IN restriction, <code>false</code>
+     * otherwise.
+     */
+    bool clustering_key_restrictions_has_IN() const {
+        return _clustering_columns_restrictions->is_IN();
+    }
+
+    bool clustering_key_restrictions_has_only_eq() const {
+        return _clustering_columns_restrictions->empty() || _clustering_columns_restrictions->is_all_eq();
+    }
+
    /**
     * Checks if the query request a range of partition keys.
     *
--- a/cql3/selection/abstract_function_selector.hh
+++ b/cql3/selection/abstract_function_selector.hh
@@ -92,6 +92,14 @@ public:
            : abstract_function_selector(fun, std::move(arg_selectors))
            , _tfun(dynamic_pointer_cast<T>(fun)) {
    }
+
+    const functions::function_name& name() const {
+        return _tfun->name();
+    }
+
+    virtual sstring assignment_testable_source_context() const override {
+        return format("{}", this->name());
+    }
 };

 }
--- a/cql3/selection/aggregate_function_selector.hh
+++ b/cql3/selection/aggregate_function_selector.hh
@@ -79,11 +79,6 @@ public:
                    dynamic_pointer_cast<functions::aggregate_function>(func), std::move(arg_selectors))
            , _aggregate(fun()->new_aggregate()) {
    }
-
-    virtual sstring assignment_testable_source_context() const override {
-        // FIXME:
-        return "FIXME";
-    }
 };

 }
--- a/cql3/selection/scalar_function_selector.hh
+++ b/cql3/selection/scalar_function_selector.hh
@@ -82,12 +82,6 @@ public:
            : abstract_function_selector_for<functions::scalar_function>(
                dynamic_pointer_cast<functions::scalar_function>(std::move(fun)), std::move(arg_selectors)) {
    }
-
-    virtual sstring assignment_testable_source_context() const override {
-        // FIXME:
-        return "FIXME";
-    }
-
 };

 }
--- a/cql3/selection/selectable.cc
+++ b/cql3/selection/selectable.cc
@@ -142,20 +142,19 @@ shared_ptr<selector::factory>
 selectable::with_field_selection::new_selector_factory(database& db, schema_ptr s, std::vector<const column_definition*>& defs) {
    auto&& factory = _selected->new_selector_factory(db, s, defs);
    auto&& type = factory->new_instance()->get_type();
-    auto&& ut = dynamic_pointer_cast<const user_type_impl>(type->underlying_type());
-    if (!ut) {
+    if (!type->underlying_type()->is_user_type()) {
        throw exceptions::invalid_request_exception(
-                format("Invalid field selection: {} of type {} is not a user type",
-                       _selected->to_string(), factory->new_instance()->get_type()->as_cql3_type()));
+                format("Invalid field selection: {} of type {} is not a user type", _selected->to_string(), type->as_cql3_type()));
    }
-    for (size_t i = 0; i < ut->size(); ++i) {
-        if (ut->field_name(i) != _field->bytes_) {
-            continue;
-        }
-        return field_selector::new_factory(std::move(ut), i, std::move(factory));
+
+    auto ut = static_pointer_cast<const user_type_impl>(type->underlying_type());
+    auto idx = ut->idx_of_field(_field->bytes_);
+    if (!idx) {
+        throw exceptions::invalid_request_exception(format("{} of type {} has no field {}",
+                                                           _selected->to_string(), ut->as_cql3_type(), _field));
    }
-    throw exceptions::invalid_request_exception(format("{} of type {} has no field {}",
-                                                       _selected->to_string(), ut->as_cql3_type(), _field));
+
+    return field_selector::new_factory(std::move(ut), *idx, std::move(factory));
 }

 sstring
--- a/cql3/selection/selection.cc
+++ b/cql3/selection/selection.cc
@@ -380,7 +380,7 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
                                                         const std::vector<bytes>& partition_key,
                                                         const std::vector<bytes>& clustering_key,
                                                         const query::result_row_view& static_row,
-                                                         const query::result_row_view& row) const {
+                                                         const query::result_row_view* row) const {
    static logging::logger rlogger("restrictions_filter");

    if (_current_partition_key_does_not_match || _current_static_row_does_not_match || _remaining == 0 || _per_partition_remaining == 0) {
@@ -395,14 +395,17 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
    }

    auto static_row_iterator = static_row.iterator();
-    auto row_iterator = row.iterator();
+    auto row_iterator = row ? std::optional<query::result_row_view::iterator_type>(row->iterator()) : std::nullopt;
    auto non_pk_restrictions_map = _restrictions->get_non_pk_restriction();
    for (auto&& cdef : selection.get_columns()) {
        switch (cdef->kind) {
        case column_kind::static_column:
            // fallthrough
        case column_kind::regular_column: {
-            auto& cell_iterator = (cdef->kind == column_kind::static_column) ? static_row_iterator : row_iterator;
+            if (cdef->kind == column_kind::regular_column && !row_iterator) {
+                continue;
+            }
+            auto& cell_iterator = (cdef->kind == column_kind::static_column) ? static_row_iterator : *row_iterator;
            std::optional<query::result_bytes_view> result_view_opt;
            if (cdef->type->is_multi_cell()) {
                result_view_opt = cell_iterator.next_collection_cell();
@@ -458,6 +461,9 @@ bool result_set_builder::restrictions_filter::do_filter(const selection& selecti
            if (restr_it == clustering_key_restrictions_map.end()) {
                continue;
            }
+            if (clustering_key.empty()) {
+                return false;
+            }
            restrictions::single_column_restriction& restriction = *restr_it->second;
            const bytes& value_to_check = clustering_key[cdef->id];
            bool pk_restriction_matches = restriction.is_satisfied_by(value_to_check, _options);
@@ -477,7 +483,7 @@ bool result_set_builder::restrictions_filter::operator()(const selection& select
                                                         const std::vector<bytes>& partition_key,
                                                         const std::vector<bytes>& clustering_key,
                                                         const query::result_row_view& static_row,
-                                                         const query::result_row_view& row) const {
+                                                         const query::result_row_view* row) const {
    const bool accepted = do_filter(selection, partition_key, clustering_key, static_row, row);
    if (!accepted) {
        ++_rows_dropped;
--- a/cql3/selection/selection.hh
+++ b/cql3/selection/selection.hh
@@ -257,7 +257,7 @@ private:
 public:
    class nop_filter {
    public:
-        inline bool operator()(const selection&, const std::vector<bytes>&, const std::vector<bytes>&, const query::result_row_view&, const query::result_row_view&) const {
+        inline bool operator()(const selection&, const std::vector<bytes>&, const std::vector<bytes>&, const query::result_row_view&, const query::result_row_view*) const {
            return true;
        }
        void reset(const partition_key* = nullptr) {
@@ -300,13 +300,13 @@ public:
            , _rows_fetched_for_last_partition(rows_fetched_for_last_partition)
            , _last_pkey(std::move(last_pkey))
        { }
-        bool operator()(const selection& selection, const std::vector<bytes>& pk, const std::vector<bytes>& ck, const query::result_row_view& static_row, const query::result_row_view& row) const;
+        bool operator()(const selection& selection, const std::vector<bytes>& pk, const std::vector<bytes>& ck, const query::result_row_view& static_row, const query::result_row_view* row) const;
        void reset(const partition_key* key = nullptr);
        uint32_t get_rows_dropped() const {
            return _rows_dropped;
        }
    private:
-        bool do_filter(const selection& selection, const std::vector<bytes>& pk, const std::vector<bytes>& ck, const query::result_row_view& static_row, const query::result_row_view& row) const;
+        bool do_filter(const selection& selection, const std::vector<bytes>& pk, const std::vector<bytes>& ck, const query::result_row_view& static_row, const query::result_row_view* row) const;
    };

    result_set_builder(const selection& s, gc_clock::time_point now, cql_serialization_format sf,
@@ -379,7 +379,7 @@ public:
        void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
            auto static_row_iterator = static_row.iterator();
            auto row_iterator = row.iterator();
-            if (!_filter(_selection, _partition_key, _clustering_key, static_row, row)) {
+            if (!_filter(_selection, _partition_key, _clustering_key, static_row, &row)) {
                return;
            }
            _builder.new_row();
@@ -409,6 +409,9 @@ public:

        uint32_t accept_partition_end(const query::result_row_view& static_row) {
            if (_row_count == 0) {
+                if (!_filter(_selection, _partition_key, _clustering_key, static_row, nullptr)) {
+                    return _filter.get_rows_dropped();
+                }
                _builder.new_row();
                auto static_row_iterator = static_row.iterator();
                for (auto&& def : _selection.get_columns()) {
--- a/cql3/sets.cc
+++ b/cql3/sets.cc
@@ -38,12 +38,22 @@ shared_ptr<term>
 sets::literal::prepare(database& db, const sstring& keyspace, shared_ptr<column_specification> receiver) {
    validate_assignable_to(db, keyspace, receiver);

-    // We've parsed empty maps as a set literal to break the ambiguity so
-    // handle that case now
-    if (_elements.empty() && dynamic_pointer_cast<const map_type_impl>(receiver->type)) {
-        // use empty_type for comparator, set is empty anyway.
-        std::map<bytes, bytes, serialized_compare> m(empty_type->as_less_comparator());
-        return ::make_shared<maps::value>(std::move(m));
+    if (_elements.empty()) {
+
+        // In Cassandra, an empty (unfrozen) map/set/list is equivalent to the column being null. In
+        // other words a non-frozen collection only exists if it has elements.  Return nullptr right
+        // away to simplify predicate evaluation.  See also
+        // https://issues.apache.org/jira/browse/CASSANDRA-5141
+        if (receiver->type->is_multi_cell()) {
+            return cql3::constants::null_literal::NULL_VALUE;
+        }
+        // We've parsed empty maps as a set literal to break the ambiguity so
+        // handle that case now. This branch works for frozen sets/maps only.
+        if (dynamic_pointer_cast<const map_type_impl>(receiver->type)) {
+            // use empty_type for comparator, set is empty anyway.
+            std::map<bytes, bytes, serialized_compare> m(empty_type->as_less_comparator());
+            return ::make_shared<maps::value>(std::move(m));
+        }
    }

    auto value_spec = value_spec_of(receiver);
@@ -122,16 +132,16 @@ sets::literal::to_string() const {
 }

 sets::value
-sets::value::from_serialized(const fragmented_temporary_buffer::view& val, set_type type, cql_serialization_format sf) {
+sets::value::from_serialized(const fragmented_temporary_buffer::view& val, const set_type_impl& type, cql_serialization_format sf) {
    try {
        // Collections have this small hack that validate cannot be called on a serialized object,
        // but compose does the validation (so we're fine).
        // FIXME: deserializeForNativeProtocol?!
      return with_linearized(val, [&] (bytes_view v) {
-        auto s = value_cast<set_type_impl::native_type>(type->deserialize(v, sf));
-        std::set<bytes, serialized_compare> elements(type->get_elements_type()->as_less_comparator());
+        auto s = value_cast<set_type_impl::native_type>(type.deserialize(v, sf));
+        std::set<bytes, serialized_compare> elements(type.get_elements_type()->as_less_comparator());
        for (auto&& element : s) {
-            elements.insert(elements.end(), type->get_elements_type()->decompose(element));
+            elements.insert(elements.end(), type.get_elements_type()->decompose(element));
        }
        return value(std::move(elements));
      });
@@ -227,15 +237,15 @@ sets::marker::bind(const query_options& options) {
    } else if (value.is_unset_value()) {
        return constants::UNSET_VALUE;
    } else {
-        auto as_set_type = static_pointer_cast<const set_type_impl>(_receiver->type);
+        auto& type = static_cast<const set_type_impl&>(*_receiver->type);
        try {
            with_linearized(*value, [&] (bytes_view v) {
-                as_set_type->validate(v, options.get_cql_serialization_format());
+                type.validate(v, options.get_cql_serialization_format());
            });
        } catch (marshal_exception& e) {
            throw exceptions::invalid_request_exception(e.what());
        }
-        return make_shared(value::from_serialized(*value, as_set_type, options.get_cql_serialization_format()));
+        return make_shared(value::from_serialized(*value, type, options.get_cql_serialization_format()));
    }
 }

@@ -251,12 +261,10 @@ sets::setter::execute(mutation& m, const clustering_key_prefix& row_key, const u
        return;
    }
    if (column.type->is_multi_cell()) {
-        // delete + add
-        collection_type_impl::mutation mut;
+        // Delete all cells first, then add new ones
+        collection_mutation_description mut;
        mut.tomb = params.make_tombstone_just_before();
-        auto ctype = static_pointer_cast<const set_type_impl>(column.type);
-        auto col_mut = ctype->serialize_mutation_form(std::move(mut));
-        m.set_cell(row_key, column, std::move(col_mut));
+        m.set_cell(row_key, column, mut.serialize(*column.type));
    }
    adder::do_add(m, row_key, params, value, column);
 }
@@ -275,21 +283,21 @@ void
 sets::adder::do_add(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params,
        shared_ptr<term> value, const column_definition& column) {
    auto set_value = dynamic_pointer_cast<sets::value>(std::move(value));
-    auto set_type = dynamic_pointer_cast<const set_type_impl>(column.type);
+    auto set_type = dynamic_cast<const set_type_impl*>(column.type.get());
+    assert(set_type);
    if (column.type->is_multi_cell()) {
-        // FIXME: mutation_view? not compatible with params.make_cell().
-        collection_type_impl::mutation mut;
-
        if (!set_value || set_value->_elements.empty()) {
            return;
        }

+        // FIXME: collection_mutation_view_description? not compatible with params.make_cell().
+        collection_mutation_description mut;
+
        for (auto&& e : set_value->_elements) {
            mut.cells.emplace_back(e, params.make_cell(*set_type->value_comparator(), bytes_view(), atomic_cell::collection_member::yes));
        }
-        auto smut = set_type->serialize_mutation_form(mut);

-        m.set_cell(row_key, column, std::move(smut));
+        m.set_cell(row_key, column, mut.serialize(*set_type));
    } else if (set_value != nullptr) {
        // for frozen sets, we're overwriting the whole cell
        auto v = set_type->serialize_partially_deserialized_form(
@@ -310,7 +318,7 @@ sets::discarder::execute(mutation& m, const clustering_key_prefix& row_key, cons
        return;
    }

-    collection_type_impl::mutation mut;
+    collection_mutation_description mut;
    auto kill = [&] (bytes idx) {
        mut.cells.push_back({std::move(idx), params.make_dead_cell()});
    };
@@ -320,10 +328,7 @@ sets::discarder::execute(mutation& m, const clustering_key_prefix& row_key, cons
    for (auto&& e : svalue->_elements) {
        kill(e);
    }
-    auto ctype = static_pointer_cast<const collection_type_impl>(column.type);
-    m.set_cell(row_key, column,
-            atomic_cell_or_collection::from_collection_mutation(
-                    ctype->serialize_mutation_form(mut)));
+    m.set_cell(row_key, column, mut.serialize(*column.type));
 }

 void sets::element_discarder::execute(mutation& m, const clustering_key_prefix& row_key, const update_parameters& params)
@@ -333,10 +338,9 @@ void sets::element_discarder::execute(mutation& m, const clustering_key_prefix&
    if (!elt) {
        throw exceptions::invalid_request_exception("Invalid null set element");
    }
-    collection_type_impl::mutation mut;
+    collection_mutation_description mut;
    mut.cells.emplace_back(*elt->get(params._options), params.make_dead_cell());
-    auto ctype = static_pointer_cast<const collection_type_impl>(column.type);
-    m.set_cell(row_key, column, ctype->serialize_mutation_form(mut));
+    m.set_cell(row_key, column, mut.serialize(*column.type));
 }

 }
--- a/cql3/sets.hh
+++ b/cql3/sets.hh
@@ -78,7 +78,7 @@ public:
        value(std::set<bytes, serialized_compare> elements)
                : _elements(std::move(elements)) {
        }
-        static value from_serialized(const fragmented_temporary_buffer::view& v, set_type type, cql_serialization_format sf);
+        static value from_serialized(const fragmented_temporary_buffer::view& v, const set_type_impl& type, cql_serialization_format sf);
        virtual cql3::raw_value get(const query_options& options) override;
        virtual bytes get_with_protocol_version(cql_serialization_format sf) override;
        bool equals(set_type st, const value& v);
--- a/cql3/statements/alter_keyspace_statement.cc
+++ b/cql3/statements/alter_keyspace_statement.cc
@@ -42,6 +42,7 @@
 #include "alter_keyspace_statement.hh"
 #include "prepared_statement.hh"
 #include "service/migration_manager.hh"
+#include "service/storage_service.hh"
 #include "db/system_keyspace.hh"
 #include "database.hh"

@@ -93,7 +94,8 @@ void cql3::statements::alter_keyspace_statement::validate(service::storage_proxy

 future<shared_ptr<cql_transport::event::schema_change>> cql3::statements::alter_keyspace_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only) {
    auto old_ksm = service::get_local_storage_proxy().get_db().local().find_keyspace(_name).metadata();
-    return service::get_local_migration_manager().announce_keyspace_update(_attrs->as_ks_metadata_update(old_ksm), is_local_only).then([this] {
+    const auto& tm = service::get_local_storage_service().get_token_metadata();
+    return service::get_local_migration_manager().announce_keyspace_update(_attrs->as_ks_metadata_update(old_ksm, tm), is_local_only).then([this] {
        using namespace cql_transport;
        return make_shared<event::schema_change>(
                event::schema_change::change_type::UPDATED,
--- a/cql3/statements/alter_table_statement.cc
+++ b/cql3/statements/alter_table_statement.cc
@@ -43,10 +43,12 @@
 #include "index/secondary_index_manager.hh"
 #include "prepared_statement.hh"
 #include "service/migration_manager.hh"
+#include "service/storage_service.hh"
 #include "validation.hh"
 #include "db/extensions.hh"
 #include <boost/range/adaptor/filtered.hpp>
 #include <boost/range/adaptor/transformed.hpp>
+#include "cdc/cdc.hh"
 #include "cql3/util.hh"
 #include "view_info.hh"
 #include "database.hh"
@@ -221,7 +223,7 @@ void alter_table_statement::add_column(schema_ptr schema, const table& cf, schem
            schema_builder builder(view);
            if (view->view_info()->include_all_columns()) {
                builder.with_column(column_name->name(), type);
-            } else if (view->view_info()->base_non_pk_columns_in_view_pk().empty()) {
+            } else if (!view->view_info()->base_non_pk_column_in_view_pk()) {
                db::view::create_virtual_column(builder, column_name->name(), type);
            }
            view_updates.push_back(view_ptr(builder.build()));
@@ -310,6 +312,8 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
         }
    };

+    bool create_cdc = false;
+    bool delete_cdc = false;
    switch (_type) {
    case alter_table_statement::type::add:
        assert(_column_changes.size());
@@ -352,6 +356,11 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
            throw exceptions::invalid_request_exception("Cannot set default_time_to_live on a table with counters");
        }

+        {
+            bool enable_cdc = _properties->get_cdc_options() && cdc::options(*_properties->get_cdc_options()).enabled();
+            create_cdc = enable_cdc && !schema->cdc_options().enabled();
+            delete_cdc = !enable_cdc && schema->cdc_options().enabled();
+        }
        _properties->apply_to_builder(cfm, db.extensions());
        break;

@@ -388,7 +397,17 @@ future<shared_ptr<cql_transport::event::schema_change>> alter_table_statement::a
        break;
    }

-    return service::get_local_migration_manager().announce_column_family_update(cfm.build(), false, std::move(view_updates), is_local_only).then([this] {
+    auto f = service::get_local_migration_manager().announce_column_family_update(cfm.build(), false, std::move(view_updates), is_local_only);
+    if (create_cdc) {
+        f = f.then([&proxy, schema = std::move(schema)] {
+            return cdc::setup(cdc::db_context::builder(proxy).build(), schema);
+        });
+    } else if (delete_cdc) {
+        f = f.then([&proxy, schema = std::move(schema)] {
+            return cdc::remove(cdc::db_context::builder(proxy).build(), schema->ks_name(), schema->cf_name());
+        });
+    }
+    return f.then([this] {
        using namespace cql_transport;
        return make_shared<event::schema_change>(
                event::schema_change::change_type::UPDATED,
--- a/cql3/statements/alter_type_statement.cc
+++ b/cql3/statements/alter_type_statement.cc
@@ -78,16 +78,6 @@ const sstring& alter_type_statement::keyspace() const
    return _name.get_keyspace();
 }

-static std::optional<uint32_t> get_idx_of_field(user_type type, shared_ptr<column_identifier> field)
-{
-    for (uint32_t i = 0; i < type->field_names().size(); ++i) {
-        if (field->name() == type->field_names()[i]) {
-            return {i};
-        }
-    }
-    return {};
-}
-
 void alter_type_statement::do_announce_migration(database& db, ::keyspace& ks, bool is_local_only)
 {
    auto&& all_types = ks.metadata()->user_types()->get_all_types();
@@ -122,18 +112,6 @@ void alter_type_statement::do_announce_migration(database& db, ::keyspace& ks, b
            }
        }
    }
-
-    // Other user types potentially using the updated type
-    for (auto&& ut : ks.metadata()->user_types()->get_all_types() | boost::adaptors::map_values) {
-        // Re-updating the type we've just updated would be harmless but useless so we avoid it.
-        if (ut->_keyspace != updated->_keyspace || ut->_name != updated->_name) {
-            auto upd_opt = ut->update_user_type(updated);
-            if (upd_opt) {
-                service::get_local_migration_manager().announce_type_update(
-                    static_pointer_cast<const user_type_impl>(*upd_opt), is_local_only).get();
-            }
-        }
-    }
 }

 future<shared_ptr<cql_transport::event::schema_change>> alter_type_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only)
@@ -165,25 +143,30 @@ alter_type_statement::add_or_alter::add_or_alter(const ut_name& name, bool is_ad

 user_type alter_type_statement::add_or_alter::do_add(database& db, user_type to_update) const
 {
-    if (get_idx_of_field(to_update, _field_name)) {
+    if (to_update->idx_of_field(_field_name->name())) {
        throw exceptions::invalid_request_exception(format("Cannot add new field {} to type {}: a field of the same name already exists",
            _field_name->to_string(), _name.to_string()));
    }

+    if (to_update->size() == max_udt_fields) {
+        throw exceptions::invalid_request_exception(format("Cannot add new field to type {}: maximum number of fields reached", _name));
+    }
+
    std::vector<bytes> new_names(to_update->field_names());
    new_names.push_back(_field_name->name());
    std::vector<data_type> new_types(to_update->field_types());
    auto&& add_type = _field_type->prepare(db, keyspace()).get_type();
    if (add_type->references_user_type(to_update->_keyspace, to_update->_name)) {
-        throw exceptions::invalid_request_exception(format("Cannot add new field {} of type {} to type {} as this would create a circular reference", _field_name->to_string(), _field_type->to_string(), _name.to_string()));
+        throw exceptions::invalid_request_exception(format("Cannot add new field {} of type {} to type {} as this would create a circular reference",
+                    *_field_name, *_field_type, _name.to_string()));
    }
    new_types.push_back(std::move(add_type));
-    return user_type_impl::get_instance(to_update->_keyspace, to_update->_name, std::move(new_names), std::move(new_types));
+    return user_type_impl::get_instance(to_update->_keyspace, to_update->_name, std::move(new_names), std::move(new_types), to_update->is_multi_cell());
 }

 user_type alter_type_statement::add_or_alter::do_alter(database& db, user_type to_update) const
 {
-    std::optional<uint32_t> idx = get_idx_of_field(to_update, _field_name);
+    auto idx = to_update->idx_of_field(_field_name->name());
    if (!idx) {
        throw exceptions::invalid_request_exception(format("Unknown field {} in type {}", _field_name->to_string(), _name.to_string()));
    }
@@ -192,12 +175,12 @@ user_type alter_type_statement::add_or_alter::do_alter(database& db, user_type t
    auto new_type = _field_type->prepare(db, keyspace()).get_type();
    if (!new_type->is_compatible_with(*previous)) {
        throw exceptions::invalid_request_exception(format("Type {} in incompatible with previous type {} of field {} in user type {}",
-            _field_type->to_string(), previous->as_cql3_type().to_string(), _field_name->to_string(), _name.to_string()));
+            *_field_type, previous->as_cql3_type(), *_field_name, _name));
    }

    std::vector<data_type> new_types(to_update->field_types());
    new_types[*idx] = new_type;
-    return user_type_impl::get_instance(to_update->_keyspace, to_update->_name, to_update->field_names(), std::move(new_types));
+    return user_type_impl::get_instance(to_update->_keyspace, to_update->_name, to_update->field_names(), std::move(new_types), to_update->is_multi_cell());
 }

 user_type alter_type_statement::add_or_alter::make_updated_type(database& db, user_type to_update) const
@@ -220,13 +203,13 @@ user_type alter_type_statement::renames::make_updated_type(database& db, user_ty
    std::vector<bytes> new_names(to_update->field_names());
    for (auto&& rename : _renames) {
        auto&& from = rename.first;
-        std::optional<uint32_t> idx = get_idx_of_field(to_update, from);
+        auto idx = to_update->idx_of_field(from->name());
        if (!idx) {
            throw exceptions::invalid_request_exception(format("Unknown field {} in type {}", from->to_string(), _name.to_string()));
        }
        new_names[*idx] = rename.second->name();
    }
-    auto&& updated = user_type_impl::get_instance(to_update->_keyspace, to_update->_name, std::move(new_names), to_update->field_types());
+    auto&& updated = user_type_impl::get_instance(to_update->_keyspace, to_update->_name, std::move(new_names), to_update->field_types(), to_update->is_multi_cell());
    create_type_statement::check_for_duplicate_names(updated);
    return updated;
 }
--- a/cql3/statements/batch_statement.cc
+++ b/cql3/statements/batch_statement.cc
@@ -40,27 +40,10 @@
 #include "batch_statement.hh"
 #include "raw/batch_statement.hh"
 #include "db/config.hh"
+#include "db/consistency_level_validations.hh"
 #include "database.hh"
 #include <seastar/core/execution_stage.hh>
-
-namespace {
-
-struct mutation_equals_by_key {
-    bool operator()(const mutation& m1, const mutation& m2) const {
-        return m1.schema() == m2.schema()
-                && m1.decorated_key().equal(*m1.schema(), m2.decorated_key());
-    }
-};
-
-struct mutation_hash_by_key {
-    size_t operator()(const mutation& m) const {
-        auto dk_hash = std::hash<dht::decorated_key>();
-        return dk_hash(m.decorated_key());
-    }
-};
-
-}
-
+#include "cas_request.hh"

 namespace cql3 {

@@ -79,12 +62,20 @@ batch_statement::batch_statement(int bound_terms, type type_,
                                 std::vector<single_statement> statements,
                                 std::unique_ptr<attributes> attrs,
                                 cql_stats& stats)
-    : cql_statement_no_metadata(timeout_for_type(type_))
+    : cql_statement_opt_metadata(timeout_for_type(type_))
    , _bound_terms(bound_terms), _type(type_), _statements(std::move(statements))
    , _attrs(std::move(attrs))
    , _has_conditions(boost::algorithm::any_of(_statements, [] (auto&& s) { return s.statement->has_conditions(); }))
    , _stats(stats)
 {
+    if (has_conditions()) {
+        // A batch can be created not only by raw::batch_statement::prepare, but also by
+        // cql_server::connection::process_batch, which doesn't call any methods of
+        // cql3::statements::batch_statement, only constructs it. So let's call
+        // build_cas_result_set_metadata right from the constructor to avoid crash trying to access
+        // uninitialized batch metadata.
+        build_cas_result_set_metadata();
+    }
 }

 batch_statement::batch_statement(type type_,
@@ -192,21 +183,20 @@ const std::vector<batch_statement::single_statement>& batch_statement::get_state
    return _statements;
 }

-future<std::vector<mutation>> batch_statement::get_mutations(service::storage_proxy& storage, const query_options& options, db::timeout_clock::time_point timeout, bool local, api::timestamp_type now, tracing::trace_state_ptr trace_state,
-                                                             service_permit permit) {
+future<std::vector<mutation>> batch_statement::get_mutations(service::storage_proxy& storage, const query_options& options,
+        db::timeout_clock::time_point timeout, bool local, api::timestamp_type now, service::query_state& query_state) {
    // Do not process in parallel because operations like list append/prepend depend on execution order.
    using mutation_set_type = std::unordered_set<mutation, mutation_hash_by_key, mutation_equals_by_key>;
-    return do_with(mutation_set_type(), [this, &storage, &options, timeout, now, local, trace_state, permit = std::move(permit)] (auto& result) mutable {
+    return do_with(mutation_set_type(), [this, &storage, &options, timeout, now, local, &query_state] (auto& result) mutable {
        result.reserve(_statements.size());
-        _stats.statements_in_batches += _statements.size();
        return do_for_each(boost::make_counting_iterator<size_t>(0),
                           boost::make_counting_iterator<size_t>(_statements.size()),
-                           [this, &storage, &options, now, local, &result, timeout, trace_state, permit = std::move(permit)] (size_t i) {
+                           [this, &storage, &options, now, local, &result, timeout, &query_state] (size_t i) {
            auto&& statement = _statements[i].statement;
-            statement->inc_cql_stats();
+            ++_stats.statements[size_t(statement->type)];
            auto&& statement_options = options.for_statement(i);
            auto timestamp = _attrs->get_timestamp(now, statement_options);
-            return statement->get_mutations(storage, statement_options, timeout, local, timestamp, trace_state, permit).then([&result] (auto&& more) {
+            return statement->get_mutations(storage, statement_options, timeout, local, timestamp, query_state).then([&result] (auto&& more) {
                for (auto&& m : more) {
                    // We want unordered_set::try_emplace(), but we don't have it
                    auto pos = result.find(m);
@@ -274,7 +264,6 @@ static thread_local inheriting_concrete_execution_stage<

 future<shared_ptr<cql_transport::messages::result_message>> batch_statement::execute(
        service::storage_proxy& storage, service::query_state& state, const query_options& options) {
-    ++_stats.batches;
    return batch_stage(this, seastar::ref(storage), seastar::ref(state),
                       seastar::cref(options), false, options.get_timestamp(state));
 }
@@ -292,11 +281,16 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::do_
        throw new InvalidRequestException("Invalid empty serial consistency level");
 #endif
    if (_has_conditions) {
+        ++_stats.cas_batches;
+        _stats.statements_in_cas_batches += _statements.size();
        return execute_with_conditions(storage, options, query_state);
    }

+    ++_stats.batches;
+    _stats.statements_in_batches += _statements.size();
+
    auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
-    return get_mutations(storage, options, timeout, local, now, query_state.get_trace_state(), query_state.get_permit()).then([this, &storage, &options, timeout, tr_state = query_state.get_trace_state(),
+    return get_mutations(storage, options, timeout, local, now, query_state).then([this, &storage, &options, timeout, tr_state = query_state.get_trace_state(),
                                                                                                                               permit = query_state.get_permit()] (std::vector<mutation> ms) mutable {
        return execute_without_conditions(storage, std::move(ms), options.get_consistency(), timeout, std::move(tr_state), std::move(permit));
    }).then([] {
@@ -342,56 +336,82 @@ future<> batch_statement::execute_without_conditions(
 }

 future<shared_ptr<cql_transport::messages::result_message>> batch_statement::execute_with_conditions(
-        service::storage_proxy& storage,
+        service::storage_proxy& proxy,
        const query_options& options,
-        service::query_state& state)
-{
-    fail(unimplemented::cause::LWT);
-#if 0
-    auto now = state.get_timestamp();
-    ByteBuffer key = null;
-    String ksName = null;
-    String cfName = null;
-    CQL3CasRequest casRequest = null;
-    Set<ColumnDefinition> columnsWithConditions = new LinkedHashSet<>();
+        service::query_state& qs) {

-    for (int i = 0; i < statements.size(); i++)
-    {
-        ModificationStatement statement = statements.get(i);
-        QueryOptions statementOptions = options.forStatement(i);
-        long timestamp = attrs.getTimestamp(now, statementOptions);
-        List<ByteBuffer> pks = statement.buildPartitionKeyNames(statementOptions);
-        if (pks.size() > 1)
-            throw new IllegalArgumentException("Batch with conditions cannot span multiple partitions (you cannot use IN on the partition key)");
-        if (key == null)
-        {
-            key = pks.get(0);
-            ksName = statement.cfm.ksName;
-            cfName = statement.cfm.cfName;
-            casRequest = new CQL3CasRequest(statement.cfm, key, true);
-        }
-        else if (!key.equals(pks.get(0)))
-        {
-            throw new InvalidRequestException("Batch with conditions cannot span multiple partitions");
-        }
+    auto cl_for_commit = options.get_consistency();
+    auto cl_for_paxos = options.check_serial_consistency();
+    seastar::shared_ptr<cas_request> request;
+    schema_ptr schema;

-        Composite clusteringPrefix = statement.createClusteringPrefix(statementOptions);
-        if (statement.hasConditions())
-        {
-            statement.addConditions(clusteringPrefix, casRequest, statementOptions);
-            // As soon as we have a ifNotExists, we set columnsWithConditions to null so that everything is in the resultSet
-            if (statement.hasIfNotExistCondition() || statement.hasIfExistCondition())
-                columnsWithConditions = null;
-            else if (columnsWithConditions != null)
-                Iterables.addAll(columnsWithConditions, statement.getColumnsWithConditions());
+    db::timeout_clock::time_point now = db::timeout_clock::now();
+    const timeout_config& cfg = options.get_timeout_config();
+    auto batch_timeout = now + cfg.write_timeout; // Statement timeout.
+    auto cas_timeout = now + cfg.cas_timeout;     // Ballot contention timeout.
+    auto read_timeout = now + cfg.read_timeout;   // Query timeout.
+
+    for (size_t i = 0; i < _statements.size(); ++i) {
+
+        modification_statement& statement = *_statements[i].statement;
+        const query_options& statement_options = options.for_statement(i);
+
+        ++_stats.cas_statements[size_t(statement.type)];
+        modification_statement::json_cache_opt json_cache = statement.maybe_prepare_json_cache(statement_options);
+        // At most one key
+        std::vector<dht::partition_range> keys = statement.build_partition_keys(statement_options, json_cache);
+        if (keys.empty()) {
+            continue;
        }
-        casRequest.addRowUpdate(clusteringPrefix, statement, statementOptions, timestamp);
+        if (request.get() == nullptr) {
+            schema = statement.s;
+            request = seastar::make_shared<cas_request>(schema, std::move(keys));
+        } else if (keys.size() != 1 || keys.front().equal(request->key().front(), dht::ring_position_comparator(*schema)) == false) {
+            throw exceptions::invalid_request_exception("BATCH with conditions cannot span multiple partitions");
+        }
+        std::vector<query::clustering_range> ranges = statement.create_clustering_ranges(statement_options, json_cache);
+
+        request->add_row_update(statement, std::move(ranges), std::move(json_cache), statement_options);
+    }
+    if (request.get() == nullptr) {
+        throw exceptions::invalid_request_exception(format("Unrestricted partition key in a conditional BATCH"));
    }

-    ColumnFamily result = StorageProxy.cas(ksName, cfName, key, casRequest, options.getSerialConsistency(), options.getConsistency(), state.getClientState());
+    return proxy.cas(schema, request, request->read_command(), request->key(),
+            {read_timeout, qs.get_permit(), qs.get_client_state(), qs.get_trace_state()},
+            cl_for_paxos, cl_for_commit, batch_timeout, cas_timeout).then([this, request] (bool is_applied) {
+        return modification_statement::build_cas_result_set(_metadata, _columns_of_cas_result_set, is_applied, request->rows());
+    });
+}

-    return new ResultMessage.Rows(ModificationStatement.buildCasResultSet(ksName, key, cfName, result, columnsWithConditions, true, options.forStatement(0)));
-#endif
+void batch_statement::build_cas_result_set_metadata() {
+    if (_statements.empty()) {
+        return;
+    }
+    const auto& schema = *_statements.front().statement->s;
+
+    _columns_of_cas_result_set.resize(schema.all_columns_count());
+
+    // Add the mandatory [applied] column to result set metadata
+    std::vector<shared_ptr<column_specification>> columns;
+
+    auto applied = make_shared<cql3::column_specification>(schema.ks_name(), schema.cf_name(),
+            make_shared<cql3::column_identifier>("[applied]", false), boolean_type);
+    columns.push_back(applied);
+
+    for (const auto& def : boost::range::join(schema.partition_key_columns(), schema.clustering_key_columns())) {
+        _columns_of_cas_result_set.set(def.ordinal_id);
+    }
+    for (const auto& s : _statements) {
+        _columns_of_cas_result_set.union_with(s.statement->columns_of_cas_result_set());
+    }
+    columns.reserve(_columns_of_cas_result_set.count());
+    for (const auto& def : schema.all_columns()) {
+        if (_columns_of_cas_result_set.test(def.ordinal_id)) {
+            columns.emplace_back(def.column_specification);
+        }
+    }
+    _metadata = seastar::make_shared<cql3::metadata>(std::move(columns));
 }

 namespace raw {
--- a/cql3/statements/batch_statement.hh
+++ b/cql3/statements/batch_statement.hh
@@ -62,7 +62,7 @@ namespace statements {
 * A <code>BATCH</code> statement parsed from a CQL query.
 *
 */
-class batch_statement : public cql_statement_no_metadata {
+class batch_statement : public cql_statement_opt_metadata {
    static logging::logger _logger;
 public:
    using type = raw::batch_statement::type;
@@ -85,7 +85,19 @@ private:
    type _type;
    std::vector<single_statement> _statements;
    std::unique_ptr<attributes> _attrs;
+    // True if *any* statement of the batch has IF .. clause. In
+    // this case entire batch is considered a CAS batch.
    bool _has_conditions;
+    // If the BATCH has conditions, it must return columns which
+    // are involved in condition expressions in its result set.
+    // Unlike Cassandra, Scylla always returns all columns,
+    // regardless of whether the batch succeeds or not - this
+    // allows clients to prepare a CAS statement like any other
+    // statement, and trust the returned statement metadata.
+    // Cassandra returns a result set only if CAS succeeds. If
+    // any statement in the batch has IF EXISTS, we must return
+    // all columns of the table, including the primary key.
+    column_set _columns_of_cas_result_set;
    cql_stats& _stats;
 public:
    /**
@@ -119,14 +131,18 @@ public:
    // Validates a prepared batch statement without validating its nested statements.
    void validate();

+    bool has_conditions() const { return _has_conditions; }
+
+    void build_cas_result_set_metadata();
+
    // The batch itself will be validated in either Parsed#prepare() - for regular CQL3 batches,
    //   or in QueryProcessor.processBatch() - for native protocol batches.
    virtual void validate(service::storage_proxy& proxy, const service::client_state& state) override;

    const std::vector<single_statement>& get_statements();
 private:
-    future<std::vector<mutation>> get_mutations(service::storage_proxy& storage, const query_options& options, db::timeout_clock::time_point timeout, bool local, api::timestamp_type now, tracing::trace_state_ptr trace_state,
-                                                service_permit permit);
+    future<std::vector<mutation>> get_mutations(service::storage_proxy& storage, const query_options& options, db::timeout_clock::time_point timeout,
+            bool local, api::timestamp_type now, service::query_state& query_state);

 public:
    /**
--- a/cql3/statements/cas_request.cc
+++ b/cql3/statements/cas_request.cc
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/*
+ * Copyright (C) 2019 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "modification_statement.hh"
+#include "cas_request.hh"
+#include <seastar/core/sleep.hh>
+
+namespace cql3::statements {
+
+using namespace std::chrono;
+
+void cas_request::add_row_update(modification_statement &stmt_arg,
+        std::vector<query::clustering_range> ranges_arg,
+        modification_statement::json_cache_opt json_cache_arg,
+        const query_options& options_arg) {
+    // TODO: reserve updates array for batches
+    _updates.emplace_back(cas_row_update{
+        .statement = stmt_arg,
+        .ranges = std::move(ranges_arg),
+        .json_cache = std::move(json_cache_arg),
+        .options = options_arg});
+}
+
+std::optional<mutation> cas_request::apply_updates(api::timestamp_type ts) const {
+    // We're working with a single partition, so there will be only one element
+    // in the vector. A vector is used since this is a conventional format
+    // to pass a mutation onward.
+    std::optional<mutation> mutation_set;
+    for (const cas_row_update& op: _updates) {
+        update_parameters params(_schema, op.options, ts, op.statement.get_time_to_live(op.options), _rows);
+
+        std::vector<mutation> statement_mutations = op.statement.apply_updates(_key, op.ranges, params, op.json_cache);
+        // Append all mutations (in fact only one) to the consolidated one.
+        for (mutation& m : statement_mutations) {
+            if (mutation_set.has_value() == false) {
+                mutation_set.emplace(std::move(m));
+            } else {
+                mutation_set->apply(std::move(m));
+            }
+        }
+    }
+
+    return mutation_set;
+}
+
+lw_shared_ptr<query::read_command> cas_request::read_command() const {
+
+    column_set columns_to_read(_schema->all_columns_count());
+    std::vector<query::clustering_range> ranges;
+
+    for (const cas_row_update& op : _updates) {
+        if (op.statement.has_conditions() == false && op.statement.requires_read() == false) {
+            // No point in pre-fetching the old row if the statement doesn't check it in a CAS and
+            // doesn't use it to apply updates.
+            continue;
+        }
+        columns_to_read.union_with(op.statement.columns_to_read());
+        if (op.statement.has_only_static_column_conditions() && !op.statement.requires_read()) {
+            // If a statement has only static column conditions and doesn't have operations that
+            // require read, it doesn't matter what clustering key range to query - any partition
+            // row will do for the check.
+            continue;
+        }
+        ranges.reserve(op.ranges.size());
+        std::copy(op.ranges.begin(), op.ranges.end(), std::back_inserter(ranges));
+    }
+    uint32_t max_rows = query::max_rows;
+    if (ranges.empty()) {
+        // With only a static condition, we still want to make the distinction between
+        // a non-existing partition and one that exists (has some live data) but has not
+        // static content. So we query the first live row of the partition.
+        ranges.emplace_back(query::clustering_range::make_open_ended_both_sides());
+        max_rows = 1;
+    } else {
+        ranges = query::clustering_range::deoverlap(std::move(ranges), clustering_key::tri_compare(*_schema));
+    }
+    auto options = update_parameters::options;
+    options.set(query::partition_slice::option::always_return_static_content);
+    query::partition_slice ps(std::move(ranges), *_schema, columns_to_read, options);
+    ps.set_partition_row_limit(max_rows);
+    return make_lw_shared<query::read_command>(_schema->id(), _schema->version(), std::move(ps));
+}
+
+bool cas_request::applies_to() const {
+
+    const partition_key& pkey = _key.front().start()->value().key().value();
+    const clustering_key empty_ckey = clustering_key::make_empty();
+    bool applies = true;
+    bool is_cas_result_set_empty = true;
+    bool has_static_column_conditions = false;
+    for (const cas_row_update& op: _updates) {
+        if (op.statement.has_conditions() == false) {
+            continue;
+        }
+        if (op.statement.has_static_column_conditions()) {
+            has_static_column_conditions = true;
+        }
+        // If a statement has only static columns conditions, we must ignore its clustering columns
+        // restriction when choosing a row to check the conditions, i.e. choose any partition row,
+        // because any of them must have static columns and that's all we need to know if the
+        // statement applies. For example, the following update must successfully apply (effectively
+        // turn into INSERT), because, although the table doesn't have any regular rows matching the
+        // statement clustering column restriction, the static row matches the statement condition:
+        //   CREATE TABLE t(p int, c int, s int static, v int, PRIMARY KEY(p, c));
+        //   INSERT INTO t(p, s) VALUES(1, 1);
+        //   UPDATE t SET v=1 WHERE p=1 AND c=1 IF s=1;
+        // Another case when we pass an empty clustering key prefix is apparently when the table
+        // doesn't have any clustering key columns and the clustering key range is empty (open
+        // ended on both sides).
+        const auto& ckey = !op.statement.has_only_static_column_conditions() && op.ranges.front().start() ?
+            op.ranges.front().start()->value() : empty_ckey;
+        const auto* row = _rows.find_row(pkey, ckey);
+        if (row) {
+            row->is_in_cas_result_set = true;
+            is_cas_result_set_empty = false;
+        }
+        if (!applies) {
+            // No need to check this condition as we have already failed a previous one.
+            // Continuing the loop just to set is_in_cas_result_set flag for all involved
+            // statements, which is necessary to build the CAS result set.
+            continue;
+        }
+        applies = op.statement.applies_to(row, op.options);
+    }
+    if (has_static_column_conditions && is_cas_result_set_empty) {
+        // If none of the fetched rows matches clustering key restrictions and hence none of them is
+        // included into the CAS result set, but there is a static column condition in the CAS batch,
+        // we must still include the static row into the result set. Consider the following example:
+        //   CREATE TABLE t(p int, c int, s int static, v int, PRIMARY KEY(p, c));
+        //   INSERT INTO t(p, s) VALUES(1, 1);
+        //   DELETE v FROM t WHERE p=1 AND c=1 IF v=1 AND s=1;
+        // In this case the conditional DELETE must return [applied=False, v=null, s=1].
+        const auto* row = _rows.find_row(pkey, empty_ckey);
+        if (row) {
+            row->is_in_cas_result_set = true;
+        }
+    }
+    return applies;
+}
+
+std::optional<mutation> cas_request::apply(query::result& qr,
+        const query::partition_slice& slice, api::timestamp_type ts) {
+    _rows = update_parameters::build_prefetch_data(_schema, qr, slice);
+    if (applies_to()) {
+        return apply_updates(ts);
+    } else {
+        return {};
+    }
+}
+
+} // end of namespace "cql3::statements"
--- a/cql3/statements/cas_request.hh
+++ b/cql3/statements/cas_request.hh
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+/*
+ * Copyright (C) 2019 ScyllaDB
+ *
+ * Modified by ScyllaDB
+ */
+
+/*
+ * This file is part of Scylla.
+ *
+ * Scylla is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Affero General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * Scylla is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with Scylla.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#pragma once
+#include "service/storage_proxy.hh"
+
+namespace cql3::statements {
+
+using namespace std::chrono;
+
+/**
+ * Due to some operation on lists, we can't generate the update that a given Modification statement does before
+ * we get the values read by the initial read of Paxos. A RowUpdate thus just store the relevant information
+ * (include the statement itself) to generate those updates. We'll have multiple RowUpdate for a Batch, otherwise
+ * we'll have only one.
+ */
+struct cas_row_update {
+    modification_statement& statement;
+    std::vector<query::clustering_range> ranges;
+    modification_statement::json_cache_opt json_cache;
+    // This statement query options. Different from cas_request::query_options,
+    // which may stand for BATCH statement, not individual modification_statement,
+    // in case of BATCH
+    const query_options& options;
+};
+
+/**
+ * Processed CAS conditions and update on potentially multiple rows of the same partition.
+ */
+class cas_request: public service::cas_request {
+private:
+    std::vector<cas_row_update> _updates;
+    schema_ptr _schema;
+    // A single partition key. Represented as a vector of partition ranges
+    // since this is the conventional format for storage_proxy.
+    std::vector<dht::partition_range> _key;
+    update_parameters::prefetch_data _rows;
+
+public:
+    cas_request(schema_ptr schema_arg, std::vector<dht::partition_range> key_arg)
+          : _schema(schema_arg)
+          , _key(std::move(key_arg))
+          , _rows(schema_arg)
+    {
+        assert(_key.size() == 1 && query::is_single_partition(_key.front()));
+    }
+
+    dht::partition_range_vector key() const {
+        return dht::partition_range_vector(_key);
+    }
+
+    const update_parameters::prefetch_data& rows() const {
+        return _rows;
+    }
+
+    lw_shared_ptr<query::read_command> read_command() const;
+
+    void add_row_update(modification_statement &stmt_arg, std::vector<query::clustering_range> ranges_arg,
+        modification_statement::json_cache_opt json_cache_arg, const query_options& options_arg);
+
+    virtual std::optional<mutation> apply(query::result& qr,
+            const query::partition_slice& slice, api::timestamp_type ts) override;
+
+private:
+    bool applies_to() const;
+    std::optional<mutation> apply_updates(api::timestamp_type t) const;
+};
+
+} // end of namespace "cql3::statements"
--- a/cql3/statements/cf_prop_defs.cc
+++ b/cql3/statements/cf_prop_defs.cc
@@ -41,6 +41,8 @@

 #include "cql3/statements/cf_prop_defs.hh"
 #include "db/extensions.hh"
+#include "cdc/cdc.hh"
+#include "service/storage_service.hh"

 #include <boost/algorithm/string/predicate.hpp>

@@ -68,6 +70,8 @@ const sstring cf_prop_defs::KW_CRC_CHECK_CHANCE = "crc_check_chance";

 const sstring cf_prop_defs::KW_ID = "id";

+const sstring cf_prop_defs::KW_CDC = "cdc";
+
 const sstring cf_prop_defs::COMPACTION_STRATEGY_CLASS_KEY = "class";

 const sstring cf_prop_defs::COMPACTION_ENABLED_KEY = "enabled";
@@ -84,7 +88,7 @@ void cf_prop_defs::validate(const db::extensions& exts) {
        KW_GCGRACESECONDS, KW_CACHING, KW_DEFAULT_TIME_TO_LIVE,
        KW_MIN_INDEX_INTERVAL, KW_MAX_INDEX_INTERVAL, KW_SPECULATIVE_RETRY,
        KW_BF_FP_CHANCE, KW_MEMTABLE_FLUSH_PERIOD, KW_COMPACTION,
-        KW_COMPRESSION, KW_CRC_CHECK_CHANCE, KW_ID
+        KW_COMPRESSION, KW_CRC_CHECK_CHANCE, KW_ID, KW_CDC
    });
    static std::set<sstring> obsolete_keywords({
        sstring("index_interval"),
@@ -123,6 +127,12 @@ void cf_prop_defs::validate(const db::extensions& exts) {
        cp.validate();
    }

+    auto cdc_options = get_cdc_options();
+    if (cdc_options && !cdc_options->empty()) {
+        // Constructor throws if options are not valid
+        cdc::options opts(*cdc_options);
+    }
+
    validate_minimum_int(KW_DEFAULT_TIME_TO_LIVE, 0, DEFAULT_DEFAULT_TIME_TO_LIVE);

    auto min_index_interval = get_int(KW_MIN_INDEX_INTERVAL, DEFAULT_MIN_INDEX_INTERVAL);
@@ -172,6 +182,10 @@ std::optional<utils::UUID> cf_prop_defs::get_id() const {
    return std::nullopt;
 }

+std::optional<std::map<sstring, sstring>> cf_prop_defs::get_cdc_options() const {
+    return get_map(KW_CDC);
+}
+
 void cf_prop_defs::apply_to_builder(schema_builder& builder, const db::extensions& exts) {
    if (has_property(KW_COMMENT)) {
        builder.set_comment(get_string(KW_COMMENT, ""));
@@ -245,6 +259,14 @@ void cf_prop_defs::apply_to_builder(schema_builder& builder, const db::extension
    if (compression_options) {
        builder.set_compressor_params(compression_parameters(*compression_options));
    }
+    auto cdc_options = get_cdc_options();
+    if (cdc_options) {
+        auto opts = cdc::options(*cdc_options);
+        if (opts.enabled() && !service::get_local_storage_service().cluster_supports_cdc()) {
+            throw exceptions::configuration_exception("CDC not supported by the cluster");
+        }
+        builder.set_cdc_options(std::move(opts));
+    }
 #if 0
    CachingOptions cachingOptions = getCachingOptions();
    if (cachingOptions != null)
--- a/cql3/statements/cf_prop_defs.hh
+++ b/cql3/statements/cf_prop_defs.hh
@@ -78,6 +78,8 @@ public:

    static const sstring KW_ID;

+    static const sstring KW_CDC;
+
    static const sstring COMPACTION_STRATEGY_CLASS_KEY;
    static const sstring COMPACTION_ENABLED_KEY;

@@ -91,6 +93,7 @@ public:
    void validate(const db::extensions&);
    std::map<sstring, sstring> get_compaction_options() const;
    std::optional<std::map<sstring, sstring>> get_compression_options() const;
+    std::optional<std::map<sstring, sstring>> get_cdc_options() const;
 #if 0
    public CachingOptions getCachingOptions() throws SyntaxException, ConfigurationException
    {
--- a/cql3/statements/create_index_statement.cc
+++ b/cql3/statements/create_index_statement.cc
@@ -151,21 +151,17 @@ create_index_statement::validate(service::storage_proxy& proxy, const service::c
                            target->as_string()));
        }

-        bool is_map = dynamic_cast<const collection_type_impl *>(cd->type.get()) != nullptr
-                      && dynamic_cast<const collection_type_impl *>(cd->type.get())->is_map();
-        bool is_collection = cd->type->is_collection();
-        bool is_frozen_collection = is_collection && !cd->type->is_multi_cell();
-
-        if (is_frozen_collection) {
-            validate_for_frozen_collection(target);
-        } else if (is_collection) {
+        if (cd->type->is_multi_cell()) {
            // NOTICE(sarna): should be lifted after #2962 (indexes on non-frozen collections) is implemented
+            // NOTICE(kbraun): don't forget about non-frozen user defined types
            throw exceptions::invalid_request_exception(
-                    format("Cannot create secondary index on non-frozen collection column {}", cd->name_as_text()));
+                    format("Cannot create secondary index on non-frozen collection or UDT column {}", cd->name_as_text()));
+        } else if (cd->type->is_collection()) {
+            validate_for_frozen_collection(target);
        } else {
            validate_not_full_index(target);
            validate_is_values_index_if_target_column_not_collection(cd, target);
-            validate_target_column_is_map_if_index_involves_keys(is_map, target);
+            validate_target_column_is_map_if_index_involves_keys(cd->type->is_map(), target);
        }
    }

--- a/cql3/statements/create_keyspace_statement.cc
+++ b/cql3/statements/create_keyspace_statement.cc
@@ -110,7 +110,8 @@ void create_keyspace_statement::validate(service::storage_proxy&, const service:
 future<shared_ptr<cql_transport::event::schema_change>> create_keyspace_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only)
 {
    return make_ready_future<>().then([this, is_local_only] {
-        return service::get_local_migration_manager().announce_new_keyspace(_attrs->as_ks_metadata(_name), is_local_only);
+        const auto& tm = service::get_local_storage_service().get_token_metadata();
+        return service::get_local_migration_manager().announce_new_keyspace(_attrs->as_ks_metadata(_name, tm), is_local_only);
    }).then_wrapped([this] (auto&& f) {
        try {
            f.get();
--- a/cql3/statements/create_table_statement.cc
+++ b/cql3/statements/create_table_statement.cc
@@ -51,10 +51,12 @@

 #include "auth/resource.hh"
 #include "auth/service.hh"
+#include "cdc/cdc.hh"
 #include "schema_builder.hh"
 #include "service/storage_service.hh"
 #include "db/extensions.hh"
 #include "database.hh"
+#include "types/user.hh"

 namespace cql3 {

@@ -96,27 +98,54 @@ std::vector<column_definition> create_table_statement::get_columns()
    return column_defs;
 }

-future<shared_ptr<cql_transport::event::schema_change>> create_table_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only) {
-    return make_ready_future<>().then([this, is_local_only, &proxy] {
-        return service::get_local_migration_manager().announce_new_column_family(get_cf_meta_data(proxy.get_db().local()), is_local_only);
-    }).then_wrapped([this] (auto&& f) {
-        try {
-            f.get();
-            using namespace cql_transport;
-            return make_shared<event::schema_change>(
-                    event::schema_change::change_type::CREATED,
-                    event::schema_change::target_type::TABLE,
-                    this->keyspace(),
-                    this->column_family());
-        } catch (const exceptions::already_exists_exception& e) {
-            if (_if_not_exists) {
-                return ::shared_ptr<cql_transport::event::schema_change>();
-            }
-            throw e;
-        }
+template <typename CreateTable>
+future<shared_ptr<cql_transport::event::schema_change>>
+create_table_statement::create_table_with_cdc(service::storage_proxy& proxy,
+                                              schema_ptr schema,
+                                              CreateTable&& create_table) {
+    if (_if_not_exists) {
+        throw exceptions::invalid_request_exception(
+                "Can't create table with CDC support using IF NOT EXISTS");
+    }
+    cdc::db_context ctx = cdc::db_context::builder(proxy).build();
+    return cdc::setup(ctx, schema).then([ctx, create_table = std::move(create_table), schema = std::move(schema)] () mutable{
+        return create_table().handle_exception([ctx, schema = std::move(schema)](std::exception_ptr ep) mutable {
+            return cdc::remove(ctx, schema->ks_name(), schema->cf_name()).then([ep = std::move(ep)] () -> shared_ptr<cql_transport::event::schema_change> {
+                std::rethrow_exception(ep);
+            });
+        });
    });
 }

+future<shared_ptr<cql_transport::event::schema_change>> create_table_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only) {
+    auto schema = get_cf_meta_data(proxy.get_db().local());
+    auto create_table = [this, is_local_only, schema] () mutable {
+        return make_ready_future<>().then([this, is_local_only, schema = std::move(schema)] {
+            return service::get_local_migration_manager().announce_new_column_family(std::move(schema), is_local_only);
+        }).then_wrapped([this] (auto&& f) {
+            try {
+                f.get();
+                using namespace cql_transport;
+                return make_shared<event::schema_change>(
+                        event::schema_change::change_type::CREATED,
+                        event::schema_change::target_type::TABLE,
+                        this->keyspace(),
+                        this->column_family());
+            } catch (const exceptions::already_exists_exception& e) {
+                if (_if_not_exists) {
+                    return ::shared_ptr<cql_transport::event::schema_change>();
+                }
+                throw e;
+            }
+        });
+    };
+    bool cdc_enabled = _properties->get_cdc_options() && cdc::options(*_properties->get_cdc_options()).enabled();
+    return cdc_enabled
+            ? create_table_with_cdc(
+                    proxy, std::move(schema), std::move(create_table))
+            : create_table();
+}
+
 /**
 * Returns a CFMetaData instance based on the parameters parsed from this
 * <code>CREATE</code> statement, or defaults where applicable.
@@ -205,18 +234,34 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa

    auto stmt = ::make_shared<create_table_statement>(_cf_name, _properties.properties(), _if_not_exists, _static_columns, _properties.properties()->get_id());

-    std::optional<std::map<bytes, data_type>> defined_multi_cell_collections;
+    std::optional<std::map<bytes, data_type>> defined_multi_cell_columns;
    for (auto&& entry : _definitions) {
        ::shared_ptr<column_identifier> id = entry.first;
        cql3_type pt = entry.second->prepare(db, keyspace());
        if (pt.is_counter() && !service::get_local_storage_service().cluster_supports_counters()) {
            throw exceptions::invalid_request_exception("Counter support is not enabled");
        }
-        if (pt.is_collection() && pt.get_type()->is_multi_cell()) {
-            if (!defined_multi_cell_collections) {
-                defined_multi_cell_collections = std::map<bytes, data_type>{};
+        if (pt.get_type()->is_multi_cell()) {
+            if (pt.get_type()->is_user_type()) {
+                // check for multi-cell types (non-frozen UDTs or collections) inside a non-frozen UDT
+                auto type = static_cast<const user_type_impl*>(pt.get_type().get());
+                for (auto&& inner: type->all_types()) {
+                    if (inner->is_multi_cell()) {
+                        // a nested non-frozen UDT should have already been rejected when defining the type
+                        assert(inner->is_collection());
+                        throw exceptions::invalid_request_exception("Non-frozen UDTs with nested non-frozen collections are not supported");
+                    }
+                }
+
+                if (!service::get_local_storage_service().cluster_supports_nonfrozen_udts()) {
+                    throw exceptions::invalid_request_exception("Non-frozen UDT support is not enabled");
+                }
            }
-            defined_multi_cell_collections->emplace(id->name(), pt.get_type());
+
+            if (!defined_multi_cell_columns) {
+                defined_multi_cell_columns = std::map<bytes, data_type>{};
+            }
+            defined_multi_cell_columns->emplace(id->name(), pt.get_type());
        }
        stmt->_columns.emplace(id, pt.get_type()); // we'll remove what is not a column below
    }
@@ -253,8 +298,8 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
            if (stmt->_columns.empty()) {
                throw exceptions::invalid_request_exception("No definition found that is not part of the PRIMARY KEY");
            }
-            if (defined_multi_cell_collections) {
-                throw exceptions::invalid_request_exception("Non-frozen collection types are not supported with COMPACT STORAGE");
+            if (defined_multi_cell_columns) {
+                throw exceptions::invalid_request_exception("Non-frozen collections and UDTs are not supported with COMPACT STORAGE");
            }
        }
        stmt->_clustering_key_types = std::vector<data_type>{};
@@ -262,8 +307,8 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
        // If we use compact storage and have only one alias, it is a
        // standard "dynamic" CF, otherwise it's a composite
        if (_properties.use_compact_storage() && _column_aliases.size() == 1) {
-            if (defined_multi_cell_collections) {
-                throw exceptions::invalid_request_exception("Collection types are not supported with COMPACT STORAGE");
+            if (defined_multi_cell_columns) {
+                throw exceptions::invalid_request_exception("Non-frozen collections and UDTs are not supported with COMPACT STORAGE");
            }
            auto alias = _column_aliases[0];
            if (_static_columns.count(alias) > 0) {
@@ -296,8 +341,8 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
            }

            if (_properties.use_compact_storage()) {
-                if (defined_multi_cell_collections) {
-                    throw exceptions::invalid_request_exception("Collection types are not supported with COMPACT STORAGE");
+                if (defined_multi_cell_columns) {
+                    throw exceptions::invalid_request_exception("Non-frozen collections and UDTs are not supported with COMPACT STORAGE");
                }
                stmt->_clustering_key_types = types;
            } else {
@@ -387,8 +432,12 @@ data_type create_table_statement::raw_statement::get_type_and_remove(column_map_
        throw exceptions::invalid_request_exception(format("Unknown definition {} referenced in PRIMARY KEY", t->text()));
    }
    auto type = it->second;
-    if (type->is_collection() && type->is_multi_cell()) {
-        throw exceptions::invalid_request_exception(format("Invalid collection type for PRIMARY KEY component {}", t->text()));
+    if (type->is_multi_cell()) {
+        if (type->is_collection()) {
+            throw exceptions::invalid_request_exception(format("Invalid non-frozen collection type for PRIMARY KEY component {}", t->text()));
+        } else {
+            throw exceptions::invalid_request_exception(format("Invalid non-frozen user-defined type for PRIMARY KEY component {}", t->text()));
+        }
    }
    columns.erase(t);

--- a/cql3/statements/create_table_statement.hh
+++ b/cql3/statements/create_table_statement.hh
@@ -119,6 +119,11 @@ private:
    void apply_properties_to(schema_builder& builder, const database&);

    void add_column_metadata_from_aliases(schema_builder& builder, std::vector<bytes> aliases, const std::vector<data_type>& types, column_kind kind);
+
+    template <typename CreateTable>
+    future<shared_ptr<cql_transport::event::schema_change>> create_table_with_cdc(service::storage_proxy& proxy,
+                                                                                  schema_ptr,
+                                                                                  CreateTable&&);
 };

 class create_table_statement::raw_statement : public raw::cf_statement {
--- a/cql3/statements/create_type_statement.cc
+++ b/cql3/statements/create_type_statement.cc
@@ -88,9 +88,16 @@ void create_type_statement::validate(service::storage_proxy& proxy, const servic
        throw exceptions::invalid_request_exception(format("Cannot add type in unknown keyspace {}", keyspace()));
    }

+    if (_column_types.size() > max_udt_fields) {
+        throw exceptions::invalid_request_exception(format("A user type cannot have more than {} fields", max_udt_fields));
+    }
+
    for (auto&& type : _column_types) {
        if (type->is_counter()) {
-            throw exceptions::invalid_request_exception(format("A user type cannot contain counters"));
+            throw exceptions::invalid_request_exception("A user type cannot contain counters");
+        }
+        if (type->is_user_type() && !type->is_frozen()) {
+            throw exceptions::invalid_request_exception("A user type cannot contain non-frozen user type fields");
        }
    }
 }
@@ -126,8 +133,9 @@ inline user_type create_type_statement::create_type(database& db)
        field_types.push_back(column_type->prepare(db, keyspace()).get_type());
    }

+    // When a table is created with a UDT column, the column will be non-frozen (multi cell) by default.
    return user_type_impl::get_instance(keyspace(), _name.get_user_type_name(),
-        std::move(field_names), std::move(field_types));
+        std::move(field_names), std::move(field_types), true /* multi cell */);
 }

 future<shared_ptr<cql_transport::event::schema_change>> create_type_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only)
--- a/cql3/statements/delete_statement.cc
+++ b/cql3/statements/delete_statement.cc
@@ -48,7 +48,7 @@ namespace cql3 {
 namespace statements {

 delete_statement::delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs, cql_stats& stats)
-        : modification_statement{type, bound_terms, std::move(s), std::move(attrs), &stats.deletes}
+        : modification_statement{type, bound_terms, std::move(s), std::move(attrs), stats}
 { }

 bool delete_statement::require_full_clustering_key() const {
@@ -101,7 +101,7 @@ delete_statement::prepare_internal(database& db, schema_ptr schema, shared_ptr<v
        op->collect_marker_specification(bound_names);
        stmt->add_operation(op);
    }
-
+    prepare_conditions(db, schema, bound_names, *stmt);
    stmt->process_where_clause(db, _where_clause, std::move(bound_names));
    if (!db.supports_infinite_bound_range_deletions()) {
        if (!stmt->restrictions()->get_clustering_columns_restrictions()->has_bound(bound::START)
--- a/cql3/statements/delete_statement.hh
+++ b/cql3/statements/delete_statement.hh
@@ -63,25 +63,6 @@ public:
    virtual bool allow_clustering_key_slices() const override;

    virtual void add_update_for_key(mutation& m, const query::clustering_range& range, const update_parameters& params, const json_cache_opt& json_cache) override;
-
-#if 0
-    protected void validateWhereClauseForConditions() throws InvalidRequestException
-    {
-        Iterator<ColumnDefinition> iterator = Iterators.concat(cfm.partitionKeyColumns().iterator(), cfm.clusteringColumns().iterator());
-        while (iterator.hasNext())
-        {
-            ColumnDefinition def = iterator.next();
-            Restriction restriction = processedKeys.get(def.name);
-            if (restriction == null || !(restriction.isEQ() || restriction.isIN()))
-            {
-                throw new InvalidRequestException(
-                        String.format("DELETE statements must restrict all PRIMARY KEY columns with equality relations in order " +
-                                      "to use IF conditions, but column '%s' is not restricted", def.name));
-            }
-        }
-
-    }
-#endif
 };

 }
--- a/Show More
+++ b/Show More