Commit Graph

3370 Commits

Author SHA1 Message Date
Gleb Natapov
e0668f806a lwt: change format of partition key serialization for system.paxos table
Serialize provided partition_key in such a way that the serialized value
will hash to the same token as the original key. This way when system.paxos
table is updated the update is shard local.

Message-Id: <20191114135449.GU10922@scylladb.com>
2019-11-14 15:07:16 +01:00
Avi Kivity
19b665ea6b Merge "Correctly handle null/unset frozen collection/UDT columns in INSERT JSON." from Kamil
"
When using INSERT JSON with frozen collection/UDT columns, if the columns were left unspecified or set to null, the statement would create an empty non-null value for these columns instead of using null values as it should have. For example:

cqlsh:b> create table t (k text primary key, l frozen<list<int>>, m frozen<map<int, int>>, s frozen<set<int>>, u frozen<ut>);
cqlsh:b> insert into t JSON '{"k": "insert_json"}';
cqlsh:b> select * from t;
 k                 | l    | m    | s    | u
-------------------+------+------+------+------
       insert_json |     [] |     {} |     {} |

This PR fixes this.
Resolves #5246 and closes #5270.
"

* 'frozen-json' of https://github.com/kbr-/scylla:
  tests: add null/unset frozen collection/UDT INSERT JSON test
  cql3: correctly handle frozen null/unset collection/UDT columns in INSERT JSON
  cql3: decouple execute from term binding in user_type::setter
2019-11-14 15:29:30 +02:00
Tomasz Grabiec
f68e17eb52 Merge "Partition/row hit/miss counters for memtable write operations" from Piotr D.
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-13 13:11:51 +01:00
Kamil Braun
d6446e352e tests: add null/unset frozen collection/UDT INSERT JSON test
When using INSERT JSON with null/unspecified frozen collection/UDT
columns, the columns should be set to null.

See #5270.
2019-11-12 18:24:47 +01:00
Piotr Dulikowski
59fbbb993f memtables: add partition/row hit/miss counters
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-12 13:35:41 +01:00
Piotr Dulikowski
41cb16a526 tests/cql_query_test: add aggregate functions test
Adds a test for min, max and avg functions for those primitive types for
which those functions are working at the moment.
2019-11-12 13:01:34 +01:00
Nadav Har'El
3f859adebd Merge: Fix filtering static columns on empty partitions
Merged patch series from Piotr Sarna:

An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.

Fixes #5248
2019-10-31 10:50:21 +02:00
Avi Kivity
398c482cd0 Merge "combined reader gallop mode" from Piotr
"
In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode.

In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations.

In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally.

A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%
Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%
Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.

Tests: unit(release)
Fixes #3593.
"

* '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla:
  mutation_reader: gallop mode microbenchmark
  mutation_reader: combined reader gallop tests
  mutation_reader: gallop mode for combined reader
  mutation_reader: refactor prepare_next
2019-10-30 17:34:47 +02:00
Piotr Sarna
dd00470a44 tests: add a test case for filtering on static columns
The test case covers filtering with an empty partition.

Refs #5248
2019-10-30 15:34:10 +01:00
Tomasz Grabiec
9da3aec115 Merge "Mutation diff improvements" from Benny
- accept diff_command option
 - standard input support
2019-10-30 13:40:58 +01:00
Piotr Dulikowski
81883a9f2e mutation_reader: gallop mode microbenchmark
This microbenchmark tests performance of the galloping reader
optimization. A combining reader that merges results from four other
readers is created. Each sub-reader provides a range of 32 clustering
rows that is disjoint from others. All sub-readers return rows from
the same partition. An improvement can be observed after introducing the
galloping reader optimization.

As for other benchmarks from the "combined" group, results are pretty
close to the old ones. The only one that seems to have suffered slightly
is combined.many_overlapping.

Median times from a single run of perf_mutation_readers.combined:
(1s run duration, 5 runs per benchmark, release mode)

test name                            before    after     improvement
one_row                              49.070ns  48.287ns  1.60%
single_active                        61.574us  61.235us  0.55%
many_overlapping                     488.193us 514.977us -5.49%
disjoint_interleaved                 57.462us  57.111us  0.61%
disjoint_ranges                      56.545us  56.006us  0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us  36.36%

Same results, normalized per mutation fragment:

test name                            before   after    improvement
one_row                              16.36ns  16.10ns  1.60%
single_active                        109.46ns 108.86ns 0.55%
many_overlapping                     216.97ns 228.88ns -5.49%
disjoint_interleaved                 102.15ns 101.53ns 0.61%
disjoint_ranges                      100.52ns 99.57ns  0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%

Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.
2019-10-30 09:51:18 +01:00
Piotr Dulikowski
29d6842db9 mutation_reader: combined reader gallop tests 2019-10-30 09:51:18 +01:00
Avi Kivity
623071020e commitlog: change variadic stream in read_log_file to future<struct>
Since seastar::streams are based on future/promise, variadic streams
suffer the same fate as variadic futures - deprecation and eventual
removal.

This patch therefore replaces a variadic stream in commitlog::read_log_file()
with a non-variadic stream, via a helper struct.

Tests: unit (dev)
2019-10-29 19:25:12 +01:00
Tomasz Grabiec
c2a4c915f3 Merge "Fix a few issues with CAS requests" from Vladimir D.
There are a few issues at the CQL layer, because of which the result of
a CAS request execution may differ between Scylla and Cassandra. Mostly,
it happens when static columns are involved. The goal of this patch set
is to fix these issues, thus making Scylla's implementation of CAS yield
the same results as Cassandra's.
2019-10-29 11:50:15 +01:00
Nadav Har'El
d69ab1b588 CDC: (atomic) delta + (non-optional) pre-image data columns
Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski:

Adds delta and pre-image data column writes for the atomic columns in a
cdc-enabled table.

Note that in this patch set it is still unconditional. Adding option support
comes in next set.

Uses code more or less derived from alternator to select pre-image, using
raw query interface. So should be fairly low overhead to query generation.
Pre-image and delta mutations are mixed in with the actual modification
mutations to generate the full cdc log (sans post-image).
2019-10-29 09:39:28 +02:00
Calle Wilund
7db393fe12 cdc_test: Add helper methods + preimage test
Add filtering, sorting etc helpers + simple pre-image test

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-29 07:49:05 +01:00
Vladimir Davydov
e0b31dd273 query: add flag to return static row on partition with no rows
A SELECT statement that has clustering key restrictions isn't supposed
to return static content if no regular rows matches the restrictions,
see #589. However, for the CAS statement we do need to return static
content on failure so this patch adds a flag that allows the caller to
override this behavior.
2019-10-28 21:50:44 +03:00
Calle Wilund
36328acf60 cql_assertions: Change signature to accept sstring 2019-10-28 06:16:12 +01:00
Benny Halevy
1895fb276e mutation_test: test_udt_mutations: use int64_t constants for long_type
Otherwise they are decomposed and serialized as 4-byte int32.

For example, on my machine cell[1] looked like this:
{0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}}

and it failed cells_equal against:
{0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 20:51:29 +02:00
Benny Halevy
fec772538c mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:49:25 +02:00
Benny Halevy
9c8cf9f51d mutation_test: test_udt_mutations: fixup udt comment
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:47:43 +02:00
Avi Kivity
27ef73f4f1 Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.

Fixes #4908.
"

* 'traced_file' of https://github.com/kbr-/scylla:
  sstables: report sstable index file I/O in CQL tracing
  sstables: report sstable data file I/O in CQL tracing
  tracing: add traced_file class
2019-10-26 22:53:37 +03:00
Avi Kivity
2b856a7317 Merge "Support non-frozen UDTs." from Kamil
"
This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted.

I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code.

Fixes #2201.
"

* 'udt' of https://github.com/kbr-/scylla: (64 commits)
  tests: too many UDT fields check test
  collection_mutation: add a FIXME.
  tests: add a non-frozen UDT materialized view test
  tests: add a UDT mutation test.
  tests: add a non-frozen UDT "JSON INSERT" test.
  tests: add a non-frozen UDT to for_each_schema_change.
  tests: more non-frozen UDT tests.
  tests: move some UDT tests from cql_query_test.cc to new file.
  types: handle trailing nulls in tuples/UDTs better.
  cql3: enable deleting single fields of non-frozen UDTs.
  cql3: enable setting single fields of a non-frozen UDT.
  cql3: enable non-frozen UDTs.
  cql3: introduce user_types::marker.
  cql3: generalize function_call::make_terminal to UDTs.
  cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
  cql3: use a dedicated setter operation for inserting user types.
  cql3: introduce user_types::value.
  types: introduce to_bytes_opt_vec function.
  cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
  cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
  ...
2019-10-26 22:53:37 +03:00
Botond Dénes
01e913397a tests: memtable_test: flush_reader_test: compare compacted mutations
To filter out artificial differences due to different representation of
an equivalent set of writes.

Fixes: #5207

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>
2019-10-26 18:14:18 +03:00
Kamil Braun
432ef7c9af sstables: report sstable index file I/O in CQL tracing
Use tracing::make_traced_file when reading from the index file in
index_reader.
2019-10-25 14:10:28 +02:00
Kamil Braun
394c36835a sstables: report sstable data file I/O in CQL tracing
Use tracing::make_traced_file when creating an sstable input_stream.
To achieve that, trace_state needs to be plumbed down through some
functions.
2019-10-25 14:10:28 +02:00
Kamil Braun
2889edea3e tests: too many UDT fields check test 2019-10-25 12:05:10 +02:00
Kamil Braun
45d2a96980 tests: add a non-frozen UDT materialized view test 2019-10-25 12:05:10 +02:00
Kamil Braun
e0c233ede1 tests: add a UDT mutation test. 2019-10-25 12:05:08 +02:00
Kamil Braun
a21d12faae tests: add a non-frozen UDT "JSON INSERT" test. 2019-10-25 12:04:44 +02:00
Kamil Braun
ae3464da45 tests: add a non-frozen UDT to for_each_schema_change. 2019-10-25 12:04:44 +02:00
Kamil Braun
b87b700e66 tests: more non-frozen UDT tests. 2019-10-25 12:04:44 +02:00
Kamil Braun
474742ac5d tests: move some UDT tests from cql_query_test.cc to new file. 2019-10-25 12:04:44 +02:00
Kamil Braun
e74b5deb5d cql3: enable non-frozen UDTs.
Add a cluster feature for non-frozen UDTs.

If the cluster supports non-frozen UDTs, do not return an error
message when trying to create a table with a non-frozen user type.
2019-10-25 12:04:44 +02:00
Kamil Braun
a8c7670722 types: add multi_cell field to user_type_impl.
is_value_compatible_with_internal and update_user_type were generalized
to the non-frozen case.

For now, all user_type_impls in the code are non-multi-cell (frozen).
This will be changed in future commits.
2019-10-25 12:04:44 +02:00
Kamil Braun
574e1cd514 tests: generalize timestamp_based_spliiting_writer and bucket_writer to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
6da89e40df tests: generalize random_schema.cc:generate_collection to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
0fbfb67cbb tests: generalize mutation_test.cc summaries to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
05d4b2e1a4 tests: generalize data_model.cc:mutation_description::build to UDTs. 2019-10-25 12:04:44 +02:00
Kamil Braun
4374982de0 types: collection_type_impl::to_value becomes serialize_for_cql.
The purpose of collection_type_impl::to_value was to serialize a
collection for sending over CQL. The corresponding function in origin
is called serializeForNativeProtocol, but the name is a bit lengthy,
so I settled for serialize_for_cql.

The method now became a free-standing function, using the visit
function to perform a dispatch on the collection type instead
of a virtual call. This also makes it easier to generalize it to UDTs
in future commits.

Remove the old serialize_for_native_protocol with a FIXME: implement
inside. It was already implemented (to_value), just called differently.

remove dead methods: enforce_limit and serialized_values. The
corresponding methods in C* are auxiliary methods used inside
serializeForNativeProtocol. In our case, the entire algorithm
is wholly written in serialize_for_cql.
2019-10-25 10:49:19 +02:00
Kamil Braun
bbdb438d89 collection_mutation: easier (de)serialization of collection_mutation(s).
`collection_type_impl::serialize_mutation_form`
became `collection_mutation(_view)_description::serialize`.

Previously callers had to cast their data_type down to collection_type
to use serialize_mutation_form. Now it's done inside `serialize`.
In the future `serialize` will be generalized to handle UDTs.

`collection_type_impl::deserialize_mutation_form`
became a free standing function `deserialize_collection_mutation`
with similiar benefits. Actually, noone needs to call this function
manually because of the next paragraph.

A common pattern consisting of linearizing data inside a `collection_mutation_view`
followed by calling `deserialize_mutation_form` has been abstracted out
as a `with_deserialized` method inside collection_mutation_view.

serialize_mutation_form_only_live was removed,
because it hadn't been used anywhere.
2019-10-25 10:42:58 +02:00
Kamil Braun
b1d16c1601 types: move collection_type_impl::mutation(_view) out of collection_type_impl.
collection_type_impl::mutation became collection_mutation_description.
collection_type_impl::mutation_view became collection_mutation_view_description.
These classes now reside inside collection_mutation.hh.

Additional documentation has been written for these classes.

Related function implementations were moved to collection_mutation.cc.

This makes it easier to generalize these classes to non-frozen UDTs in future commits.
The new names (together with documentation) better describe their purpose.
2019-10-25 10:19:45 +02:00
Benny Halevy
3b3611b57a mutation_diff: standard input support
Also, not that the file name is properly quoted
it may contain space characters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:58 +03:00
Benny Halevy
6feb4d5207 mutation_diff: accept diff_command option
To support using other diff tools than colordiff

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-23 08:29:47 +03:00
Nadav Har'El
51fc6c7a8e make static_row optional to reduce memory footprint
Merged patch series from Avi Kivity:

The static row can be rare: many tables don't have them, and tables
that do will often have mutations without them (if the static row
is rarely updated, it may be present in the cache and in readers,
but absent in memtable mutations). However, it always consumes ~100
bytes of memory, even if it not present, due to row's overhead.

Change it to be optional by allocating it as an external object rather
than inlined into mutation_partition. This adds overhead when the
static row is present (17 bytes for the reference, back reference,
and lsa allocator overhead).

perf_simple_query appears to marginally (2%) faster. Footprint is
reduced by ~9% for a cache entry, 12% in memtables. More details are
provided in the patch commitlog.

Tests: unit (debug)

Avi Kivity (4):
  managed_ref: add get() accessor
  managed_ref: add external_memory_usage()
  mutation_partition: introduce lazy_row
  mutation_partition: make static_row optional to reduce memory
    footprint

 cell_locking.hh                          |   2 +-
 converting_mutation_partition_applier.hh |   4 +-
 mutation_partition.hh                    | 284 ++++++++++++++++++++++-
 partition_builder.hh                     |   4 +-
 utils/managed_ref.hh                     |  12 +
 flat_mutation_reader.cc                  |   2 +-
 memtable.cc                              |   2 +-
 mutation_partition.cc                    |  45 +++-
 mutation_partition_serializer.cc         |   2 +-
 partition_version.cc                     |   4 +-
 tests/multishard_mutation_query_test.cc  |   2 +-
 tests/mutation_source_test.cc            |   2 +-
 tests/mutation_test.cc                   |  12 +-
 tests/sstable_mutation_test.cc           |  10 +-
 14 files changed, 355 insertions(+), 32 deletions(-)
2019-10-22 12:25:15 +03:00
Piotr Jastrzebski
2b26e3c904 test: change test_partition_key_logging to test_primary_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
997be35ef3 modification_statement: log in cdc clustering key of a change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
d8718a4ffc test: add test_partition_key_logging
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
a1edb68b16 test: check that alter table with cdc manages log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00
Piotr Jastrzebski
629cdb5065 test: check that drop table with cdc removes log and desc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 11:28:23 +02:00