Commit Graph

20011 Commits

Author SHA1 Message Date
Piotr Dulikowski
2a46a09e7c mutation_reader: refactor prepare_next
Move out logic responsible for adding readers at partition boundary
into `maybe_add_readers_at_partition_boundary`, and advancing one reader
into `prepare_one`. This will allow to reuse this logic outside
`prepare_next`.
2019-10-30 09:49:12 +01:00
Nadav Har'El
d69ab1b588 CDC: (atomic) delta + (non-optional) pre-image data columns
Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski:

Adds delta and pre-image data column writes for the atomic columns in a
cdc-enabled table.

Note that in this patch set it is still unconditional. Adding option support
comes in next set.

Uses code more or less derived from alternator to select pre-image, using
raw query interface. So should be fairly low overhead to query generation.
Pre-image and delta mutations are mixed in with the actual modification
mutations to generate the full cdc log (sans post-image).
2019-10-29 09:39:28 +02:00
Calle Wilund
7db393fe12 cdc_test: Add helper methods + preimage test
Add filtering, sorting etc helpers + simple pre-image test

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-29 07:49:05 +01:00
Botond Dénes
edc1750297 scylla-gdb.py: introduce scylla smp-queues
Print a histogram of the number of async work items in the shard's
outgoing smp queues.
Example:

    (gdb) scylla smp-queues
        10747 17 ->  3 ++++++++++++++++++++++++++++++++++++++++
          721 17 -> 19 ++
          247 17 -> 20 +
          233 17 -> 10 +
          210 17 -> 14 +
          205 17 ->  4 +
          204 17 ->  5 +
          198 17 -> 16 +
          197 17 ->  6 +
          189 17 -> 11 +
          181 17 ->  1 +
          179 17 -> 13 +
          176 17 ->  2 +
          173 17 ->  0 +
          163 17 ->  8 +
            1 17 ->  9 +

Useful for identifying the target shard, when `scylla task_histogram`
indicates a high number of async work items.

To produce the histogram the command goes over all virtual objects in
memory and identifies the source and target queues of each
`seastar::smp_message_queue::async_work_item` object. Practically the
source queue will always be that of the current shard. As this scales
with the number of virtual objects in memory, it can take some time to
run. An alternative implementation would be to instead read the actual
smp queues, but the code of that is scary so I went for the simpler and
more reliable solution.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>
2019-10-28 15:42:55 +02:00
Tomasz Grabiec
3b37027598 Merge "lwt: implement basic lightweight transactions support" from Kostja
This patch set introduces light-weight transactions support to
ScyllaDB. It is a subset of the full series, which adds
basic LWT support and which has been reviewed thus far.
2019-10-28 11:45:28 +01:00
Tomasz Grabiec
f745819ed7 Merge "lwt: paxos protocol implementation" from Gleb
This is paxos implementation for LWT. LWT itself is not included in the
patch so the code is essentially is not wired yet (except read path).
2019-10-28 11:29:40 +01:00
Avi Kivity
f8ba96efcf Merge "test_udt_mutations fixes" from Benny
"
mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...))

mutation_test: test_udt_mutations: fixup udt comment
mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
mutation_test: test_udt_mutations: use int64_t constants for long_type

Test: mutation_test(dev, debug)
"

* 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla:
  mutation_test: test_udt_mutations: use int64_t constants for long_type
  mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
  mutation_test: test_udt_mutations: fixup udt comment
2019-10-28 10:43:52 +02:00
Calle Wilund
36328acf60 cql_assertions: Change signature to accept sstring 2019-10-28 06:16:12 +01:00
Calle Wilund
7d98f735ee cdc: Add static columns to data/preimage mutations
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
19bba5608a cdc: Create and perform a pre-image select for mutations
As well as generate per-image rows in resulting log mutation

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d4ee1938c7 cdc: Add modification record for regular atomic values in mutations
Fills in the data columns for regular columns iff they are
atomic (not unfrozed collections)
2019-10-28 06:16:12 +01:00
Calle Wilund
3fdcbd9dff cdc: Set row op in log
Adds actual operation (part delete, range delete, update) to
cdc log
2019-10-28 06:16:12 +01:00
Calle Wilund
8a6b72f47e cdc: Add pre-image select generator method
Based on a mutation, creates a pre-image select operation.

Note, this uses raw proxy query to shortcut parsing etc,
instead of trying to cache by generated query. Hypothesis is that
this is essentially faster.

The routine assumes all rows in a mutation touch same static/regular
columns. If this is not always true it will need additional
calculations.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
d74f32b07a cql3::untyped_result_set: Add constructor from cql3:;result_set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Calle Wilund
3ed7a9dd69 cql3::untyped_result_set: Add view getter to make non-intrusive read chaper
Also use in actual data conversion.
2019-10-28 06:16:12 +01:00
Calle Wilund
451bb7447d cdc: Add log / log data column operation types and make data cols tuples of these
Makes static/regular data columns tuple<op, value, ttl> as per spec.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-28 06:16:12 +01:00
Konstantin Osipov
e555dc502e lwt: implement basic lightweight transactions support
Support single-statement conditional updates and as well as batches.

This patch almost fully rewrites column_condition.cc, implementing
is_satisfied_by().

Most of the remaining complications in column_condition implementation
come from the need to properly handle frozen and multi-cell
collection in predicates - up until now it was not possible
to compare entire collection values between each other. This is further
complicated since multi-cell lists and sets are returned as maps.

We can no longer assume that the columns fetched by prefetch operation
are non-frozen collections. IF EXISTS/IF NOT EXISTS condition
fetches all columns, besides, a column may be needed to check other
condition.

When fetching the old row for LWT or to apply updates on list/columns,
we now calculate precisely the list of columns to fetch.

The primary key columns are also included in CAS batch result set,
and are thus also prefetched (the user needs them to figure out which
statements failed to apply).

The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502
but does deviate from the origin in handling of conditions on static
row cells. This is addressed in future series.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
67e68dabf0 lwt: ensure we don't crash when we get a LIKE 2019-10-27 23:42:49 +03:00
Konstantin Osipov
f8f36d066c lwt: check for unsupported collection type in condition element access
We don't support conditions with element access on non-frozen UDTs,
check that only supported collection types are supplied.
2019-10-27 23:42:49 +03:00
Konstantin Osipov
c9f0adf616 lwt: rewrite cql3::raw::column_condition::prepare()
Restructure the code to avoid quite a bit of code duplication.
2019-10-27 23:42:47 +03:00
Konstantin Osipov
c2217df4d8 lwt: reorganize column_condition declaration and add comments 2019-10-27 23:42:03 +03:00
Konstantin Osipov
22b0240fe7 lwt: remove useless code in column_condition.hh
Each column_condition and raw::column_condition construction case had a
static method wrapping its constructor, simply supplying some defaults.

This neither improves clarity nor maintainability.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
3e25b83391 lwt: propagate if_exists condition from the parser to AST
UPDATE ... IF EXISTS is legal, but IF EXISTS condition
was not propagated from the parser to AST (rad::update_statement).
2019-10-27 23:42:03 +03:00
Konstantin Osipov
df28985295 lwt: introduce cql_statment_opt_metadata
cql_statement_opt_metadata is an interim node
in cql (prepared) statement hierarchy parenting
modification_statement and batch_statement. If there
is IF condition in such statements, they return a result set,
and thus have a result set metadata.

The metadata itself is filled in a subsequent patch.
2019-10-27 23:42:03 +03:00
Vladimir Davydov
c8869e803e lwt: remove commented out validateWhereClauseForConditions
This logic was implemented in validate_where_clause_for_conditions()
method of modification_statement class.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
eb5e82c6a1 lwt: add CAS where clause validation
Add checks for conditional modification statement limitations:
- WHERE clustering_key IN (list) IF condition is not supported
  since a conditions is evaluated for a single row/cell, so
  allowing multiple rows to match the WHERE clause would create
  ambiguity,
- the same is true for conditional range deletions.
- ensure all clustering restrictions are eq for conditional delete

  We must not allow statements like

  create table t(p int, c int, v int, primary key (p, c));
  delete from t where p=1 and c>0 if v=1;

  because there may be more than one statement in a partition satisfying
  WHERE clause, in which case it's unclear which of them should satisfy
  IF condition: all or just one.

  Raising an error on such a statement is consistent with Cassandra's
  behavior.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
203eb3eccc lwt: sleep a random amount of time when retrying CAS
Sleep a random interval between 0 and 100 ms before retrying CAS.
Reuse sleep function, make the distribution object thread local.
2019-10-27 23:42:03 +03:00
Konstantin Osipov
0674fab05c lwt: implement storage_proxy::cas()
Introduce service::cas_request abstract base class
which can be used to parameterize Paxos logic.

Implement storage_proxy::cas() - compare and swap - the storage proxy
entry point for lightweight transactions.
2019-10-27 23:42:03 +03:00
Gleb Natapov
70adf65341 storage_proxy: make mutation holder responsible for mutation operation
Currently the code that manipulates mutations during write need to
check what kind of mutations are those and (sometimes) choose different
code paths. This patch encapsulates the differences in virtual
functions of mutation_holder object, so that high level code will not
concern itself with the details. The functions that are added:
apply_locally(), apply_remotely() and store_hint().
2019-10-27 23:21:51 +03:00
Gleb Natapov
b3e01a45d7 lwt: storage_proxy: implement paxos protocol
This patch adds all functionality needed for Paxos protocol. The
implementation does not strictly adhere to Paxos paper since the original
paper allows setting a value only once, while for LWT we need to be able
to make another Paxos round after "learn" phase completes, which requires
things like repair to be introduced.
2019-10-27 23:21:51 +03:00
Gleb Natapov
8d6201a23b lwt: Add RPC verbs needed for paxos implementation
Paxos protocol has three stages: prepare, accept, learn. This patch adds
rpc verb for each of those stages. To be term compatible with Cassandra
the patch calls those stages: prepare, propose, commit.
2019-10-27 23:21:51 +03:00
Gleb Natapov
d1774693bf lwt: Define state needed by paxos and persist it
Paxos protocol relies on replicas having a state that persists over
crashes/restarts. This patch defines such state and stores it in the
database itself in the paxos table to make it persistent.

The stored state is:
  in_progress_ballot    - promised ballot
  proposal              - accepted value
  proposal_ballot       - the ballot of the accepted value
  most_recent_commit    - most recently learned value
  most_recent_commit_at - the ballot of the most recently learned value
2019-10-27 23:21:51 +03:00
Gleb Natapov
15b935b95d lwt: add data structures needed for paxos implementation
This patch add two data structures that will be used by paxos. First
one is "proposal" which contains a ballot and a mutation representing
a value paxos protocol is trying to set. Second one is
"prepare_response" which is a value returned by paxos prepare stage.
It contains currently accepted value (if any) and most recently
learned value (again if any). The later is used to "repair" replicas
that missed previous "learn" message.
2019-10-27 23:21:51 +03:00
Benny Halevy
1895fb276e mutation_test: test_udt_mutations: use int64_t constants for long_type
Otherwise they are decomposed and serialized as 4-byte int32.

For example, on my machine cell[1] looked like this:
{0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}}

and it failed cells_equal against:
{0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 20:51:29 +02:00
Benny Halevy
fec772538c mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:49:25 +02:00
Benny Halevy
9c8cf9f51d mutation_test: test_udt_mutations: fixup udt comment
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-10-27 19:47:43 +02:00
Benny Halevy
76581e7f14 docs/debugging.md: fix gdb command for retrieving shared libraries information
This correct command is `info sharedlibrary`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191027153541.27286-1-bhalevy@scylladb.com>
2019-10-27 18:15:09 +02:00
Dejan Mircevski
2a136ba1bc alternator: Fix race condition in set_routes()
server::set_routes() was setting the value of server::_callbacks.
This led to a race condition, as set_routes() is invoked on every
shard simultaneously.  It is also unnecessary, since _callbacks can be
initialized in the constructor.

Fixes #5220.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-27 12:31:24 +02:00
Avi Kivity
27ef73f4f1 Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.

Fixes #4908.
"

* 'traced_file' of https://github.com/kbr-/scylla:
  sstables: report sstable index file I/O in CQL tracing
  sstables: report sstable data file I/O in CQL tracing
  tracing: add traced_file class
2019-10-26 22:53:37 +03:00
Avi Kivity
2b856a7317 Merge "Support non-frozen UDTs." from Kamil
"
This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted.

I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code.

Fixes #2201.
"

* 'udt' of https://github.com/kbr-/scylla: (64 commits)
  tests: too many UDT fields check test
  collection_mutation: add a FIXME.
  tests: add a non-frozen UDT materialized view test
  tests: add a UDT mutation test.
  tests: add a non-frozen UDT "JSON INSERT" test.
  tests: add a non-frozen UDT to for_each_schema_change.
  tests: more non-frozen UDT tests.
  tests: move some UDT tests from cql_query_test.cc to new file.
  types: handle trailing nulls in tuples/UDTs better.
  cql3: enable deleting single fields of non-frozen UDTs.
  cql3: enable setting single fields of a non-frozen UDT.
  cql3: enable non-frozen UDTs.
  cql3: introduce user_types::marker.
  cql3: generalize function_call::make_terminal to UDTs.
  cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
  cql3: use a dedicated setter operation for inserting user types.
  cql3: introduce user_types::value.
  types: introduce to_bytes_opt_vec function.
  cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
  cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
  ...
2019-10-26 22:53:37 +03:00
Piotr Sarna
657e7ef5a5 alternator: add alternator health check
The health check is performed simply by issuing a GET request
to the alternator port - it returns the following status 200
response when the server is healthy:

$ curl -i localhost:8000
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 23
Server: Seastar httpd
Date: 21 Oct 2019 12:55:33 GMT

healthy: localhost:8000

This commit comes with a test.
Fixes #5050
Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>
2019-10-26 18:14:18 +03:00
Botond Dénes
01e913397a tests: memtable_test: flush_reader_test: compare compacted mutations
To filter out artificial differences due to different representation of
an equivalent set of writes.

Fixes: #5207

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191024103718.29266-1-bdenes@scylladb.com>
2019-10-26 18:14:18 +03:00
Kamil Braun
432ef7c9af sstables: report sstable index file I/O in CQL tracing
Use tracing::make_traced_file when reading from the index file in
index_reader.
2019-10-25 14:10:28 +02:00
Kamil Braun
394c36835a sstables: report sstable data file I/O in CQL tracing
Use tracing::make_traced_file when creating an sstable input_stream.
To achieve that, trace_state needs to be plumbed down through some
functions.
2019-10-25 14:10:28 +02:00
Kamil Braun
a8c9d1206a tracing: add traced_file class
This is a thin wrapper over the `seastar::file` class which adds
CQL trace messages before and after I/O operations.
2019-10-25 14:10:24 +02:00
Kamil Braun
2889edea3e tests: too many UDT fields check test 2019-10-25 12:05:10 +02:00
Kamil Braun
adfc04ebec collection_mutation: add a FIXME.
We could use iterators over cells instead of a vector of cells
in collection_mutation(_view)_description. Then some use cases could
provide iterators that construct the cells "on the fly".
2019-10-25 12:05:10 +02:00
Kamil Braun
45d2a96980 tests: add a non-frozen UDT materialized view test 2019-10-25 12:05:10 +02:00
Kamil Braun
e0c233ede1 tests: add a UDT mutation test. 2019-10-25 12:05:08 +02:00
Kamil Braun
a21d12faae tests: add a non-frozen UDT "JSON INSERT" test. 2019-10-25 12:04:44 +02:00