Move out logic responsible for adding readers at partition boundary
into `maybe_add_readers_at_partition_boundary`, and advancing one reader
into `prepare_one`. This will allow to reuse this logic outside
`prepare_next`.
Merged patch series by Calle Wilund, with a few fixes by Piotr Jastrzębski:
Adds delta and pre-image data column writes for the atomic columns in a
cdc-enabled table.
Note that in this patch set it is still unconditional. Adding option support
comes in next set.
Uses code more or less derived from alternator to select pre-image, using
raw query interface. So should be fairly low overhead to query generation.
Pre-image and delta mutations are mixed in with the actual modification
mutations to generate the full cdc log (sans post-image).
Print a histogram of the number of async work items in the shard's
outgoing smp queues.
Example:
(gdb) scylla smp-queues
10747 17 -> 3 ++++++++++++++++++++++++++++++++++++++++
721 17 -> 19 ++
247 17 -> 20 +
233 17 -> 10 +
210 17 -> 14 +
205 17 -> 4 +
204 17 -> 5 +
198 17 -> 16 +
197 17 -> 6 +
189 17 -> 11 +
181 17 -> 1 +
179 17 -> 13 +
176 17 -> 2 +
173 17 -> 0 +
163 17 -> 8 +
1 17 -> 9 +
Useful for identifying the target shard, when `scylla task_histogram`
indicates a high number of async work items.
To produce the histogram the command goes over all virtual objects in
memory and identifies the source and target queues of each
`seastar::smp_message_queue::async_work_item` object. Practically the
source queue will always be that of the current shard. As this scales
with the number of virtual objects in memory, it can take some time to
run. An alternative implementation would be to instead read the actual
smp queues, but the code of that is scary so I went for the simpler and
more reliable solution.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191028132456.37796-1-bdenes@scylladb.com>
This patch set introduces light-weight transactions support to
ScyllaDB. It is a subset of the full series, which adds
basic LWT support and which has been reviewed thus far.
"
mutation_test/test_udt_mutations kept failing on my machine and I tracked it down to the 3rd patch in this series (use int64_t constants for long_type). While at it, this series also fixes a comment and the end iterator in BOOST_REQUIRE(std::all_of(...))
mutation_test: test_udt_mutations: fixup udt comment
mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
mutation_test: test_udt_mutations: use int64_t constants for long_type
Test: mutation_test(dev, debug)
"
* 'test_udt_mutations-fixes' of https://github.com/bhalevy/scylla:
mutation_test: test_udt_mutations: use int64_t constants for long_type
mutation_test: test_udt_mutations: fix end iterator in call to std::all_of
mutation_test: test_udt_mutations: fixup udt comment
Based on a mutation, creates a pre-image select operation.
Note, this uses raw proxy query to shortcut parsing etc,
instead of trying to cache by generated query. Hypothesis is that
this is essentially faster.
The routine assumes all rows in a mutation touch same static/regular
columns. If this is not always true it will need additional
calculations.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Support single-statement conditional updates and as well as batches.
This patch almost fully rewrites column_condition.cc, implementing
is_satisfied_by().
Most of the remaining complications in column_condition implementation
come from the need to properly handle frozen and multi-cell
collection in predicates - up until now it was not possible
to compare entire collection values between each other. This is further
complicated since multi-cell lists and sets are returned as maps.
We can no longer assume that the columns fetched by prefetch operation
are non-frozen collections. IF EXISTS/IF NOT EXISTS condition
fetches all columns, besides, a column may be needed to check other
condition.
When fetching the old row for LWT or to apply updates on list/columns,
we now calculate precisely the list of columns to fetch.
The primary key columns are also included in CAS batch result set,
and are thus also prefetched (the user needs them to figure out which
statements failed to apply).
The patch is cross-checked for compatibility with cassandra-3.11.4-1545-g86812fa502
but does deviate from the origin in handling of conditions on static
row cells. This is addressed in future series.
Each column_condition and raw::column_condition construction case had a
static method wrapping its constructor, simply supplying some defaults.
This neither improves clarity nor maintainability.
cql_statement_opt_metadata is an interim node
in cql (prepared) statement hierarchy parenting
modification_statement and batch_statement. If there
is IF condition in such statements, they return a result set,
and thus have a result set metadata.
The metadata itself is filled in a subsequent patch.
Add checks for conditional modification statement limitations:
- WHERE clustering_key IN (list) IF condition is not supported
since a conditions is evaluated for a single row/cell, so
allowing multiple rows to match the WHERE clause would create
ambiguity,
- the same is true for conditional range deletions.
- ensure all clustering restrictions are eq for conditional delete
We must not allow statements like
create table t(p int, c int, v int, primary key (p, c));
delete from t where p=1 and c>0 if v=1;
because there may be more than one statement in a partition satisfying
WHERE clause, in which case it's unclear which of them should satisfy
IF condition: all or just one.
Raising an error on such a statement is consistent with Cassandra's
behavior.
Introduce service::cas_request abstract base class
which can be used to parameterize Paxos logic.
Implement storage_proxy::cas() - compare and swap - the storage proxy
entry point for lightweight transactions.
Currently the code that manipulates mutations during write need to
check what kind of mutations are those and (sometimes) choose different
code paths. This patch encapsulates the differences in virtual
functions of mutation_holder object, so that high level code will not
concern itself with the details. The functions that are added:
apply_locally(), apply_remotely() and store_hint().
This patch adds all functionality needed for Paxos protocol. The
implementation does not strictly adhere to Paxos paper since the original
paper allows setting a value only once, while for LWT we need to be able
to make another Paxos round after "learn" phase completes, which requires
things like repair to be introduced.
Paxos protocol has three stages: prepare, accept, learn. This patch adds
rpc verb for each of those stages. To be term compatible with Cassandra
the patch calls those stages: prepare, propose, commit.
Paxos protocol relies on replicas having a state that persists over
crashes/restarts. This patch defines such state and stores it in the
database itself in the paxos table to make it persistent.
The stored state is:
in_progress_ballot - promised ballot
proposal - accepted value
proposal_ballot - the ballot of the accepted value
most_recent_commit - most recently learned value
most_recent_commit_at - the ballot of the most recently learned value
This patch add two data structures that will be used by paxos. First
one is "proposal" which contains a ballot and a mutation representing
a value paxos protocol is trying to set. Second one is
"prepare_response" which is a value returned by paxos prepare stage.
It contains currently accepted value (if any) and most recently
learned value (again if any). The later is used to "repair" replicas
that missed previous "learn" message.
Otherwise they are decomposed and serialized as 4-byte int32.
For example, on my machine cell[1] looked like this:
{0002, atomic_cell{0000000310600000;ts=0;expiry=-1,ttl=0}}
and it failed cells_equal against:
{0002, atomic_cell{0000000300000000;ts=0;expiry=-1,ttl=0}}
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
server::set_routes() was setting the value of server::_callbacks.
This led to a race condition, as set_routes() is invoked on every
shard simultaneously. It is also unnecessary, since _callbacks can be
initialized in the constructor.
Fixes#5220.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.
Fixes#4908.
"
* 'traced_file' of https://github.com/kbr-/scylla:
sstables: report sstable index file I/O in CQL tracing
sstables: report sstable data file I/O in CQL tracing
tracing: add traced_file class
"
This change allows creating tables with non-frozen UDT columns. Such columns can then have single fields modified or deleted.
I had to do some refactoring first. Please read the initial commit messages, they are pretty descriptive of what happened (read the commits in the order they are listed on my branch: https://github.com/kbr-/scylla/commits/udt, starting from kbr-@8eee36e, in order to understand them). I also wrote a bunch of documentation in the code.
Fixes#2201.
"
* 'udt' of https://github.com/kbr-/scylla: (64 commits)
tests: too many UDT fields check test
collection_mutation: add a FIXME.
tests: add a non-frozen UDT materialized view test
tests: add a UDT mutation test.
tests: add a non-frozen UDT "JSON INSERT" test.
tests: add a non-frozen UDT to for_each_schema_change.
tests: more non-frozen UDT tests.
tests: move some UDT tests from cql_query_test.cc to new file.
types: handle trailing nulls in tuples/UDTs better.
cql3: enable deleting single fields of non-frozen UDTs.
cql3: enable setting single fields of a non-frozen UDT.
cql3: enable non-frozen UDTs.
cql3: introduce user_types::marker.
cql3: generalize function_call::make_terminal to UDTs.
cql3: generalize insert_prepared_json_statement::execute_set_value to UDTs.
cql3: use a dedicated setter operation for inserting user types.
cql3: introduce user_types::value.
types: introduce to_bytes_opt_vec function.
cql3: make user_types::delayed_value::bind_internal return vector<bytes_opt>.
cql3: make cql3_type::raw_ut::to_string distinguish frozenness.
...
The health check is performed simply by issuing a GET request
to the alternator port - it returns the following status 200
response when the server is healthy:
$ curl -i localhost:8000
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 23
Server: Seastar httpd
Date: 21 Oct 2019 12:55:33 GMT
healthy: localhost:8000
This commit comes with a test.
Fixes#5050
Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>
We could use iterators over cells instead of a vector of cells
in collection_mutation(_view)_description. Then some use cases could
provide iterators that construct the cells "on the fly".