Serialization header is a new components in Statistics.db introduced in
SSTables 3.0 ('ma') format. It is essential for reading data file as it
contains the base values used for delta-encoded values (timestamps,
TTLs, local deletion times) and description of column types.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
A step to untie classes sstable_writer_m and sstable so that eventually
we could stop them being friends.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
SSTables 3.0 format makes a distinction between count of cells and count
of columns. In that sense, a column of a collection type counts as one
column but every atomic cell in it counts as a separate cell.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
test_multishard_combining_reader_destroyed_with_pending_create_reader
was failing because it relied on smp == 3 and thus the shard on which
the reader creation is blocked being shard-2. Since the test requires to
be run with smp >= 3 we can hardcode this shard to be 2 because if the
test runs at all we are guaranteed to have at least smp >= 3.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <38883a1f4c18ca0cd065aa13826a4f1858353289.1525328233.git.bdenes@scylladb.com>
These tests are quite complicated and require intimate knowledge of how
foreign_reader and multishard_combining_reader operates. Knowing these
two objects is still required to understand the tests but make it that
much easier by explaining how they were designed to test what they test.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8de580131a8652924de920c2bc68a98e579398ee.1525328226.git.bdenes@scylladb.com>
'shard' is a short-lived on-stack variable that gets captured by
reference by continuation that gets executed on another shard.
Fixes a race condition that leads to an heap-use-after-free.
Message-Id: <20180502150507.2776-1-pdziepak@scylladb.com>
The test_foreign_reader_destroyed_with_pending_read_ahead test currently
doesn't ensure that the objects in it's scope are destroyed in the
correct order. This is necessary as there are severeal foreign pointers
to objects that live on remote shards and use each other. Since
foreign pointers destory their managed object in the background we
cannot rely on the to reliably destroy objects in order, nor can we be
sure when the object they manage is actually destroy.
So to work around that ensure that the puppet_reader is destroyed before
the remote_control it references even has a chance of being destroyed.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <232eaa899878b03fb2a765c2916e4f05841472a3.1525269726.git.bdenes@scylladb.com>
Test for Scylla's default choice of secondary index name (we found one
small problem, see issue #3403, and left it commented out). Also test
the ability to give indices non-default names.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501153439.26619-1-nyh@scylladb.com>
Add a test that adding a secondary-index for an only partition key column
is not allowed (it would be redundant), but indexing one of several partition
key columns *is* allowed. This reproduced issue #3404, and verifies that
it was fixed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501121544.22869-2-nyh@scylladb.com>
Indexing an only partition key component is not allowed (because it would
be redundant), but it should be allowed to index one of several partition
key components. We had a bug in that case: the underlying materialized view
we created had the same column as both a partition key and a clustering
key, which resulted in an assertion failure. This patch fixes that.
Fixes#3404.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501121544.22869-1-nyh@scylladb.com>
The db/index directory contains just a few lines of code that exists
there for historical reasons. It's confusing that we have both db/index
and index/ directory related to secondary-indexing.
This patch moves what little is still in db/index/ to index/. In the
future we should probably get rid of the "secondary_index" class we had
there, but for now, let's at least not have a whole new directory for it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501101246.21143-1-nyh@scylladb.com>
"
Both multishard_combining_reader and foreign_reader use read-head in the
background to avoid blocking consumers. These read-aheads can be still
pending when the reader is destroyed and hence extra attention is needed
to avoid memory errors. Recent manual testing, done in the context of
testing code that is using the multishard reader, proved that these
cases were not handled correctly in the initial series introducing it
(2d126a79b).
This series introduces fixes and comprehensive tests for all problematic
scenarios:
1) multishard_combining_reader is destroyed with pending reader creation
on a remote shard.
2) foreign_reader is destroyed with pending read-ahead.
3) multishard_combining_reader is destroyed with pending read-ahead.
"
* 'multishard-reader-read-ahead-fixes/v2' of https://github.com/denesb/scylla:
test.py: add custom seastar flags for mutation_reader_test
test.py: move custom seastar flags for tests declarative
mutation_reader_test: add read-ahead related multishard reader tests
tests/mutation_reader_test: change recommented smp to 3
mutation_reader_test: fix name of existing multishard reader tests
simple_schema: add global_simple_schema
simple_schema.hh: remove unused include
multishard_combining_reader: prepare for read-ahead otliving the reader
foreign_reader: prepare for read-ahead outliving the reader
multishard_combining_reader: avoid creating the shard reader twice
multishard_combining_reader: read_ahead: don't assume reader is created
multishard_combining_reader: move read-ahead related methods
multishard_combining_reader: avoid looking up the shard reader twice
multishard_combining_reader: use optional for maybe created reader
Add tests for foreign_reader and multishard_combining_reader that check
that readers destroyed while there is pending read-head will not result
in use-after-free.
Specifically check that:
* multishard_combining_reader destroyed with pending reader creation
* foreign_reader destroyed with pending read-ahead
* multishard_combining_reader destroyed with pending read-ahead
does not result in use-after-free or SEGFAULT.
These tests try to do their best to check for correct behaviour with
various BOOST_REQUIRE* checks but they still heavily rely on ASAN to
detect any use-after-free, SEGFAULT or similar errors.
Of the test_multishard_combining_reader_reading_empty_table test.
Running this test with smp=3 instead of smp=2 helps detecting additional
read-ahead related memory problems.
Which allows a simple_schema instance to be transferred to another
shard. In fact a new simple_schema instance will be created on the
remote shard but it will use the same schema instance the the original
one.
When the multishard reader is destroyed there might be severeal pending
read-aheads running in the background. These read-aheads need their
associated reader to stay alive until after the read-ahead completes.
To solve this move the flat_mutation_reader into a struct and manage
this struct's lifetime through a shared pointer. Fibers associated with
read-aheads that might outlive the multishard reader will hold on to a
copy of the shard pointer keeping the underlying reader alive until they
complete. To avoid doing any extra work a flag is added to this state
which is set when the multishard reader is destroyed. When this flag is
set, pending continuations will return early. All this is encapsulated
in multishard_combining_reader::shard_reader the multishard reader code
itself need not be changed.
The foreign reader keeps track of ongoing read-aheads via a
foreign_ptr to the read-ahead's future on the remote shard. This pointer
is overwritten after each "remote call" to the remote reader with a
pointer to the future of the new read-ahead's future.
There are severeal problems with the current implementation:
1) There is a new read-ahead launched after each "remote call"
unconditionally, even if the remote reader is at EOS. This will start
unecessary read-ahead when the reader is already finished and may be
soon destroyed (legally) by the client.
2) The pointer to the remote read-ahead future is not set to nullptr
when a remote call is issued. Thus in the destructor, where we
attach a continuation to the read-ahead's future to extend the
reader's lifetime until after the read-ahead finishes, we migh attach
a continuation to a future that already has one and run into a failed
assert().
To fix this issues reset the read-ahead pointer to nullptr each time a
remote call is issued and don't start a new read-ahead if the remote
reader is at EOS. This way we can ensure that when the reader is
destroyed we either have a valid and non-stale read-aead future or none
at all and can reliably make a decision about whether we need to extend
the lifetime of the remote reader or not.
The multishard reader creates its shard readers on demand when they are
first attempted to be used. However at this time the reader migh already
be in the progress of being created, initiated by a previous read-ahead.
To avoid creating the shard reader twice, before creating the reader
check whether there are any read-aheads in progress. If there is, it
already created (is creating or will create) the reader and hence
synchronise with the read ahead. Synchronisation happens via a promise,
the read ahead creates a promise which will be fulfilled when the reader
is created. A concurrent create_reader() call will wait on this promise
instead of attempting to create a new reader.
Currently it is assumed that when read_ahead is called the reader is
already created. Under most circumstances this will not be true. It was
blind (bad) luck that we didn't hit this before (during testing).
To the group of methods that do not assume the reader is already
created. A patch will follow that will update read_ahead() to not assume
that the reader is created.
After a little "research" [1] it turns out my initial fears were
completely without ground, std::optional::operator->() and
std::optional::opterator*() doesn't involve an unnecessary branch and
thus there is no need to hand-roll an optional with a separate bool.
[1] http://en.cppreference.com/w/cpp/utility/optional/operator*
The mutation forwarding intermediary (src_addr) may not always know
about the schema which was used by the original coordinator. I think
this may be the cause of the "Schema version ... not found" error seen
in one of the clusters which entered some pathological state:
storage_proxy - Failed to apply mutation from 1.1.1.1#5: std::_Nested_exception<schema_version_loading_failed> (Failed to load schema version 32893223-a911-3a01-ad70-df1eb2a15db1): std::runtime_error (Schema version 32893223-a911-3a01-ad70-df1eb2a15db1 not found)
Fixes#3393.
Message-Id: <1524639030-1696-1-git-send-email-tgrabiec@scylladb.com>
Confirm that issue #2991 is indeed fixed - creating a secondary index
with IF NOT EXISTS ignores an already existing index, and dropping with
IF EXISTS ignores a non-existant index.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180430071714.10154-1-nyh@scylladb.com>
The existing test_secondary_index_case_sensitive only tested the
case-sensitive case of the column being indexed, and only in some
scenarios. Further testing exposed more bugs - issue #3388, issue #3391,
issue #3401. This patch adds tests which reproduced those bugs, and now
verifies their fix.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-9-nyh@scylladb.com>
test_case_sensitivity from tests/view_schema_test.cc was well-intentioned,
aiming to test from different angles the issue of non-lowercase (quoted)
column names and their interaction with materialized views.
But unfortunately, it didn't test anything! This is because the quotation
marks were forgotten, so all the identifier in this test were folded to
lowercase, and the test didn't test non-lowercase identifiers like it
intended.
So this patch adds the missing quotes, to make this test great again.
After the patches for issues #3388 and #3391 which I sent earlier, the
test *passes* (before those patches, the fixed test did not pass -
the unfixed test trivially passed).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-8-nyh@scylladb.com>
When the secondary index code builds a "%s IS NOT NULL" clause for a
CQL statement, it needs to quote the column name if it needs to be
(not only lowercase, digits and _).
Fixes#3401.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-7-nyh@scylladb.com>
We had another case-sensitivity bug in materialized views, where if
a case-sensitive (quoted) column name was listed explicitly on "SELECT"
(instead of implicitly, e.g., in "SELECT *") the column name was
incorrectly folded to lower-case and inserts would fail.
This patch fixes the code, where a "SELECT" statement was built using
the desired column names, but column names that needed quoting were
not being quoted. The bug was in a helper function build_select_statement()
which took column name strings and failed to quote them. We clean up this
function to take column definitions instead of strings - and take care
of the quoting itself. It also needs to quote the table's name in the
select statement being built.
Fixes#3391.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-6-nyh@scylladb.com>
Before this patch, if a materialized view is defined with the restriction
IS NOT NULL on a case-sensitive (quoted) column name, inserts fail with
a "restriction 'foobar IS NOT null' unknown column foobar" error, where
foobar is the lowercased version of the case-sensitive column name.
The problem is that the code uses single_column_relation::to_string()
to convert the relation into a CQL where clause. And indeed, this method
generates a CQL expression; But it calls column_identifier::raw::to_string()
to print identifiers. This is the wrong function - it doesn't quote
identifiers that need quoting because they are not lowercase.
So this patch uses column_identifier::raw::to_cql_string() (a method we
added in the previous patch) to generate the properly quoted CQL relation.
Fixes#3388
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-5-nyh@scylladb.com>
Implement a method column_identifier::raw::to_cql_string(). Exactly like
the one without "raw", this method quotes the identifier name as needed
for CQL. We'll need this method in a later patch.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-4-nyh@scylladb.com>
There is no reason for to_cql_string() and maybe_quote() to both
implement the same quoting algorithm. Use the latter to implement the
former.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-3-nyh@scylladb.com>
The utility function maybe_quote() is supposed to quote identifier names
(name of keyspace, table, or column) according to CQL rules, e.g., if the
name has any uppercase or non-alphanumeric characters, it needs to be
quoted. Unfortunatelty, it didn't quite do the right thing, so this patch
fixes that. This patch also adds a comment explaining what maybe_quote()
is supposed to do (until now, users could only guess).
Fixes#3400.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-2-nyh@scylladb.com>
In commit d674b6f672, I fixed a case-
sensitive column name bug by avoiding CQL quoting of a column name
in create_index_statement.cc when building a "targets" option string.
However, there is also matching code in target_parser.hh to unquote
that option string. So this unquoting code is no longer necessary, and
should be dropped.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-1-nyh@scylladb.com>
In the current code, if the base table has a compound partition key (i.e.,
multiple partition-key columns) searching its secondary indexes didn't work.
There is no real reason why this, it was a just a bug in preparing the
second query:
Every SI query is converted to two queries. The first queries the associated
materialized view, to find a list of primary keys. Those we need to use in a
second query, of the base table. The second query needs to list, as
restrictions, the keys found above. When a partition key is compound, its
components build one key and one restriction. But in the buggy code, we
incorrectly used each component as a separate (improperly formatted) key
and restriction, and obviously this didn't work.
This patch also adds a test that reproduces this problem and confirms its fix.
In the fixed code I also found another incorrect use of to_cql_string() (which
could break case-sensitive primary key column names) and changed it to
to_string().
Fixes#3210.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429124138.24406-1-nyh@scylladb.com>
We make multiple attempts to mark a node as alive. We do that be
sending an EchoMessage, and marking the node as alive upon receiving a
successful answer. In case there's a network partition and the nodes
can't reach each other, multiple messages may be delivered and
processed.
We can avoid processing duplicate EchoMessage replies by checking
whether we had already marked the node as alive.
Fixes#1184
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180428191942.31990-1-duarte@scylladb.com>
"
Recently many changes have landed in seastar for the I/O Scheduler. We
can now describe the I/O storage of a machine by its visible properties
like throughput and bandwidth instead of relying in an indirect
calculation.
For the instances we support, we can just measure that and start using
them right away.
A version of iotune that computes those properties is not yet ready, but
in its making I have noticed that we aren't really setting the nomerges
and scheduler properties of the disks under testing. We definitely
should, since that can influence the results. So this patchset also
starts doing that.
The commandline for iotunev2 shouldn't change much. When it is ready we
will just adjust this script once more.
"
* 'scylla_io_setup' of github.com:glommer/scylla:
scylla_io_setup: preconfigure i3 and i2 instances with new I/O scheduler properties
scylla_lib: drop support for m3 and c3 AWS instance types
io_setup: call blocktune before tuning I/O
blocktune: allow it to be called as a library.
scripts: move scylla-blocktune to scripts location
* seastar 70aecca...ac02df7 (5):
> Merge "Prefix preprocessor definitions" from Jesse
> cmake: Do not enable warnings transitively
> posix: prevent unused variable warning
> build: Adjust DPDK options to fix compilation
> io_scheduler: adjust property names
DEBUG, DEFAULT_ALLOCATOR, and HAVE_LZ4_COMPRESS_DEFAULT macro
references prefixed with SEASTAR_. Some may need to become
Scylla macros.