Commit Graph

18318 Commits

Author SHA1 Message Date
Asias He
b2c110699e gms: Remove i_failure_detector.hh
It is not used any more.
2019-03-22 09:08:51 +08:00
Asias He
af579a055b gossip: Get rid of the gms::get_local_failure_detector static object
Store the failure_detector object inside gossiper object.

- No more the global object sharded<failure_detector>

- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
2019-03-22 09:08:51 +08:00
Asias He
2b6a4050c2 dht: Do not use failure_detector::is_alive in failure_detector_source_filter
Switch failure_detector_source_filter to use get_local_gossiper::is_alive
directly since we are going to remove the static
gms::get_local_failure_detector object soon.
Pass the nodes that are down to the filter direclty, to avoid the
range_streamer to depends on gossiper at all.
2019-03-22 08:26:47 +08:00
Asias He
9dbc4af1dd tests: Fix stop snitch in gossip_test.cc
It should stop snitch not failure detector. Fix it up. We are going to
remove the static failure_detector object soon.
2019-03-22 08:26:47 +08:00
Asias He
967794798a gossiper: Do not use value_factory from storage_service object
Avoid using value_factory from storage_service inside gossiper.
2019-03-22 08:26:47 +08:00
Asias He
4a55617c6c gossiper: Use cfg options from _cfg instead of get_local_storage_service
Gossiper has db::config _cfg now, avoid using the
get_local_storage_service() to get config options.
2019-03-22 08:26:44 +08:00
Asias He
ee1227b3ae gossiper: Pass db::config object to gossiper class
Gossiper calls service::get_local_storage_service() to get cfg options.
To avoid cyclic dependency, pass the cfg object to gossiper directly.
2019-03-22 08:25:16 +08:00
Asias He
1652ee512a init: Pass gossiper object to init_ms_fd_gossiper
In order to avoid the usage of the static gossiper object returned from
get_local_gossiper().
2019-03-22 08:25:16 +08:00
Duarte Nunes
5752174762 Merge 'Use staging directory for uploaded sstables awaiting view updates' from Piotr
"
This series adds moving sstables uploaded via `nodetool refresh` to
staging/ directory if they require generating view updates from them.
Previous behavior (leaving these sstables in upload/ directory until
view updates are generated) might have caused sstables with
conflicting names to be mistakenly overwritten by the user.

Fixes #4047

Tests: unit (dev)
dtest: backup_restore_tests.py + backup_restore_tests.py modified with
       having materialized view definitions
"

* 'use_staging_directory_for_uploaded_sstables_awaiting_view_updates' of https://github.com/psarna/scylla:
  sstables: simplify requires_view_building
  loader: move uploaded view pending sstables to staging
2019-03-21 12:46:02 -03:00
Gleb Natapov
bb93d990ad messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream
Current code captures a reference to rpc::client in a continuation, but
there is no guaranty that the reference will be valid when continuation runs.
Capture shared pointer to rpc::client instead.

Fixes #4350.

Message-Id: <20190314135538.GC21521@scylladb.com>
2019-03-21 12:46:01 -03:00
Tomasz Grabiec
69775c5721 row_cache: Fix abort in cache populating read concurrent with memtable flush
When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>
2019-03-21 12:46:00 -03:00
Asias He
c0f744b407 storage_service: Wait for gossip to settle only if do_bind is set
In commit 71bf757b2c, we call
wait_for_gossip_to_settle() which takes some time to complete in
storage_service::prepare_to_join().

In tests/cql_query_test calls init_server with do_bind == false which in
turn calls storage_service::prepare_to_join(). Since in the test, there
is only one node, there is no point to wait for gossip to settle.

To make the cql_query_test fast again, do not call
wait_for_gossip_to_settle if do_bind is false.

Before this patch, cql_query_test takes forever to complete.
After it takes 10s.

Tests: tests/cql_query_test
Message-Id: <3ae509e0a011ae30eef3f383c6a107e194e0e243.1553147332.git.asias@scylladb.com>
2019-03-21 12:46:00 -03:00
Avi Kivity
a9cf07369f Merge "Add local indexes" from Piotr
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"

* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
  tests: add cases for local index prefix optimization
  tests: add create/drop local index test case
  tests: add non-standard names cases to local index tests
  tests: add multi pk case for local index tests
  tests: add test for malformed local index definitions
  tests: add local index paging test
  tests: add local indexing test
  cql3: add CREATE INDEX syntax for local indexes
  cql3: use serialization function to create index target string
  index: add serialization function for index targets
  index: use proper local index target when adding index
  index: add parsing target column name from local index targets
  db: add checking for local index in schema tables
  index: add checking if serialized target implies local index
  index: enable parsing multi-key targets
  index: move target parser code to .cc file
  json: add non-throwing overload for to_json_value
  cql3: add checking for local indexes in has_supporting_index()
  cql3: move finding index restrictions to prepare stage
  cql3: add picking an index by score
  ...
2019-03-21 12:46:00 -03:00
Nadav Har'El
561c640ed1 materialized views: allow view without clustering columns
When a materialized view was created, the verification code artificially
forbade creating a view without a clustering key column. However, there
is no real reason to forbid this. In the trivial case, the original base
table might not have had a clustering key, and the view might want to use
the exact same key. In a more complex case, a view may want to have all the
primary key columns as *partition* key columns, and that should be fine.

The patch also includes a regression test, which failed before this patch,
and succeeds with it (we test that we can create materialized views in both
aforementioned scenarios, and these materialized views work as expected).

Duarte raised the opinion that the "trivial" case of a view table with
a key identical to that of the base should be disallowed. However, this
should be done, if at all (I think it shouldn't), in a follow-up patch,
which will implement the non-triviality requirement consistently (e.g.,
require view primary key to be different from base's, regardless of
the existance or non-existance of clustering columns).

Fixes #4340.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320122925.10108-1-nyh@scylladb.com>
2019-03-21 12:45:52 -03:00
Glauber Costa
34b640993f storage proxy: add tracepoints about delays
When we are tracing requests, we would like to know everything that
happened to a query that can contribute to it having increased
latencies.

We insert some of those latencies explicitly due to throttling, but we
do not log that into tracing.

In the case of storage proxy, we do have a log message at trace level
but that is rarely used: trace messages are too heavy of a hammer, there
is no way to specify specific queries, etc.

The correct place for that is CQL tracing. This patch moves that message
to CQL tracing. We also add a matching tracepoint assuring us that no
delay happened if that's the case.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190320163350.15075-1-glauber@scylladb.com>
2019-03-21 12:45:52 -03:00
Avi Kivity
eddb98e8c6 Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz
"
Static compact tables are tables with compact storage and no
clustering columns.

Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of as static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.

This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in our schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.

Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.

Fixes #4139.

Tests:

  - unit (dev)
"

* tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla:
  tests: sstables: Test reading of static compact sstable generated by Cassandra
  tests: sstables: Add test for writing and reading of static compact tables
  sstables: mc: Write static compact tables the same way as Cassandra
  sstable: mc: writer: Set _static_row_written inside write_static_row()
  sstables: Add sstable::features()
  sstables: mc: writer: Prepare write_static_row() for working with any column_kind
  storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag
  sstables: mc: writer: Build indexed_columns together with serialization_header
  sstables: mc: writer: De-optimize make_serialization_header()
  sstable: mc: writer: Move attaching of mc-specific components out of generic code
2019-03-21 12:45:51 -03:00
Piotr Sarna
9695a47e96 sstables: simplify requires_view_building
Since sstables uploaded via upload/ directory are no longer left there
awaiting view updates, the only remaining valid directory is staging/.
2019-03-20 13:47:21 +01:00
Botond Dénes
0c381572fd repair::row_level: pin table for local reads
The repair reader depends on the table object being alive, while it is
reading. However, for local reads, there was no synchronization between
the lifecycle of the repair reader and that of the table. In some cases
this can result in use-after-free. Solve by using the table's existing
mechanism for lifecycle extension: `read_in_progress()`.

For the non-local reader, when the local node's shard configuration is
different from the remote one's, this problem is already solved, as the
multishard streaming reader already pins table objects on the used
shards. This creates an inconsistency that might be suprising (in a bad
way). One reader takes care of pinning needed resources while the other
one doesn't. I was thorn on how to reconcile this, and decided to go
with the simplest solution, explicitely pinning the table for local
reads, that is conserve the inconsistency. It was suggested that this
inconsitency is remedied by building resource pinning into the local
reader as well [1] but there is opposition to this [2]. Adding a wrapper
reader which does just the resource pinning seems excessive, both in
code and runtime overhead.

Spotted while investigating repair-related crashes which occured during
interrupted repairs.

Fixes: #4342

[1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050
[2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657

Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>
2019-03-20 14:45:22 +02:00
Piotr Sarna
986004a959 loader: move uploaded view pending sstables to staging
When loading tables uploaded via `nodetool refresh`, they used to be
left in upload/ directory if view updates would need to be generated
from them. Since view update generation is asynchronous, sstables
left in the directory could erroneously get overwritten by the user,
who decides to upload another batch of sstables and some of the names
collided.
To remedy this, uploaded sstables that need view updates are moved
to staging/ directory with a unique generation number, where they
await view update generation.

Fixes #4047
2019-03-20 13:44:29 +01:00
Juliana Oliveira
8cd6028d0d Dockerfile: remove cgroup volume mount
Mounting /sys/fs/cgroup inside the image causes docker cgroup to not
be mounted internally. Therefore, hosts cannot limit resources on
Scylla. This patch removes the cgroup volume mount, allowing folders
under /sys/fs/cgroup to be created inside docker.

Message-Id: <20190320122053.GA20256@shenzou.localdomain>
2019-03-20 14:30:27 +02:00
Nadav Har'El
7c874057f5 materialized_views: propagate "view virtual columns" between nodes
db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed
to list the same schema tables - the former is the list of their names, and
the latter is the list of their schemas. This code duplication makes it easy
to forget to update one of them, and indeed recently the new
"view_virtual_columns" was added to all_tables() but not to ALL.

What this patch does is to make ALL a function instead of constant vector.
The newly named all_table_names() function uses all_tables() so the list
of schema tables only appears once.

So that nobody worries about the performance impact, all_table_names()
caches the list in a per-thread vector that is only prepared once per thread.

Because after this patch all_table_names() has the "view_virtual_columns"
that was previously missing, this patch also fixes #4339, which was about
virtual columns in materialized views not being propagated to other nodes.

Unfortunately, to test the fix for #4339 we need a test with multiple
nodes, so we cannot test it here in a unit test, and will instead use
the dtest framework, in a separate patch.

Fixes #4339

Branches: 3.0
Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320063437.32731-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Nadav Har'El
ccf731a820 Materialized views: add metric for current flow-control delay
The materialized views flow control mechanism works by adding a certain
delay to each client request, designed to slow down the client to the
rate at we can complete the background view work. Until now we could observe
this mechanism only indirectly, in whether or not it succeeded to keep the
view backlog bounded; But we had no way to directly observe the delay that
we decided to add. In fact, we had a bug where this delay was constantly
zero, and we didn't even notice :-)

So in this patch we add a new metric,
scylla_storage_proxy_coordinator_last_mv_flow_control_delay

The metric is a floating point number, in units of seconds.

This metric is somewhat peculiar that it always contains the *last* delay
used for some request - unlike other metrics it doesn't measure the "current"
value of something. Moreover, it can jump wildly because there is no
guarantee that each request's delay will be identical (in particular,
different requests may involve different base replicas which have different
view backlogs, so decide on different delays). In the future we may want
to supplement this metric with some sort of delay histogram. But even
this simple metric is already useful to debug certain scenarios and
understand if the materialized-views flow control is working or not.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190227133630.26328-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Tomasz Grabiec
fbeae4ffeb toolchain: Install gdb in the image
Scylla built using the frozen toolchain needs to be debugged
on a system with matching libraries. It's easiest if it's also done on the same image.
Install gdb in the image so that it's always out there when we need it.

Fixes #4329

Message-Id: <1553072393-9145-1-git-send-email-tgrabiec@scylladb.com>
2019-03-20 13:35:26 +02:00
Piotr Sarna
41679de13e tests: add cases for local index prefix optimization
The cases check if incorporating clustering key prefix into
the indexed query works fine (i.e. does not require filtering
and returns proper rows).
2019-03-20 10:51:27 +01:00
Piotr Sarna
56a0e6d992 tests: add create/drop local index test case 2019-03-20 10:51:27 +01:00
Piotr Sarna
3c61c8e18a tests: add non-standard names cases to local index tests
New test cases cover case-sensitive column/table names and names with
non-alphanumeric characters like commas and parentheses.
2019-03-20 10:51:27 +01:00
Piotr Sarna
d664e0e522 tests: add multi pk case for local index tests 2019-03-20 10:51:27 +01:00
Piotr Sarna
3b39029924 tests: add test for malformed local index definitions 2019-03-20 10:51:27 +01:00
Piotr Sarna
4b82011cd3 tests: add local index paging test 2019-03-20 10:51:27 +01:00
Piotr Sarna
8836500fcd tests: add local indexing test
A test case for local indexing is added to the SI suite.
2019-03-20 10:51:27 +01:00
Piotr Sarna
cedec95f8d cql3: add CREATE INDEX syntax for local indexes
In order to create a local index, the syntax used is:
CREATE INDEX t ON ((p1, p2, p3), v);

where (p1, p2, p3) are partition key columns (all of them),
and v is the indexed column.
2019-03-20 10:51:27 +01:00
Piotr Sarna
1fd61c5ac4 cql3: use serialization function to create index target string
Instead of building the string manually, a serialization function
is called to create a string out of index target list.
2019-03-20 10:51:27 +01:00
Piotr Sarna
757419b524 index: add serialization function for index targets
Since target_parser is responsible for deserializing target strings,
the function that serializes them belongs in the same class.
2019-03-20 10:51:26 +01:00
Piotr Sarna
074ed2c8a5 index: use proper local index target when adding index
With global indexes, target column name is always the same as the string
kept in 'options[target]' field. It's not the case for local indexes,
and so a proper extracting function is used to get the value.
2019-03-20 10:20:24 +01:00
Piotr Sarna
2fcae3d0ec index: add parsing target column name from local index targets
When (re)creating a local index, the target string needs to be used
to parse out the actual indexed column:
"(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column".
This column is later used to deterine if an index should be applied
to a SELECT statement.
2019-03-20 10:20:24 +01:00
Piotr Sarna
e0d7807eed db: add checking for local index in schema tables
Based on which targets the index has, it will be either local
or global - local indexes have their full base partition key
embedded in their targets.
2019-03-20 10:20:24 +01:00
Piotr Sarna
de5e5ee1a5 index: add checking if serialized target implies local index
This utility enables checking if the specified target indicated
having a local index, even before base table schema is known.
2019-03-20 10:20:24 +01:00
Piotr Sarna
5672edc149 index: enable parsing multi-key targets
Parsing index targets that consist of partition key columns
followed by clustering key columns is enabled.
2019-03-20 10:20:24 +01:00
Piotr Sarna
9782381dd4 index: move target parser code to .cc file
It will be useful later when expanding the implementation.
2019-03-20 10:20:24 +01:00
Piotr Sarna
25264d61ee json: add non-throwing overload for to_json_value
It will be needed later to avoid unnecessary try-catch blocks.
2019-03-20 10:20:24 +01:00
Piotr Sarna
b46ab76d4b cql3: add checking for local indexes in has_supporting_index()
With local indexes it's not sufficient to check if a single
restriction is supported by an index in order to decide
that in can be used, because local indexes can be leveraged
only when full partition key is properly restricted.

(It also serves as a great example why restrictions code
 would greatly benefit from a facelift! :) )
2019-03-20 10:20:24 +01:00
Piotr Sarna
87f6e37caa cql3: move finding index restrictions to prepare stage
Index restrictions that match a given index were recomputed
during execution stage, which is redundant and prone to errors.
Now, used index restrictions are cached in a prepare statement.
2019-03-20 10:20:22 +01:00
Piotr Sarna
9823898b27 cql3: add picking an index by score
Instead of choosing the first index that we find (in column def order),
the index with highest score is picked. Currently local indexes
score higher than global ones if restrictions allow local indexing
to be applied.
2019-03-20 10:20:02 +01:00
Piotr Sarna
2f173f7ed8 cql3: add handling paging state for local indexes
When computing paging state for local indexes, the partition
and clustering keys are different than with global ones:
 - partition key is the same as base's
 - clustering key starts with the indexed column
2019-03-20 10:20:02 +01:00
Piotr Sarna
75dd964751 cql3: add handling partition slices for local indexes
For local indexes, a slice will consist of the indexed column
followed by base clustering columns.
2019-03-20 10:20:01 +01:00
Piotr Sarna
b12162c8f5 cql3: add returning correct partition ranges for local indexes
Local indexes always share the partition range with their base.
2019-03-20 09:51:46 +01:00
Piotr Sarna
da8e8f18b3 cql3: make read_posting_list a member function
It already accepts several arguments that can be extracted from 'this',
and more will be added in the future.
New parameters include lambdas prepared during prepare stage
that define how to extract partition/clustering key ranges depending
on which index is used, so keeping it a static function will result
in unbounded number of parameters with complex types, which will
in turn make the function header almost illegible for a reader.
Hence, read_posting_list becomes a member function with easy access
to any data prepared during prepare stage.
2019-03-20 09:51:46 +01:00
Piotr Sarna
85017c5ad4 cql3: look for indexed column definition only once
There's no need to look for the column definition inside a loop.
2019-03-20 09:51:46 +01:00
Piotr Sarna
8002471c81 cql3: allow index target to keep multiple columns
Instead of having just one column definition, index target is now
a variant of either single column definition or a vector of them.
The vector is expected to be used when part of a target definition
is enclosed in parentheses:
 $ CREATE INDEX ON t((p),v);
or
 $ CREATE INDEX ON t((p1,p2), v);
etc.

This feature will allow providing (possibly composite) base partition key
to CREATE INDEX statement, which will result in creating a local index.
2019-03-20 09:51:46 +01:00
Piotr Sarna
a45022dbc7 docs: document index target serialization
Index target serialization format is extended for the purpose
of local indexing. Both new and old formats are described
in docs.
2019-03-20 09:51:46 +01:00