Commit Graph

24659 Commits

Author SHA1 Message Date
Kamil Braun
f0842ba34e sstable_set: new reader for TWCS single partition queries
This commit introduces a new implementation of `create_single_key_sstable_reader`
in `time_series_sstable_set` dedicated for TWCS-created sstables.

It uses the fact that such sstables are mostly disjoint with respect to
contained `position_in_partition`s in order to decrease the number of
sstable readers that are opened at the same time.

The implementation uses `clustering_order_reader_merger` under the hood.

The reader assumes that the schema does not have static columns and none
of the queried sstable contain partition tombstones; also, it assumes
that the sstables have the min/max clustering key metadata in order for
the implementation to be efficient. Thus, if we detect that some of
these assumptions aren't true, we fall back to the old implementation.
2020-12-18 16:33:27 +01:00
Kamil Braun
b41139a07f mutation_reader_test: test clustering_order_reader_merger with time_series_sstable_set 2020-12-18 16:33:27 +01:00
Kamil Braun
d0548aa77f sstable_set: introduce min_position_reader_queue
This is a queue of readers of sstables in a time_series_sstable_set,
returning the readers in order of the smallest position_in_partition
that the sstables have. It uses the min/max clustering key sstable
metadata.

The readers are opened lazily, at the moment of being returned.
2020-12-18 16:33:27 +01:00
Kamil Braun
52697022b0 sstable_set: introduce time_series_sstable_set
At this moment it is a slightly less efficient version of
bag_sstable_set, but in following commits we will use the new data
structures to gain advantage in single partition queries
for sstables created by TimeWindowCompactionStrategy.
2020-12-18 16:33:27 +01:00
Kamil Braun
2a160dd909 sstables: add min_position and max_position accessors
The methods return a lower-bound and an upper-bound for the
position-in-partitions appearing in a given sstable.
2020-12-18 16:33:27 +01:00
Kamil Braun
fe26da82ba sstable_set: make create_single_key_sstable_reader a virtual method
... of sstable_set_impl.

Soon we shall provide a specialized implementation in one of the
`sstable_set_impl` derived classes.

The existing implementation is used as the default one.
2020-12-18 12:31:16 +01:00
Kamil Braun
5e846b33b8 clustering_order_reader_merger: fix the 0 readers case
With 0 readers the merger would produce a `partition_end` fragment
when it should immediately return `end_of_stream` instead.
2020-12-18 12:30:40 +01:00
Gleb Natapov
37368726c9 migration_manager: remove unused announce() variant
Message-Id: <20201216153150.GG3244976@scylladb.com>
2020-12-16 18:14:07 +02:00
Konstantin Osipov
2c46938c2a commitlog: avoid a syscall in a most common case of segment recycle
When recycling a segment in O_DSYNC mode if the size of the segment
is neither shrunk nor grown, avoid calling file::truncate() or
file::allocate().

Message-Id: <20201215182332.1017339-2-kostja@scylladb.com>
2020-12-16 14:57:36 +02:00
Avi Kivity
fdb47c954d Merge "idl: allow IDL compiler to parse const specifiers for template arguments" from Pavel S
"
This patch series consists of the following patches:

1. The first one turned out to be a massive rewrite of almost
everything in `idl-compiler.py`. It aims to decouple parser
structures from the internal representation which is used
in the code-generation itself.

Prior to the patch everything was working with raw token lists and
the code was extremely fragile and hard to understand and modify.

Moreover, every change in the parser code caused a cascade effect
of breaking things at many different places, since they were relying
on the exact format of output produced by parsing rules.

Now there is a bunch of supplementary AST structures which provide
hierarchical and strongly typed structure as the output of parsing
routine.
It is much easier to verify (by the means of `isinstance`, for example)
and extend since the internal structures used in code-generation are
decoupled from the structure of parsing rules, which are now controlled
by custom parse actions providing high-level abstractions.

It is tested manually by checking that the old code produces exactly
the same autogenerated sources for all Scylla IDLs as the new one.

2 and 3. Cosmetics changes only: fixed a few typos and moved from
old-fashioned `string.Template` to python f-strings.

This improves readability of the idl-compiler code by a lot.

Only one non-functional whitespace change introduced.

4. This patch adds a very basic support for the parser to
understand `const` specifier in case it's used with a template
parameter for a data member in a class, e.g.

    struct my_struct {
        std::vector<const raft::log_entry> entries;
    };

It actually does two things:
* Adjusts `static_asserts` in corresponding serializer methods
  to match const-ness of fields.
* Defines a second serializer specialization for const type in
  `.dist.hh` right next to non-const one.

This seems to be sufficient for raft-related uses for now.
Please note there is no support for the following cases, though:

    const std::vector<raft::log_entry> entries;
    const raft::term_t term;

None of the existing IDLs are affected by the change, so that
we can gradually improve on the feature and write the idl
unit-tests to increase test coverage with time.

5. A basic unit-test that writes a test struct with an
`std::vector<S<const T>>` field and reads it back to verify
that serialization works correctly.

6. Basic documentation for AST classes.
TODO: should also update the docs in `docs/IDL.md`. But it is already
quite outdated, and some changes would even be out of scope for this
patch set.
"

* 'idl-compiler-refactor-v5' of https://github.com/ManManson/scylla:
  idl: add docstrings for AST classes
  idl: add unit-test for `const` specifiers feature
  idl: allow to parse `const` specifiers for template arguments
  idl: fix a few typos in idl-compiler
  idl: switch from `string.Template` to python f-strings and format string in idl-compiler
  idl: Decouple idl-compiler data structures from grammar structure
2020-12-16 14:05:33 +02:00
Gleb Natapov
61520a33d6 mutation_writer: pass exceptions through feed_writer
feed_writer() eats exception and transforms it into an end of stream
instead. Downstream validators hate when this happens.

Fixes #7482
Message-Id: <20201216090038.GB3244976@scylladb.com>
2020-12-16 13:18:19 +02:00
Pavel Solodovnikov
8b8dce15c3 idl: add docstrings for AST classes
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-16 09:03:39 +03:00
Botond Dénes
978ec7a4bb tools: introduce scylla-sstable-index
A tool which lists all partitions contained in an sstable index. As all
partitions in an sstable are indexed, this tool can be used to find out
what partitions are contained in a given sstable.

The printout has the following format:
$pos: $human_readable_value (pk{$raw_hex_value})

Where:
* $pos: the position of the partition in the (decompressed) data file
* $human_readable_value: the human readable partition key
* $raw_hex_value: the raw hexadecimal value of the binary representation
  of the partition key

For now the tool requires the types making up the partition key to be
specified on the command line, using the `--type|-t` command line
argument, using the Cassandra type class name notation for types.
As these are not assumed to be widely known, this patch includes a
document mapping all cql3 types to their Cassandra type class name
equivalent (but not just).

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201208092323.101349-1-bdenes@scylladb.com>
2020-12-15 18:46:47 +02:00
Calle Wilund
71c5dc82df database: Verify iff we actually are writing memtables to disk in truncate
Fixes #7732

When truncating with auto_snapshot on, we try to verify the low rp mark
from the CF against the sstables discarded by the truncation timestamp.
However, in a scenario like:

Fill memtables
Flush
Truncate with snapshot A
Fill memtables some more
Truncate
Move snapshot A to upload + refresh (load old tables)
Truncate

The last op will assert, because while we have sstables loaded, which
will be discarded now, we did not in fact generate any _new_ ones
(since memtables are empty), and the RP we get back from discard is
one from an earlier generation set.

(Any permutation of events that create the situation "empty memtable" +
"non-empty sstables with only old tables" will generate the same error).

Added a check that before flushing checks if we actually have any
data, and if not, does not uphold the RP relation assert.

Closes #7799
2020-12-15 16:24:36 +02:00
Avi Kivity
7636799b18 Merge 'Add waiting for flushes on table drops' from Piotr Sarna
This series makes sure that before the table is dropped, all pending memtable flushes related to its memtables would finish.
Normally, flushes are not problematic in Scylla, because all tables are by default `auto_snapshot=true`, which also implies that a table is flushed before being dropped. However, with `auto_snapshot=false` the flush is not attempted at all. It leads to the following race:
1. Run a node with `auto_snapshot=false`
2. Schedule a memtable flush  (e.g. via nodetool)
3. Get preempted in the middle of the flush
4. Drop the table
5. The flush that already started wakes up and starts operating on freed memory, which causes a segfault

Tests: manual(artificially preempting for a long time in bullet point 2. to ensure that the race occurs; segfaults were 100% reproducible before the series and do not happen anymore after the series is applied)

Fixes #7792

Closes #7798

* github.com:scylladb/scylla:
  database: add flushes to waiting for pending operations
  table: unify waiting for pending operations
  database: add a phaser for flush operations
  database: add waiting for pending streams on table drop
2020-12-15 16:02:47 +02:00
Pavel Solodovnikov
1e6df841a5 idl: add unit-test for const specifiers feature
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-15 16:03:18 +03:00
Pavel Solodovnikov
facf27dbe4 idl: allow to parse const specifiers for template arguments
This patch introduces very limited support for declaring `const`
template parameters in data members.

It's not covering all the cases, e.g.
`const type member_variable` and `const template_def<T1, T2, ...>`
syntax is not supported at the moment.

Though the changes are enough for raft-related use: this makes it
possible to declare `std::vector<raft::log_entries_ptr>` (aka
`std::vector<lw_shared_ptr<const raft::log_entry>>`) in the IDL.

Existing IDL files are not affected in any way.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-15 16:03:11 +03:00
Pavel Solodovnikov
f02703fcd7 idl: fix a few typos in idl-compiler
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-15 16:02:55 +03:00
Pavel Solodovnikov
28b602833f idl: switch from string.Template to python f-strings and format string in idl-compiler
Move to a modern and lightweight syntax of f-strings
introduced in python 3.6. It improves readability and provides
greater flexibility.

A few places are now using format strings instead, though.

In case when multiline substitution variable is used, the template
string should be first re-indented and only after that the
formatting should be applied, or we can end up with screwed
indentation the in generated sources.

This change introduces one invisible whitespace change
in `query.dist.impl.hh`, otherwise all generated code is exactly
the same.

Tests: build(dev) and diff genetated IDL sources by hand

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-15 16:01:17 +03:00
Pavel Solodovnikov
4ab1f7f55d idl: Decouple idl-compiler data structures from grammar structure
Instead of operating on the raw lists of tokens, transform them into
typed structures representation, which makes the code by many orders of
magnitude simpler to read, understand and extend.

This includes sweeping changes throughout the whole source code of the
tool, because almost every function was tightly coupled to the way
data was passed down from the parser right to the code generation
routines.

Tested manually by checking that old generated sources are precisely
the same as the new generated sources.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-15 15:59:17 +03:00
Piotr Sarna
b1208d0fcc database: add flushes to waiting for pending operations
In order to prevent races with table drops, the helper function
which waits for all pending operations to finish now also
waits for pending flushes.
2020-12-15 13:11:33 +01:00
Piotr Sarna
cd1e351dc1 table: unify waiting for pending operations
In order to reduce code duplication which already caused a bug,
waiting for pending operations is now unified with a single helper
function.
2020-12-15 13:11:25 +01:00
Piotr Sarna
df3204426d database: add a phaser for flush operations
Pending flushes can participate in races when a table
with auto_snapshot==false is dropped. The race is as follows:
1. A flush of table T is initiated
2. The flush operation is preempted
3. Table T is dropped without flushing, because it has auto_snapshot off
4. The flush operation from (2.) wakes up and continues
   working on table T, which is already dropped
5. Segfault/memory corruption

To prevent such races, a phaser for pending flushes is introduced
2020-12-15 12:59:36 +01:00
Piotr Sarna
57d63ca036 database: add waiting for pending streams on table drop
We already wait for pending reads and writes, so for completeness
we should also wait for all pending stream operations to finish
before dropping the table to avoid inconsistencies.
2020-12-15 12:55:45 +01:00
Takuya ASADA
ebc4076fa5 tools: toolchain: add node_exporter
Download node_exporter in frozen image to prepare adding node_exporter
to relocatable pacakge.

Related #2190

Closes #7765

[avi: updated toolchain, x86_64/aarch64/s390x]
2020-12-14 20:34:17 +02:00
Piotr Sarna
13317f7698 alternator: ensure correct isolation level in tracing tests
Taking advantage of the fact that isolation level can be defined
for a table with a tag, the tracing test that relies on CAS
can now be sure to have a correct isolation level.
Message-Id: <43f005ab9d566c7d3d55ce93c553127b1df9e87f.1607954739.git.sarna@scylladb.com>
2020-12-14 17:37:55 +02:00
Piotr Sarna
7081e361cc test: add isolation level requirement message to tracing tests
Alternator tracing tests require the cluster to have the 'always'
isolation level configured to work properly. If that's not the case,
the tests will fail due to not having CAS-related traces present
in the logs. In order to help the users fix their configuration,
a helper message is printed before the test case is performed.
Automatic tests do not need this, because they are all ran with
matching isolation level, but this message could greatly improve
the user experience for manual tests.
Message-Id: <62bcbf60e674f57a55c9573852b6a28f99cbf408.1607949754.git.sarna@scylladb.com>
2020-12-14 14:53:58 +02:00
Piotr Sarna
4b0303d8ae tests: make alternator tracing tests idempotent
The outcome of alternator tracing tests was that tracing probability
was always set to 0 after the test was finished. That makes sense
for most test runs, but manual tests can work on existing clusters
with tracing probability set to some other value. Due to preserve
previous trace probability, the value is now extracted and stored,
so that it can be restored after the test is done.
Message-Id: <94f829b63f92847b4abb3b16f228bf9870f90c2e.1607949754.git.sarna@scylladb.com>
2020-12-14 14:53:23 +02:00
Avi Kivity
19ff528ef3 Update seastar submodule
* seastar 2de43eb6bf...3b8903d406 (3):
  > coroutines: check preemption flag in co_await
  > memory: consider span freelist objects in small pool diagnostics
  > util: noncopyable_function: avoid gcc uninitialized error in move constructor
2020-12-14 12:50:32 +02:00
Pekka Enberg
8d00c16feb transport/server: Code cleanups
Fix up some coding style issues spotted while reading the code:

- Fix indentation to be 4 spaces

- Remove superfluous semicolons

Closes #7793
2020-12-14 12:48:05 +02:00
Konstantin Osipov
b6c6cc275f commitlog: align input of dma_write() during segment recycle
Normally a file size should be aligned around block size, since
we never write to it any unaligned size. However, we're not
protected against partial writes.

Just to be safe, align up the amount of bytes to zerofill
when recycling a segment.

Message-Id: <20201211142628.608269-4-kostja@scylladb.com>
2020-12-14 12:16:18 +02:00
Konstantin Osipov
ad6817bcde commitlog: fix typo in a comment
Message-Id: <20201211142628.608269-2-kostja@scylladb.com>
2020-12-14 12:16:14 +02:00
Benny Halevy
0e79e0f215 test: mutation_diff: extend section markers
When the different mutations are printed via
BOOST_REQUIRE_EQUAL, we don't get the "expect {} but got {}"
section markers.  Instead, the parts we're interested in
are bracketed like "critical check X == Y has failed [{} != {}]"

Test: with both formats:
- https://github.com/scylladb/scylla/files/3890627/test_concurrent_reads_and_eviction.log
- https://github.com/scylladb/scylla/files/4303117/flat_mutation_reader_test.118.log
- https://github.com/scylladb/scylla/files/5687372/flat_mutation_reader_test.172.log.gz

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201214100521.3814909-1-bhalevy@scylladb.com>
2020-12-14 12:11:34 +02:00
Nadav Har'El
72cb3e9255 alternator test: add missing wait for update_table to finish
Three tests in test_streams.py run update_table() on a table without
waiting for it to complete, and then call update_table() on the same
table or delete it. This always works in Scylla, and usually works in
AWS, but if we reach the second call, it may fail because the previous
update_table() did not take effect yet. We sometimes see these failures
when running the Alternator test suite against AWS.

So in this patch, after an each update_table() we wait for the table
to return from UPDATING to ACTIVE status.

The entire Alternator test suite now passes (or skipped) on AWS,
so: Fixes #7778.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201213164931.2767236-1-nyh@scylladb.com>
2020-12-14 09:18:38 +01:00
Nadav Har'El
43ce0aef3d alternator test: fix test wrongly failing on AWS
The test test_query_filter.py::test_query_filter_paging fails on AWS
and shouldn't fail, so this patch fixes the test. Note that this is
only a test problem - no fix is needed for Alternator itself.

The test reads 20 results with 1-result pages, and assumed that
21 pages are returned. The 21st page may happen because when the
server returns the 20th, it might not yet know there will be no
additional results, so another page is needed - and will be empty.
Still a different implementation might notice that the last page
completed the iteration, and not return an extra empty page. This is
perfectly fine, and this is what AWS DynamoDB does today - and should
not be considered an error.

Refs #7778

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201213143612.2761943-1-nyh@scylladb.com>
2020-12-14 09:18:31 +01:00
Nadav Har'El
4ab98a4c68 alternator: use a more specific error when Authorization header is missing
When request signature checking is enabled in Alternator, each request
should come with the appropriate Authorization header. Most errors in
this preparing this header will result in an InvalidSignatureException
response; But DynamoDB returns a more specific error when this header is
completely missing: MissingAuthenticationTokenException. We should do the
same, but before this patch we return InvalidSignatureException also for
a missing header.

The test test_authorization.py::test_no_authorization_header used to
enshrine our wrong error message, and failed when run against AWS.
After this patch, we fix the error message and the test - which now
passes against both Alternator and AWS.

Refs #7778.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201213133825.2759357-1-nyh@scylladb.com>
2020-12-14 09:18:24 +01:00
Avi Kivity
39afe14ad4 Merge 'Add per query timeout' from Piotr Sarna
This series allows setting per-query timeout via CQL. It's possible via the existing `USING` clause, which is extended to be available for `SELECT` statement as well. This parameter accepts a duration and can also be provided as a marker.
The parameter acts as a regular part of the `USING` clause, which means that it can be used along with `USING TIMESTAMP` and `USING TTL` without issues.
The series comes with a pytest test suite.

Examples:
```cql
        SELECT * FROM t USING TIMEOUT 200ms;
```
```cql
        INSERT INTO t(a,b,c) VALUES (1,2,3) USING TIMESTAMP 42 AND TIMEOUT 50ms;
```

Working with prepared statements works as usual - the timeout parameter can be
explicitly defined or provided as a marker:

```cql
        SELECT * FROM t USING TIMEOUT ?;
```
```cql
        INSERT INTO t(a,b,c) VALUES (?,?,?) USING TIMESTAMP 42 AND TIMEOUT 50ms;
```

Tests: unit(dev)
Fixes #7777

Closes #7781

* github.com:scylladb/scylla:
  test: add prepared statement tests to USING TIMEOUT suite
  docs: add an entry about USING TIMEOUT
  test: add a test suite for USING TIMEOUT
  storage_proxy: start propagating local timeouts as timeouts
  cql3: allow USING clause for SELECT statement
  cql3: add TIMEOUT attribute to the parser
  cql3: add per-query timeout to select statement
  cql3: add per-query timeout to batch statement
  cql3: add per-query timeout to modification statement
  cql3: add timeout to cql attributes
2020-12-14 09:46:46 +02:00
Piotr Sarna
d6e7e36280 test: add prepared statement tests to USING TIMEOUT suite 2020-12-14 07:50:40 +01:00
Piotr Sarna
da77ab832b docs: add an entry about USING TIMEOUT
The paragraph describes how USING TIMEOUT clause can be used
along with some simple examples.
2020-12-14 07:50:40 +01:00
Piotr Sarna
0148b41a02 test: add a test suite for USING TIMEOUT
The test suite is based on cql-pytest and checks if USING TIMEOUT
works as expected.
2020-12-14 07:50:40 +01:00
Piotr Sarna
27fba35832 storage_proxy: start propagating local timeouts as timeouts
A local timeout was previously propagated to the client as WriteFailure,
while there exists a more concrete error type for that: WriteTimeout.
2020-12-14 07:50:40 +01:00
Piotr Sarna
ddd9cb1b2a cql3: allow USING clause for SELECT statement
In order to be able to specify a timeout for SELECT statements,
it's now possible to use the USING clause with it.
2020-12-14 07:50:40 +01:00
Piotr Sarna
d3896a209b cql3: add TIMEOUT attribute to the parser
It's now possible to specify TIMEOUT as part of the USING clause.
2020-12-14 07:50:40 +01:00
Piotr Sarna
157be33b89 cql3: add per-query timeout to select statement
First of all, select statement is extended with an 'attrs' field,
which keeps the per-query attributes. Currently, only TIMEOUT
parameter is legal to use, since TIMESTAMP and TTL bear no meaning
for reads.

Secondly, if TIMEOUT attribute is set, it will be used as the effective
timeout for a particular query.
2020-12-14 07:50:40 +01:00
Piotr Sarna
20dedd0df7 cql3: add per-query timeout to batch statement
If TIMEOUT attribute is set, it will be used as the effective
timeout for a particular query.
2020-12-14 07:50:40 +01:00
Piotr Sarna
3c49b6bd88 cql3: add per-query timeout to modification statement
If TIMEOUT attribute is set, it will be used as the effective
timeout for a particular query.
2020-12-14 07:50:40 +01:00
Piotr Sarna
5bbd0b049b cql3: add timeout to cql attributes
This attribute will be used later to specify per-query timeout.
2020-12-14 07:50:40 +01:00
Benny Halevy
c60da2e90d cdc: remove _token_metadata from db_context
1. It's unused since cbe510d1b8
2. It's unsafe to keep a reference to token_metadata&
potentially across yield points.

The higher-level motivation is to make
storage_service::get_token_metadata() private so we
can control better how it's used.

For cdc, if the token_metadata is going to be needed
to the future, it'd be better get it from
db_context::_proxy.get_token_metadata_ptr().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201213162351.52224-2-bhalevy@scylladb.com>
2020-12-13 18:32:17 +02:00
Avi Kivity
0f967f911d Merge "storage_service: get_token_metadata_ptr to hold on to token_metadata" from Benny
"
This series fixes use-after-free via token_metadata&

We may currently get a token_metadata& via get_token_metadata() and
use it across yield points in a couple of sites:
- do_decommission_removenode_with_repair
- get_new_source_ranges

To fix that, get_token_metadata_ptr and hold on to it
across yielding.

Fixes #7790

Dtest: update_cluster_layout_tests:TestUpdateClusterLayout.simple_removenode_2_test(debug)
Test: unit(dev)
"

* tag 'storage_service-token_metadata_ptr-v2' of github.com:bhalevy/scylla:
  storage_service: get_new_source_ranges: don't hold token_metadata& across yield point
  storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range
  storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner
  storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield
2020-12-13 17:37:24 +02:00
Aleksandr Bykov
e74dc311e7 dist: scylla_util: fix aws_instance.ebs_disks method
aws_instance.ebs_disks() method should return ebs disk
instead of ephemeral

Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com>

Closes #7780
2020-12-13 17:33:37 +02:00