Commit Graph

2726 Commits

Author SHA1 Message Date
Nadav Har'El
a1635b553e cql-pytest: fix detection of "raft" experimental feature
In a previous patch we fixed the output of experimental features list
(issue #10047), so we also need to fix the test code which detects the
"raft" experimental feature - to use the string "raft" and not the
silly byte 4 we had there before.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209104331.312999-1-nyh@scylladb.com>
2022-02-10 09:10:24 +03:00
Nadav Har'El
de586ef856 test/cql-pytest: mechanism for tests requiring raft-based schema updates
Issue #8968 no longer exists when Raft-based schema updates are enabled
in Scylla (with --experimental-features=raft). Before we can close this
issue we need a way to re-run its test

        test_keyspace.py::test_concurrent_create_and_drop_keyspace

with Raft and see it pass. But we also want the tests to continue to run
by default the older raft-less schema updates - so that this mode doesn't
regress during the potentially-long duration that it's still the default!

The solution in this patch is:

1. Introduce a "--raft" option to test/cql-pytest/run, which runs the tests
   against a Scylla with the raft experimental feature, while the default is
   still to run without it.

2. Introduce a text fixture "fails_without_raft" which marks a test which
   is expected to fail with the old pre-raft code, but is expected to
   pass in the new code.

3. Mark the test test_concurrent_create_and_drop_keyspace with this new
   "fails_without_raft".

After this patch, running

        test/cql-pytest/run --raft
            test_keyspace.py::test_concurrent_create_and_drop_keyspace

Passes, which shows that issue 8968 was fixed (in Raft mode) - so we can say:
Fixes #8968

Running the same test without "--raft" still xfails (an expected failure).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220208162732.260888-1-nyh@scylladb.com>
2022-02-10 09:10:24 +03:00
Nadav Har'El
fef7934a2d config: fix some types in system.config virtual table
The system.config virtual tables prints each configuration variable of
type T based on the JSON printer specified in the config_type_for<T>
in db/config.cc.

For two variable types - experimental_features and tri_mode_restriction,
the specified converter was wrong: We used value_to_json<string> or
value_to_json<vector<string>> on something which was *not* a string.
Unfortunately, value_to_json silently casted the given objects into
strings, and the result was garbage: For example as noted in #10047,
for experimental_features instead of printing a list of features *names*,
e.g., "raft", we got a bizarre list of one-byte strings with each feature's
number (which isn't documented or even guaranteed to not change) as well
as carriage-return characters (!?).

So solution is a new printable_to_json<T> which works on a type T that
can be printed with operator<< - as in fact the above two types can -
and the type is converted into a string or vector of strings using this
operator<<, not a cast.

Also added a cql-pytest test for reading system.config and in particular
options of the above two types - checking that they contain sensible
strings and not "garbage" like before this patch.

Fixes #10047.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209090421.298849-1-nyh@scylladb.com>
2022-02-10 09:10:24 +03:00
Avi Kivity
5099b1e272 Merge 'Propagate coordinator timeouts for regular writes and batches without throwing' from Piotr Dulikowski
Currently, most of the failures that occur during CQL reads or writes are reported using C++ exceptions. Although the seastar framework avoids most of the cost of unwinding by keeping exceptions in futures as `std::exception_ptr`s, the exceptions need to be inspected at various points for the purposes of accounting metrics or converting them to a CQL error response. Analyzing the value and type of an exception held by `std::exception_ptr`'s cannot be done without rethrowing the exception, and that can be very costly even if the exception is immediately caught. Because of that, exceptions are not a good fit for reporting failures which happen frequently during overload, especially if the CPU is the bottleneck.

This PR introduces facilities for reporting exceptions as values using the boost::outcome library. As a first step, the need to use exceptions for reporting timeouts was eliminated for regular and batch writes, and no exceptions are thrown between creation of a `mutation_write_timeout_exception` and its serialization as a CQL response in the `cql_server`.

The types and helpers introduced here can be reused in order to migrate more exceptions and exception paths in a similar fashion.

Results of `perf_simple_query --smp 1 --operations-per-shard 1000000`:

    Master (00a9326ae7)
    128789.53 tps ( 82.2 allocs/op,  12.2 tasks/op,   49245 insns/op)

    This PR
    127072.93 tps ( 82.2 allocs/op,  12.2 tasks/op,   49356 insns/op)

The new version seems to be slower by about 100 insns/op, fortunately not by much (about 0.2%).

Tests: unit(dev), unit(result_utils_test, debug)

Closes #10014

* github.com:scylladb/scylla:
  cql_test_env: optimize handling result_message::exception
  transport/server: handle exceptions from coordinator_result without throwing
  transport/server: propagate coordinator_result to the error handling code
  transport/server: unwrap the exception result_message in process_xyz_internal
  query_processor: add exception-returning variants of execute_ methods
  modification_statement: propagate failed result through result_message::exception
  batch_statement: propagate failed result through result_message::exception
  cql_statement: add `execute_without_checking_exception_message`
  result_message: add result_message::exception
  storage_proxy: change mutate_with_triggers to return future<result<>>
  storage_proxy: add mutate_atomically_result
  storage_proxy: return result<> from mutate_result
  storage_proxy: return result<> from mutate_internal
  storage_proxy: properly propagate future from mutate_begin to mutate_end
  storage_proxy: handle exceptions as values in mutate_end
  storage_proxy: let mutate_end take a future<result<>>
  storage_proxy: resultify mutate_begin
  storage_proxy: use result in the _ready future of write handlers
  storage_proxy: introduce helpers for dealing with results
  exceptions: add coordinator_exception_container and coordinator_result
  utils: add result utils
  utils: add exception_container
2022-02-08 14:27:09 +02:00
Piotr Dulikowski
ffd439d908 cql_test_env: optimize handling result_message::exception
The single_node_cql_env uses query_processor::execute_xyz family of
methods to perform operations. Due to previous commits in this series,
they allocate one more task than before - a continuation that converts
result_message::exception into an exceptional future. We can recover
that one task by using variants of those methods which do not perform a
conversion, and turn .finally() invocations into .then()s which perform
conversion manually.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
e4ff22b4ca result_message: add result_message::exception
In order to propagate exceptions as values through the CQL layer with
minimal modifications to the interfaces, a new result_message type is
introduced: result_message::exception. Similarly to
result_message::bounce_to_shard, this is an internal type which is
supposed to be handled before being returned to the client.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
11cb670881 utils: add result utils
Adds a number of utilities for working with boost::outcome::result
combined with exception_container. The utilities are meant to help with
migration of the existing code to use the boost::outcome::result:

- `exception_container_throw_policy` - a NoValuePolicy meant to be used
  as a template parameter for the boost::outcome::result. It protects
  the caller of `result::value()` and `result::error()` methods - if the
  caller wishes to get a value but the result has an error
  (exception_container in our case), the exception in the container will
  be thrown instead. In case it's the other way around,
  boost::outcome::bad_result_access is thrown.
- `result_parallel_for_each` - a version of `parallel_for_each` which is
  aware of results and returns a failed result in case any of the
  parallel invocations return a failed result.
- `result_into_future` - converts a result into a future. If the result
  holds a value, converts it into make_ready_future; if it holds an
  exception, the exception is returned as make_exception_future.
- `then_ok_result` takes a `future<T>` and converts it into
  a `future<result<T>>`.
- `result_wrap` adapts a callable of type `T -> future<result<T>>` and
  returns a callable of type `result<T> -> future<result<T>>`.
2022-02-08 11:08:42 +01:00
Nadav Har'El
9982a28007 alternator: allow REMOVE of non-existent nested attribute
DynamoDB allows an UpdateItem operation "REMOVE x.y" when a map x
exists in the item, but x.y doesn't - the removal silently does
nothing. Alternator incorrectly generated an error in this case,
and unfortunately we didn't have a test for this case.

So in this patch we add the missing test (which fails on Alternator
before this patch - and passes on DynamoDB) and then fix the behavior.
After this patch, "REMOVE x.y" will remain an error if "x" doesn't
exist (saying "document paths not valid for this item"), but if "x"
exists and is a map, but "x.y" doesn't, the removal will silently
do nothing and will not be an error.

Fixes #10043.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220207133652.181994-1-nyh@scylladb.com>
2022-02-07 18:40:48 +02:00
Nadav Har'El
203291f7ba cql: reject a map literal with the same key twice
The CQL parser currently accepts a command like:

    ALTER KEYSPACE ksname WITH replication = {
        'class' : 'NetworkTopologyStrategy',
        'dc1' : 2,
        'dc1' : 3 }

But because these options are read into an std::map, one of the
definitions of 'dc1' is silently ignored (counter-intuitively, it is
the first setting which is kept, and the second setting is ignored.)
But this is most likely a user's typo, so a better choice is to report
this as a parse error instead of arbitrarly and silently keeping just
one of the settings.

This is what Cassandra does since version 3.11 (see
https://issues.apache.org/jira/browse/CASSANDRA-13369 and Cassandra
commit 1a83efe2047d0138725d5e102cc40774f3b14641), and this is what we do
in this patch.

The unit test cassandra_tests/validation/operations/alter_test.py::
testAlterKeyspaceWithMultipleInstancesOfSameDCThrowsSyntaxException,
translated from Cassandra's unit tests, now passes.

Fixes #10037.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220207113709.78613-1-nyh@scylladb.com>
2022-02-07 18:40:48 +02:00
Nadav Har'El
cc57ac8c1c cql3: add a cql3::util::quote() function
The function cql3::util::maybe_quote() is used throughout Scylla to
convert identifier names (column names, table names, etc.) into strings
that can be embedded in CQL commands. maybe_quote() sometimes needs to
quote these identifier names, but when the identifier name is lowercase,
and not a CQL keyword, it is not quoted.

Not quoting identifier names when not needed is nice and pretty, but has
a forward-compatibility problem: If some CQL command with an unquoted
identifier is saved somewhere, and new version of Scylla adss this
identifier as a new reserved keyword - the CQL command will break.

So this patch introduces a new function, cql3::util::quote(), which
unconditionally quotes the given identifier.

The new function is not yet used in Scylla, but we add a unit test
(based on the test of maybe_quote()) to confirm it behaves correctly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220118161217.231811-2-nyh@scylladb.com>
2022-02-07 11:33:57 +02:00
Nadav Har'El
5d2f694a90 cql3: fix cql3::util::maybe_quote() for keywords
cql3::util::maybe_quote() is a utility function formatting an identifier
name (table name, column name, etc.) that needs to be embedded in a CQL
statement - and might require quoting if it contains non-alphanumeric
characters, uppercase characters, or a CQL keyword.

maybe_quote() made an effort to only quote the identifier name if neccessary,
e.g., a lowercase name usually does not need quoting. But lowercase names
that are CQL keywords - e.g., to or where - cannot be used as identifiers
without quoting. This can cause problems for code that wants to generate
CQL statements, such as the materialized-view problem in issue #9450 - where
a user had a column called "to" and wanted to create a materialized view
for it.

So in this patch we fix maybe_quote() to recognize invalid identifiers by
using the CQL parser, and quote them. This will quote reserved keywords,
but not so-called unreserved keywords, which *are* allowed as identifiers
and don't need quoting. This addition slows down maybe_quote(), but
maybe_quote() is anyway only used in heavy operations which need to
generate CQL.

This patch also adds two tests that reproduce the bug and verify its
fix:

1. Add to the low-level maybe_quote() test (a C++ unit test) also tests
   that maybe_quote() quotes reserved keywords like "to", but doesn't
   quote unreserved keywords like "int".

2. Add a test reproducing issue #9450 - creating a materialized view
   whose key column is a keyword. This new test passes on Cassandra,
   failed on Scylla before this patch, and passes after this patch.

It is worth noting that maybe_quote() now has a "forward compatiblity"
problem: If we save CQL statements generated by maybe_quote(), and a
future version introduces a new reserved keyword, the parser of the
future version may not be able to parse the saved CQL statement that
was generated with the old mayb_quote() and didn't quote what is now
a keyword. This problem can be solved in two ways:

1. Try hard not to introduced new reserved keywords. Instead, introduce
   unreserved keywords. We've been doing this even before recognizing
   this maybe_quote() future-compatibility problem.

2. In the next patch we will introduce quote() - which unconditionally
   quotes identifier names, even if lowercase. These quoted names will
   be uglier for lowercase names - but will be safe from future
   introduction of new keywords. So we can consider switching some or
   all uses of maybe_quote() to quote().

Fixes #9450

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220118161217.231811-1-nyh@scylladb.com>
2022-02-07 11:33:56 +02:00
Nadav Har'El
b3cfd4ce07 cql-pytest: translate Cassandra's tests for ALTER operations
This is a translation of Cassandra's CQL unit test source file
validation/operations/AlterTest.java into our our cql-pytest framework.

This test file includes 24 tests for various types of ALTER operations
(of keyspaces, tables and types). Two additional tests which required
multiple data centers to test were dropped with a comment explaining why.

All 24 tests pass on Cassandra, with 8 failing on Scylla reproducing
one already known Scylla issue and 5 previously-unknown ones:

  Refs #8948:  Cassandra 3.11.10 uses "class" instead of
               "sstable_compression" for compression settings by default
  Refs #9929:  Cassandra added "USING TIMESTAMP" to "ALTER TABLE",
               we didn't.
  Refs #9930:  Forbid re-adding static columns as regular and vice versa
  Refs #9935:  Scylla stores un-expanded compaction class name in system
               tables.
  Refs #10036: Reject empty options while altering a keyspace
  Refs #10037: If there are multiple values for a key, CQL silently
               chooses last value

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220206163820.1875410-2-nyh@scylladb.com>
2022-02-07 10:57:43 +02:00
Nadav Har'El
b61876f4ff test/cql-pytest: implement nodetool.compact()
Implement the nodetool.compact() function, requesting a major compaction
of the given table. As usual for the nodetool.* functions, this is
implemented with the REST API if available (i.e., testing Scylla), or
with the external "nodetool" command if not (for testing Cassandra).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220206163820.1875410-1-nyh@scylladb.com>
2022-02-07 10:57:42 +02:00
Konstantin Osipov
caeaba60f9 cql_repl: use POSIX primitives to reset input/output
Seastar uses POSIX IO for output in addition to C++ iostreams,
e.g. in print_safe(), where it write()s directly to stdout.

Instead of manipulating C++ output streams to reset
stdout/log files, reopen the underlying file descriptors
to output/log files.

Fixes #9962 "cql_repl prints junk into the log"
Message-Id: <20220204205032.1313150-1-kostja@scylladb.com>
2022-02-07 10:53:20 +02:00
Piotr Dulikowski
80f6224959 utils: add exception_container
Adds `exception_container` - a helper type used to hold exceptions as a
value, without involving the std::exception_ptr.

The motivation behind this type is that it allows inspecting exception's
type and value without having to rethrow that exception and catch it,
unlike std::exception_ptr. In our current codebase, some exception
handling paths need to rethrow the exception multiple times in order to
account it into metrics or encode it as an error response to the CQL
client. Some types of exceptions can be thrown very frequently in case
of overload (e.g. timeouts) and inspecting those exceptions with
rethrows can make the overload even worse. For those kinds of exceptions
it is important to handle them as cheaply as possible, and
exception_container used with conjunction with boost::outcome::result
can help achieve that.
2022-02-04 20:18:00 +01:00
Avi Kivity
fe65122ccd Merge 'Distribute select count(*) queries' from Michał Sala
This pull request speeds up execution of `count(*)` queries. It does so by splitting given query into sub-queries and distributing them across some group of nodes for parallel execution.

New level of coordination was added. Node called super-coordinator splits aggregation query into sub-queries and distributes them across some group of coordinators. Super-coordinator is also responsible for merging results.

To develop a mechanism for speeding up `count(*)` queries, there was a need to detect which queries have a `count(*)` selector. Due to this pull request being a proof of concept, detection was realized rather poorly. It is only allows catching the simplest cases of `count(*)` queries (with only one selector and no column name specified).

After detecting that a query is a `count(*)` it should be split into sub-queries and sent to another coordinators. Splitting part wasn't that difficult, it has been achieved by limiting original query's partition ranges. Sending modified query to another node was much harder. The easiest scenario would be to send whole `cql3::statements::select_statement`. Unfortunately `cql3::statements::select_statement` can't be [de]serialized, so sending it was out of the question. Even more unfortunately, some non-[de]serializable members of `cql3::statements::select_statement` are required to start the execution process of this statement. Finally, I have decided to send a `query::read_command` paired with required [de]serializable members. Objects, that cannot be [de]serialized (such as query's selector) are mocked on the receiving end.

When a super-coordinator receives a `count(*)` query, it splits it into sub-queries. It does so, by splitting original query's partition ranges into list of vnodes, grouping them by their owner and creating sub-queries with partition ranges set to successive results of such grouping. After creation, each sub-query is sent to the owner of its partition ranges. Owner dispatches received sub-query to all of its shards. Shards slice partition ranges of the received sub-query, so that they will only query data that is owned by them. Each shard becomes a coordinator and executes so prepared sub-query.

3 node cluster set up on powerful desktops located in the office (3x32 cores)
Filled the cluster with ~2 * 10^8 rows using scylla-bench and run:
```
time cqlsh <ip> <port> --request-timeout=3600 -e "select count(*) from scylla_bench.test using timeout 1h;"
```

* master: 68s
* this branch: 2s

3 node cluster (each node had 2 shards, `murmur3_ignore_msb_bits` was set to 1, `num_tokens` was set to 3)

```
>  cqlsh -e 'tracing on; select count(*) from ks.t;
Now Tracing is enabled

 count
-------
  1000

(1 rows)

Tracing session: e5852020-7fc3-11ec-8600-4c4c210dd657

 activity                                                                                                                                    | timestamp                  | source    | source_elapsed | client
---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                          Execute CQL3 query | 2022-01-27 22:53:08.770000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                               Parsing a statement [shard 1] | 2022-01-27 22:53:08.770451 | 127.0.0.1 |             -- | 127.0.0.1
                                                                                                            Processing a statement [shard 1] | 2022-01-27 22:53:08.770487 | 127.0.0.1 |             36 | 127.0.0.1
                                                                                        Dispatching forward_request to 3 endpoints [shard 1] | 2022-01-27 22:53:08.770509 | 127.0.0.1 |             58 | 127.0.0.1
                                                                                            Sending forward_request to 127.0.0.1:0 [shard 1] | 2022-01-27 22:53:08.770516 | 127.0.0.1 |             64 | 127.0.0.1
                                                                                                         Executing forward_request [shard 1] | 2022-01-27 22:53:08.770519 | 127.0.0.1 |             -- | 127.0.0.1
                                                                                                       read_data: querying locally [shard 1] | 2022-01-27 22:53:08.770528 | 127.0.0.1 |              9 | 127.0.0.1
                                             Start querying token range ({-4242912715832118944, end}, {-4075408479358018994, end}] [shard 1] | 2022-01-27 22:53:08.770531 | 127.0.0.1 |             12 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 1 [shard 1] | 2022-01-27 22:53:08.770537 | 127.0.0.1 |             18 | 127.0.0.1
                      Scanning cache for range ({-4242912715832118944, end}, {-4075408479358018994, end}] and slice {(-inf, +inf)} [shard 1] | 2022-01-27 22:53:08.770541 | 127.0.0.1 |             22 | 127.0.0.1
    Page stats: 12 partition(s), 0 static row(s) (0 live, 0 dead), 12 clustering row(s) (12 live, 0 dead) and 0 range tombstone(s) [shard 1] | 2022-01-27 22:53:08.770589 | 127.0.0.1 |             70 | 127.0.0.1
                                                                                            Sending forward_request to 127.0.0.2:0 [shard 1] | 2022-01-27 22:53:08.770600 | 127.0.0.1 |            149 | 127.0.0.1
                                                                                            Sending forward_request to 127.0.0.3:0 [shard 1] | 2022-01-27 22:53:08.770608 | 127.0.0.1 |            157 | 127.0.0.1
                                                                                                         Executing forward_request [shard 0] | 2022-01-27 22:53:08.770627 | 127.0.0.1 |             -- | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.770639 | 127.0.0.1 |             11 | 127.0.0.1
                                               Start querying token range ({2507462623645193091, end}, {3897266736829642805, end}] [shard 0] | 2022-01-27 22:53:08.770643 | 127.0.0.1 |             15 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.770646 | 127.0.0.1 |             19 | 127.0.0.1
                        Scanning cache for range ({2507462623645193091, end}, {3897266736829642805, end}] and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.770649 | 127.0.0.1 |             22 | 127.0.0.1
                                                                                                         Executing forward_request [shard 1] | 2022-01-27 22:53:08.770658 | 127.0.0.2 |             -- | 127.0.0.1
                                                                                                         Executing forward_request [shard 1] | 2022-01-27 22:53:08.770674 | 127.0.0.3 |              5 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 1] | 2022-01-27 22:53:08.770698 | 127.0.0.2 |             40 | 127.0.0.1
                                             Start querying token range [{4611686018427387904, start}, {5592106830937975806, end}] [shard 1] | 2022-01-27 22:53:08.770704 | 127.0.0.2 |             46 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 1 [shard 1] | 2022-01-27 22:53:08.770710 | 127.0.0.2 |             52 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 1] | 2022-01-27 22:53:08.770712 | 127.0.0.3 |             43 | 127.0.0.1
                      Scanning cache for range [{4611686018427387904, start}, {5592106830937975806, end}] and slice {(-inf, +inf)} [shard 1] | 2022-01-27 22:53:08.770714 | 127.0.0.2 |             56 | 127.0.0.1
                                           Start querying token range [{-4611686018427387904, start}, {-4242912715832118944, end}] [shard 1] | 2022-01-27 22:53:08.770718 | 127.0.0.3 |             49 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 1 [shard 1] | 2022-01-27 22:53:08.770739 | 127.0.0.3 |             70 | 127.0.0.1
                    Scanning cache for range [{-4611686018427387904, start}, {-4242912715832118944, end}] and slice {(-inf, +inf)} [shard 1] | 2022-01-27 22:53:08.770743 | 127.0.0.3 |             73 | 127.0.0.1
    Page stats: 17 partition(s), 0 static row(s) (0 live, 0 dead), 17 clustering row(s) (17 live, 0 dead) and 0 range tombstone(s) [shard 1] | 2022-01-27 22:53:08.770814 | 127.0.0.3 |            145 | 127.0.0.1
                                                                                                         Executing forward_request [shard 0] | 2022-01-27 22:53:08.770846 | 127.0.0.3 |             -- | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.770862 | 127.0.0.3 |             16 | 127.0.0.1
    Page stats: 71 partition(s), 0 static row(s) (0 live, 0 dead), 71 clustering row(s) (71 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.770865 | 127.0.0.1 |            238 | 127.0.0.1
                                             Start querying token range ({-6683686776653114062, end}, {-6473446911791631266, end}] [shard 0] | 2022-01-27 22:53:08.770867 | 127.0.0.3 |             21 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.770874 | 127.0.0.3 |             28 | 127.0.0.1
                      Scanning cache for range ({-6683686776653114062, end}, {-6473446911791631266, end}] and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.770879 | 127.0.0.3 |             33 | 127.0.0.1
    Page stats: 48 partition(s), 0 static row(s) (0 live, 0 dead), 48 clustering row(s) (48 live, 0 dead) and 0 range tombstone(s) [shard 1] | 2022-01-27 22:53:08.770880 | 127.0.0.2 |            222 | 127.0.0.1
                                                                                                                  Querying is done [shard 1] | 2022-01-27 22:53:08.770888 | 127.0.0.1 |            369 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 1] | 2022-01-27 22:53:08.770909 | 127.0.0.1 |            390 | 127.0.0.1
                                             Start querying token range ({-4075408479358018994, end}, {-3391415989210253693, end}] [shard 1] | 2022-01-27 22:53:08.770911 | 127.0.0.1 |            392 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 1 [shard 1] | 2022-01-27 22:53:08.770914 | 127.0.0.1 |            395 | 127.0.0.1
                      Scanning cache for range ({-4075408479358018994, end}, {-3391415989210253693, end}] and slice {(-inf, +inf)} [shard 1] | 2022-01-27 22:53:08.770936 | 127.0.0.1 |            418 | 127.0.0.1
                                                                                                         Executing forward_request [shard 0] | 2022-01-27 22:53:08.770951 | 127.0.0.2 |             -- | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.770966 | 127.0.0.2 |             15 | 127.0.0.1
    Page stats: 12 partition(s), 0 static row(s) (0 live, 0 dead), 12 clustering row(s) (12 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.770969 | 127.0.0.3 |            123 | 127.0.0.1
                                                                    Start querying token range (-inf, {-6683686776653114062, end}] [shard 0] | 2022-01-27 22:53:08.770969 | 127.0.0.2 |             18 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.770974 | 127.0.0.2 |             23 | 127.0.0.1
                                             Scanning cache for range (-inf, {-6683686776653114062, end}] and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.770977 | 127.0.0.2 |             26 | 127.0.0.1
                                                                                                                  Querying is done [shard 1] | 2022-01-27 22:53:08.770993 | 127.0.0.3 |            324 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 1] | 2022-01-27 22:53:08.770998 | 127.0.0.3 |            329 | 127.0.0.1
                                                              Start querying token range ({-3391415989210253693, end}, {0, start}) [shard 1] | 2022-01-27 22:53:08.771001 | 127.0.0.3 |            332 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 1 [shard 1] | 2022-01-27 22:53:08.771004 | 127.0.0.3 |            335 | 127.0.0.1
                                       Scanning cache for range ({-3391415989210253693, end}, {0, start}) and slice {(-inf, +inf)} [shard 1] | 2022-01-27 22:53:08.771007 | 127.0.0.3 |            338 | 127.0.0.1
    Page stats: 48 partition(s), 0 static row(s) (0 live, 0 dead), 48 clustering row(s) (48 live, 0 dead) and 0 range tombstone(s) [shard 1] | 2022-01-27 22:53:08.771044 | 127.0.0.1 |            525 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.771069 | 127.0.0.1 |            442 | 127.0.0.1
                                                                                                 On shard execution result is [71] [shard 0] | 2022-01-27 22:53:08.771145 | 127.0.0.1 |            518 | 127.0.0.1
                                                                                                                  Querying is done [shard 1] | 2022-01-27 22:53:08.771308 | 127.0.0.1 |            789 | 127.0.0.1
                                                                                                 On shard execution result is [60] [shard 1] | 2022-01-27 22:53:08.771351 | 127.0.0.1 |            832 | 127.0.0.1
 Page stats: 127 partition(s), 0 static row(s) (0 live, 0 dead), 127 clustering row(s) (127 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.771379 | 127.0.0.2 |            427 | 127.0.0.1
 Page stats: 183 partition(s), 0 static row(s) (0 live, 0 dead), 183 clustering row(s) (183 live, 0 dead) and 0 range tombstone(s) [shard 1] | 2022-01-27 22:53:08.771385 | 127.0.0.3 |            716 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.771402 | 127.0.0.3 |            556 | 127.0.0.1
                                                                                                                  Querying is done [shard 1] | 2022-01-27 22:53:08.771403 | 127.0.0.2 |            745 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 1] | 2022-01-27 22:53:08.771408 | 127.0.0.2 |            750 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.771409 | 127.0.0.3 |            563 | 127.0.0.1
                                                                     Start querying token range ({5592106830937975806, end}, +inf) [shard 1] | 2022-01-27 22:53:08.771411 | 127.0.0.2 |            754 | 127.0.0.1
                                           Start querying token range ({-6272011798787969456, end}, {-4611686018427387904, start}) [shard 0] | 2022-01-27 22:53:08.771412 | 127.0.0.3 |            566 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.771415 | 127.0.0.3 |            569 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 1 [shard 1] | 2022-01-27 22:53:08.771415 | 127.0.0.2 |            757 | 127.0.0.1
                                              Scanning cache for range ({5592106830937975806, end}, +inf) and slice {(-inf, +inf)} [shard 1] | 2022-01-27 22:53:08.771419 | 127.0.0.2 |            761 | 127.0.0.1
                    Scanning cache for range ({-6272011798787969456, end}, {-4611686018427387904, start}) and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.771419 | 127.0.0.3 |            573 | 127.0.0.1
                                                                                    Received forward_result=[131] from 127.0.0.1:0 [shard 1] | 2022-01-27 22:53:08.771454 | 127.0.0.1 |           1003 | 127.0.0.1
    Page stats: 74 partition(s), 0 static row(s) (0 live, 0 dead), 74 clustering row(s) (74 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.771764 | 127.0.0.3 |            918 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.771768 | 127.0.0.3 |            922 | 127.0.0.1
                                                               Start querying token range [{0, start}, {2507462623645193091, end}] [shard 0] | 2022-01-27 22:53:08.771771 | 127.0.0.3 |            925 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.771775 | 127.0.0.3 |            929 | 127.0.0.1
                                        Scanning cache for range [{0, start}, {2507462623645193091, end}] and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.771779 | 127.0.0.3 |            933 | 127.0.0.1
                                                                                                                  Querying is done [shard 1] | 2022-01-27 22:53:08.771935 | 127.0.0.3 |           1265 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.771950 | 127.0.0.2 |            998 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.771956 | 127.0.0.2 |           1004 | 127.0.0.1
                                             Start querying token range ({-6473446911791631266, end}, {-6272011798787969456, end}] [shard 0] | 2022-01-27 22:53:08.771959 | 127.0.0.2 |           1008 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.771963 | 127.0.0.2 |           1011 | 127.0.0.1
                      Scanning cache for range ({-6473446911791631266, end}, {-6272011798787969456, end}] and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.771966 | 127.0.0.2 |           1014 | 127.0.0.1
    Page stats: 13 partition(s), 0 static row(s) (0 live, 0 dead), 13 clustering row(s) (13 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.772008 | 127.0.0.2 |           1057 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2022-01-27 22:53:08.772012 | 127.0.0.2 |           1061 | 127.0.0.1
                                             Start querying token range ({3897266736829642805, end}, {4611686018427387904, start}) [shard 0] | 2022-01-27 22:53:08.772014 | 127.0.0.2 |           1063 | 127.0.0.1
                                                                                                 Creating shard reader on shard: 0 [shard 0] | 2022-01-27 22:53:08.772016 | 127.0.0.2 |           1065 | 127.0.0.1
                      Scanning cache for range ({3897266736829642805, end}, {4611686018427387904, start}) and slice {(-inf, +inf)} [shard 0] | 2022-01-27 22:53:08.772019 | 127.0.0.2 |           1067 | 127.0.0.1
                                                                                                On shard execution result is [200] [shard 1] | 2022-01-27 22:53:08.772053 | 127.0.0.3 |           1384 | 127.0.0.1
    Page stats: 56 partition(s), 0 static row(s) (0 live, 0 dead), 56 clustering row(s) (56 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.772138 | 127.0.0.2 |           1186 | 127.0.0.1
 Page stats: 190 partition(s), 0 static row(s) (0 live, 0 dead), 190 clustering row(s) (190 live, 0 dead) and 0 range tombstone(s) [shard 1] | 2022-01-27 22:53:08.772364 | 127.0.0.2 |           1706 | 127.0.0.1
 Page stats: 149 partition(s), 0 static row(s) (0 live, 0 dead), 149 clustering row(s) (149 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2022-01-27 22:53:08.772407 | 127.0.0.3 |           1561 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.772417 | 127.0.0.3 |           1571 | 127.0.0.1
                                                                                                                  Querying is done [shard 1] | 2022-01-27 22:53:08.772418 | 127.0.0.2 |           1760 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.772426 | 127.0.0.2 |           1475 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.772428 | 127.0.0.2 |           1476 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2022-01-27 22:53:08.772449 | 127.0.0.3 |           1604 | 127.0.0.1
                                                                                                On shard execution result is [196] [shard 0] | 2022-01-27 22:53:08.772555 | 127.0.0.2 |           1603 | 127.0.0.1
                                                                                                On shard execution result is [238] [shard 1] | 2022-01-27 22:53:08.772674 | 127.0.0.2 |           2016 | 127.0.0.1
                                                                                                On shard execution result is [235] [shard 0] | 2022-01-27 22:53:08.772770 | 127.0.0.3 |           1924 | 127.0.0.1
                                                                                    Received forward_result=[435] from 127.0.0.3:0 [shard 1] | 2022-01-27 22:53:08.772933 | 127.0.0.1 |           2482 | 127.0.0.1
                                                                                    Received forward_result=[434] from 127.0.0.2:0 [shard 1] | 2022-01-27 22:53:08.773110 | 127.0.0.1 |           2658 | 127.0.0.1
                                                                                                           Merged result is [1000] [shard 1] | 2022-01-27 22:53:08.773111 | 127.0.0.1 |           2660 | 127.0.0.1
                                                                                              Done processing - preparing a result [shard 1] | 2022-01-27 22:53:08.773114 | 127.0.0.1 |           2663 | 127.0.0.1
                                                                                                                            Request complete | 2022-01-27 22:53:08.772666 | 127.0.0.1 |           2666 | 127.0.0.1
```

Fixes #1385

Closes #9209

* github.com:scylladb/scylla:
  docs: add parallel aggregations design doc
  db: config: add a flag to disable new parallelized aggregation algorithm
  test: add parallelized select count test
  forward_service: add metrics
  forward_service: parallelize execution across shards
  forward_service: add tracing
  cql3: statements: introduce parallelized_select_statement
  cql3: query_processor: add forward_service reference to query_processor
  gms: add PARALLELIZED_AGGREGATION feature
  service: introduce forward_service
  storage_proxy: extract query_ranges_to_vnodes_generator to a separate file
  messaging_service: add verb for count(*) request forwarding
  cql3: selection: detect if a selection represents count(*)
2022-02-04 12:34:19 +02:00
Nadav Har'El
b54e85088d Merge 'snapshots: Fix snapshot-ctl to include snapshots of dropped tables' from Benny Halevy
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.

This PR is a rebased version of #9539 (and slightly cleaned-up, cosmetically)
and so it replaces the previous PR.

Fixes #3463
Closes #7122

Closes #9884

* github.com:scylladb/scylla:
  snapshots: Fix snapshot-ctl to include snapshots of dropped tables
  table: snapshot: add debug messages
2022-02-04 12:34:19 +02:00
Botond Dénes
d309a86708 Merge 'Add keyspace_offstrategy_compaction api' from Benny Halevy
This series adds methods to perform offstrategy compaction, if needed, returning a future<bool>
so the caller can wait on it until compaction completes.
The returned value is true iff offstrategy compaction was needed.

The added keyspace_offstrategy_compaction calls perform_offstrategy_compaction on the specified keyspace and tables, return the number of tables that required offstrategy compaction.

A respective unit test was added to the rest_api pytest.

This PR replaces https://github.com/scylladb/scylla/pull/9095 that suggested adding an option to `keyspace_compaction`
since offstrategy compaction triggering logic is different enough from major compaction meriting a new api.

Test: unit (dev)

Closes #9980

* github.com:scylladb/scylla:
  test: rest_api: add unit tests for keyspace_offstrategy_compaction api
  api: add keyspace_offstrategy_compaction
  compaction_manager: get rid of submit_offstrategy
  table: add perform_offstrategy_compaction
  compaction_manager: perform_offstrategy: print ks.cf in log messages
  compaction_manager: allow waiting on offstrategy compaction
2022-02-02 13:15:31 +02:00
Piotr Wojtczak
0dd7739716 snapshots: Fix snapshot-ctl to include snapshots of dropped tables
Snapshot-ctl methods fetch information about snapshots from
column family objects. The problem with this is that we get rid
of these objects once the table gets dropped, while the snapshots
might still be present (the auto_snapshot option is specifically
made to create this kind of situation). This commit switches from
relying on column family interface to scanning every datadir
that the database knows of in search for "snapshots" folders.

Fixes #3463
Closes #7122

Closes #9884

Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-01 22:31:43 +02:00
Michał Sala
140bab279c test: add parallelized select count test
Added test that checks if a SELECT COUNT(*) query was transformed and
processed in a parallel way. Checking is done by looking at the cql
statistics and comparing subsequent counts of parallelized aggregation
SELECT query executions.
2022-02-01 21:14:41 +01:00
Michał Sala
66a93d3000 cql3: query_processor: add forward_service reference to query_processor 2022-02-01 21:14:41 +01:00
Michał Sala
0fe59082ec storage_proxy: extract query_ranges_to_vnodes_generator to a separate file
Such separation allows using query_ranges_to_vnodes_generator by other
services without needing a storage_proxy dependency.
2022-02-01 21:14:41 +01:00
Tomasz Grabiec
00a9326ae7 Merge "raft: let modify_config finish on a follower that removes itself" from Kamil
When forwarding a reconfiguration request from follower to a leader in
`modify_config`, there is no reason to wait for the follower's commit
index to be updated. The only useful information is that the leader
committed the configuration change - so `modify_config` should return as
soon as we know that.

There is a reason *not* to wait for the follower's commit index to be
updated: if the configuration change removes the follower, the follower
will never learn about it, so a local waiter will never be resolved.

`execute_modify_config` - the part of `modify_config` executed on the
leader - is thus modified to finish when the configuration change is
fully complete (including the dummy entry appended at the end), and
`modify_config` - which does the forwarding - no longer creates a local
waiter, but returns as soon as the RPC call to the leader confirms that
the entry was committed on the leader.

We still return an `entry_id` from `execute_modify_config` but that's
just an artifact of the implementation.

Fixes #9981.

A regression test was also added in randomized_nemesis_test.

* kbr/modify-config-finishes-v1:
  test: raft: randomized_nemesis_test: regression test for #9981
  raft: server: don't create local waiter in `modify_config`
2022-01-31 20:14:50 +01:00
Nadav Har'El
8a745593a2 Merge 'alternator: fill UnprocessedKeys for failed batch reads' from Piotr Sarna
DynamoDB protocol specifies that when getting items in a batch
failed only partially, unprocessed keys can be returned so that
the user can perform a retry.
Alternator used to fail the whole request if any of the reads failed,
but right now it instead produces the list of unprocessed keys
and returns them to the user, as long as at least 1 read was
successful.

This series comes with a test based on Scylla's error injection mechanism, and thus is only useful in modes which come with error injection compiled in. In release mode, expect to see the following message:
SKIPPED (Error injection not enabled in Scylla - try compiling in dev/debug/sanitize mode)

Fixes #9984

Closes #9986

* github.com:scylladb/scylla:
  test: add total failure case for GetBatchItem
  test: add error injection case for GetBatchItem
  test: add a context manager for error injection to alternator
  alternator: add error injection to BatchGetItem
  alternator: fill UnprocessedKeys for failed batch reads
2022-01-31 15:28:24 +02:00
Piotr Sarna
c87126198d test: add total failure case for GetBatchItem
The test verifies that if all reads from a batch operation
failed, the result is an error, and not a success response
with UnprocessedKeys parameter set to all keys.
2022-01-31 14:21:55 +01:00
Piotr Sarna
e79c2943fc test: add error injection case for GetBatchItem
The new test case is based on Scylla error injection mechanism
and forces a partial read by failing some requests from the batch.
2022-01-31 14:21:55 +01:00
Piotr Sarna
99c5bec0e2 test: add a context manager for error injection to alternator
With the new context manager it's now easier to request an error
to be injected via REST API. Note that error injection is only
enabled in certain build modes (dev, debug, sanitize)
and the test case will be skipped if it's not possible to use
this mechanism.
2022-01-31 14:21:55 +01:00
Tomasz Grabiec
8297ae531d Merge "Automatically retry CQL DDL statements in presence of concurrent changes" from Kamil
Schema changes on top of Raft do not allow concurrent changes.
If two changes are attempted concurrently, one of them gets
`group0_concurrent_modification` exception.

Catch the exception in CQL DDL statement execution function and retry.

In addition, improve the description of CQL DDL statements
in group 0 history table.

Add a test which checks that group 0 history grows iff a schema change does
not throw `group0_concurrent_modification`. Also check that the retry
mechanism works as expected.

* kbr/ddl-retry-v1:
  test: unit test for group 0 concurrent change protection and CQL DDL retries
  cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes
2022-01-31 14:12:35 +01:00
Tomasz Grabiec
b78bab7286 Merge "raft: fixes and improvements to the library and nemesis test" from Kamil
Raft randomized nemesis test was improved by adding some more
chaos: randomizing the network delay, server configuration,
ticking speed of servers.

This allowed to catch a serious bug, which is fixed in the first patch.

The patchset also fixes bugs in the test itself and adds quality of life
improvements such as better diagnostics when inconsistency is detected.

* kbr/nemesis-random-v1:
  test: raft: randomized_nemesis_test: print state of each state machine when detecting inconsistency
  test: raft: randomized_nemesis_test: print details when detecting inconsistency
  test: raft: randomized_nemesis_test: print snapshot details when taking/loading snapshots in `impure_state_machine`
  test: raft: randomized_nemesis_test: keep server id in impure_state_machine
  test: raft: randomized_nemesis_test: frequent snapshotting configuration
  test: raft: randomized_nemesis_test: tick servers at different speeds in generator test
  test: raft: randomized_nemesis_test: simplify ticker
  test: raft: randomized_nemesis_test: randomize network delay
  test: raft: randomized_nemesis_test: fix use-after-free in `environment::crash()`
  test: raft: randomized_nemesis_test: fix use-after-free in two-way rpc functions
  test: raft: randomized_nemesis_test: rpc: don't propagate `gate_closed_exception` outside
  test: raft: randomized_nemesis_test: fix obsolete comment
  raft: fsm: print configuration entries appearing in the log
  raft: `operator<<(ostream&, ...)` implementation for `server_address` and `configuration`
  raft: server: abort snapshot applications before waiting for rpc abort
  raft: server: logging fix
  raft: fsm: don't advance commit index beyond matched entries
2022-01-31 13:25:27 +01:00
Mikołaj Sielużycki
93d6eb6d51 compacting_reader: Support fast_forward_to position range.
Fast forwarding is delegated to the underlying reader and assumes the
it's supported. The only corner case requiring special handling that has
shown up in the tests is producing partition start mutation in the
forwarding case if there are no other fragments.

compacting state keeps track of uncompacted partition start, but doesn't
emit it by default. If end of stream is reached without producing a
mutation fragment, partition start is not emitted. This is invalid
behaviour in the forwarding case, so I've added a public method to
compacting state to force marking partition as non-empty. I don't like
this solution, as it feels like breaking an abstraction, but I didn't
come across a better idea.

Tests: unit(dev, debug, release)

Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>
2022-01-31 13:37:36 +02:00
Nadav Har'El
a25e265373 test/alternator: improve comment on why we need "global_random"
Improve the comment that explains why we needed to use an explicitly
shared random sequence instead of the usual "random". We now understand
that we need this workaround to undo what the pytest-randomly plugin does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220130155557.1181345-1-nyh@scylladb.com>
2022-01-31 10:07:56 +01:00
Nadav Har'El
59fe6a402c test/cql-pytest: use unique keys instead of random keys
Some of the tests in test/cql-pytest share the same table but use
different keys to ensure they don't collide. Before this patch we used a
random key, which was usually fine, but we recently noticed that the
pytest-randomly plugin may cause different tests to run through the *same*
sequence of random numbers and ruin our intent that different tests use
different keys.

So instead of using a *random* key, let's use a *unique* key. We can
achieve this uniqueness trivially - using a counter variable - because
anyway the uniqueness is only needed inside a single temporary table -
which is different in every run.

Another benefit is that it will now be clearer that the tests are
deterministic and not random - the intent of a random_string() key
was never to randomly walk the entire key space (random_string()
anyway had a pretty narrow idea of what a random string looks like) -
it was just to get a unique key.

Refs #9988 (fixes it for cql-pytest, but not for test/alternator)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-01-31 09:01:23 +02:00
Benny Halevy
1c25934399 test: rest_api: add unit tests for keyspace_offstrategy_compaction api
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-30 20:40:40 +02:00
Tomasz Grabiec
b734615f51 util: cached_file: Fix corruption after memory reclamation was triggered from population
If memory reclamation is triggered inside _cache.emplace(), the _cache
btree can get corrupted. Reclaimers erase from it, and emplace()
assumes that the tree is not modified during its execution. It first
locates the target node and then does memory allocation.

Fix by running emplace() under allocating section, which disables
memory reclamation.

The bug manifests with assert failures, e.g:

./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed.

Fixes #9915

Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>
2022-01-30 19:57:35 +02:00
Piotr Sarna
471205bdcf test/alternator: use a global random generator for all test cases
It was observed (perhaps it depends on the Python implementation)
that an identical seed was used for multiple test cases,
which violated the assumption that generated values are in fact
unique. Using a global generator instead makes sure that it was
only seeded once.

Tests: unit(dev) # alternator tests used to fail for me locally
  before this patch was applied
Message-Id: <315d372b4363f449d04b57f7a7d701dcb9a6160a.1643365856.git.sarna@scylladb.com>
2022-01-30 16:40:20 +02:00
Kamil Braun
d10b508380 test: raft: randomized_nemesis_test: regression test for #9981 2022-01-27 17:50:40 +01:00
Kamil Braun
4a52b802ac test: unit test for group 0 concurrent change protection and CQL DDL retries
Check that group 0 history grows iff a schema change does not throw
`group0_concurrent_modification`. Check that the CQL DDL statement retry
mechanism works as expected.
2022-01-27 11:26:15 +01:00
Tomasz Grabiec
ba6c02b38a Merge "Clear old entries from group 0 history when performing schema changes" from Kamil
When performing a change through group 0 (which right now means schema
changes), clear entries from group 0 history table which are older
than one week.

This is done by including an appropriate range tombstone in the group 0
history table mutation.

* kbr/g0-history-gc-v2:
  idl: group0_state_machine: fix license blurb
  test: unit test for clearing old entries in group0 history
  service: migration_manager: clear old entries from group 0 history when announcing
2022-01-26 16:12:40 +01:00
Kamil Braun
95ac8ead4f test: raft: randomized_nemesis_test: print state of each state machine when detecting inconsistency 2022-01-26 16:09:41 +01:00
Kamil Braun
e249ea5aef test: raft: randomized_nemesis_test: print details when detecting inconsistency
If the returned result is inconsistent with the constructed model, print
the differences in detail instead of just failing an assertion.
2022-01-26 16:09:41 +01:00
Kamil Braun
1170e47af4 test: raft: randomized_nemesis_test: print snapshot details when taking/loading snapshots in impure_state_machine
Useful for debugging.
2022-01-26 16:09:41 +01:00
Kamil Braun
b8158e0b43 test: raft: randomized_nemesis_test: keep server id in impure_state_machine
Will be used for logging.
2022-01-26 16:09:41 +01:00
Kamil Braun
3c01449472 test: raft: randomized_nemesis_test: frequent snapshotting configuration
With probability 1/2, run the test with a configuration that causes
servers to take snapshots frequently.
2022-01-26 16:09:41 +01:00
Kamil Braun
7546a9ebb5 test: raft: randomized_nemesis_test: tick servers at different speeds in generator test
Previously all servers were ticked at the same moment, every 10
network/timer ticks.

Now we tick each server with probability 1/10 on each network/timer
tick. Thus, on average, every server is ticked once per 10 ticks.
But now we're able to obtain more interesting behaviors.
E.g. we can now observe servers which are stalling for as long as 10 ticks
and servers which temporarily speed up to tick once per each network tick.
2022-01-26 16:09:41 +01:00
Kamil Braun
5d986b2682 test: raft: randomized_nemesis_test: simplify ticker
Instead of taking a set of functions with different periods, take a
single function that is called on every tick. The periodicity can be
implemented easily on the user side.
2022-01-26 16:09:41 +01:00
Kamil Braun
173fb2bf36 test: raft: randomized_nemesis_test: randomize network delay
As a side effect, this causes messages to be delivered in a different
order they were sent, adding even more chaos.
2022-01-26 16:09:41 +01:00
Kamil Braun
00c18adbb0 test: raft: randomized_nemesis_test: fix use-after-free in environment::crash()
The lambda attached to `_crash_fiber` was a coroutine. The coroutine
would use `this` captured by the lambda after the `co_await`, where the
lambda object (hence its captures) was already destroyed.

No idea why it worked before and sanitizers did not complain in debug
mode.
2022-01-26 16:09:41 +01:00
Kamil Braun
4c68e6a04c test: raft: randomized_nemesis_test: fix use-after-free in two-way rpc functions
Two-way RPC functions such as `send_snapshot` had a guard object which
was captured in a lambda passed to `with_gate`. The guard object, on
destruction, accessed the `rpc` object. Unfortunately, the guard object
could outlive the `rpc` object. That's because the lambda, and hence the
guard object, was destroyed after `with_gate` finished (it lived in the
frame of the caller of `with_gate`, i.e. `send_snapshot` and others),
so it could be destroyed after `rpc` (the gate prevents `rpc` from being
destroyed).

Make sure that the guard object is destroyed before `with_gate` finishes
by creating it inside the lambda body - not capturing inside the object.
2022-01-26 16:09:41 +01:00
Kamil Braun
871f0d00ce test: raft: randomized_nemesis_test: rpc: don't propagate gate_closed_exception outside
The `raft::rpc` interface functions are called by `raft::server_impl`
and the exceptions may be propagated outside the server, e.g. through
the `add_entry` API.

Translate the internal `gate_closed_exception` to an external
`raft::stopped_error`.
2022-01-26 16:09:41 +01:00
Kamil Braun
9da4ffc1c7 test: raft: randomized_nemesis_test: fix obsolete comment 2022-01-26 16:09:41 +01:00