Commit Graph

30104 Commits

Author SHA1 Message Date
Benny Halevy
19ea228cf8 replica: table: coroutinize move_sstables_from_staging
Test: unit(dev)
DTest: test_drop_mv_during_base_table_writes

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214102911.1314022-1-bhalevy@scylladb.com>
2022-02-14 17:52:27 +02:00
Benny Halevy
8f417b8021 sstable: coroutinize seal_sstable
Test: unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214105214.1337361-1-bhalevy@scylladb.com>
2022-02-14 17:49:52 +02:00
Benny Halevy
c75e63e480 sstable: coroutinize move_to_new_dir
Test: unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214154403.1590022-1-bhalevy@scylladb.com>
2022-02-14 17:47:09 +02:00
Benny Halevy
b131f94fc3 large_data_handler: maybe_delete_large_data_entries: data_size is unused
Since 64a4ffc579

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214115258.1354372-1-bhalevy@scylladb.com>
2022-02-14 13:58:44 +02:00
Benny Halevy
795d4a0bad batchlog_manager: batchlog_replay_loop: ignore broken_semaphore if abort_requested
drain() breaks _sem, causing do_batch_log_replay to throw broken_semaphore.
Ignore this error in batchlog_replay_loop as it's expected on shutdown.

https://jenkins.scylladb.com/job/scylla-master/job/dtest-debug/1073/testReport/junit/thrift_tests/TestCompactStorageThriftAccesses/test_get/
```
E           AssertionError: Unexpected errors found: [('node1', ['ERROR 2022-02-14 06:55:44,263 [shard 0] batchlog_manager - Exception in batch replay: seastar::broken_semaphore (Semaphore broken)'])]
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220214090607.1213740-1-bhalevy@scylladb.com>
2022-02-14 11:34:16 +02:00
Raphael S. Carvalho
a9427f150a Revert "sstables/compaction_manager: rewrite_sstables(): resolve maintenance group FIXME"
This reverts commit 4c05e5f966.

Moving cleanup to maintenance group made its operation time up to
10x slower than previous release. It's a blocker to 4.6 release,
so let's revert it until we figure this all out.

Probably this happens because maintenance group is fixed at a
relatively small constant, and cleanup may be incrementally
generating backlog for regular compaction, where the former is
fighting for resources against the latter.

Fixes #10060.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220213184306.91585-1-raphaelsc@scylladb.com>
2022-02-13 21:48:20 +02:00
Avi Kivity
13cf66d3ef Revert "schema_registry: Increase grace period for schema version cache"
This reverts commit 23da2b5879. It causes
the node to quickly run out of memory when many schema changes are made
within a small time window.

Fixes #10071.
2022-02-13 19:38:24 +02:00
Avi Kivity
7cc43f8aa8 Merge 'utils: add result_try and result_futurize_try' from Piotr Dulikowski
Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic.

For example, you can convert the following try..catch block:

    try {
        return a_function_that_may_throw();
    } catch (const my_exception& ex) {
        return 123;
    } catch (...) {
        throw;
    }

...to this:

    return utils::result_try([&] {
        return a_function_that_may_throw_or_return_a_failed_result();
    },  utils::result_catch<my_exception>([&] (const Ex&) {
        return 123;
    }), utils::result_catch_dots([&] (auto&& handle) {
        return handle.into_result();
    });

Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs.

As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`.

Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`:

```
127041.61 tps ( 67.2 allocs/op,  14.2 tasks/op,   52422 insns/op)
126958.60 tps ( 67.2 allocs/op,  14.2 tasks/op,   52409 insns/op)
127088.37 tps ( 67.2 allocs/op,  14.2 tasks/op,   52411 insns/op)
127560.84 tps ( 67.2 allocs/op,  14.2 tasks/op,   52424 insns/op)
127826.61 tps ( 67.2 allocs/op,  14.2 tasks/op,   52406 insns/op)

126801.02 tps ( 67.2 allocs/op,  14.2 tasks/op,   52420 insns/op)
125371.51 tps ( 67.2 allocs/op,  14.2 tasks/op,   52425 insns/op)
126498.51 tps ( 67.2 allocs/op,  14.2 tasks/op,   52427 insns/op)
126359.41 tps ( 67.2 allocs/op,  14.2 tasks/op,   52423 insns/op)
126298.27 tps ( 67.2 allocs/op,  14.2 tasks/op,   52410 insns/op)
```

The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results.

Tests: unit(dev)

Closes #10045

* github.com:scylladb/scylla:
  transport: use result_try in process_request_one
  storage_proxy: use result_futurize_try in mutate_end
  storage_proxy: temporarily throw exception from result in mutate_end
  utils: add result_try and result_futurize_try
2022-02-13 19:38:13 +02:00
Avi Kivity
45bdb57b05 Update seastar submodule
* seastar 299c9474d...c18cc5dc6 (5):
  > log: Fix silencer to be shard-local and logger-global
Fixes #9784.
  > Fix alloc-dealloc-mismatch error in DPDK mode
  > Fix stack buffer overflow when using native network inteface
  > test: coroutines: test_scheduling_group: fixup indentation
  > test: coroutines: test_scheduling_group: destroy temporary scheduling_group when done
2022-02-13 16:47:25 +02:00
Avi Kivity
6572b297a2 treewide: clean up stray license blurbs
After the mechanical change in fcb8d040e8
("treewide: use Software Package Data Exchange (SPDX) license identifiers"),
a few stray license blurbs or fragments thereof remain. In two cases
these were extra blurbs in code generators intended for the generated code,
in others they were just missed by the script.

Clean them up, adding an SPDX license identifier where needed.

Closes #10072
2022-02-13 14:16:16 +02:00
Avi Kivity
6b380121e0 Merge 'utils/result: optimize result_parallel_for_each' from Piotr Dulikowski
This PR rewrites the `utils::result_parallel_for_each`'s implementation to resemble the original `seastar::parallel_for_each` more closely instead of using the less efficient `seastar::map_reduce`. It uses less tasks and allocations now, as demonstrated in the results from the `perf_result_query` benchmark, attached at the end of the cover letter.

The main drawback of the new implementation is that it needs to rethrow exceptions propagated as exceptional futures from the parallel sub-invocations. Contrary to the original `seastar::parallel_for_each` which uses a custom task to collect results, the new `utils::result_parallel_for_each` uses a coroutine and there doesn't currently seem to be a way to co_await for a future and inspect its state without either rethrowing or handling it in then_wrapped (which allocates a continuation). Fortunately, rethrowing is not needed for exceptions returned in failed result<>, which are already intended to be a more performant alternative to regular exceptions.

As a bonus, definitions from `utils/result.hh` are now split across three different headers in order to improve (re)compilation times.

Results from `perf_simple_query --smp 1 --operations-per-shard 1000000 --write` (before vs. after):

```
126872.54 tps ( 67.2 allocs/op,  14.2 tasks/op,   52404 insns/op)
126532.13 tps ( 67.2 allocs/op,  14.2 tasks/op,   52408 insns/op)
126864.99 tps ( 67.2 allocs/op,  14.2 tasks/op,   52428 insns/op)
127073.10 tps ( 67.2 allocs/op,  14.2 tasks/op,   52404 insns/op)
126895.85 tps ( 67.2 allocs/op,  14.2 tasks/op,   52411 insns/op)

127894.02 tps ( 66.2 allocs/op,  13.2 tasks/op,   52036 insns/op)
127671.51 tps ( 66.2 allocs/op,  13.2 tasks/op,   52042 insns/op)
127541.42 tps ( 66.2 allocs/op,  13.2 tasks/op,   52044 insns/op)
127409.10 tps ( 66.2 allocs/op,  13.2 tasks/op,   52052 insns/op)
127831.30 tps ( 66.2 allocs/op,  13.2 tasks/op,   52043 insns/op)
```

Test: unit(dev, debug)

Closes #10053

* github.com:scylladb/scylla:
  utils/result: optimize result_parallel_for_each
  utils/result: split into `combinators` and `loop` file
2022-02-13 12:04:40 +02:00
Avi Kivity
52e707f978 Merge 'gms: gossiper: coroutinize code (continued)' from Pavel Solodovnikov
This series continues the effort of https://github.com/scylladb/scylla/pull/9844 to reduce `seastar::async` usage and coroutinize in the gossiper code.

There are mostly trivial conversions from using `.get()` to `co_await`, where appropriate, as well, as elimination of `seastar::async()` wrappers.

A few more functions are not yet converted, though (e.g. `apply_new_states`, `do_apply_state_locally`, `apply_state_locally`, `apply_state_locally_without_listener_notification`, maybe a few others, as well).

The motivation is to be able to call every public API function of `gossiper` class without requiring `seastar::async` context.

Tests: unit(debug, dev), dtest (topology-related tests)

Closes #10032

* github.com:scylladb/scylla:
  gms: gossiper: coroutinize `wait_for_gossip`
  gms: gossiper: coroutinize `advertise_token_removed`
  gms: gossiper: coroutinize `advertise_removing`
  gms: gossiper: don't wrap `convict` calls into `seastar::async`
  gms: gossiper: coroutinize `handle_major_state_change`
  gms: gossiper: coroutinize `handle_shutdown_msg`
  gms: gossiper: coroutinize `mark_as_shutdown` and `convict`
  gms: gossiper: remove comment about requiring thread context in `mark_alive`
  gms: gossiper: don't use `seastar::async` in `mark_alive`
  gms: gossiper: coroutinize `do_on_change_notifications`
  gms: gossiper: coroutinize `do_before_change_notifications`
  gms: gossiper: coroutinize `real_mark_alive`
  gms: gossiper: coroutinize `mark_dead`
2022-02-13 11:51:44 +02:00
Piotr Dulikowski
dd3284ec38 utils/result: optimize result_parallel_for_each
It now resembles the original parallel_for_each more, but uses a
coroutine instead of a custom `task` to collect not-ready futures.
Although the usage of a coroutine saves on allocations, the drawback is
that there is currently no way to co_await on a future and handle its
exception without throwing or without unconditionally allocating a
then_wrapped or handle_exception continuation - so it introduces a
rethrow.

Furthermore, now failed results and exceptions are treated as equals.
Previously, in case one parallel invocation returned failed result and
another returned an exception, the exception would always be returned.
Now, the failed result/exception of the invocation with the lowest index
is always preferred, regardless of the failure type.

The reimplementation manages to save about 350-400 instructions, one
task and one allocation in the perf_simple_query benchmark in write
mode.

Results from `perf_simple_query --smp 1 --operations-per-shard 1000000
--write` (before vs. after):

```
126872.54 tps ( 67.2 allocs/op,  14.2 tasks/op,   52404 insns/op)
126532.13 tps ( 67.2 allocs/op,  14.2 tasks/op,   52408 insns/op)
126864.99 tps ( 67.2 allocs/op,  14.2 tasks/op,   52428 insns/op)
127073.10 tps ( 67.2 allocs/op,  14.2 tasks/op,   52404 insns/op)
126895.85 tps ( 67.2 allocs/op,  14.2 tasks/op,   52411 insns/op)

127894.02 tps ( 66.2 allocs/op,  13.2 tasks/op,   52036 insns/op)
127671.51 tps ( 66.2 allocs/op,  13.2 tasks/op,   52042 insns/op)
127541.42 tps ( 66.2 allocs/op,  13.2 tasks/op,   52044 insns/op)
127409.10 tps ( 66.2 allocs/op,  13.2 tasks/op,   52052 insns/op)
127831.30 tps ( 66.2 allocs/op,  13.2 tasks/op,   52043 insns/op)
```

Test: unit(dev), unit(result_utils_test, debug)
2022-02-10 18:19:08 +01:00
Piotr Dulikowski
6abeec6299 utils/result: split into combinators and loop file
Segregates result utilities into:

- result.hh - basic definitions related to results with exception
  containers,
- result_combinators.hh - combinators for working with results in
  conjunction with futures,
- result_loop.hh - loop-like combinators, currently has only
  result_parallel_for_each.

The motivation for the split is:

1. In headers, usually only result.hh will be needed, so no need to
   force most .cc files to compile definitions from other files,
2. Less files need to be recompiled when a combinator is added to
   result_combinators or result_loop.

As a bonus, `result_with_exception` was moved from `utils::internal` to
just `utils`.
2022-02-10 18:19:05 +01:00
Piotr Dulikowski
049564bd2d transport: use result_try in process_request_one
Adapts the exception handling logic in process_request_one so that it
uses utils::result_try to handle both C++ exceptions and failed results
in a unified way.
2022-02-10 17:35:32 +01:00
Piotr Dulikowski
98bde8d6d2 storage_proxy: use result_futurize_try in mutate_end
Adapts the mutate_end exception handling logic so that it uses the new
utils::result_futurize_try function to handle both exceptional futures
and failed results in an unified way.
2022-02-10 17:35:32 +01:00
Piotr Dulikowski
d5d24a5140 storage_proxy: temporarily throw exception from result in mutate_end
Temporarily removes the logic which handles failed results in a
non-throwing way. Exceptions from failed results are thrown and handled
in try..catch.

The reason for this change is that it makes the following commit, which
migrates the whole try..catch block to utils::result_futurize_try much
nicer. The next commit will also bring back the non-throwing handling of
the failed result.
2022-02-10 17:35:32 +01:00
Piotr Dulikowski
8d52ceca50 utils: add result_try and result_futurize_try
Adds result_try and result_futurize_try - functions which allow to
convert existing try..catch blocks into a version which handles C++
exceptions, failed results with exception containers and, depending on
the function variant, exceptional futures.
2022-02-10 17:35:32 +01:00
Juliusz Stasiewicz
00a6fda7b9 tracing: Trace slow queries on replicas wrt. parent's clock
Secondary tracing sessions used to compute the execution time
from the point of their `begin()`-ning, not the parent session's
`begin()`. As a result, replica reported a slow query if it
exceeded the entire threshold *on that replica* too.

This change augments `trace_info` with the TS of parent's session
starting point, to be used as a reference on replicas.

Fixes #9403

Closes #10005
2022-02-10 12:03:53 +01:00
Pavel Solodovnikov
e892170c86 raft: add raft tables to extra_durable_tables list
`system.raft`, `system.raft_snapshots` and `system.raft_config`
were missing from the `extra_durable_tables` list, so that
`set_wait_for_sync_to_commitlog(true)` was not enabled when
the tables were re-created via `create_table_from_mutations`.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20220210073418.484843-1-pa.solodovnikov@scylladb.com>
2022-02-10 11:47:41 +02:00
Botond Dénes
ef34c10a94 main: run scylla main to when there are no arguments
main() has some logic to select the main function it will delegate to
based on argv[1]. The intent is that when the value of argv[1] suggest
that the user did not specify a specific app to run, we default to
"server" (scylla proper).
This logic currently breaks down when there are no arguments at all: in
this case the following error is printed and scylla refuses to start:

    error: unrecognized first argument: expected it to be "server", a regular command-line argument or a valid tool name (see `scylla --list-tools`), but got

Fix this by checking for empty argv[1] and defaulting to "server" in
that case.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220210092125.293682-1-bdenes@scylladb.com>
2022-02-10 11:47:20 +02:00
Avi Kivity
2e2b54254c Merge 'docs: update theme 1.1' from David Garcia
Related issue https://github.com/scylladb/sphinx-scylladb-theme/issues/310

ScyllaDB Sphinx Theme 1.1 is now released 🥳

We’ve made a number of updates to update all our dependencies to the latest version and introduced new directives you can use to write great docs.

You can read more about all notable changes [here](https://sphinx-theme.scylladb.com/master/upgrade/CHANGELOG.html#february-2022).

Before,  the theme installed [poetry 1.1.x](https://python-poetry.org/) as a dependency to manage Python dependencies. However, ``poetry 1.2.x`` changed the installation method. Therefore, we've decided to [#307 Make poetry a prerequisite](https://github.com/scylladb/sphinx-scylladb-theme/issues/307) so that you can decide to install the poetry version you prefer.

To preview the docs locally, you should uninstall the previous version of poetry. Then, install the latest version:

1. Uninstall Poetry 1.1.x.

```
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | POETRY_UNINSTALL=1 python -
```

2. Install Poetry 1.2.x. For detailed instructions, see [Poetry installation](https://python-poetry.org/docs/master/#installation).

1. Clone this PR. For more information, see [Cloning pull requests locally](https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally).

2. Uninstall poetry 1.1 and install poetry 1.2. For more information, see **Breaking changes** notice above.

3. Enter the docs folder, and run:

```
make preview
````

4. Open http://127.0.0.1:5500/ with your favorite browser. The doc should render without errors, and the version should be Sphinx Theme version (see the footer) must be ``1.1.x``:

![image](https://user-images.githubusercontent.com/9107969/152107446-52b167d8-c607-4431-a7a4-92579153d024.png)

Closes #10054

* github.com:scylladb/scylla:
  Add missing lexer
  docs: update theme 1.1
2022-02-10 11:14:02 +02:00
Botond Dénes
54b27a6dec Update seastar submodule
* seastar d27bf8b5...299c9474 (1):
  > core/app_template: print debug warning to std::cerr
2022-02-10 09:51:41 +02:00
Nadav Har'El
4937270803 test/alternator: add option to run with Raft-based schema changes
This patch adds a "--raft" option to test/alternator/run to enable the
experimental Raft-based schema changes ("--experimental-features=raft")
when running Scylla for the tests. This is the same option we added to
test/cql-pytest/run in a previous patch.

Note that we still don't have any Alternator tests that pass or fail
differently in these two modes - these will probably come later as we
fix issues #9868 and #6391. But in order to work on fixing those issues
we need to be able to run the tests in Raft mode.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209123144.321344-1-nyh@scylladb.com>
2022-02-10 09:43:10 +02:00
Nadav Har'El
8409a42baa merge: Convert table::compact_sstables to coroutines
Patch series by Mikołaj Sielużycki

  compaction: Fix indentation in table::compact_sstables.
  compaction: Convert table::compact_sstables to coroutines.
2022-02-10 09:10:24 +03:00
Nadav Har'El
a1635b553e cql-pytest: fix detection of "raft" experimental feature
In a previous patch we fixed the output of experimental features list
(issue #10047), so we also need to fix the test code which detects the
"raft" experimental feature - to use the string "raft" and not the
silly byte 4 we had there before.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209104331.312999-1-nyh@scylladb.com>
2022-02-10 09:10:24 +03:00
Nadav Har'El
de586ef856 test/cql-pytest: mechanism for tests requiring raft-based schema updates
Issue #8968 no longer exists when Raft-based schema updates are enabled
in Scylla (with --experimental-features=raft). Before we can close this
issue we need a way to re-run its test

        test_keyspace.py::test_concurrent_create_and_drop_keyspace

with Raft and see it pass. But we also want the tests to continue to run
by default the older raft-less schema updates - so that this mode doesn't
regress during the potentially-long duration that it's still the default!

The solution in this patch is:

1. Introduce a "--raft" option to test/cql-pytest/run, which runs the tests
   against a Scylla with the raft experimental feature, while the default is
   still to run without it.

2. Introduce a text fixture "fails_without_raft" which marks a test which
   is expected to fail with the old pre-raft code, but is expected to
   pass in the new code.

3. Mark the test test_concurrent_create_and_drop_keyspace with this new
   "fails_without_raft".

After this patch, running

        test/cql-pytest/run --raft
            test_keyspace.py::test_concurrent_create_and_drop_keyspace

Passes, which shows that issue 8968 was fixed (in Raft mode) - so we can say:
Fixes #8968

Running the same test without "--raft" still xfails (an expected failure).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220208162732.260888-1-nyh@scylladb.com>
2022-02-10 09:10:24 +03:00
Nadav Har'El
fef7934a2d config: fix some types in system.config virtual table
The system.config virtual tables prints each configuration variable of
type T based on the JSON printer specified in the config_type_for<T>
in db/config.cc.

For two variable types - experimental_features and tri_mode_restriction,
the specified converter was wrong: We used value_to_json<string> or
value_to_json<vector<string>> on something which was *not* a string.
Unfortunately, value_to_json silently casted the given objects into
strings, and the result was garbage: For example as noted in #10047,
for experimental_features instead of printing a list of features *names*,
e.g., "raft", we got a bizarre list of one-byte strings with each feature's
number (which isn't documented or even guaranteed to not change) as well
as carriage-return characters (!?).

So solution is a new printable_to_json<T> which works on a type T that
can be printed with operator<< - as in fact the above two types can -
and the type is converted into a string or vector of strings using this
operator<<, not a cast.

Also added a cql-pytest test for reading system.config and in particular
options of the above two types - checking that they contain sensible
strings and not "garbage" like before this patch.

Fixes #10047.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209090421.298849-1-nyh@scylladb.com>
2022-02-10 09:10:24 +03:00
David Garcia
e092bf3bad Add missing lexer 2022-02-09 11:25:10 +00:00
Mikołaj Sielużycki
ee386213c2 compaction: Fix indentation in table::compact_sstables. 2022-02-09 12:19:23 +01:00
Mikołaj Sielużycki
ec91192525 compaction: Convert table::compact_sstables to coroutines. 2022-02-09 12:19:23 +01:00
David Garcia
24b5584941 docs: update theme 1.1 2022-02-09 11:13:38 +00:00
Avi Kivity
7f0dec9227 Update seastar submodule
* seastar 0d250d15a...d27bf8b5a (5):
  > Merge "Clean internal namespace in io_queue.cc" from Pavel E
  > Making par.._for_each and max_conc.._for_each compatible with move-only views (like generators)
  > tests: Perf test for smp::submit_to efficiency
  > Merge "Auto-increase IO latency goal from reactor" from Pavel E
  > reactor: Fix default task-quota-ms to be 0.5ms
2022-02-09 10:17:26 +02:00
Tomasz Grabiec
23da2b5879 schema_registry: Increase grace period for schema version cache
If version is absent in cache, it will be fetched from the
coordinator. This is not expensive, but if the version is not known,
it must be also "synced". It means that the node will do a full schema
pull from the coordinator. This pull is expensive and can take seconds.

If the coordinator we pull from is at an old version, the pull will do
nothing and current node will soon forget the old version, initiating
another pull.

If some nodes stay at an old version for a long time for some reason,
this will make new coordinators initiate pulls frequently.

Increase the expiration period to 15 minutes to reduce the impact in
such scenarios.

Fixes #10042.

Message-Id: <20220207122317.674241-1-tgrabiec@scylladb.com>
2022-02-09 09:27:07 +02:00
Tomasz Grabiec
7ae947b7e1 Merge "raft: bootstrap nodes as non-voter" from Alejo
Make only the first node in group0 to start as voter. Subsequent nodes
start as non-voters and request change to voter once bootstrap is
successful.

Add support for this in raft and a couple of minor fixes.

* alejo/raft-join-non-voting-v6:
  raft: nodes joining as non-voters
  raft: group 0: use cfg.contains() for config check
  raft: modify_config: support voting state change
  raft: minor: fix log format string
2022-02-09 09:27:07 +02:00
Raphael S. Carvalho
d208d33636 Fix quadratic behavior and compaction inefficiency when adding new files
With trigger_compaction() being called after each new sstable is added
to the set, we'll get quadratic behavior because strategies like
tiered will sort all the candidates before iterating on them, so
complexity is ~ ((N - 1) * N * logN).
Additionally, compaction may be inefficient as we're not waiting for
the sstable set to settle, so table may end up missing files that
would allow for more efficient jobs.
The latter isn't a big problem because we have reshape running in an
earlier phase, so data layout should satisfy the strategy almost.
Boot is not affected by these problems because it temporarily
disables auto compaction, so trigger_compaction() is a no-op for it.
So refresh remains as the only one affected.

Fixes #10046.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220208151154.72606-1-raphaelsc@scylladb.com>
2022-02-09 09:27:07 +02:00
Alejo Sanchez
a0c2bc0df2 raft: nodes joining as non-voters
Except for the first node creating the group0, make other nodes join as
non-voters and make them voters after successful bootstrap.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-02-08 09:16:30 -04:00
Avi Kivity
5099b1e272 Merge 'Propagate coordinator timeouts for regular writes and batches without throwing' from Piotr Dulikowski
Currently, most of the failures that occur during CQL reads or writes are reported using C++ exceptions. Although the seastar framework avoids most of the cost of unwinding by keeping exceptions in futures as `std::exception_ptr`s, the exceptions need to be inspected at various points for the purposes of accounting metrics or converting them to a CQL error response. Analyzing the value and type of an exception held by `std::exception_ptr`'s cannot be done without rethrowing the exception, and that can be very costly even if the exception is immediately caught. Because of that, exceptions are not a good fit for reporting failures which happen frequently during overload, especially if the CPU is the bottleneck.

This PR introduces facilities for reporting exceptions as values using the boost::outcome library. As a first step, the need to use exceptions for reporting timeouts was eliminated for regular and batch writes, and no exceptions are thrown between creation of a `mutation_write_timeout_exception` and its serialization as a CQL response in the `cql_server`.

The types and helpers introduced here can be reused in order to migrate more exceptions and exception paths in a similar fashion.

Results of `perf_simple_query --smp 1 --operations-per-shard 1000000`:

    Master (00a9326ae7)
    128789.53 tps ( 82.2 allocs/op,  12.2 tasks/op,   49245 insns/op)

    This PR
    127072.93 tps ( 82.2 allocs/op,  12.2 tasks/op,   49356 insns/op)

The new version seems to be slower by about 100 insns/op, fortunately not by much (about 0.2%).

Tests: unit(dev), unit(result_utils_test, debug)

Closes #10014

* github.com:scylladb/scylla:
  cql_test_env: optimize handling result_message::exception
  transport/server: handle exceptions from coordinator_result without throwing
  transport/server: propagate coordinator_result to the error handling code
  transport/server: unwrap the exception result_message in process_xyz_internal
  query_processor: add exception-returning variants of execute_ methods
  modification_statement: propagate failed result through result_message::exception
  batch_statement: propagate failed result through result_message::exception
  cql_statement: add `execute_without_checking_exception_message`
  result_message: add result_message::exception
  storage_proxy: change mutate_with_triggers to return future<result<>>
  storage_proxy: add mutate_atomically_result
  storage_proxy: return result<> from mutate_result
  storage_proxy: return result<> from mutate_internal
  storage_proxy: properly propagate future from mutate_begin to mutate_end
  storage_proxy: handle exceptions as values in mutate_end
  storage_proxy: let mutate_end take a future<result<>>
  storage_proxy: resultify mutate_begin
  storage_proxy: use result in the _ready future of write handlers
  storage_proxy: introduce helpers for dealing with results
  exceptions: add coordinator_exception_container and coordinator_result
  utils: add result utils
  utils: add exception_container
2022-02-08 14:27:09 +02:00
Alejo Sanchez
2d9f40f716 raft: group 0: use cfg.contains() for config check
There will be nodes in non-voting state in configuration, so can_vote()
is not a good check. Use newer cfg.contains().

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-02-08 08:00:07 -04:00
Alejo Sanchez
627275945f raft: modify_config: support voting state change
Handle requests to change voting for servers already present in the
current configuration.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-02-08 08:00:07 -04:00
Alejo Sanchez
a40417df08 raft: minor: fix log format string
Fix format string for log line.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-02-08 08:00:07 -04:00
Piotr Dulikowski
ffd439d908 cql_test_env: optimize handling result_message::exception
The single_node_cql_env uses query_processor::execute_xyz family of
methods to perform operations. Due to previous commits in this series,
they allocate one more task than before - a continuation that converts
result_message::exception into an exceptional future. We can recover
that one task by using variants of those methods which do not perform a
conversion, and turn .finally() invocations into .then()s which perform
conversion manually.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
81968f2c3a transport/server: handle exceptions from coordinator_result without throwing
Instead of throwing the exception contained in failed `result<>`, it is
now inspected with a visitor which avoids the need for throwing.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
4cc5d582e3 transport/server: propagate coordinator_result to the error handling code
Now, the failed `result<>` is throwlessly propagated to the continuation
which converts exceptions to CQL response messages, and is thrown there.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
c750f7895f transport/server: unwrap the exception result_message in process_xyz_internal
At the point where `result_message` is converted to a
`cql_server::response`, now the result message is inspected and returned
as failed `result<>` if it contained an error.

For now, the failed `result<>` is thrown as exception in `process` and
`process_on_shard`, but that will change in the next commit.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
53f3feb103 query_processor: add exception-returning variants of execute_ methods
Adds variants of the execute_prepared, execute_direct and execute_batch
which are allowed to return exceptions as `result_message::exception`.

Because the `result_message::exception` must be explicitly handled by
the receiver, new variants are introduced in order not to accidentally
ignore the exception, which would be very bad.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
2572104dfe modification_statement: propagate failed result through result_message::exception
Modifies the modification_statement code so that is converts failed
`result<>` into a `result_message::exception` without involving the C++
exception runtime.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
f9d1914e1c batch_statement: propagate failed result through result_message::exception
Modifies the batch_statement code so that is converts failed `result<>`
into a `result_message::exception` without involving the C++ exception
runtime.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
e1d762b110 cql_statement: add execute_without_checking_exception_message
Adds a new virtual method to the cql_statement with a wordy name. The
new method is a variant of `execute`, but it is allowed to return errors
via the `result_message::exception` object.

The reason for an additional method is that there are many places in the
code which call `execute` but do not check the result in any way.
Because ignoring an exception unintentionally is a very bad thing, the
new method needs to be explicitly implemented by statements which can
return a `result_message::exception`, and explicitly called in the code
which is prepared to handle a `result_message::exception`.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
e4ff22b4ca result_message: add result_message::exception
In order to propagate exceptions as values through the CQL layer with
minimal modifications to the interfaces, a new result_message type is
introduced: result_message::exception. Similarly to
result_message::bounce_to_shard, this is an internal type which is
supposed to be handled before being returned to the client.
2022-02-08 11:08:42 +01:00