Commit Graph

37954 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
7a7e287d8c compaction: add reshard_sstables_compaction_task_impl
Add task manager's task covering resharding compaction.

A struct and some functions are moved from replica/distributed_loader.cc
to compaction/task_manager_module.cc.
2023-07-19 17:15:40 +02:00
Aleksandra Martyniuk
e486f4eba6 compaction: create resharding_compaction_task_impl
resharding_compaction_task_impl serves as a base class of all
concrete resharding compaction task classes.
2023-07-19 10:41:35 +02:00
Kamil Braun
bfaac5192a gossiper: call on_remove subscriptions in the foreground in remove_endpoint
`gossiper::remove_endpoint` performs `on_remove` callbacks for all
endpoint change subscribers. This was done in the background (with a
discarded future) due the following reason:
```
    // We can not run on_remove callbacks here because on_remove in
    // storage_service might take the gossiper::timer_callback_lock
```
however, `gossiper::timer_callback_lock` no longer exists, it was
removed in 19e8c14.

Today it is safe to perform the `storage_service::on_remove` callback in
the foreground -- it's only taking the token metadata lock, which is
also taken and then released earlier by the same fiber that calls
`remove_endpoint` (i.e.  `storage_service::handle_state_normal`).

Furthermore, we want to perform it in the foreground. First, there
already was a comment saying:
```
     // do subscribers first so anything in the subscriber that depends on gossiper state won't get confused
```
it's not too precise, but it argues that subscriber callbacks should be
serialized with the rest of `remove_endpoint`, not done concurrently
with it.

Second, we now have a concrete reason to do them in the foreground. In
issue #14646 we observed that the subcriber callbacks are racing with
the bootstrap procedure. Depending on scheduling order, if
`storage_service::on_remove` is called too late, a bootstrapping node
may try to wait for a node that was earlier replaced to become UP which
is incorrect. By putting the `on_remove` call into foreground of
`remove_endpoint`, we ensure that a node that was replaced earlier will
not be included in the set of nodes that the bootstrapping node waits
for (because `storage_service::on_remove` will clear it from
`token_metadata` which we use to calculate this set of nodes).

We also get rid of an unnecessary `seastar::async` call.

Fixes #14646

Closes #14741
2023-07-18 21:29:29 +02:00
Pavel Emelyanov
8bc42f54d4 Merge 'feature_service: handle deprecated features correctly in feature check' from Piotr Dulikowski
The feature check in `enable_features_on_startup` loads the list
of features that were enabled previously, goes over every one of them
and checks whether each feature is considered supported and whether
there is a corresponding `gms::feature` object for it (i.e. the feature
is "registered"). The second part of the check is unnecessary
and wrong. A feature can be marked as supported but its `gms::feature`
object not be present anymore: after a feature is supported for long
enough (i.e. we only support upgrades from versions that support the
feature), we can consider such a feature to be deprecated.

When a feature is deprecated, its `gms::feature` object is removed and
the feature is always considered enabled which allows to remove some
legacy code. We still consider this feature to be supported and
advertise it in gossip, for the sake of the old nodes which, even
though they always support the feature, they still check whether other
nodes support it.

The problem with the check as it is now is that it disallows moving
features to the disabled list. If one tries to do it, they will find
out that upgrading the node to the new version does not work:
`enable_features_on_startup` will load the feature, notice that it is
not "registered" (there is no `gms::feature` object for it) and fail
to boot.

This commit fixes the problem by modifying `enable_features_on_startup`
not to look at the registered features list at all. In addition to
this, some other small cleanups are performed:

- "LARGE_COLLECTION_DETECTION" is removed from the deprecated features
  list. For some reason, it was put there when the feature was being
  introduced. It does not break anything because there is
  a `gms::feature` object for it, but it's slightly confusing
  and therefore is removed.
- The comment in `supported_feature_set` that invites developers to add
  features there as they are introduced is removed. It is no longer
  necessary to do so because registered features are put there
  automatically. Deprecated features should still be put there,
  as indicated as another comment.

Fortunately, this issue does not break any upgrades as of now - since
we added enabled cluster feature persisting, no features were
deprecated, and we only add registered features to the persisted feature
list.

An error injection and a regression test is added.

Closes #14701

* github.com:scylladb/scylladb:
  topology_custom: add deprecated features test
  feature_service: add error injection for deprecated cluster feature
  feature_service: move error injection check to helper function
  feature_service: handle deprecated features correctly in feature check
2023-07-18 21:01:48 +03:00
Kamil Braun
6f22ed9145 Merge 'raft: move group0_state_machine::merger to its own header and add unit test for it' from Mikołaj Grzebieluch
Move `merger` to its own header file. Leave the logic of applying
commands to `group0_state_machine`. Remove `group0_state_machine`
dependencies from `merger` to make it an independent module.

Add a test that checks if `group0_state_machine_merger` preserves
timeuuid monotonicity. `last_id()` should be equal to the largest
timeuuid, based on its timestamps.

This test combines two commands in the reverse order of their timeuuids.
The timeuuids yield different results when compared in both timeuuid
order and uuid order. Consequently, the resulting command should have a
more recent timeuuid.

Fixes #14568

Closes #14682

* github.com:scylladb/scylladb:
  raft: group0_state_machine_merger: add test for timeuuid ordering
  raft: group0_state_machine: extract merger to its own header
2023-07-18 17:43:50 +02:00
Kamil Braun
56c91473f2 Merge 'storage_proxy: silence abort_requested_exception on reads and writes' from Patryk Jędrzejczak
Fixes #10447

This issue is an expected behavior. However, `abort_requested_exception` is not handled properly.

-- Why this issue appeared

1. The node is drained.
2. `migration_manager::drain` is called and executes `_as.request_abort();`.
3. The coordinator sends read RPCs to the drained replica. On the replica side, `storage_proxy::handle_read` calls `migration_manager::get_schema_for_read`, which is defined like this:
```cpp
future<schema_ptr> migration_manager::get_schema_for_write(/* ... */) {
    if (_as.abort_requested()) {
        co_return coroutine::exception(std::make_exception_ptr(abort_requested_exception()));
    }
    /* ... */
```
So, `abort_requested_exception` is thrown.
4. RPC doesn't preserve information about its type, and it is converted to a string containing its error message.
5. It is rethrown as `std::runtime_error` on the coordinator side, and `abstract_resolve_reader::error()` logs information about it. However, we don't want to report `abort_requested_exception` there. This exception should be catched and ignored:
```cpp
void error(/* ... */) {
    /* ... */
   else if (try_catch<abort_requested_exception>(eptr)) {
        // do not report aborts, they are trigerred by shutdown or timeouts
    }
    /* ... */
```

-- Proposed solution

To fix this issue, we can add `abort_requested_exception` to `replica::exception_variant` and make sure that if it is thrown by `migration_manager::get_schema_for_write`, `storage_proxy::handle_read` correctly  encodes it. Thanks to this change, `abstract_read_resolver::error` can correctly handle `abort_requested_exception` thrown on the replica side by not reporting it.

-- Side effect of the proposed solution

If the replica supports it, the coordinator doesn't, and all nodes support `feature_service::typed_errors_in_read_rpc`, the coordinator will fail to decode `abort_requested_exception` and it will be decoded to `unknown_exception`. It will still be rethrown as `std::runtime_error`, however the message will change from *abort requested* to *unknown exception*.

-- Another issue

Moreover, `handle_write` reports abort requests for the same reason. This also floods the logs (this time on the replica side) for the same reason. I don't think it is intended, so I've changed it too. This change is in the last commit.

Closes #14681

* github.com:scylladb/scylladb:
  service: storage_proxy: do not report abort requests in handle_write
  service: storage_proxy: encode abort_requested_exception in handle_read
  service: storage_proxy: refactor encode_replica_exception_for_rpc
  replica: add abort_requested_exception to exception_variant
2023-07-18 17:04:05 +02:00
Nadav Har'El
4ce46a998a cql-pytest: translate Cassandra's tests for BATCH operations
This is a translation of Cassandra's CQL unit test source file
BatchTest.java into our cql-pytest framework.

This test file an old (2014) and small test file, with only a few minimal
testing of mostly error paths in batch statements. All test tests pass in
both Cassandra and Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14733
2023-07-18 17:01:18 +03:00
Raphael S. Carvalho
da18a9badf Fix test.py with compaction groups
test.py with --x-log2-compaction-groups option rotted a little bit.
Some boost tests added later didn't use the correct header which
parses the option or they didn't adjust suite.yaml.
Perhaps it's time to set up a weekly (or bi-weekly) job to verify
there are no regressions with it. It's important as it stresses
the data plane for tablets reusing the existing tests available.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14732
2023-07-18 16:57:11 +03:00
Botond Dénes
7d5cca1958 Merge 'Regular compaction task' from Aleksandra Martyniuk
Task manager's tasks covering regular compaction.

Uses multiple inheritance on already existing
regular_compaction_task_executor to keep track of
the operation with task manager.

Closes #14377

* github.com:scylladb/scylladb:
  test: add regular compaction task test
  compaction: turn regular_compaction_task_executor into regular_compaction_task_impl
  compaction: add compaction_manager::perform_compaction method
  test: modify sstable_compaction_test.cc
  compaction: add regular_compaction_task_impl
  compaction: switch state after compaction is done
2023-07-18 16:52:53 +03:00
Kefu Chai
4661671220 s3/test: do not keep the tempdir forever
by default, up to 3 temporary directories are kept by pytest.
but we run only a single time for each of the $TMPDIR. per
our recent observation, it takes a lot more time for jenkins
to scan the tempdir if we use it for scylla's rundir.

so, to alleviate this symptom, we just keep up to one failed
session in the tempdir. if the test passes, the tempdir
created by pytest will be nuked. normally it is located at
scylladb/testlog/${mode}/pytest-of-$(whoami).

see also
https://docs.pytest.org/en/7.3.x/reference/reference.html#confval-tmp_path_retention_policy

Refs #14690
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14735

[xemul: Withdrawing from PR's comments

    object_store is the only test which

        is using tmpdir fixture
        starts / stops scylla by itself
        and put the rundir of scylla in its own tmpdir

    we don't register the step of cleaning up [the temp dir] using the utilities provided by
    cql-pytest. we rely on pytest to perform the cleanup. while cql-pytest performs the
    cleanup using a global registry.
]
2023-07-18 16:49:25 +03:00
Kamil Braun
69e22de54d Merge 'minor test/pylib type fixes' from Alecco
Some minor fixes reported by `mypy`.

Closes #14693

* github.com:scylladb/scylladb:
  test/pylib: fix function attribute
  test/pylib: check cmd is defined before using it
  test/pylib: fix return type hint
  test/pylib: remove redundant method
2023-07-18 15:17:51 +02:00
Avi Kivity
a51fdadfed Merge 'treewide: remove #includes not use directly' from Kefu Chai
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.

in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.

if a forward declaration suffice, the declaration is added instead.

see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning

Closes #14740

* github.com:scylladb/scylladb:
  treewide: remove #includes not use directly
  size_tiered_backlog_tracker: do not include remove header
2023-07-18 14:45:33 +03:00
Alejo Sanchez
8fceb7b7a0 test/pylib: fix function attribute
Instead of globally hardcoding an attribute, set it in the function
itself.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-07-18 13:33:46 +02:00
Alejo Sanchez
f7ee4ee7f6 test/pylib: check cmd is defined before using it
Add an assert to check cmd is defined. Helps the type checker.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-07-18 13:33:46 +02:00
Alejo Sanchez
ff564583a4 test/pylib: fix return type hint
Fix type hint of return when using @asynccontextmanager.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-07-18 13:33:46 +02:00
Alejo Sanchez
2194d8864b test/pylib: remove redundant method
The ManagerClient.get_cql method is defined twice. Remove one and fix
the assert.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-07-18 13:33:46 +02:00
Kamil Braun
eb6202ef9c Merge 'db: hints: add checksum to sync_point encoding' from Patryk Jędrzejczak
Fixes #9405

`sync_point` API provided with incorrect sync point id might allocate
crazy amount of memory and fail with `std::bad_alloc`.

To fix this, we can check if the encoded sync point has been modified
before decoding. We can achieve this by calculating a checksum before
encoding, appending it to the encoded sync point, and compering it with
a checksum calculated in `db::hints::decode` before decoding.

Closes #14534

* github.com:scylladb/scylladb:
  db: hints: add checksum to sync point encoding
  db: hints: add the version_size constant
2023-07-18 13:05:10 +02:00
Kefu Chai
bab16eb30e treewide: remove #includes not use directly
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.

in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.

if a forward declaration suffice, the declaration is added instead.

see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-18 17:36:31 +08:00
Kefu Chai
58302ab145 size_tiered_backlog_tracker: do not include remove header
according to cppreference,

> <ctgmath> is deprecated in C++17 and removed in C++20

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-18 17:36:31 +08:00
Michał Jadwiszczak
62ced66702 schema: add scylla specific options to schema description
Add `paxos_grace_seconds`, `tombstone_gc`, `cdc` and `synchronous_updates`
options to schema description.

Fixes: #12389
Fixes: scylladb/scylla-enterprise#2979

Closes #14275
2023-07-18 11:16:19 +03:00
Botond Dénes
21ff6efd74 test/boost/view_build_test: improve test_view_update_generator_register_semaphore_unit_leak
By making it independent of the number of units the view update
generator's registration semaphore is created with. We want to increase
this number significantly and that would destabilize this test
significantly. To prevent this, detach the test from the number of units
completely, while stil preserving the original intent behind it, as best
as it could be determined.

Closes #14727
2023-07-18 09:18:28 +03:00
Alejo Sanchez
13e31eaeca test.py: show mode and suite name when listing tests
For --list, show also mode and suite name.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #14729
2023-07-18 09:06:47 +03:00
Botond Dénes
b3cb611be7 Merge 'treewide: enable -Wsign-compare and address the warnings from this option' from Kefu Chai
in order to identify the problems caused by integer type promotion when comparing unsigned and signed integers, in this series, we

- address the warnings raised by `-Wsign-compare` compiler option
- add `-Wsign-compare` compiler option to the building systems

Closes #14652

* github.com:scylladb/scylladb:
  treewide: use unsigned variable to compare with unsigned
  treewide: compare signed and unsigned using std::cmp_*()
2023-07-18 09:05:30 +03:00
Botond Dénes
6961fbcec7 Merge 'Add the metrics config api' from Amnon Heiman
This series is based on top of the seastar relabel config API.

The series adds a REST API for the configuration, it allows to get and set it.

The API is registered under the V2 prefix and uses the swagger 2.0 definition.

After this series to get the current relabel-config configuration:

```
    curl -X GET --header 'Accept: application/json' 'http://localhost:10000/v2/metrics-config/'
```

A set config example:
```
    curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '[ \
       { \
         "source_labels": [ \
           "__name__" \
         ], \
         "action": "replace", \
         "target_label": "level", \
         "replacement": "1", \
         "regex": "io_que.*" \
       } \
     ]' 'http://localhost:10000/v2/metrics-config/'
```

This is how it looks like in the UI
![image](https://user-images.githubusercontent.com/2118079/230763730-bafcaf8b-ea6d-4a6c-a778-6271fa3b6f82.png)

Closes #12670

* github.com:scylladb/scylladb:
  api: Add the metrics API
  api/config: make it optional if the config API is the first to register
  api: Add the metrics.json Swagger file
  Preparing for V2 API from files
2023-07-18 07:10:31 +03:00
Botond Dénes
f03efd7ea9 Merge 'build: cmake: fix the build of some tests' from Kefu Chai
this series addresses the FTBFS of tests with CMake, and also checks for the unknown parameters in `add_scylla_test()`

Closes #14650

* github.com:scylladb/scylladb:
  build: cmake: build SEASTAR tests as SEASTAR tests
  build: cmake: error out if found unknown keywords
  build: cmake: link tests against necessary libraries
2023-07-18 06:51:40 +03:00
Kefu Chai
4c1a26c99f compaction_manager: sort sstables when compaction is enabled
before this change, we sort sstables with compaction disabled, when we
are about to perform the compaction. but the idea of of guarding the
getting and registering as a transaction is to prevent other compaction
to mutate the sstables' state and cause the inconsistency.

but since the state is tracked on per-sstable basis, and is not related
to the order in which they are processed by a certain compaction task.
we don't need to guard the "sort()" with this mutual exclusive lock.

for better readability, and probably better performance, let's move the
sort out of the lock. and take this opportunity to use
`std::ranges::sort()` for more concise code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14699
2023-07-18 06:40:43 +03:00
Kefu Chai
fa3129fa29 treewide: use unsigned variable to compare with unsigned
some times we initialize a loop variable like

auto i = 0;

or

int i = 0;

but since the type of `0` is `int`, what we get is a variable of
`int` type, but later we compare it with an unsigned number, if we
compile the source code with `-Werror=sign-compare` option, the
compiler would warn at seeing this. in general, this is a false
alarm, as we are not likely to have a wrong comparison result
here. but in order to prevent issues due to the integer promotion
for comparison in other places. and to prepare for enabling
`-Werror=sign-compare`. let's use unsigned to silence this warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-18 10:27:18 +08:00
Kefu Chai
3129ae3c8c treewide: compare signed and unsigned using std::cmp_*()
when comparing signed and unsigned numbers, the compiler promotes
the signed number to coomon type -- in this case, the unsigned type,
so they can be compared. but sometimes, it matters. and after the
promotion, the comparison yields the wrong result. this can be
manifested using a short sample like:

```
int main(int argc, char **argv) {
    int x = -1;
    unsigned y = 2;
    fmt::print("{}\n", x < y);
    return 0;
}
```

this error can be identified by `-Werror=sign-compare`, but before
enabling this compiling option. let's use `std::cmp_*()` to compare
them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-18 10:27:18 +08:00
Amnon Heiman
123dd44c21 api: Add the metrics API
This patch adds a metrics API implementation.
The API supports get and set the metric relabel config.

Seastar supports metrics relabeling in runtime, following Prometheus
relabel_config.

Based on metrics and label name, a user can add or remove labels,
disable a metric and set the skip_when_empty flag.

The metrics-config API support such configuration to be done using the
RestFull API.

As it's a new API it is placed under the V2 path.

After this patch the following API will be available
'http://localhost:10000/v2/metrics-config/' GET/POST.

For example:
To get the current config:
```
curl -X GET --header 'Accept: application/json' 'http://localhost:10000/v2/metrics-config/'
```

To set a config:
```
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '[ \
   { \
     "source_labels": [ \
       "__name__" \
     ], \
     "action": "replace", \
     "target_label": "level", \
     "replacement": "1", \
     "regex": "io_que.*" \
   } \
 ]' 'http://localhost:10000/v2/metrics-config/'
```
2023-07-17 17:09:36 +03:00
Amnon Heiman
eeac846ea7 api/config: make it optional if the config API is the first to register
Until now, only the configuration API was part of the V2 API.

Now, when other APIs are added, it is possible that another API would be
the first to register. The first to register API is different in the
sense that it does not have a leading ',' to it.

This patch adds an option to mark the config API if it's the first.
2023-07-17 17:09:35 +03:00
Amnon Heiman
d694a42745 api: Add the metrics.json Swagger file
This patch adds the swagger definition for the metrics API.

Currently, the API defines a get and set of the metric_relabel_config.
2023-07-17 17:09:35 +03:00
Amnon Heiman
9e0ec3afba Preparing for V2 API from files
This patch changes the base path of the V2 of the API to be '/'.  That
means that the v2 prefix will be part of the path definition.
Currently, it only affect the config API that is created from code.

The motivation for the change is for Swagger definitions that are read
from a file.  Currently, when using the swagger-ui with a doc path set
to http://localhost:10000/v2 and reading the Swagger from a file swagger
ui will concatenate the path and look for
http://localhost:10000/v2/v2/{path}

Instead, the base path is now '/' and the /v2 prefix will be added by
each endpoint definition.

From the user perspective, there is no change in current functionality.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2023-07-17 17:09:35 +03:00
Patryk Jędrzejczak
02618831ef db: hints: add checksum to sync point encoding
sync point API provided with incorrect sync point id might allocate
crazy amount of memory and fail with std::bad_alloc.

To fix this, we can check if the encoded sync point has been modified
before decoding. We can achieve this by calculating a checksum before
encoding, appending it to the encoded sync point, and compering
it with a checksum calculated in db::hints::decode before decoding.
2023-07-17 16:05:07 +02:00
Patryk Jędrzejczak
0a424e1760 db: hints: add the version_size constant
The next commit changes the format of encoding sync points to V2. The
new format appends the checksum to the encoded sync points and its
implementation uses the checksum_size constant - the number of bytes
required to store the checksum. To increase consistency and readability,
we can additionally add and use the version_size constant.

Definitions of sync_point::decode and sync_point::encode are slightly
changed so that they don't depend on the version_size value and make
implementation of the V2 format easier.
2023-07-17 16:02:18 +02:00
Aleksandra Martyniuk
7dbe624dee test: add regular compaction task test 2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk
2e87ba1879 compaction: turn regular_compaction_task_executor into regular_compaction_task_impl
regular_compaction_task_executor inherits both from compaction_task_executor
and regular_compaction_task_impl.
2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk
e3b068be4d compaction: add compaction_manager::perform_compaction method 2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk
ab4ae6b84a test: modify sstable_compaction_test.cc
Modify sstable_compaction_test.cc so that it does not depend on
how quick compaction manager stats are updated after compaction
is triggered.

It is required since in the following changes the context may
switch before the stats are updated.
2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk
9fdd130943 compaction: add regular_compaction_task_impl
regular_compaction_task_impl serves as a base class of all
concrete regular compaction task classes.
2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk
33cb156ee3 compaction: switch state after compaction is done
Compaction task executors which inherit from compaction_task_impl
may stay in memory after the compaction is finished.
Thus, state switch cannot happen in destructor.

Switch state to none in perform_task defer.
2023-07-17 15:54:33 +02:00
Mikołaj Grzebieluch
bdf3959ae6 raft: group0_state_machine_merger: add test for timeuuid ordering
This test checks if `group0_state_machine_merger` preserves timeuuid monotonicity.
`last_id()` should be equal to the largest timeuuid, based on its timestamps.

This test combines two commands in the reverse order of their timeuuids.
The timeuuids yield different results when compared in both timeuuid order and
uuid order. Consequently, the resulting command should have a more recent timeuuid.

Closes #14568
2023-07-17 15:51:20 +02:00
Mikołaj Grzebieluch
96c6e0d0f7 raft: group0_state_machine: extract merger to its own header
Move `merger` to its own header file. Leave the logic of applying commands to
`group0_state_machine`. Remove `group0_state_machine` dependencies from `merger`
to make it an independent module. Add `static` and `const` keywords to its
methods signature. Change it to `class`. Add documentation.

With this patch, it is easier to write unit tests for the merger.
2023-07-17 15:45:49 +02:00
Anna Stuchlik
2aa3672e5f doc: fix the 5.2-to-5.3 upgrade guide
Fixes https://github.com/scylladb/scylladb/issues/13993

This commit applies feedback from @mykaul added in
https://github.com/scylladb/scylladb/pull/13960 after
it was merged.

In addition, I've removed the information about
the Ubuntu version the images are based - the info
doesn't belong here, and, it addition, it causes
maintenance issues.

Closes #14703
2023-07-17 15:26:33 +02:00
Patryk Jędrzejczak
7ae7be0911 locator: remove this_host_id from topology::config
The `locator::topology::config::this_host_id` field is redundant
in all places that use `locator::topology::config`, so we can
safely remove it.

Closes #14638

Closes #14723
2023-07-17 14:57:36 +02:00
Patryk Jędrzejczak
56bd9b5db3 service: storage_proxy: do not report abort requests in handle_write
We don't want to report aborts in storage_proxy::handle_write, because it
can be only triggered by shutdowns and timeouts.

Before this change, such reports flooded logs when a drained node still
received the write RPCs.
2023-07-17 12:27:36 +02:00
Patryk Jędrzejczak
f9db9f5943 service: storage_proxy: encode abort_requested_exception in handle_read
storage_proxy::handle_read now makes sure that abort_requested_exception
is encoded in a way that preserves its type information. This allows
the coordinator to properly deserialize and handle it.

Before this change, if a drained replica was still receiving the read
RPCs, it would flood the coordinator's logs with std::runtime_error
reports.
2023-07-17 12:27:36 +02:00
Patryk Jędrzejczak
68bd0424c2 service: storage_proxy: refactor encode_replica_exception_for_rpc
To properly handle abort_requested_exception thrown from
migration_manager::get_schema_for_read in storage_proxy::handle_read (we
do in the next commit) we have to somehow encode and return it. The
encode_replica_exception_for_rpc function is not suitable for that because
it requires the SourceTuple type (of a value returned by do_query()) which
we don't know when calling get_schema_for_read.

We move the part of encode_replica_exception_for_rpc responsible for
handling exceptions to a new function and rewrite it in a way that doesn't
require the SourceTuple type. As this function fits the name
encode_replica_exception_for_rpc better, we name it this way and rename
the previous encode_replica_exception_for_rpc.
2023-07-17 12:27:33 +02:00
Patryk Jędrzejczak
7f83dbd9e7 test: disable raft-topology in test_remove_garbage_group0_members
With Raft-topology enabled, test_remove_garbage_group0_members has been
flaky when it should always fail. This has been discussed in #14614.

Disabling Raft-topology in the topology suite is problematic because
the initial cluster size is non-zero, so we have nodes that already use
Raft-topology at the beginning of the test. Therefore, we move
test_topology_remove_garbage_group0.py to the topology_custom suite.
Apart from disabling Raft-topology, we have to start 4 servers instead
of 1 because of the different initial cluster sizes.

Closes #14692
2023-07-17 11:42:57 +02:00
Anna Stuchlik
c53bbbf1b9 doc: document nodetool checkAndRepairCdcStreams
Fixes https://github.com/scylladb/scylladb/issues/13783

This commit documents the nodetool checkAndRepairCdcStreams
operation, which was missing from the docs.

The description is added in a new file and referenced from
the nodetool operations index.

Closes #14700
2023-07-17 11:41:54 +02:00
Avi Kivity
bfaac3a239 Merge 'Make replace sstables implementations exception safe' from Benny Halevy
This is the first phase of providing strong exception safety guarantees by the generic `compaction_backlog_tracker::replace_sstables`.

Once all compaction strategies backlog trackers' replace_sstables provide strong exception safety guarantees (i.e. they may throw an exception but must revert on error any intermediate changes they made to restore the tracker to the pre-update state).

Once this series is merged and ICS replace_sstables is also made strongly exception safe (using infrastructure from size_tiered_backlog_tracker introduced here), `compaction_backlog_tracker::replace_sstables` may allow exceptions to propagate back to the caller rather than disabling the backlog tracker on errors.

Closes #14104

* github.com:scylladb/scylladb:
  leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees
  size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution
  size_tiered_backlog_tracker: make log4 helper static
  size_tiered_backlog_tracker: define struct sstables_backlog_contribution
  size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed
  compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
  compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety
2023-07-17 12:32:27 +03:00