Commit Graph

42153 Commits

Author SHA1 Message Date
Kamil Braun
a0b331b310 gossiper: lock local endpoint when updating heart_beat
In testing, we've observed multiple cases where nodes would fail to
observe updated application states of other nodes in gossiper.

For example:
- in scylladb/scylladb#16902, a node would finish bootstrapping and enter
NORMAL state, propagating this information through gossiper. However,
other nodes would never observe that the node entered NORMAL state,
still thinking that it is in joining state. This would lead to further
bad consequences down the line.
- in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for
schema versions to converge. Convergence would never be achieved and the
test eventually timed out. The node was observing outdated schema state
of some existing node in gossip.

I created a test that would bootstrap 3 nodes, then wait until they all
observe each other as NORMAL, with timeout. Unfortunately, thousands of
runs of this test on different machines failed to reproduce the problem.

After banging my head against the wall failing to reproduce, I decided
to sprinkle randomized sleeps across multiple places in gossiper code
and finally: the test started catching the problem in about 1 in 1000
runs.

With additional logging and additional head-banging, I determined
the root cause.

The following scenario can happen, 2 nodes are sufficient, let's call
them A and B:
- Node B calls `add_local_application_state` to update its gossiper
  state, for example, to propagate its new NORMAL status.
- `add_local_application_state` takes a copy of the endpoint_state, and
  updates the copy:
```
            auto local_state = *ep_state_before;
            for (auto& p : states) {
                auto& state = p.first;
                auto& value = p.second;
                value = versioned_value::clone_with_higher_version(value);
                local_state.add_application_state(state, value);
            }
```
  `clone_with_higher_version` bumps `version` inside
  gms/version_generator.cc.
- `add_local_application_state` calls `gossiper.replicate(...)`
- `replicate` works in 2 phases to achieve exception safety: in first
  phase it copies the updated `local_state` to all shards into a
  separate map. In second phase the values from separate map are used to
  overwrite the endpoint_state map used for gossiping.

  Due to the cross-shard calls of the 1 phase, there is a yield before
  the second phase. *During this yield* the following happens:
- `gossiper::run()` loop on B executes and bumps node B's `heart_beat`.
  This uses the monotonic version_generator, so it uses a higher version
  then the ones we used for states added above. Let's call this new version
  X. Note that X is larger than the versions used by application_states
  added above.
- now node B handles a SYN or ACK message from node A, creating
  an ACK or ACK2 message in response. This message contains:
    - old application states (NOT including the update described above,
      because `replicate` is still sleeping before phase 2),
    - but bumped heart_beat == X from `gossiper::run()` loop,
  and sends the message.
- node A receives the message and remembers that the max
  version across all states (including heart_beat) of node B is X.
  This means that it will no longer request or apply states from node B
  with versions smaller than X.
- `gossiper.replicate(...)` on B wakes up, and overwrites
  endpoint_state with the ones it saved in phase 1. In particular it
  reverts heart_beat back to smaller value, but the larger problem is that it
  saves updated application_states that use versions smaller than X.
- now when node B sends the updated application_states in ACK or ACK2
  message to node A, node A will ignore them, because their versions are
  smaller than X. Or node B will never send them, because whenever node
  A requests states from node B, it only requests states with versions >
  X. Either way, node A will fail to observe new states of node B.

If I understand correctly, this is a regression introduced in
38c2347a3c, which introduced a yield in
`replicate`. Before that, the updated state would be saved atomically on
shard 0, there could be no `heart_beat` bump in-between making a copy of
the local state, updating it, and then saving it.

With the description above, it's easy to make a consistent
reproducer for the problem -- introduce a longer sleep in
`add_local_application_state` before second phase of replicate, to
increase the chance that gossiper loop will execute and bump heart_beat
version during the yield. Further commit adds a test based on that.

The fix is to bump the heart_beat under local endpoint lock, which is
also taken by `replicate`.

Fixes: scylladb/scylladb#15393
Fixes: scylladb/scylladb#15602
Fixes: scylladb/scylladb#16668
Fixes: scylladb/scylladb#16902
Fixes: scylladb/scylladb#17493
Fixes: scylladb/scylladb#18118
Ref: scylladb/scylla-enterprise#3720
2024-04-04 18:46:56 +02:00
Botond Dénes
d43670046b test/lib: random_schema: disallow boolean_type in keys
They result in poor distribution and poor cardinality, interfering with
tests which want to generate N partitions or rows.

Fixes: #17821

Closes scylladb/scylladb#17856
2024-04-03 09:52:36 +03:00
Botond Dénes
2cb5dcabf7 docs/dev/maintainer.md: document another exceptions to rule no.0
Maintainers are also allowed to commit their own backport PR. They are
allowed to backport their own code, opening a PR to get a CI run for a
backport doesn't change this.

Closes scylladb/scylladb#17727
2024-04-03 09:51:19 +03:00
Piotr Dulikowski
3ba7a4ead2 Merge 'api: upgrade_to_raft topology: add logging' from Benny Halevy
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Closes scylladb/scylladb#18143

* github.com:scylladb/scylladb:
  api: storage_service: upgrade_to_raft_topology: fixup indentation
  api: storage_service: upgrade_to_raft_topology: add logging
2024-04-03 07:00:10 +02:00
Pavel Emelyanov
8550a38a8b cql: Reserve vector of column definitions in advance
The vector in question is populted from the content of another map, so
its size is known in advance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18155
2024-04-02 22:35:10 +03:00
Benny Halevy
1272d736c0 api: storage_service: upgrade_to_raft_topology: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:51 +03:00
Benny Halevy
31026ae27f api: storage_service: upgrade_to_raft_topology: add logging
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Indentation will be fixed in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:49 +03:00
Kefu Chai
15d59db98b cql3: select_statement: include <ranges>
we should include used header, to avoid compilation failures like:
```
cql3/statements/select_statement.cc:229:79: error: no member named 'filter' in namespace 'std::ranges::views'
        for (const auto& used_function : used_functions | std::ranges::views::filter(not_native)) {
                                                          ~~~~~~~~~~~~~~~~~~~~^
1 error generated.`
```
if some of the included header drops its own `#include <optional>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18145
2024-04-02 18:47:54 +03:00
Botond Dénes
2179bfc40d Merge 'Relax initialization of virtual tables' from Pavel Emelyanov
It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read.

Closes scylladb/scylladb#18133

* github.com:scylladb/scylladb:
  virtual_tables: Open-code install_virtual_readers_and_writers()
  virtual_tables: Move readers setup loop into add_table()
  virtual_tables: Move tables creation loop into add_table()
  virtual_tables: Make add_tablet() a coroutine
  virtual_tables: Open-code register_virtual_tables()
2024-04-02 13:39:26 +03:00
Botond Dénes
469ff4f290 Merge 'repair: Load repair history in background' from Asias He
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.

In my test, to load 10M entries, it took around 60 seconds.

It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.

Fixes #17993

Closes scylladb/scylladb#17994

* github.com:scylladb/scylladb:
  repair: Load repair history in background
  repair: Abort load_history process in shutdown
2024-04-02 10:53:10 +03:00
Botond Dénes
fd12052c89 Update tools/java/ submodule
* tools/java/ d61296dc...b810e8b0 (1):
  > do not include {dclocal_,}read_repair_chance if not enabled
2024-04-02 10:47:57 +03:00
Yaron Kaikov
fcdb80773e github: sync-labels: run only in scylladb oss repo
We currently support the sync-label only in OSS. Since Scylla-enterprise
get all the commits from OSS repo, the sync-label is running and failing
during checkout (since it's a private repo and should have different
configuration)

For now, let's limit the workflows for oss repo

Closes scylladb/scylladb#18142
2024-04-02 10:45:17 +03:00
Botond Dénes
ffdd47c2b1 Merge 'Track and limit memory used by bloom filters' from Lakshmi Narayanan Sreethar
Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required.

The feature can be manually verified by doing the following :
1. run a single-node single-shard 1GB cluster
2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter)
3. populate with tiny partitions
4. watch the bloom filter metrics get capped at 100MB

The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature.

Fixes #17747

Closes scylladb/scylladb#17771

* github.com:scylladb/scylladb:
  test_bloom_filter.py: disable reclaiming memory from components
  sstable_datafile_test: add tests to verify auto reclamation of components
  test/lib: allow overriding available memory via test_env_config
  sstables_manager: support reclaiming memory from components
  sstables_manager: store available memory size
  sstables_manager: add variable to track component memory usage
  db/config: add a new variable to limit memory used by table components
  sstable_datafile_test: add testcase to verify reclamation from sstables
  sstables: support reclaiming memory from components
2024-04-02 10:40:52 +03:00
Amnon Heiman
803d414896 get_description.py: Make the Script a library
This patch makes the get_description.py script easier to use by the
documentation automation:
1. The script is now a library.
2. You can choose the output of the script, currently supported pipee
   and yml.

You can still call the from the command line, like before, but you can
also calls it from another python script.

For example the folowing python script would generate the documentation
for the metrics description of the ./alternator/ttl.cc file.
```

import get_description

metrics = get_description.get_metrics_from_file("./alternator/ttl.cc", "scylla", get_description.get_metrics_information("metrics-config.yml"))
get_description.write_metrics_to_file("out.yaml", metrics, "yml")
```

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18136
2024-04-02 10:07:11 +03:00
Botond Dénes
ea8478a3e7 scripts/open-coredump.sh: introduce --ci
Coredumps coming from CI are produced by a commit, which is not
available in the scylla.git repository, as CI runs on a merge commit
between the main branch (master or enterprise) and the tested PR branch.
Currently the script will attempt to checkout this commit and will fail
as the commit hash is unrecognized.
To work around this, add a --ci flag, which when used, will force the
main branch to be checked out, instead of the commit hash.

Closes scylladb/scylladb#18023
2024-04-02 09:27:52 +03:00
Kefu Chai
55d0ea48bd test: randomized_nemesis_test: remove fmt::formatter for seastar::timed_out_error
This reverts commit 97b203b1af.

since Seastar provides the formatter, it's not necessary to vendor it in
scylladb anymore.

Refs #13245

Closes scylladb/scylladb#18114
2024-04-02 09:25:51 +03:00
Benny Halevy
d5ac0c06b3 test_sstable_reversing_reader_random_schema: drop workaround for #9352
Issue #9352 was fixed about a year and a half ago
so this workaround should not be needed anymore.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18121
2024-04-02 09:25:06 +03:00
Raphael S. Carvalho
29f9f7594f replica: Kill table::storage_group_id_for_token()
storage_group_id_for_token() was only needed from within
tablet_storage_group_manager, so we can kill
table::storage_group_id_for_token().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18134
2024-04-02 09:23:23 +03:00
Asias He
99b7ccfa8b repair: Load repair history in background
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.

In my test, to load 10M entries, it took around 60 seconds.

It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.

Fixes #17993
2024-04-02 09:24:35 +08:00
Asias He
523895145d repair: Abort load_history process in shutdown
If the node is shutting down, there is no point to continue to load the
repair history.

Refs #17993
2024-04-02 09:24:35 +08:00
Lakshmi Narayanan Sreethar
d86505e399 test_bloom_filter.py: disable reclaiming memory from components
Disabled reclaiming memory from sstable components in the testcase as it
interferes with the false positive calculation.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
d261f0fbea sstable_datafile_test: add tests to verify auto reclamation of components
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
169629dd40 test/lib: allow overriding available memory via test_env_config
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
a36965c474 sstables_manager: support reclaiming memory from components
Reclaim memory from the SSTable that has the most reclaimable memory if
the total reclaimable memory has crossed the threshold. Only the bloom
filter memory is considered reclaimable for now.

Fixes #17747

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
2ca4b0a7a2 sstables_manager: store available memory size
The available memory size is required to calculate the reclaim memory
threshold, so store that within the sstables manager.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
f05bb4ba36 sstables_manager: add variable to track component memory usage
sstables_manager::_total_reclaimable_memory variable tracks the total
memory that is reclaimable from all the SSTables managed by it.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
e8026197d2 db/config: add a new variable to limit memory used by table components
A new configuration variable, components_memory_reclaim_threshold, has
been added to configure the maximum allowed percentage of available
memory for all SSTable components in a shard. If the total memory usage
exceeds this threshold, it will be reclaimed from the components to
bring it back under the limit. Currently, only the memory used by the
bloom filters will be restricted.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
e0b6186d16 sstable_datafile_test: add testcase to verify reclamation from sstables
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
4f0aee62d1 sstables: support reclaiming memory from components
Added support to track total memory from components that are reclaimable
and to reclaim memory from them if and when required. Right now only the
bloom filters are considered as reclaimable components but this can be
extended to any component in the future.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Pavel Emelyanov
627c5fdf04 virtual_tables: Open-code install_virtual_readers_and_writers()
It's pretty short already and is naturally a "part" of
initialize_virtual_tables(). Neither it installs writers any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:02:40 +03:00
Pavel Emelyanov
1d79cfc6cf virtual_tables: Move readers setup loop into add_table()
Similarly to previous patch, after virtual tables are registered the
registry is iterated over to install virtual readers onto each entry.
Again, this can happen at the time of registering, no need in dedicated
loop for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:01:50 +03:00
Pavel Emelyanov
891e792717 virtual_tables: Move tables creation loop into add_table()
Once virtual_tables map is populated, it's iterated over to create
replica::table entries for each virtual table. This can be done in the
same place where the virtual table is created, no need in dedicated loop
for it nowadays.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:38 +03:00
Pavel Emelyanov
420ce3634f virtual_tables: Make add_tablet() a coroutine
Next patches will populate it with sleeping calls, this patch prepares
for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:15 +03:00
Pavel Emelyanov
ddc6f9279f virtual_tables: Open-code register_virtual_tables()
It's naturally a "part" of initialize_virtual_tables(). Further patching
gets possible with it being open-coded.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 18:59:18 +03:00
Kefu Chai
c5601a749e github: sync_labels: do not error out if PR's cover letter is empty
if a pull request's cover letter is empty, `pr.body` is None. in that
case we should not try to pass it to `re.findall()` as the "string"
parameter. otherwise, we'd get

```
TypeError: expected string or bytes-like object, got 'NoneType'
```
so, in this change, we just return an empty list if the PR in question
has an empty cover letter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18125
2024-04-01 18:13:22 +03:00
Avi Kivity
88fb686d67 test: generate core dumps on crashes in debug clusters
The cluster manager library doesn't set the asan/ubsan options
to abort on error and create core dumps; this makes debugging much
harder.

Fix by preparing the environment correctly.

Fixes scylladb/scylladb#17510

Closes scylladb/scylladb#17511
2024-04-01 18:11:41 +03:00
Kefu Chai
07c40f5600 github: sync_labels: use ${{}} expression syntax in "if" condition
to ensure that the expression is evaluated properly.
see https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsstepsif

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18127
2024-04-01 17:17:43 +03:00
Kefu Chai
1494499f90 github: sync_labels: checkout a single file not the whole repo
what we need is but a script, so instead of checkout the whole repo,
with all history for all tags and branches, let's just checkout
a single file. faster this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18126
2024-04-01 17:15:50 +03:00
Yaron Kaikov
b8c705bc54 .github: sync-labels: fix pull request permissions
when adding a label to a PR request we keep getting the following error
message:
```
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 93, in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main
    sync_labels(repo, args.number, args.label, args.action, args.is_issue)
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 74, in sync_labels
    target.add_to_labels(label)
  File "/usr/lib/python3/dist-packages/github/Issue.py", line 321, in add_to_labels
    headers, data = self._requester.requestJsonAndCheck(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
    return self.__check(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"}
```

Based on
https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token.
The maximum access for pull requests from public forked repositories is
set to `read`

Switching to `pull_request_target` to solve it

Fixes: https://github.com/scylladb/scylladb/issues/18102

Closes scylladb/scylladb#18052
2024-04-01 17:11:35 +03:00
Pavel Emelyanov
46bbfc0c53 expression: Shorten making raw_value from FragmetedView
The read_field is std::optional<View>. The raw_value::make_value()
accepts managed_bytes_opt which is std::optional<manager_bytes>.
Finally, there's std::optional<T>::optional(std::optional<U>&&)
move constructor (and its copy-constructor peer).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18128
2024-04-01 16:52:18 +03:00
Benny Halevy
01fc1a9f66 schema_tables: std::move mutation into the mutation vector
To save a copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18120
2024-04-01 14:16:30 +03:00
Pavel Emelyanov
5427967f45 schema: Introduce build() && overload
The schema_builder::build() method creates a copy of raw schema
internaly in a hope that builder will be updated and be asked to build
the resulting schema again (e.g. alternator uses this).

However, there are places that build schema using temporary object once
in a `return schema_builder().with_...().build()` manner. For those
invocations copying raw schema is just waste of cycles.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18094
2024-04-01 14:00:42 +03:00
Nadav Har'El
b6854cbb21 Merge 'test/cql-pytest: match error message formated using {fmt} ' from Kefu Chai
currently, our homebrew formatter formats `std::map` like
```
{{k1, v1}, {k2, v2}}
```
while {fmt} formats a map like:
```
{k1: v1, k2: v2}
```
and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like
```
{"max_threshold": "1"}
```
before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.

Closes scylladb/scylladb#18058

* github.com:scylladb/scylladb:
  test/cql-pytest: match error message formated using {fmt}
  test/cql-pytest: extract scylla_error() for not allowed options test
2024-04-01 11:23:24 +03:00
Kefu Chai
fcf7ca5675 utils/logalloc: do not allocate memory in reclaim_timer::report()
before this change, `reclaim_timer::report()` calls

```c++
fmt::format(", at {}", current_backtrace())
```

which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.

in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.

Fixes #18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18100
2024-04-01 11:01:52 +03:00
Botond Dénes
885cb2af07 utils/rjson: include tasklocal backtrace in rapidjson assert error message
Currently, the error message on a failed RAPIDJSON_ASSERT() is this:

    rjson::error (JSON error: condition not met: false)

This is printed e.g. when the code processing a json expects an object
but the JSON has a different type. Or if a JSON object is missing an
expected member. This message however is completely inadequate for
determinig what went wrong. Change this to include a task-local
backtrace, like a real assert failure would. The new error looks like
this:

    rjson::error (JSON assertion failed on condition '{}' at: libseastar.so+0x56dede 0x2bde95e 0x2cc18f3 0x2cf092d 0x2d2316b libseastar.so+0x46b623)

Closes scylladb/scylladb#18101
2024-03-29 18:41:54 +01:00
Pavel Emelyanov
41a1b1c0d0 move_tablets: Emplace mutations into vector, not push
It's more applicable in this case.

Also, built tablets mutations are casted to canonical_mutations, but
when emplaced compiler can pick-up canonical_mutation(const mutation&)
constructor and the cast is not required.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18090
2024-03-29 15:21:49 +02:00
Kamil Braun
f5603ad9ca Merge 'test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero' from Mikołaj Grzebieluch
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it.

In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations.

Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use.

This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```

Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use.

Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```

Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation.

Fixes scylladb/scylladb#17977

Reapply b4144d14c6.

Closes scylladb/scylladb#17998

* github.com:scylladb/scylladb:
  test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero
  Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
2024-03-29 12:52:31 +01:00
Tzach Livyatan
4930095d39 Docs: Fix link fro scylla-sstable.rst to /architecture/sstable/
Fix https://github.com/scylladb/scylladb/issues/18096

Closes scylladb/scylladb#18097
2024-03-29 10:48:24 +02:00
Piotr Dulikowski
57719ece4f Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.

Closes scylladb/scylladb#18040

* github.com:scylladb/scylladb:
  main: reload service levels data accessor after join_cluster
  service: qos: create separate function for reloading data accessor
2024-03-29 09:39:11 +01:00
Kefu Chai
1632fbbef9 test/cql-pytest: match error message formated using {fmt}
currently, our homebrew formatter formats `std::map` like

{{k1, v1}, {k2, v2}}

while {fmt} formats a map like:

{k1: v1, k2: v2}

and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like

{"max_threshold": "1"}

before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-29 08:07:59 +08:00