Commit Graph

42525 Commits

Author SHA1 Message Date
Lakshmi Narayanan Sreethar
24064064e9 sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables
The testcase uses an sstable whose mutation key and the generation are
owned by different shards. Due to this, when process_sstable_dir is
called, the sstable gets loaded into a different shard than the one that
was intended. This also means that the sstable and the sstable manager
end up in different shards.

The following patch will introduce a condition variable in sstables
manager which will be signalled from the sstables. If the sstable and
the sstable manager are in different shards, the signalling will cause
the testcase to fail in debug mode with this error : "Promise task was
set on shard x but made ready on shard y". So, fix it by supplying
appropriate generation number owned by the same shard which owns the
mutation key as well.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
69b2a127b0 sstable_datafile_test: add test to verify reclaimed components reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
54bb03cff8 sstables: support reloading reclaimed components
Added support to reload components from which memory was previously
reclaimed as the total memory of reclaimable components crossed a
threshold. The implementation is kept simple as only the bloom filters
are considered reclaimable for now.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
2340ab63c6 sstables_manager: add new intrusive set to track the reclaimed sstables
The new set holds the sstables from where the memory has been reclaimed
and is sorted in ascending order of the total memory reclaimed.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
140d8871e1 sstable: add link and comparator class to support new instrusive set
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
3ef2f79d14 sstable: renamed intrusive list link type
Renamed the intrusive list link type to differentiate it from the set
link type that will be added in an upcoming patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
02d272fdb3 sstable: track memory reclaimed from components per sstable
Added a member variable _total_memory_reclaimed to the sstable class
that tracks the total memory reclaimed from a sstable.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
a53af1f878 sstable: rename local variable in sstable::total_reclaimable_memory_size
Renamed local variable in sstable::total_reclaimable_memory_size in
preparation for the next patch which adds a new member variable
_total_memory_reclaimed to the sstable class.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Kefu Chai
906700d523 test/nodetool: accept -1 returncode also when --help is invoked
in newer seastar, 0 is returned as the returncode of the application
when handling `--help`. to prepare for this behavior, let's
accept it before updating the seastar submodule.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18574
2024-05-09 08:26:44 +03:00
Kefu Chai
6047b3b6aa build: cmake: build async_utils.cc
async_utils.cc was introduced in e1411f39, so let's
update the cmake building system to build it. without
which, we'd run into link failure like:

```
ld.lld: error: undefined symbol: to_mutation_gently(canonical_mutation const&, seastar::lw_shared_ptr<schema const>)
>>> referenced by storage_service.cc
>>>               storage_service.cc.o:(service::storage_service::merge_topology_snapshot(service::raft_snapshot)) in archive service/Dev/libservice.a
>>> referenced by group0_state_machine.cc
>>>               group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>)) inarchive service/Dev/libservice.a
>>> referenced by group0_state_machine.cc
>>>               group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>) (.resume)) in archive service/Dev/libservice.a
>>> referenced 1 more times
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18524
2024-05-09 08:26:44 +03:00
Kefu Chai
c336904722 build: cmake: mark abseil include SYSTEM
this change is a followup of 0b0e661a. it helps to ensure that the header files in
abseil submodule have higher priority when the compiler includes abseil headers
when building with CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18523
2024-05-09 08:26:44 +03:00
Kefu Chai
2a9a874e19 db,service: fix typos in comments
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18567
2024-05-09 08:26:44 +03:00
Anna Stuchlik
65c8b81051 doc: add OS support in version 6.0
This commit adds OS support in version 6.0.
In addition, it removes the information about version 5.2, as this version is no longer supported, according to our policy.

Closes scylladb/scylladb#18562
2024-05-09 08:26:44 +03:00
Anna Stuchlik
74fb9808ed doc: update Consistent Topology with Raft
This PR:
- Removes the `.. only:: opensource` directive from Consistent Topology with Raft.
  This feature is no longer an Open Source-only experimental feature.
- Removes redundant version-specific information.
- Moves the necessary version-specific information to a separate file.

This is a follow-up to 55b011902e.

Refs https://github.com/scylladb/scylladb/pull/18285/

Closes scylladb/scylladb#18553
2024-05-09 08:26:44 +03:00
Calle Wilund
79d56ccaad commitlog: Fix request_controller semaphore accounting.
Fixes #18488

Due to the discrepancy between bytes added to CL and bytes written to disk
(due to CRC sector overhead), we fail to account for the proper byte count
when issuing account_memory_usage in allocate (using bytes added) and in
cycle:s notify_memory_written (disk bytes written).

This leads us to slowly, but surely, add to the semaphore all the time.
Eventually rendering it useless.

Also, terminate call would _not_ take any of this into account,
and the chunk overhead there would cause a (smaller) discrepancy
as well.

Fix by simply ensuring that buffer alloc handles its byte usage,
then accounting based on buffer position, not input byte size.

Closes scylladb/scylladb#18489
2024-05-09 08:26:44 +03:00
Botond Dénes
155332ebf8 Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov
Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail.

This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop.

Closes scylladb/scylladb#18408

* github.com:scylladb/scylladb:
  Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
  view: Abort pending view updates when draining
2024-05-09 08:26:44 +03:00
Aleksandra Martyniuk
67bbaad62e tasks: use default task_ttl in scylla.yaml
Currently default task_ttl_in_seconds is 0, but scylla.yaml changes
the value to 10.

Change task_ttl_in_seconds in scylla.yaml to 0, so that there are
consistent defaults. Comment it out.

Fixes: #16714.

Closes scylladb/scylladb#18495
2024-05-09 08:26:44 +03:00
Botond Dénes
0438febdc9 Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El
The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`.

Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table.

The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards.

Fixes #5883.

Closes scylladb/scylladb#18286

* github.com:scylladb/scylladb:
  test/alternator: test addressing LSI using REST API
  REST API: stop using deprecated, buggy, path parameter
2024-05-09 08:26:43 +03:00
Yaniv Michael Kaul
124064844f docs/dev/object_stroage.md: convert example AWS keys to be more innocent
Someone thought that they actually represent real keys (the 'EXAMPLE' in their name was not enough).
Converted them to be as clear as can be, example data.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#18565
2024-05-09 08:26:43 +03:00
Asias He
46269a99d8 repair: Add ranges_parallelism option support for tablet
The ranges_parallelism option is introduced in commit 9b3fd9407b.
Currently, this option works for vnode table repair only.

This patch enables it for tablet repair, since it is useful for
tablet repair too.

Fixes #18383

Closes scylladb/scylladb#18385
2024-05-09 08:26:43 +03:00
Benny Halevy
0156e97560 storage_proxy: cas: reject for tablets-enabled tables
Currently, LWT is not supported with tablets.
In particular the interaction between paxos and tablet
migration is not handled yet.

Therefore, it is better to outright reject LWT queries
for tablets-enabled tables rather than support them
in a flaky way.

This commit also marks tests that depend on LWT
as expeced to fail.

Fixes scylladb/scylladb#18066

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18103
2024-05-09 08:26:43 +03:00
Patryk Jędrzejczak
053a2893cf raft topology: join_token_ring: prevent shutdown hangs
Shutdown of a bootstrapping node could hang on
`_topology_state_machine.event.when()` in
`wait_for_topology_request_completion`. It caused
scylladb/scylladb#17246 and scylladb/scylladb#17608.

On a normal node, `wait_for_group0_stop` would prevent it, but this
function won't be called before we join group 0. Solve it by adding
a new subscriber to `_abort_source`.

Additionally, trigger `_group0_as` to prevent other hang scenarios.

Note that if both the new subscriber and `wait_for_group0_stop` are
called, nothing will break. `abort_source::request_abort` and
`conditional_variable::broken` can be called multiple times.

The raft-based topology is moved out of experimental in 6.0, no need
to backport the patch.

Fixes scylladb/scylladb#17246
Fixes scylladb/scylladb#17608

Closes scylladb/scylladb#18549
2024-05-09 08:26:43 +03:00
Botond Dénes
96a7ed7efb Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili
This is the second half of the fix for issue #13968. The first half is already merged with PR #18346

Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row.

This change adds a new counter for dead rows to the large_partitions table.

This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells.

This is an enhancement, and backporting is not needed.

Fixes #13968

Closes scylladb/scylladb#18458

* github.com:scylladb/scylladb:
  sstable: added test for counting dead rows
  sstable: added docs for system.large_partitions.dead_rows
  sstable: added cluster feature for dead rows and range tombstones
  sstable: write dead_rows count to system.large_partitions
  sstable: added counter for dead rows
2024-05-09 08:26:43 +03:00
David Garcia
d63d418ae3 docs: change "create an issue" github label to "type/documentation"
Closes scylladb/scylladb#18550
2024-05-09 08:26:43 +03:00
Kefu Chai
02be1e9309 .github: add clang-tidy workflow
clang-tidy is a tool provided by Clang to perform static analysis on
C++ source files. here, we are mostly intersted in using its
https://clang.llvm.org/extra/clang-tidy/checks/bugprone/use-after-move.html
check to reveal the potential issues.

this workflow is added to run clang-tidy when building the tree, so
that the warnings from clang-tidy can be noticed by developers.

a dedicated action is added so other github workflow can reuse it to
setup the building environment in an ubuntu:jammy runner.

clang-tidy-matcher.json is added to annotate the change, so that the
warnings are more visible with github webpage.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18342
2024-05-09 08:26:43 +03:00
David Garcia
4a1b109641 docs: add swagger ui extension
Renders the API Reference from api/api-doc using Swagger UI 2.2.10.

address comments

Closes scylladb/scylladb#18253
2024-05-09 08:26:43 +03:00
Kamil Braun
4dcae66380 Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski
Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior.

In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed.

Fixes: scylladb/scylladb#18369

Closes scylladb/scylladb#18379

* github.com:scylladb/scylladb:
  test: get rid of server-side server_restart
  test: util: get rid of the `restart` helper
  test: {auth,topology}: use manager.rolling_restart
2024-05-08 09:45:08 +02:00
Piotr Dulikowski
180cb7a2b9 storage_service: notify lifecycle subs only after token metadata update
Currently, in raft mode, when raft topology is reloaded from disk or a
notification is received from gossip about an endpoint change, token
metadata is updated accordingly. While updating token metadata we detect
whether some nodes are joining or are leaving and we notify endpoint
lifecycle subscribers if such an event occurs. These notifications are
fired _before_ we finish updating token metadata and before the updated
version is globally available.

This behavior, for "node leaving" notifications specifically, was not
present in legacy topology mode. Hinted handoff depends on token
metadata being updated before it is notified about a leaving node (we
had a similar issue before: scylladb/scylladb#5087, and we fixed it by
enforcing this property). Because this is not true right now for raft
mode, this causes the hint draining logic not to work properly - when a
node leaves the cluster, there should be an attempt to send out hints
for that node, but instead hints are not sent out and are kept on disk.

In order to fix the issue with hints, postpone notifying endpoint
lifecycle subscribers about joined and left nodes only after the final
token metadata is computed and replicated to all shards.

Fixes: scylladb/scylladb#17023

Closes scylladb/scylladb#18377
2024-05-08 09:40:44 +02:00
Kamil Braun
03818c4aa9 direct_failure_detector: increase ping timeout and make it tunable
The direct failure detector design is simplistic. It sends pings
sequentially and times out listeners that reached the threshold (i.e.
didn't hear from a given endpoint for too long) in-between pings.

Given the sequential nature, the previous ping must finish so the next
ping can start. We timeout pings that take too long. The timeout was
hardcoded and set to 300ms. This is too low for wide-area setups --
latencies across the Earth can indeed go up to 300ms. 3 subsequent timed
out pings to a given node were sufficient for the Raft listener to "mark
server as down" (the listener used a threshold of 1s).

Increase the ping timeout to 600ms which should be enough even for
pinging the opposite side of Earth, and make it tunable.

Increase the Raft listener threshold from 1s to 2s. Without the
increased threshold, one timed out ping would be enough to mark the
server as down. Increasing it to 2s requires 3 timed out pings which
makes it more robust in presence of transient network hiccups.

In the future we'll most likely want to decrease the Raft listener
threshold again, if we use Raft for data path -- so leader elections
start quickly after leader failures. (Faster than 2s). To do that we'll
have to improve the design of the direct failure detector.

Ref: scylladb/scylladb#16410
Fixes: scylladb/scylladb#16607

---

I tested the change manually using `tc qdisc ... netem delay`, setting
network delay on local setup to ~300ms with jitter. Without the change,
the result is as observed in scylladb/scylladb#16410: interleaving
```
raft_group_registry - marking Raft server ... as dead for Raft groups
raft_group_registry - marking Raft server ... as alive for Raft groups
```
happening once every few seconds. The "marking as dead" happens whenever
we get 3 subsequent failed pings, which is happens with certain (high)
probability depending on the latency jitter. Then as soon as we get a
successful ping, we mark server back as alive.

With the change, the phenomenon no longer appears.

Closes scylladb/scylladb#18443
2024-05-07 23:40:23 +02:00
Anna Stuchlik
98367cb6a1 doc: Snitch switch is not supported with tablets
This commit adds the tablets-related limitation:
if you use tablets, then changing snitch is not supported

Refs:https://github.com/scylladb/scylladb/issues/17513
See: https://github.com/scylladb/scylladb/issues/17513#issuecomment-2022552677

Closes scylladb/scylladb#18548
2024-05-07 17:26:05 +02:00
Pavel Emelyanov
677e80a4d5 table: Coroutinize table::delete_sstables_atomically()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18499
2024-05-07 17:10:28 +02:00
Kamil Braun
53443f566a Merge 'Coroutinize generic_server's listen() method' from Pavel Emelyanov
It needs some local naming cleanup, but otherwise it's pretty simple

Closes scylladb/scylladb#18510

* github.com:scylladb/scylladb:
  generic_server: Fix indentation after previous patch
  generic_server: Coroutinize listen() method
  generic_server: Rename creds argument to builder
2024-05-07 17:08:59 +02:00
Ferenc Szili
60bf846f68 sstable: added test for counting dead rows 2024-05-07 15:44:33 +02:00
Ferenc Szili
8e9771d010 sstable: added docs for system.large_partitions.dead_rows 2024-05-07 15:44:33 +02:00
Avi Kivity
9b8dfb2b19 compaction: compaction_strategy validation: don't rely on optional<> formatting
std::optional formatting changed while moving from the home-grown formatter to
the fmt provided formatter; don't rely on it for user visible messages.

Here, the optional formatted is known to be engaged, so just print it.

Closes scylladb/scylladb#18533
2024-05-07 12:02:33 +03:00
Kefu Chai
7e578ae964 message: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18527
2024-05-07 11:59:36 +03:00
Raphael S. Carvalho
570e3f8df0 compaction: exclude expired sstables from calculation of base timestamps
base timestamps are feeded into the sstable writer for calculating
delta, used by varints. given that expired ssts are bypassed, we
don't have to account them. so if we compacting fully expired and
new sstable together, we can save a bit by having a base ts closer
to the data actually written into output. also I wanted to move
the calculation into the loop in setup(), to avoid two iterations
over input set that can have even more than 1k elements.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18504
2024-05-07 08:43:50 +03:00
Raphael S. Carvalho
2d9142250e Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown
Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this
will be handled by explicitly closing the connection, so if scylla fails
to update gossiper state due to premature abort on shutdown, then we
won't be stuck in an endless reconnection attempt (later through
heartbeats (30s interval)), causing the test to timeout.

Manifests in scylla logs like this:
gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted)
gossip - failure_detector_loop: Finished main loop
migration_manager - stopping migration service
storage_service - Shutting down native transport server
gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
cql_server_controller - CQL server stopped
...
gossip - My status = NORMAL
gossip - Announcing shutdown
gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924
gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924
gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested)

Refs #14746.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18484
2024-05-07 02:31:02 +02:00
Piotr Dulikowski
5459cfed6a Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables

We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
  it errors when executing `list roles` or `list users` if
  there is no system_auth keyspace, it does support case when
  there is no expected tables)

Fixes https://github.com/scylladb/scylladb/issues/17737

Closes scylladb/scylladb#17939

* github.com:scylladb/scylladb:
  auth: don't run legacy migrations on auth-v2 startup
  auth: fix indent in password_authenticator::start
  auth: remove unused service::has_existing_legacy_users func
2024-05-06 19:53:35 +02:00
Wojciech Mitros
8472c46c8a service_level_controller: coroutinize notify_service_level_removed
To avoid conflicts arising from the discrepancy between different
versions of the repository, use coroutines instead of continuations
in service_level_controller::notify_service_level_removed().

Closes scylladb/scylladb#18525
2024-05-06 14:20:49 +03:00
Piotr Dulikowski
92e5018ddb test: get rid of server-side server_restart
Restarting a node amounts to just shutting it down and then starting
again. There is no good reason to have a dedicated endpoint in the
ScyllaClusterManager for restarting when it can be implemented by
calling two endpoints in a sequence: stop and start - it's just code
duplication.

Remove the server_restart endpoint in ScyllaClusterManager and
reimplement it as two endpoint calls in the ManagerClient.
2024-05-06 12:54:53 +02:00
Piotr Dulikowski
8de2bda7ae test: util: get rid of the restart helper
We already have `ManagerClient.server_restart`, which can be used in its
place.
2024-05-06 12:24:40 +02:00
Piotr Dulikowski
897e603bf0 test: {auth,topology}: use manager.rolling_restart
Instead of performing a rolling restart by calling `restart` in a loop
over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other
nodes see the currently processed node as up or down before proceeding
to the next step. Not doing so may lead to surprising behavior.

In particular, in scylladb/scylladb#18369, a test failed shortly after
restarting three nodes. Because nodes were restarted one after another
too fast, when the third node was restarted it didn't send a
notification to the second node because it still didn't know that the
second node was alive. This led the second node to notice that the third
node restarted by observing that it incremented its generation in gossip
(it restarted too fast to be marked as down by the failure detector). In
turn, this caused the second node to send "third node down" and "third
node up" notifications to the driver in a quick succession, causing it
to drop and reestablish all connections to that node. However, this
happened _after_ rolling upgrade finished and _after_ the test logic
confirmed that all nodes were alive. When the notifications were sent to
the driver, the test was executing some statements necessary for the
test to pass - as they broke, the test failed.

Fixes: scylladb/scylladb#18369
2024-05-06 12:24:40 +02:00
Kamil Braun
ccbb9f5343 Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak
We want to clear CDC generations that are no longer needed
(because all writes are already using a new generation) so they
don't take space and are not sent during snapshot transfers
(see e.g. https://github.com/scylladb/scylladb/issues/17545).

The condition used previously was that we clear generations which
were closed (i.e., a new generation started at this time) more than
24h ago. This is a safe choice, but too conservative: we could
easily end up with a large number of obsolete generations if we
boot multiple nodes during 24h (which is especially easy to do
with tablets.)

Change this bound from 24h to `5s + ring_delay`. The choice is
explained in a comment in the code.

Additionally, improve `test_raft_snapshot_request` that would
become flaky after the change so it's not sensitive to changes
anymore.

The raft-based topology was experimental before 6.0, no need
to backport.

Ref: scylladb/scylladb#17545

Closes scylladb/scylladb#18497

* github.com:scylladb/scylladb:
  topology_coordinator: clear obsolete generations earlier
  test: test_raft_snapshot_request: improve the last assertion
  test: test_raft_snapshot_request: find raft leader after restart
  test: test_raft_shanpshot_request: simplify appended_command
2024-05-06 12:03:33 +02:00
Kamil Braun
1a50a524e7 Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski
During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints().

Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case).

The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster.

In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size.

Fixes: scylladb/scylladb#18198

Closes scylladb/scylladb#18261

* github.com:scylladb/scylladb:
  test: topology: test that upgrade succeeds after recent removal
  topology_coordinator: compute cluster size correctly during upgrade
2024-05-06 11:06:09 +02:00
Piotr Dulikowski
64ba620dc2 Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek
This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased.

The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost.

Refs scylladb/scylladb#6403
Fixes scylladb/scylladb#12278

Closes scylladb/scylladb#15567

* github.com:scylladb/scylladb:
  docs: Update Hinted Handoff documentation
  db/hints: Add endpoint_downtime_not_bigger_than()
  db/hints: Migrate hinted handoff when cluster feature is enabled
  db/hints: Handle arbitrary directories in resource manager
  db/hints: Start using hint_directory_manager
  db/hints: Enforce providing IP in get_ep_manager()
  db/hints: Introduce hint_directory_manager
  db/hints/resource_manager: Update function description
  db/hints: Coroutinize space_watchdog::scan_one_ep_dir()
  db/hints: Expose update lock of space watchdog
  db/hints: Add function for migrating hint directories to host ID
  db/hints: Take both IP and host ID when storing hints
  db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
  db/hints: Migrate to locator::host_id
  db/hints: Remove noexcept in do_send_one_mutation()
  service: Add locator::host_id to on_leave_cluster
  service: Fix indentation
  db/hints: Fix indentation
2024-05-06 09:58:18 +02:00
Patryk Jędrzejczak
628d7e709e cdc: generation: fix retrieve_generation_data_v2
`system_keyspace::read_cdc_generation_opt` queries
`system.cdc_generations_v3`, which stores ids of CDC generations
as timeuuids. This function shouldn't be called with a normal uuid
(used by `system.cdc_generations_v2` to store generation ids).
Such a call would end with a marshaling error.

Before this patch,`retrieve_generation_data_v2` could call
`system_keyspace::read_cdc_generation_opt` with a normal uuid if
the generation wasn't present in `system.cdc_generations_v2`.
This logic caused a marshaling error while handling the
`check_and_repair_cdc_streams` request in the
`cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest.

This patch fixes the code being added in 6.0, no need to backport it.

Fixes scylladb/scylladb#18473

Closes scylladb/scylladb#18483
2024-05-06 09:12:47 +02:00
Kamil Braun
16846bf5ce Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb
With topology over raft all operation are already serialized by the
coordinator anyway, so no need to synchronize removenode using api lock.
All others are still synchronized since there cannot be executed in
parallel for the same node anyway.

* 'gleb/17681-fix' of github.com:scylladb/scylla-dev:
  storage_service: do not take API lock for removenode operation if topology coordinator is enabled
  test: return file mark from wait_for that points after the found string
2024-05-06 09:03:03 +02:00
Benny Halevy
ebff5f5d70 everywhere: include seastar headers using angle brackets
seastar is an external library therefore it should
use the system-include syntax.

Closes scylladb/scylladb#18513
2024-05-06 10:00:31 +03:00
Kefu Chai
5ca9a46a91 test/lib: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18515
2024-05-05 23:31:48 +03:00