Commit Graph

44545 Commits

Author SHA1 Message Date
Pavel Emelyanov
f500ee690b test: Threadify generate_clustered() callers
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:34:54 +03:00
Pavel Emelyanov
08186c048d test: Threadify test_no_clustered test
And update its callers.
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:26:25 +03:00
Pavel Emelyanov
5f0a40f959 test: Threadify nonexistent_key test
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:26:13 +03:00
Pavel Emelyanov
a150a63259 test: Squash two open_sstables() helper together
One accepts integer generations, another one accepts "generic" ones. The
latter is only called by the former, so no sense in keeping it around.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 09:08:40 +03:00
Pavel Emelyanov
4184c688ea test: Coroutinize open_sstables() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 09:08:12 +03:00
Piotr Dulikowski
ecd53db3b0 service/qos: remove the marked_for_deletion parameter
It is always set to false and it doesn't seem to serve any function now.
2024-09-04 21:52:34 +02:00
Piotr Dulikowski
bae6076541 service/qos: add constructors to service_level
Add a default constructor and a constructor which explicitly
initializes all fields of the service_level structure.

This is done in order to make sure that removal of the
marked_for_deletion field can be done safely - otherwise, for example,
service_level could be aggregate-initialized with an incomplete list of
values for the fields, and removing marked_for_deletion which is in the
middle of the struct would cause the is_static field to be initialized
with the value that was designated for marked_for_deletion.

As a bonus, make sure that marked_for_deletion and is_static bool fields
are initialized in the default constructor to false in order to avoid
potential undefined behavior.
2024-09-04 21:52:13 +02:00
Avi Kivity
ec8590ae6c Merge 'Always pass abort_source& to raft_group0_client::hold_read_apply_mutex' from Kamil Braun
There are two versions of `raft_group0_client::hold_read_apply_mutex`, one takes `abort_source&`, the other doesn't. Modify all call sites that used the non-abort-source version to pass an `abort_source&`, allowing us to remove the other overload.

If there is no explicit reason not to pass an `abort_source&`, then one should be passed by default -- it often prevents hangs during shutdown.

---

No backport needed -- no known issues affected by this change.

Closes scylladb/scylladb#19996

* github.com:scylladb/scylladb:
  raft_group0_client: remove `hold_read_apply_mutex` overload without `abort_source&`
  storage_service: pass `_abort_source` to `hold_read_apply_mutex`
  group0_state_machine: pass `_abort_source` to `hold_read_apply_mutex`
  api: move `reload_raft_topology_state` implementation inside `storage_service`
2024-09-04 21:35:27 +03:00
Kefu Chai
fe0e961856 docs: do not install scylla/ppa repo when perform upgrade
for following reasons:

1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing.
2. the ppa in question does not provide the packages used in production. it does provides the package for *building* scylla
3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages.

so, in this change, we remove all references to enabling the Scylla/PPA repository.

Fixes scylladb/scylladb#20449

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20450
2024-09-04 20:30:40 +03:00
Avi Kivity
20b79816f1 repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload) 2024-09-04 18:43:19 +03:00
Avi Kivity
3b9ac51b6b repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload) 2024-09-04 18:39:21 +03:00
Avi Kivity
704e3f5432 repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload) 2024-09-04 18:37:48 +03:00
Avi Kivity
9612c4d790 repair: row_level: row_level_repair::run()
The function itself is threaded, but the inner lambdas are coroutinized
(except one which is expected to run in a thread, and so is threaded).
2024-09-04 18:34:45 +03:00
Avi Kivity
2b94ee981b repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes()
The function itself is threaded, but the inner lambda is coroutinized.
2024-09-04 18:28:27 +03:00
Avi Kivity
c768448339 repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes()
The function itself is threaded, but the inner lambda is coroutinized.
2024-09-04 18:28:12 +03:00
Avi Kivity
d2f1b44487 repair: row_level: row_level_repair::negotiate_sync_boundary()
The function itself is threaded, but the inner lambda is coroutinized.
2024-09-04 18:21:39 +03:00
Kefu Chai
0756520f82 sstable: coroutinize sstable::seal_sstable()
for better readability.

presumably, `sstable::seal_sstable()` is not on the critical path,
and we don't need to worry about the overhead of using C++20 coroutine.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20410
2024-09-04 18:14:33 +03:00
Kefu Chai
88c5c3001a compaction: refactor compaction_manager::can_proceed()
instead of chaining the conditions with '&&', break them down.
for two reasons:

* for better readability: to group the conditions with the same
  purpose together
* so we don't look up the table twice. it's an anti-pattern of
  using STL, and it could be confusing at first glance.

this change is a cleanup, so it does not change the behavior.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20369
2024-09-04 18:12:29 +03:00
Avi Kivity
645e39e746 repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op()
Both the outer function and the inner lambda are coroutinized.
2024-09-04 18:10:43 +03:00
Avi Kivity
4c05d0b965 repair: row_level: coroutinize repair_meta::get_sync_boundary_handler() 2024-09-04 15:33:40 +03:00
Avi Kivity
eea011fad5 repair: row_level: coroutinize repair_meta::get_sync_boundary()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:31:57 +03:00
Avi Kivity
91b88df956 repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler() 2024-09-04 15:20:53 +03:00
Avi Kivity
b73194c9bf repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:18:33 +03:00
Avi Kivity
a69fb626bd repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler() 2024-09-04 15:17:42 +03:00
Avi Kivity
5cd8207ac7 repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:16:32 +03:00
Avi Kivity
e108f867a9 repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler() 2024-09-04 15:15:42 +03:00
Avi Kivity
ffbb973063 repair: row_level: coroutinize repair_meta::repair_row_level_stop()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:14:08 +03:00
Avi Kivity
587b6fe400 repair: row_level: coroutinize repair_meta::repair_row_level_start_handler() 2024-09-04 15:12:49 +03:00
Avi Kivity
db7b1014ff repair: row_level: coroutinize repair_meta::repair_row_level_start() 2024-09-04 15:10:45 +03:00
Avi Kivity
17b82265ae repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler() 2024-09-04 15:08:58 +03:00
Avi Kivity
bacbdde791 repair: row_level: coroutinize repair_meta::get_combined_row_hash() 2024-09-04 15:07:27 +03:00
Avi Kivity
8b8dc5092f repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler() 2024-09-04 15:05:28 +03:00
Avi Kivity
21e01990ff repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream()
The when_all_succeed() call is changed to the safer coroutine::when_all(),
which avoids the temporary futures.
2024-09-04 15:03:00 +03:00
Avi Kivity
572fbfde09 repair: row_level: coroutinize repair_meta::request_row_hashes() 2024-09-04 14:07:59 +03:00
Nadav Har'El
15f8046fcb alternator ttl: fix use-after-free
The Alternator TTL scanning code uses an object "scan_ranges_context"
to hold the scanning context. One of the members of this object is
a service::query_state, and that in turn holds a reference to a
service::client_state. The existing constructor created a temporary
client_state object and saved a reference to it - which can result
in use after free as the temporary object is freed as soon as the
constructor ends.

The fix is to save a client_state in the scan_ranges_context object,
instead of a temporary object.

Fixes #19988

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20418
2024-09-03 22:15:18 +03:00
Pavel Emelyanov
c03b1e2827 test: Remove unused database argument from make_sstable_for_all_shards() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20427
2024-09-03 21:36:28 +03:00
Calle Wilund
2695fefa81 commitlog/database: Make some commitlog options updatable + add feature listener
Makes some commitlog options runtime updatable. Most important for this case,
the usage of fragmented entries. Also adds a subscription in database on said
feature, to possibly enable once cluster enables it.
2024-09-03 16:38:28 +00:00
Calle Wilund
238a0236e5 features/config: Add feature for fragmented commitlog entries
Hides the functionality behind a cluster feature, i.e. postspones
using it until an upgrade is complete etc. This to allow rolling back
even with dirty nodes, at least until a cluster is commited.

Feature can also be disabled by scylla option, just in case. This will
lock it out of whole cluster, but this is probably good, because depending
on off or on, certain schema/raft ops might fail or succeed (due to large
mutations), and this should probably be equivalent across nodes.
2024-09-03 16:38:28 +00:00
Calle Wilund
9bf452c7a0 docs: Add entry on commitlog file format v4 2024-09-03 16:38:28 +00:00
Calle Wilund
ad595e4d6a commitlog_test: Add more oversized cases
Also adds some randomization to the tests.
2024-09-03 16:38:28 +00:00
Calle Wilund
1d5e509136 commitlog_replayer: Replay segments in order created
Minimizes potential buffer usage for fragmented entries.
2024-09-03 16:38:28 +00:00
Calle Wilund
61ff9486fb commitlog_replayer: Use replay state to support fragmented entries 2024-09-03 16:38:27 +00:00
Calle Wilund
7c16683184 commitlog_replayer: coroutinize partly 2024-09-03 16:38:27 +00:00
Calle Wilund
05bf2ae5d7 commitlog: Handle oversized entries
Refs #18161

Yet another approach to dealing with large commitlog submissions.

We handle oversize single mutation by adding yet another entry
type: fragmented. In this case we only add a fragment (aha) of
the data that needs storing into each entry, along with metadata
to correlate and reconstruct the full entry on replay.

Because these fragmented entries are spread over N segments, we
also need to add references from the first segment in a chain
to the subsequent ones. These are released once we clear the
relevant cf_id count in the base.
                 *
This approach has the downside that due to how serialization etc
works w.r.t. mutations, we need to create an intermediate buffer
to hold the full serialized target entry. This is then incrementally
written into entries of < max_mutation_size, successively requesting
more segments.

On replay, when encountering a fragment chain, the fragment is
added to a "state", i.e. a mapping of currently processing
frag chains. Once we've found all fragments and concatenated
the buffers into a single fragmented one, we can issue a
replay callback as usual.

Note that a replay caller will need to create and provide such
a state object. Old signature replay function remains for tests
and such.

This approach bumps the file format (docs to come).

To ensure "atomicity" we both force syncronization, and should
the whole op fail, we restore segment state (rewinding), thus
discarding data all we wrote.

v2:
* Improve some bookeep, ensure we keep track of segments and flush
  properly, to get counter correct
2024-09-03 16:38:27 +00:00
Anna Stuchlik
35796306a7 doc: comment out redirections for pages under Features
This commit temporarily disables redirections for all pages under Features
that were moved with this PR: https://github.com/scylladb/scylladb/pull/20401

Redirections work for all versions. This means that pages in 6.1 are redirected
to URLs that are not available yet (because 6.2 has not been released yet).

The redirections are correct and should be enabled when 6.2 is released:
I've created an issue to do it: https://github.com/scylladb/scylladb/issues/20428

Closes scylladb/scylladb#20429
2024-09-03 17:16:51 +02:00
Avi Kivity
6ddcf80d89 Merge 'Reuse sstable::test_env::reusable_sst() helper for pre-exsting sstables' from Pavel Emelyanov
Tests that try to access sstables from test/resource/ typically sstable::load() it after object creation. There's reusable_sst() helper for that. This PR fixes one more caller that still goes longer route by doing sstable and loading it on its own.

Closes scylladb/scylladb#20420

* github.com:scylladb/scylladb:
  test: Call reusable sst from ka_sst() helper
  test: Move sstable_open_config to reusable_sst()'s argument
2024-09-03 17:40:34 +03:00
Kamil Braun
504bf68ebb raft_group0_client: remove hold_read_apply_mutex overload without abort_source&
Ensure that every caller passes `abort_source&`.
2024-09-03 15:52:05 +02:00
Kamil Braun
79983723c8 storage_service: pass _abort_source to hold_read_apply_mutex
There's no point waiting for this lock if `storage_service` is being
aborted. In theory the lock, if held, should be eventually released by
whatever is holding it during shutdown -- but if there is some cyclic
reference between the services, and e.g. whatever holds the lock is
stuck because of ongoing shutdown and would only be unstuck by
`storage_service` getting stopped (which it can't because it's waiting
on the lock), that would cause a shutdown deadlock. Better to be safe
than sorry.
2024-09-03 15:52:05 +02:00
Kamil Braun
a7097fb985 group0_state_machine: pass _abort_source to hold_read_apply_mutex
`transfer_snapshot` was already passing `_abort_source` when trying to
take the lock but other member functions didn't.
2024-09-03 15:52:05 +02:00
Kamil Braun
a4d1065628 api: move reload_raft_topology_state implementation inside storage_service
In later commit we'll want to access more `storage_service` internals
in the API's implementation (namely, `_abort_source`)

Also moving the implementation there allows making
`service::topology_transition()` private again (it was made public in
992f1327d3 only for this API
implementation)
2024-09-03 15:52:03 +02:00