Commit Graph

3054 Commits

Author SHA1 Message Date
Kefu Chai
fb4f48b4ed schema: add fmt::formatter for schema
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr

their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17768
2024-03-13 09:29:00 +02:00
Pavel Emelyanov
d90db016bf treewide: Use partition_slice::is_reversed()
Continuation of cc56a971e8, more noisy places detected

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17763
2024-03-13 08:52:46 +02:00
Avi Kivity
f410038296 Merge 'Use do_with_cql_env_thread() helper in storage proxy test' from Pavel Emelyanov
Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread

Closes scylladb/scylladb#17758

* github.com:scylladb/scylladb:
  test/storage_proxy: Restore indentation after previous patch
  test/storage_proxy: Use do_with_cql_env_thread()
2024-03-12 20:23:40 +02:00
Pavel Emelyanov
34477ad98e test/storage_proxy: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:44 +03:00
Pavel Emelyanov
fd112446c2 test/storage_proxy: Use do_with_cql_env_thread()
One of the test cases explicitly wraps itself into async, but there's a
convenience helper for that already.

Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:33 +03:00
Pavel Emelyanov
a755914265 test/cql_query_test: Use string_view by value
The test carries const std::string_view& around, but the type is
lightweight class that can be copied around at the same cost as its
reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17735
2024-03-12 13:44:04 +02:00
Kefu Chai
007d7f1355 utils: add fmt::formatter for std::strong_ordering and friends
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* std::strong_ordering
* std::weak_ordering
* std::partial_ordering

and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they
are only used by Boost.test.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Kefu Chai
1ab30fc306 clustering_bounds_comparator: add fmt::formtter for bound_{kind,view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `bound_kind` and `bound_view`,
and drop the latter's operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17706
2024-03-11 11:37:48 +02:00
Kefu Chai
38ae52d5cd add fmt::formatter for reader_permit::state and reader_resources
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* reader_permit::state
* reader_resources

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17707
2024-03-11 09:55:51 +02:00
Kefu Chai
64e14d21db locator/tablets: add fmt::formatter for tablet_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map

their operator<<:s are dropped

Refs scylladb/scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17504
2024-03-07 09:00:49 +03:00
Pavel Emelyanov
52a1b2c413 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* position_range
* mutation_fragment
* range_tombstone_stream
* mutation_fragment_v2::printer

Refs #13245

Closes scylladb/scylladb#17521

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for position_range
  mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
  mutation: add fmt::formatter for mutation_fragment_v2::printer
2024-03-07 08:56:21 +03:00
Kamil Braun
19b816bb68 Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.

Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.

Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157

Closes scylladb/scylladb#16578

* github.com:scylladb/scylladb:
  test: extend auth-v2 migration test to catch stale static
  test: add auth-v2 migration test
  test: add auth-v2 snapshot transfer test
  test: auth: add tests for lost quorum and command splitting
  test: pylib: disconnect driver before re-connection
  test: adjust tests for auth-v2
  auth: implement auth-v2 migration
  auth: remove static from queries on auth-v2 path
  auth: coroutinize functions in password_authenticator
  auth: coroutinize functions in standard_role_manager
  auth: coroutinize functions in default_authorizer
  storage_service: add support for auth-v2 raft snapshots
  storage_service: extract getting mutations in raft snapshot to a common function
  auth: service: capture string_view by value
  alternator: add support for auth-v2
  auth: add auth-v2 write paths
  auth: add raft_group0_client as dependency
  cql3: auth: add a way to create mutations without executing
  cql3: run auth DML writes on shard 0 and with raft guard
  service: don't loose service_level_controller when bouncing client_state
  auth: put system_auth and users consts in legacy namespace
  cql3: parametrize keyspace name in auth related statements
  auth: parametrize keyspace name in roles metadata helpers
  auth: parametrize keyspace name in password_authenticator
  auth: parametrize keyspace name in standard_role_manager
  auth: remove redundant consts auth::meta::*::qualified_name
  auth: parametrize keyspace name in default_authorizer
  db: make all system_auth_v2 tables use schema commitlog
  db: add system_auth_v2 tables
  db: add system_auth_v2 keyspace
2024-03-06 10:11:33 +01:00
Kefu Chai
fc774361e8 cql3: add fmt::formatter for raw_value{,_view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raw_value
* raw_value_view

`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.

`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-05 14:00:13 +08:00
Marcin Maliszkiewicz
1badd09d45 test: adjust tests for auth-v2 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
7f204a6e80 auth: add raft_group0_client as dependency
Most auth classes need this to be able to announce
raft commands.

Usage added in subsequent commit.
2024-03-01 16:25:14 +01:00
Botond Dénes
8213e66815 replica/database: use include page-size in max-result-size
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
  which now has a much simpler common return path and has to drop the
  page size on one rare return path.

This is a purely mechanical change, no behaviour is changed.
2024-02-27 02:27:55 -05:00
Kefu Chai
1fe7a467e7 mutation: add fmt::formatter for mutation_fragment_v2::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for mutation_fragment_v2::printer

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 17:47:05 +08:00
Kefu Chai
a70318e722 test: add printer for type for BOOST_REQUIRE_EQUAL
after dropping the operator<< for vector, we would not able to
use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this,
less defined the printer for Boost.test

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
63396f780d test: add fmt::formatters
the operator<< for `cql3::expr::test_utils::mutation_column_value` is
preserved, as it used by test/lib/expr_test_utils.cc, which prints
std::map<sstring, cql3::expr::test_utils::mutation_column_value> using
the homebrew generic formatter for std::map<>. and the formatter uses
operator<< for printing the elements in map.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Avi Kivity
67f8dc5a7c Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* row_tombstone
* row_marker
* deletable_row::printer
* row::printer
* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17461

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for clustering_row and friends
  mutation: add fmt::formatter for row_tombstone and friends
2024-02-22 16:16:26 +02:00
Kefu Chai
37c6073fd5 mutation: add fmt::formatter for clustering_row and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 17:53:34 +08:00
Avi Kivity
51df8b9173 interval: rename nonwrapping_interval to interval
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.

We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
2024-02-21 19:43:17 +02:00
Avi Kivity
e338f0e009 interval: rename interval_test to wrapping_interval_test
As preparation for reclaiming the name `interval` for nonwrapping_interval,
rename interval_test to wrapping_interval_test.
2024-02-21 19:38:53 +02:00
Botond Dénes
7bdd0c2cae locator: introduce tablet_range_spliter
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
2024-02-21 02:08:48 -05:00
Botond Dénes
239484f259 interval: add before() overload which takes another interval
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
2024-02-21 02:08:48 -05:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Petr Gusev
4ef5d92f50 gossiping_property_file_snitch_test: modernize + fix potential race
This is mostly a refactoring commit to make the test
more readable, as a byproduct of
scylladb/scylladb#17369 investigation.

We add the check for specific type of exceptions that
can be thrown (bad_property_file_error).

We also fix the potential race - the test may write
to res from multiple cores with no locks.

Closes scylladb/scylladb#17371
2024-02-18 19:21:53 +02:00
Botond Dénes
120442231f Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski
Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it.

Closes scylladb/scylladb#17208

* github.com:scylladb/scylladb:
  row_cache_test: test cache consistency during memtable-to-cache merge
  row_cache: use preemption_source in update()
  utils: preempt: add preemption_source
2024-02-13 17:37:06 +02:00
Botond Dénes
3f2d7e8b25 tree: remove unnecessary yields around for_each_tablet()
Commit 904bafd069 consolidated the two
existing for_each_tablet() overloads, to the one which has a future<>
returning callback. It also added yields to the bodies of said
callbacks. This is unnecessary, the loop in for_each_tablet() already
has a yield per tablet, which should be enough to prevent stalls.

This patch is a follow-up to #17118

Closes scylladb/scylladb#17284
2024-02-12 17:10:25 +01:00
Michał Chojnowski
f5e3a728e4 row_cache_test: test cache consistency during memtable-to-cache merge
A rather minimal reproducer for #16759. Not extensive.
2024-02-07 18:31:36 +01:00
Michał Chojnowski
bed20a2e37 row_cache: use preemption_source in update()
To facilitate testing the state of cache after the update is preempted
at various points, pass a preemption_source& to update() instead of
calling the reactor directly.

In release builds, the calls to preemption_source methods should compile
to the same direct reactor calls as today. Only in dev mode they should
add an extra branch. (However, the `preemption_source&` argument has
to be shoveled in any mode).
2024-02-07 18:31:36 +01:00
Botond Dénes
35da9551fb Merge 'storage_service: Add describe_ring support for tablet table' from Asias He
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509

Closes scylladb/scylladb#17118

* github.com:scylladb/scylladb:
  tablets: Convert to use the new version of for_each_tablet
  storage_service: Add describe_ring support for tablet table
  storage_service: Mark host2ip as const
  tablets: Add for_each_tablet_gently
2024-02-07 10:41:36 +02:00
Raphael S. Carvalho
41a5c9eaec test: Reduce mem footprint of test_token_group_based_splitting_mutation_writer
Reduces footprint from hundreds of MB to a very few MB.

Issue could be reproduced with:
./build/dev/test/boost/mutation_writer_test --run_test=test_token_group_based_splitting_mutation_writer -- -m 500M --smp 1 --random-seed 1848215131

Fixes #17076.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17187
2024-02-07 09:21:24 +02:00
Tomasz Grabiec
032c1a3d04 Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov
When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

closes: #16529

Closes scylladb/scylladb#17079

* github.com:scylladb/scylladb:
  tablets: Make sure topology has enough endpoints for RF
  cql-pytest: Disable tablets when RF > nodes-in-DC
  test: Remove test that configures RF larger than the number of nodes
  keyspace_metadata: Include tablets property in DESCRIBE
2024-02-06 22:38:11 +01:00
Kefu Chai
97587a2ea4 test/boost: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17139
2024-02-06 13:22:16 +02:00
Botond Dénes
ce3233112e Merge 'configure.py: add -Wextra to cflags' from Kefu Chai
also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them.

this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra.

Closes scylladb/scylladb#17131

* github.com:scylladb/scylladb:
  configure.py: add -Wextra to cflags
  test/tablets: do not compare signed and unsigned
2024-02-06 12:57:32 +02:00
Asias He
904bafd069 tablets: Convert to use the new version of for_each_tablet
It is more gently than the old one.
2024-02-05 18:45:40 +08:00
Pavel Emelyanov
3b9ca29411 test: Remove test that configures RF larger than the number of nodes
This is going to be disabled soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:03 +03:00
Avi Kivity
784c2f8ad2 Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai
get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.

Closes scylladb/scylladb#17130

* github.com:scylladb/scylladb:
  treewide: replace seastar::future::get0() with seastar::future::get()
  sstable: capture return value of get0() using auto
  utils: result_loop: define result_type with decayed type

[avi: add another one that snuck in while this was cooking]
2024-02-04 15:23:33 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
aea6cd0b2d test/tablets: do not compare signed and unsigned
this change should silence following warning:

```
 test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
19:47:04          for (int i = 0; i < smp::count * 20; i++) {
19:47:04                          ~ ^ ~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 20:49:21 +08:00
Asias He
2888c3086c utils: Add uuid_xor_to_uint32 helper
Convert the uuid to a uint32_t using xor.
It is useful to get a uint32_t number from the uuid.

Refs: #16927

Closes scylladb/scylladb#17049
2024-02-01 10:27:55 +02:00
Pavel Emelyanov
7c5c89ba8d Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel"
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.

This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
2024-01-31 15:08:14 +03:00
Avi Kivity
c8397f0287 Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho
The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer.

If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint.

Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on.

A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split.

Tablet metadata gains 2 new fields for managing this:
resize_type: resize decision type, can be either of "merge", "split", or "none".
resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator).

A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ).
When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready.

When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split").
The split monitor will start splitting of compaction groups (using mechanism introduced here: 081f30d149) for the table. And once splitting work is completed, the table updates its local state as having completed split.

When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step).

Fixes #16536.

Closes scylladb/scylladb#16580

* github.com:scylladb/scylladb:
  test/topology_experimental_raft: Add tablet split test
  replica: Bypass reshape on boot with tablets temporarily
  replica: Fix table::compaction_group_for_sstable() for tablet streaming
  test/topology_experimental_raft: Disable load balancer in test fencing
  replica: Remap compaction groups when tablet split is finalized
  service: Split tablet map when split request is finalized
  replica: Update table split status if completed split compaction work
  storage_service: Implement split monitor
  topology_cordinator: Generate updates for resize decisions made by balancer
  load_balancer: Introduce metrics for resize decisions
  db: Make target tablet size a live-updateable config option
  load_balancer: Implement resize decisions
  service: Wire table_resize_plan into migration_plan
  service: Introduce table_resize_plan
  tablet_mutation_builder: Add set_resize_decision()
  topology_coordinator: Wire load stats into load balancer
  storage_service: Allow tablet split and migration to happen concurrently
  topology_coordinator: Periodically retrieve table_load_stats
  locator: Introduce topology::get_datacenter_nodes()
  storage_service: Implement table_load_stats RPC
  replica: Expose table_load_stats in table
  replica: Introduce storage_group::live_disk_space_used()
  locator: Introduce table_load_stats
  tablets: Add resize decision metadata to tablet metadata
  locator: Introduce resize_decision
2024-01-31 13:59:56 +02:00
Pavel Emelyanov
370fbd346c Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel
`db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact.

This PR:
 - extends the public interface of utils::directories class to provide required directory paths to the users
 - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object
 - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs

Fixes: scylladb#5626

Closes scylladb/scylladb#16787

* github.com:scylladb/scylladb:
  utils/directories: make utils::directories::set an internal type
  db::config: keep dir paths unchanged
  cql_transport/controler: use utils::directories to get paths of dirs
  service/storage_proxy: use utils::directories to get paths of dirs
  api/storage_service.cc: use utils::directories to get paths of dirs
  tools/scylla-sstable.cc: use utils::directories to get paths
  db/commitlog: do not use db::config to get dirs
  Use utils::directories to get dirs paths in replica::database
  Allow utils::directories to provide paths to dirs
  Clean-up of utils::directories
2024-01-29 18:01:15 +03:00
Botond Dénes
d202d32f81 Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun
This allows the user of `raft::server` to cause it to create a snapshot
and truncate the Raft log (leaving no trailing entries; in the future we
may extend the API to specify number of trailing entries left if
needed). In a later commit we'll add a REST endpoint to Scylla to
trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).

The PR adds the API to `raft::server` and a HTTP endpoint that uses it.

In a follow-up PR, we plan to modify group 0 server startup logic to automatically
call this API if it sees that no snapshot is present yet (to automatically
fix the aforementioned 5.2 deployments once they upgrade.)

Closes scylladb/scylladb#16816

* github.com:scylladb/scylladb:
  raft: remove `empty()` from `fsm_output`
  test: add test for manual triggering of Raft snapshots
  api: add HTTP endpoint to trigger Raft snapshots
  raft: server: add `trigger_snapshot` API
  raft: server: track last persisted snapshot descriptor index
  raft: server: framework for handling server requests
  raft: server: inline `poll_fsm_output`
  raft: server: fix indentation
  raft: server: move `io_fiber`'s processing of `batch` to a separate function
  raft: move `poll_output()` from `fsm` to `server`
  raft: move `_sm_events` from `fsm` to `server`
  raft: fsm: remove constructor used only in tests
  raft: fsm: move trace message from `poll_output` to `has_output`
  raft: fsm: extract `has_output()`
  raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor`
  raft: server: pass `*_aborted` to `set_exception` call
2024-01-29 15:06:04 +02:00
Patryk Wrobel
1cd676e438 Allow utils::directories to provide paths to dirs
This change extends utils::directories class in
the following way:
 - adds new member variables that correspond to
   fields from db::config that describe paths
   of directories
 - introduces a public interface to retrieve the
   values of the new members
 - allows construction of utils::directories
   object based on db::config to setup internal
   member variables related to paths to dirs

The new members of utils::directories are overriden
when the provided values are empty. The way of setting
paths is taken from db::config.

To ensure that the new logic works correctly
`utils_directories_test` has been created.

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Kefu Chai
8f38bd5376 commitlog: add formatter for db::replay_position
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::replay_position`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17014
2024-01-29 09:59:30 +02:00
Raphael S. Carvalho
7ed5b44d52 load_balancer: Implement resize decisions
This implements the ability in load balancer to emit split or merge
requests, cancel ongoing ones if they're no longer needed, and
also finalize those that are ready for the topology changes.

That's all based on average tablet size, collected by coordinator
from all nodes, and split and merge thresholds.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
ed2138a35a tablet_mutation_builder: Add set_resize_decision()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00