Commit Graph

3679 Commits

Author SHA1 Message Date
Calle Wilund
82d97da3e0 commitlog: Remove (benign) use-after-move
Fixes #18329

named_file::assign call uses old object "known_size" after a move
of the object. While this is wholly ok, since the attribute accessed
will not be modified/destroyed by the move, it causes warnings in
"tidy" runs, and might confuse or cause real errors should impl. change.

Closes scylladb/scylladb#18337
2024-04-22 17:20:19 +03:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Kamil Braun
9c2a836607 Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB"
This reverts commit 298a7fcbf2, reversing
changes made to 5cf53e670d.

The change made CI flaky.

Fixes: scylladb/scylladb#18278
2024-04-18 11:50:41 +02:00
Kamil Braun
eb9ba914a3 Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy
The problem this series solves is correctly ignoring DOWN nodes state
when replacing a node.

When a node is replaced and there are other nodes that are down, the
replacing node is told to ignore those DOWN nodes using the
`ignore_dead_nodes_for_replace` option.

Since the replacing node is bootstrapping it starts with an empty
system.peers table so it has no notion about any node state and it
learns about all other nodes via gossip shadow round done in
`storage_service::prepare_replacement_info`.

Normally, since the DOWN nodes to ignore already joined the ring, the
remaining node will have their endpoint state already in gossip, but if
the whole cluster was restarted while those DOWN nodes did not start,
the remaining nodes will only have a partial endpoint state from them,
which is loaded from system.peers.

Currently, the partial endpoint state contains only `HOST_ID` and
`TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`.

The first part of this series loads also `DC` and `RACK` from
system.peers to make them available to the replacing node as they are
crucial for building a correct replication map with network topology
replication strategy.

But still, without a `STATUS` those nodes are not considered as normal
token owners yet, and they do not go through handle_state_normal which
adds them to the topology and token_metadata.

The second part of this series uses the endpoint state retrieved in the
gossip shadow round to explicitly add the ignored nodes' state to
topology (including dc and rack) and token_metadata (tokens) in
`prepare_replacement_info`.  If there are more DOWN nodes that are not
explicitly ignored replace will fail (as it should).

Fixes scylladb/scylladb#15787

Closes scylladb/scylladb#15788

* github.com:scylladb/scylladb:
  storage_service: join_token_ring: load ignored nodes state if replacing
  storage_service: replacement_info: return ignore_nodes state
  locator: host_id_or_endpoint: keep value as variant
  gms: endpoint_state: add getters for host_id, dc_rack, and tokens
  storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
  gossiper: add_saved_endpoint: set dc and rack
  gossiper: add_saved_endpoint: fixup indentation
  gossiper: add_saved_endpoint: make host_id mandatory
  gossiper: add load_endpoint_state
  gossiper: start_gossiping: log local state
2024-04-16 10:27:36 +02:00
Botond Dénes
298a7fcbf2 Merge 'Drain view_builder in generic drain' from ScyllaDB
For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself.

refs: #2737
refs: #2795

Closes scylladb/scylladb#16558

* github.com:scylladb/scylladb:
  storage_service: Drain view builder on drain too
  view_builder: Generalize mark_as_built(view_ptr) method
  view_builder: Move mark_existing_views_as_built from storage service
  storage_service: Add view_builder& reference
  main,cql_test_env: Move view_builder start up (and make unconditional)
2024-04-16 07:21:42 +03:00
Pavel Emelyanov
f17c594d21 large_data_handler: If-less statistics increment
The partitions_bigger_than_threshold is incremented only if the previous
check detects that the partition exceeds a threshold by its size. It's
done with an extra if, but it can be done without (explicit) condition
as bool type is guaranteed by the standard to convert into integers as
true = 1 and false = 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18217
2024-04-16 07:16:05 +03:00
Benny Halevy
1061455442 gossiper: add load_endpoint_state
Pack the topology-related data loaded from system.peers
in `gms::load_endpoint_state`, to be used in a following
patch for `add_saved_endpoint`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:06:56 +03:00
Pavel Emelyanov
90593f4e82 view_builder: Generalize mark_as_built(view_ptr) method
Marking is performed in two places and they can be generalized

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:12 +03:00
Pavel Emelyanov
3c3f2cd337 view_builder: Move mark_existing_views_as_built from storage service
Now it's in the correct component

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:11 +03:00
Ferenc Szili
f1cc6252fd logging: Don't log PK/CK in large partition/row/cell warning
Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly.

This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables.

The logged data will look like this:

Large cells:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db

Large rows:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db

Large partitions:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db

Fixes #18041

Closes scylladb/scylladb#18166
2024-04-04 12:06:31 +03:00
Piotr Dulikowski
baae811142 Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.

Fixes https://github.com/scylladb/scylladb/issues/17736

Closes scylladb/scylladb#18039

* github.com:scylladb/scylladb:
  auth: keep auth version in scylla_local
  auth: coroutinize service::start
2024-04-03 12:25:56 +02:00
Marcin Maliszkiewicz
562caaf6c6 auth: keep auth version in scylla_local
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.
2024-04-02 19:04:21 +02:00
Botond Dénes
2179bfc40d Merge 'Relax initialization of virtual tables' from Pavel Emelyanov
It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read.

Closes scylladb/scylladb#18133

* github.com:scylladb/scylladb:
  virtual_tables: Open-code install_virtual_readers_and_writers()
  virtual_tables: Move readers setup loop into add_table()
  virtual_tables: Move tables creation loop into add_table()
  virtual_tables: Make add_tablet() a coroutine
  virtual_tables: Open-code register_virtual_tables()
2024-04-02 13:39:26 +03:00
Lakshmi Narayanan Sreethar
e8026197d2 db/config: add a new variable to limit memory used by table components
A new configuration variable, components_memory_reclaim_threshold, has
been added to configure the maximum allowed percentage of available
memory for all SSTable components in a shard. If the total memory usage
exceeds this threshold, it will be reclaimed from the components to
bring it back under the limit. Currently, only the memory used by the
bloom filters will be restricted.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Pavel Emelyanov
627c5fdf04 virtual_tables: Open-code install_virtual_readers_and_writers()
It's pretty short already and is naturally a "part" of
initialize_virtual_tables(). Neither it installs writers any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:02:40 +03:00
Pavel Emelyanov
1d79cfc6cf virtual_tables: Move readers setup loop into add_table()
Similarly to previous patch, after virtual tables are registered the
registry is iterated over to install virtual readers onto each entry.
Again, this can happen at the time of registering, no need in dedicated
loop for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:01:50 +03:00
Pavel Emelyanov
891e792717 virtual_tables: Move tables creation loop into add_table()
Once virtual_tables map is populated, it's iterated over to create
replica::table entries for each virtual table. This can be done in the
same place where the virtual table is created, no need in dedicated loop
for it nowadays.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:38 +03:00
Pavel Emelyanov
420ce3634f virtual_tables: Make add_tablet() a coroutine
Next patches will populate it with sleeping calls, this patch prepares
for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:15 +03:00
Pavel Emelyanov
ddc6f9279f virtual_tables: Open-code register_virtual_tables()
It's naturally a "part" of initialize_virtual_tables(). Further patching
gets possible with it being open-coded.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 18:59:18 +03:00
Benny Halevy
01fc1a9f66 schema_tables: std::move mutation into the mutation vector
To save a copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18120
2024-04-01 14:16:30 +03:00
Kamil Braun
d274f63d89 Merge 'Add support for "initial-token" parameter in raft mode' from Gleb
Fixes scylladb/scylladb#17893

* 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev:
  dht: drop unused parameter from get_random_bootstrap_tokens() function
  test: add test for initial_token parameter
  topology coordinator: use provided initial_token parameter to choose bootstrap tokens
  topology cooordinator: propagate initial_token option to the coordinator
2024-03-27 11:41:06 +01:00
Gleb Natapov
6ab78e13c6 topology cooordinator: propagate initial_token option to the coordinator
The patch propagates initial_token option to the topology coordinator
where it is added to join request parameter.
2024-03-26 18:43:16 +02:00
Avi Kivity
4ddf82e58b treewide: don't #include "gms/feature_service.hh" from other headers
feature_service.hh is a high-level header that integrates much
of the system functionality, so including it in lower-level headers
causes unnecessary rebuilds. Specifically, when retiring features.

Fix by removing feature_service.hh from headers, and supply forward
declarations and includes in .cc where needed.

Closes scylladb/scylladb#18005
2024-03-26 15:31:18 +02:00
Pavel Emelyanov
8bf9098663 system_keyspace: Consolidate node-state vs tokens checks
When loading topology state, nodes are checked to have or not to have
"tokens" field set. The check is done based on node state and it's
spread across the loading method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17957
2024-03-26 14:55:46 +02:00
Kefu Chai
f3532cbaa0 db: commitlog: use fmt::streamed() to print segment
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change:

* add `format_as()` for `segment` so we can use it as a fallback
  after upgrading to {fmt} v10
* use fmt::streamed() when formatting `segment`, this will be used
  the intermediate solution before {fmt} v10 after dropping
  `FMT_DEPRECATED_OSTREAM` macro

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18019
2024-03-26 12:13:29 +02:00
Wojciech Mitros
9789a3dc7c mv: keep semaphore units alive until the end of a remote view update
When a view update has both a local and remote target endpoint,
it extends the lifetime of its memory tracking semaphore units
only until the end of the local update, while the resources are
actually used until the remote update finishes.
This patch changes the semaphore transferring so that in case
of both local and remote endpoints, both view updates share the
units, causing them to be released only after the update that
takes longer finishes.

Fixes #17890

Closes scylladb/scylladb#17891
2024-03-25 19:43:58 +02:00
Piotr Dulikowski
f23f8f81bf Merge 'Raft-based service levels' from Michał Jadwiszczak
This patch introduces raft-based service levels.

The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation

Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)

Fixes #17926

Closes scylladb/scylladb#16585

* github.com:scylladb/scylladb:
  test: test service levels v2 works in recovery mode
  test: add test for service levels migration
  test: add test for service levels snapshot
  test:topology: extract `trigger_snapshot` to utils
  main: create raft dda if sl data was migrated
  service:qos: store information about sl data migration
  service:qos: service levels migration
  main: assign standard service level DDA before starting group0
  service:qos: fix `is_v2()` method
  service:qos: add a method to upgrade data accessor
  test: add unit_test_raft_service_levels_accessor
  service:storage_service: add support for service levels raft snapshot
  service:qos: add abort_source for group0 operations
  service:qos: raft service level distributed data accessor
  service:qos: use group0_guard in data accessor
  cql3:statements: run service level statements on shard0 with raft guard
  test: fix overrides in unit_test_service_levels_accessor
  service:qos: fix indentation
  service:qos: coroutinize some of the methods
  db:system_keyspace: add `SERVICE_LEVELS_V2` table
  service:qos: extract common service levels' table functions
2024-03-22 11:51:53 +01:00
Kamil Braun
4359a1b460 Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.

The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.

The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.

Fixes scylladb/scylladb#16604

Closes scylladb/scylladb#17590

* github.com:scylladb/scylladb:
  migration_manager: use raft_timeout{}
  storage_service::join_node_response_handler: use raft_timeout{}
  storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
  storage_service::set_tablet_balancing_enabled: use raft_timeout{}
  storage_service::move_tablet: use raft_timeout{}
  raft_check_and_repair_cdc_streams: use raft_timeout{}
  raft_timeout: test that node operations fail properly
  raft_rebuild: use raft_timeout{}
  do_cluster_cleanup: use raft_timeout{}
  raft_initialize_discovery_leader: use raft_timeout{}
  update_topology_with_local_metadata: use with_timeout{}
  raft_decommission: use raft_timeout{}
  raft_removenode: use raft_timeout{}
  join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
  raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
  raft_group0: make_raft_config_nonvoter: add abort_source parameter
  manager_client: server_add with start=false shouldn't call driver_connect
  scylla_cluster: add seeds parameter to the add_server and servers_add
  raft_server_with_timeouts: report the lost quorum
  join_node_request_handler: add raft_timeout{} for start_operation
  skip_mode: add platform_key
  auth: use raft_timeout{}
  raft_group0_client: add raft_timeout parameter
  raft_group_registry: add group0_with_timeouts
  utils: add composite_abort_source.hh
  error_injection: move api registration to set_server_init
  error_injection: add inject_parameter method
  error_injection: move injection_name string into injection_shared_data
  error_injection: pass injection parameters at startup
2024-03-22 10:45:33 +01:00
Michał Jadwiszczak
dab909b1d1 service:qos: store information about sl data migration
Save information whether service levels data was migrated to v2 table.
The information is stored in `system.scylla_local` table. It's
written with raft command and included in raft snapshot.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
8e242f5acd db:system_keyspace: add SERVICE_LEVELS_V2 table
The table has the same schema as `system_distributed.service_levels`.
However it's created entirely at once (unlike old table which creates
base table first and then it adds other columns) because `system` tables
are local to the node.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
990c5e7dd0 service:qos: extract common service levels' table functions
Getting a service level(s) will be done the same way in raft-based
service levels as it's done in standard service levels, so those
funtions are extracted to reused it.
2024-03-21 23:14:57 +01:00
Pavel Emelyanov
21a5911e60 Merge 'db/virtual_tables: make token_ring_table tablet aware' from Botond Dénes
The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports.
Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table.
Simple sanity tests are added for this virtual table (it had none).

Fixes: #16850

Closes scylladb/scylladb#17351

* github.com:scylladb/scylladb:
  test/cql-pytest: test_virtual_tables: add test for token_ring table
  db/virtual_tables: token_ring_table: add tablet support
  db/virtual_tables: token_ring_table: add table_name column
  db/virtual_tables: token_ring_table: extract ring emit
  service/storage_service: describe_ring_for_table(): use topology to map hostid to ip
2024-03-20 14:05:49 +03:00
Petr Gusev
49a4220fea error_injection: pass injection parameters at startup
Injection parameters can be used in the lambda passed to
inject_with_handler method to take some values from
the test. However, there was no way to set values to these
parameters on node startup, only through
the error injection REST api. Therefore, we couldn't rely
on this when inject_with_handler is used during
node startup, it could trigger before we call the api
from the test.

In this commit with solve this problem by allowing these
parameters to be assigned through scylla.yaml config.

The defer.hh header was added to error_injection.hh to fix
compilation after adding error_injection.hh to config.hh,
defer function is used in error_injection.hh.
2024-03-19 20:17:02 +04:00
Avi Kivity
e48eb76f61 sstables_manager: decouple from system_keyspace
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).

One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.

Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.

Closes scylladb/scylladb#17868
2024-03-18 20:38:07 +03:00
Avi Kivity
72bbe75d5b Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes https://github.com/scylladb/scylladb/issues/17025

Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.

Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.

Closes scylladb/scylladb#17388

* github.com:scylladb/scylladb:
  test: py: Add test for view replica pairing after replace
  raft, api: Add RESTful API to query current leader of a raft group
  test: test_tablets_removenode: Verify replacing when there is no spare node
  doc: topology-on-raft: Document replace behavior with tablets
  tablets, raft topology: Rebuild tablets after replacing node is normal
  tablets: load_balancer: Access node attributes via node struct
  tablets: load_balancer: Extract ensure_node()
  mv: Switch to using host_id-based replica set
  effective_replication_map: Introduce host_id-based get_replicas()
  raft topology: Keep nodes in the left state to topology
  tablets: Introduce read_required_hosts()
2024-03-18 16:16:08 +02:00
Wojciech Mitros
efcb718e0a mv: adjust memory tracking of single view updates within a batch
Currently, when dividing memory tracked for a batch of updates
we do not take into account the overhead that we have for processing
every update. This patch adds the overhead for single updates
and joins the memory calculation path for batches and their parts
so that both use the same overhead.

Fixes #17854

Closes scylladb/scylladb#17855
2024-03-18 14:31:54 +02:00
Avi Kivity
731b5c5120 schema_tables: unfreeze frozen_mutation:s gently
With large schemas, unfreezing can stall, especially as it requires
a lot of memory. Switch to a gentle version that will not stall.
2024-03-17 17:46:02 +02:00
Kefu Chai
8bab51733f db: add fmt::formatter for db::functions::function
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::functions::function`.
please note, because we use `std::ostream` as the parameter of
the polymorphism implementation of `function::print()`.
without an intrusive change, we have to use `fmt::ostream_formatter`
or at least use similar technique to format the `function` instance
into an instance of `ostream` first. so instead of implementing
a "native" `fmt::formatter`, in this change, we just use
`fmt::ostream_formatter`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17832
2024-03-16 17:36:49 +02:00
Tomasz Grabiec
9b656ec2aa mv: Switch to using host_id-based replica set
This is necessary to not break replica pairing between base and
view. After replacing a node, tablet replica set contains for a while
the replaced node which is in the left state. This node is not
returned by the IP-based get_natural_endpoints() so the replica
indexes would shift, changing the pairing with the view.

The host_id-based replica set always has stable indexes for replicas.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
61b3453552 raft topology: Keep nodes in the left state to topology
Those nodes will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on
 their location. For example materialized views code which pairs base
 replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order
 to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet
replica sets.

We load topology infromation only for left nodes which are actually
referenced by any tablet. To achieve that, topology loading code
queries system.tablet for the set of hosts. This set is then passed to
system.topology loading method which decides whether to load
replica_state for a left node or not.
2024-03-15 11:05:29 +01:00
Botond Dénes
279e496133 db/virtual_tables: token_ring_table: add tablet support
For keyspaces which use tablets, we describe each table separately.
2024-03-15 04:23:20 -04:00
Botond Dénes
61b6ac7ffe db/virtual_tables: token_ring_table: add table_name column
As the first clustering column. For vnode keyspaces, this will always be
"ALL", for tablet keyspaces, this will contain the name of the described
table.
2024-03-15 04:23:20 -04:00
Botond Dénes
fdef62c232 db/virtual_tables: token_ring_table: extract ring emit
Into a separate method. For vnodes there is a single ring per keyspace,
but for tablets, there is a separate ring for each table in the
keyspace. To accomodate both, we move the code emitting the ring into a
separate method, so execute() can just call it once per keyspace or once
per table, whichever appropriate.
2024-03-15 04:23:20 -04:00
Tomasz Grabiec
8c5d088928 Merge 'Drop tablets of dropped views and indices' from Benny Halevy
This series adds notification before dropping views and indices so that the
tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets.

Additional unit tests were added for these cases.

Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path.

This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series.

Fixes #17627

Closes scylladb/scylladb#17773

* github.com:scylladb/scylladb:
  migration_manager: notify before_drop_column_family when dropping indices
  schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
  migration_manager: notify before_drop_column_family before dropping views
  cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
  tablet_allocator: on_before_drop_column_family: remove unused result variable
2024-03-14 22:52:29 +01:00
Benny Halevy
5bfca73b30 migration_manager: notify before_drop_column_family when dropping indices
Fixes #17627

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:12 +02:00
Benny Halevy
9cf6a2e510 schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
When dropping indices, we don't need to go through
`create_view_for_index` in order to drop the index.
That actually creates a new schema for this view
which is used just for its metadata for generating mutations
dropping it.

Instead, use `find_schema` to lookup the current schema
for the dropped index.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:11 +02:00
Piotr Smaron
ad2d039e3d db: move all group 0 tables to schema commitlog
This is to have durability for the group0 tables.
But also because I need it specifially to make
`system.topology` & `system_schema.scylla_keyspaces`
mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723

Fixes: #15596

Closes scylladb/scylladb#17783
2024-03-14 13:33:30 +01:00
Kefu Chai
926fe29ebd db: commitlog: add fmt::formatter for commitlog types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* db::commitlog::segment::cf_mark
* db::commitlog::segment_manager::named_file
* db::commitlog::segment_manager::dispose_mode
* db::commitlog::segment_manager::byte_flow<T>

please note, the formatter of `db::commitlog::segment` is not
included in this commit, as we are formatting it in the inline
definition of this class. so we cannot define the specialization
of `fmt::formatter` for this class before its callers -- we'd
either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`.
either way, it's different from the theme of this commit, and we
will handle it in a separated commit.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17792
2024-03-14 09:28:12 +02:00