There are few of them that don't need the storage service for anything
but get token metadata from. Move them to own .cc/.hh units.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage_proxy/get_schema_version is served by storage_service, so it
should be in storage_service.cc instead
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some tests may want to modify system.topology table directly. Add a REST
API to reload the state into memory. An alternative would be restarting
the server, but that's slower and may have other side effects undesired
in the test.
The API can also be called outside tests, it should not have any
observable effects unless the user modifies `system.topology` table
directly (which they should never do, outside perhaps some disaster
recovery scenarios).
Currently the storage-service API handlers are set up in "random" place.
It can happen earlier -- as soon as the storage service itself is ready.
Also, despite storage service is stopped on shutdown, API handlers
continue reference it leading to potential use-after-frees or "local is
not initialized" assertions.
Fix both. Unsetting is pretty bulky, scylladb/seastar#1620 is to help.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some handlers in set_storage_service() have implicit dependency on
gossiper. It's not API that should track it, but storage service itself,
so get the gossiper from service, not from the external argument (it
will be removed soon)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The API handlers that live in set_storage_service() should be
self-contained and operate on storage-service only. Said that, they
should get the token metadata, when needed, from storage service, not
from somewhere else.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.
Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.
We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.
Use like this:
curl -X POST http://127.0.0.1:10000/storage_service/relocal_schemaFixes#15380Closes#15381
There's a storage_service/cdc_streams_check_and_repair endpoint that
needs to provide cdc gen. service to call storage_service method on. Now
the latter has its own reference to the former and API can stop taking
care of that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Three places handle comma-separated lists similarly:
- ss::remove_node.set(...) in api::set_storage_service,
- storage_service::parse_node_list,
- storage_service::is_repair_based_node_ops_enabled.
In the next commit, the fourth place that needs the same logic
appears -- storage_service::raft_replace. It needs to load
and parse the --ignore-dead-nodes-for-replace param from config.
Moreover, the code in is_repair_based_node_ops_enabled is
different and doesn't seem right. We swap '\"' and '\'' with ' '
but don't do anything with it afterward.
To avoid code duplication and fix is_repair_based_node_ops_enabled,
we introduce the new function utils::split_comma_separated_list.
This change has a small side effect on logging. For example,
ignore_nodes_strs in storage_service::parse_node_list might be
printed in a slightly different form.
This patch adds the ranges_parallelism option to repair restful API.
Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel.
Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job.
To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel.
Fixes#4847Closes#14886
* github.com:scylladb/scylladb:
repair: Add ranges_parallelism option
repair: Change to use coroutine in do_repair_ranges
This patch adds the ranges_parallelism option to repair restful API.
Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.
Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.
To make it safer, unlike the PR #4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.
Fixes#4847
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
Task manager compaction tasks need table names for logs.
Thus, compaction tasks store table infos instead of table ids.
get_table_ids function is deleted as it isn't used anywhere.
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `api::table_info` without the help of `operator<<`.
but the corresponding `operator<<()` is preserved in this change, as we
still have lots of callers relying on this << operator instorage_service.cc
where std::vector<table_info> is formatted using operator<<(ostream&, const Range&)
defined in to_string.hh. we could have used fmt/ranges.h to print the
std::vector<table_info>. but the combination of operator<<(ostream&, const Range&)
and FMT_DEPRECATED_OSTREAM renders this impossible. because
unlike the builtin range formatter specializations, the fallback formatter
synthesized from the operator<< does not have brackets defined for
the range printer. the brackets are used as the left and right marks
of the range, for instance, the array-alike containers are printed
like [1,2,3], while the tuple-alike containers are printed like
(1,2,3). once we are allowed to remove FMT_DEPRECATED_OSTREAM, we
should be able to use the builtin range formatter, and remove the
operator<< for api::table_info by then.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13975
`check_and_repair_cdc_streams` is an existing API which you can use when the
current CDC generation is suboptimal, e.g. after you decommissioned a node the
current generation has more stream IDs than you need. In that case you can do
`nodetool checkAndRepairCdcStreams` to create a new generation with fewer
streams.
It also works when you change number of shards on some node. We don't
automatically introduce a new generation in that case but you can use
`checkAndRepairCdcStreams` to create a new generation with restored
shard-colocation.
This PR implements the API on top of raft topology, it was originally
implemented using gossiper. It uses the `commit_cdc_generation` topology
transition state and a new `publish_cdc_generation` state to create new CDC
generations in a cluster without any nodes changing their `node_state`s in the
process.
Closes#13683
* github.com:scylladb/scylladb:
docs: update topology-over-raft.md
test: topology_experimental_raft: test `check_and_repair_cdc` API
raft topology: implement `check_and_repair_cdc_streams` API
raft topology: implement global request handling
raft topology: introduce `prepare_new_cdc_generation_data`
raft_topology: `get_node_to_work_on_opt`: return guard if no node found
raft topology: remove `node_to_work_on` from `commit_cdc_generation` transition
raft topology: separate `publish_cdc_generation` state
raft topology: non-node-specific `exec_global_command`
raft topology: introduce `start_operation()`
raft topology: non-node-specific `topology_mutation_builder`
topology_state_machine: introduce `global_topology_request`
topology_state_machine: use `uint16_t` for `enum_class`es
raft topology: make `new_cdc_generation_data_uuid` topology-global
in this change, the type of the "generation" field of "sstable" in the
return value of RESTful API entry point at
"/storage_service/sstable_info" is changed from "long" to "string".
this change depends on the corresponding change on tools/jmx submodule,
so we have to include the submodule change in this very commit.
this API is used by our JMX exporter, which in turn exposes the
SSTable information via the "StorageService.getSSTableInfo" mBean
operation, which returns the retrieved SSTable info as a list of
CompositeData. and "generation" is a field of an element in the
CompositeData. in general, the scylla JMX exporter is consumed
by the nodetool, which prints out returned SSTable info list with
a pretty formatted table, see
tools/java/src/java/org/apache/cassandra/tools/nodetool/SSTableInfo.java.
the nodetool's formatter is not aware of the schema or type of the
SSTables to be printed, neither does it enforce the type -- it just
tries it best to pretty print them as a tabular.
But the fields in CompositeData is typed, when the scylla JMX exporter
translates the returned SSTables from the RESTful API, it sets the
typed fields of every `SSTableInfo` when constructing `PerTableSSTableInfo`.
So, we should be consistent on the type of "generation" field on both
the JMX and the RESTful API sides. because we package the same version
of scylla-jmx and nodetool in the same precompiled tarball, and enforce
the dependencies on exactly same version when shipping deb and rpm
packages, we should be safe when it comes to interoperability of
scylla-jmx and scylla. also, as explained above, nodetool does not care
about the typing, so it is not a problem on nodetool's front.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13834
Task manager's tasks covering scrub compaction on top,
shard and table level.
For this levels we have common scrub tasks for each scrub
mode since they share code. Scrub modes will be differentiated
on compaction group level.
Closes#13694
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test scrub compaction
compaction: add table_scrub_sstables_compaction_task_impl
compaction: add shard_scrub_sstables_compaction_task_impl
compaction: add scrub_sstables_compaction_task_impl
api: get rid of unnecessary std::optional in scrub
compaction: rename rewrite_sstables_compaction_task_impl
The original API is gossiper-based. Since we're moving CDC generations
handling to Raft-based topology, we need to implement this API as well.
For now the API creates a new generation unconditionally, in a follow-up
I'll introduce a check to skip the creation if the current generation is
optimal.
* replace generation_type::value() with generation_type::as_int()
* drop generation_value()
because we will switch over to UUID based generation identifier, the member
function or the free function generation_value() cannot fulfill the needs
anymore. so, in this change, they are consolidated and are replaced by
"as_int()", whose name is more specific, and will also work and won't be
misleading even after switching to UUID based generation identifier. as
`value()` would be confusing by then: it could be an integer or a UUID.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.
Things achieved in this PR:
- You can start a cluster and create a keyspace whose tables will use
tablet-based replication. This is done by setting `initial_tablets`
option:
```
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
'replication_factor': 3,
'initial_tablets': 8};
```
All tables created in such a keyspace will be tablet-based.
Tablet-based replication is a trait, not a separate replication
strategy. Tablets don't change the spirit of replication strategy, it
just alters the way in which data ownership is managed. In theory, we
could use it for other strategies as well like
EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
is augmented to support tablets.
- You can create and drop tablet-based tables (no DDL language changes)
- DML / DQL work with tablet-based tables
Replicas for tablet-based tables are chosen from tablet metadata
instead of token metadata
Things which are not yet implemented:
- handling of views, indexes, CDC created on tablet-based tables
- sharding is done using the old method, it ignores the shard allocated in tablet metadata
- node operations (topology changes, repair, rebuild) are not handling tablet-based tables
- not integrated with compaction groups
- tablet allocator piggy-backs on tokens to choose replicas.
Eventually we want to allocate based on current load, not statically
Closes#13387
* github.com:scylladb/scylladb:
test: topology: Introduce test_tablets.py
raft: Introduce 'raft_server_force_snapshot' error injection
locator: network_topology_strategy: Support tablet replication
service: Introduce tablet_allocator
locator: Introduce tablet_aware_replication_strategy
locator: Extract maybe_remove_node_being_replaced()
dht: token_metadata: Introduce get_my_id()
migration_manager: Send tablet metadata as part of schema pull
storage_service: Load tablet metadata when reloading topology state
storage_service: Load tablet metadata on boot and from group0 changes
db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
migration_notifier: Introduce before_drop_keyspace()
migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
test: perf: Introduce perf-tablets
test: Introduce tablets_test
test: lib: Do not override table id in create_table()
utils, tablets: Introduce external_memory_usage()
db: tablets: Add printers
db: tablets: Add persistence layer
dht: Use last_token_of_compaction_group() in split_token_range_msb()
locator: Introduce tablet_metadata
dht: Introduce first_token()
dht: Introduce next_token()
storage_proxy: Improve trace-level logging
locator: token_metadata: Fix confusing comment on ring_range()
dht, storage_proxy: Abstract token space splitting
Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
db: Introduce get_non_local_vnode_based_strategy_keyspaces()
service: storage_proxy: Avoid copying keyspace name in write handler
locator: Introduce per-table replication strategy
treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
locator: Introduce effective_replication_map
locator: Rename effective_replication_map to vnode_effective_replication_map
locator: effective_replication_map: Abstract get_pending_endpoints()
db: Propagate feature_service to abstract_replication_strategy::validate_options()
db: config: Introduce experimental "TABLETS" feature
db: Log replication strategy for debugging purposes
db: Log full exception on error in do_parse_schema_tables()
db: keyspace: Remove non-const replication strategy getter
config: Reformat
It's meant to be used in places where currently
get_non_local_strategy_keyspaces() is used, but work only with
keyspaces which use vnode-based replication strategy.
this change replaces all occurrences of `boost::lexical_cast<std::string>`
in the source tree with `fmt::to_string()`. for couple reasons:
* `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`,
so the latter is easier to parse and read.
* `boost::lexical_cast<std::string>` creates a stringstream under the
hood, so it can use the `operator<<` to stringify the given object.
but stringstream is known to be less performant than fmtlib.
* we are migrating to fmtlib based formatting, see #13245. so
using `fmt::to_string()` helps us to remove yet another dependency
on `operator<<`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13611
Similarly to previous patch, but from another routes group. The storage
service API calls mainly use storage service, but one place needs proxy
to call recalculate_schema_version() with
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `bytes` and `gms::inet_address` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const bytes &)` and `operator<<(ostream, const gms::inet_address&)`.
`gms::inet_address` related changes are included here in hope to demonstrate the usage of delimiter specifier of `fmt_hex` 's formatter.
Refs #13245Closes#13275
* github.com:scylladb/scylladb:
gms/inet_address: implement operator<< using fmt::formatter
treewide: use fmtlib to format gms::inet_address
gms/inet_address: specialize fmt::formatter<gms::inet_address>
bytes: implement formatting helpers using formatter
bytes: specialize fmt::formatter<bytes>
bytes: specialize fmt::formatter<fmt_hex>
bytes: mark fmt_hex::v `const`
the goal of this change is to reduce the dependency on
`operator<<(ostream&, const gms::inet_address&)`.
this is not an exhaustive search-and-replace change, as in some
caller sites we have other dependencies to yet-converted ostream
printer, we cannot fix them all, this change only updates some
caller of `operator<<(ostream&, const gms::inet_address&)`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Task manager task implementations of classes that cover
cleanup keyspace compaction which can be started through
/storage_service/keyspace_compaction/ api.
Top level task covers the whole compaction and creates child
tasks on each shard.
Closes#12712
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test cleanup compaction
compaction: create task manager's task for cleanup keyspace compaction on one shard
compaction: create task manager's task for cleanup keyspace compaction
api: add get_table_ids to get table ids from table infos
compaction: create cleanup_compaction_task_impl
this change tries to reduce the number of callers using operator<<()
for printing UUID. they are found by compiling the tree after commenting
out `operator<<(std::ostream& out, const UUID& uuid)`. but this change
alone is not enough to drop all callers, as some callers are using
`operator<<(ostream&, const unordered_map&)` and other overloads to
print ranges whose elements contain UUID. so in order to limit the
scope of the change, we are not changing them here.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
it turns out we have `using namespace httpd;` in seastar's
`request_parser.rl`, and we should not rely on this statement to
expose the symbols in `seatar::httpd` to `seastar` namespace.
in this change,
* api/*.hh: all httpd symbols are referenced by `httpd::*`
instead of being referenced as if they are in `seastar`.
* api/*.cc: add `using namespace seastar::httpd`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
seastar::httpd::request was deprecated in favor of `seastar::http::request`
since bdd5d929891d2cb821eca25896e25ed4ff658b7a.
so let's use the latter. this change also silences the warning of:
```
/home/kefu/dev/scylladb/api/authorization_cache.cc: In function ‘void api::set_authorization_cache(http_context&, seastar::httpd::routes&, seastar::sharded<auth::service>&)’:
/home/kefu/dev/scylladb/api/authorization_cache.cc:19:104: error: ‘using seastar::httpd::request = struct seastar::http::request’ is deprecated: Use http::request instead [-Werror=deprecated-declarations]
19 | httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<request> req) -> future<json::json_return_type> {
| ^~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>