Some endpoints in api/column_family fill vectors with data obtained from
database and return them back. Since the amount of data is known in
advance, it's good to reserve the vector.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".
Fixes: #16509Closesscylladb/scylladb#17319
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.
This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.
The new logic is as follows:
- when tablets are disabled then users may query endpoints
for a keyspace or for a given table in a keyspace
- when tablets are enabled then users have to provide
table name, because effective replication map is per-table
When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.
Fixes: scylladb#17343
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17372
when we just want to perform read access to `http_context`, there
is no need to use a non-const reference. so let's add `const` specifier
to make this explicit. this shoudl help with the readability and
maintainability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17219
In particular, `inet_address(const sstring& addr)` is
dangerous, since a function like
`topology::get_datacenter(inet_address ep)`
might accidentally convert a `sstring` argument
into an `inet_address` (which would most likely
throw an obscure std::invalid_argument if the datacenter
name does not look like an inet_address).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#17260
This PR implements a procedure that upgrades existing clusters to use
raft-based topology operations. The procedure does not start
automatically, it must be triggered manually by the administrator after
making sure that no topology operations are currently running.
Upgrade is triggered by sending `POST
/storage_service/raft_topology/upgrade` request. This causes the
topology coordinator to start who drives the rest of the process: it
builds the `system.topology` state based on information observed in
gossip and tells all nodes to switch to raft mode. Then, topology
coordinator runs normally.
Upgrade progress is tracked in a new static column `upgrade_state` in
`system.topology`.
The procedure also serves as an extension to the current recovery
procedure on raft. The current recovery procedure requires restarting
nodes in a special mode which disables raft, perform `nodetool
removenode` on the dead nodes, clean up some state on the nodes and
restart them so that they automatically rebuild the group 0. Raft
topology fits into existing procedure by falling back to legacy topology
operations after disabling raft. After rebuilding the group 0, upgrade
needs to be triggered again.
Because upgrade is manual and it might not be convenient for
administrators to run it right after upgrading the cluster, we allow the
cluster to operate in legacy topology operations mode until upgrade,
which includes allowing new nodes to join. In order to allow it, nodes
now ask the cluster about the mode they should use to join before
proceeding by using a new `JOIN_NODE_QUERY` RPC.
The procedure is explained in more detail in `topology-over-raft.md`.
Fixes: https://github.com/scylladb/scylladb/issues/15008Closesscylladb/scylladb#17077
* github.com:scylladb/scylladb:
test/topology_custom: upgrade/recovery tests for topology on raft
cdc/generation_service: in legacy mode, fall back to raft tables
system_keyspace: add read_cdc_generation_opt
cdc/generation_service: turn off gossip notifications in raft topo mode
cql_test_env: move raft_topology_change_enabled var earlier
group0_state_machine: pull snapshot after raft topology feature enabled
storage_service: disable persistent feature enabler on upgrade
storage_service: replicate raft features to system.peers
storage_service: gossip tokens and cdc generation in raft topology mode
API: add api for triggering and monitoring topology-on-raft upgrade
storage_service: infer which topology operations to use on startup
storage_service: set the topology kind value based on group 0 state
raft_group0: expose link to the upgrade doc in the header
feature_service: fall back to checking legacy features on startup
storage_service: add fiber for tracking the topology upgrade progress
gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
topology_coordinator: implement core upgrade logic
topology_coordinator: extract top-level error handling logic
storage_service: initialize discovery leader's state earlier
topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
topology_state_machine: introduce upgrade_state
storage_service: disallow topology ops when upgrade is in progress
raft_group0_client: add in_recovery method
storage_service: introduce join_node_query verb
raft_group0: make discover_group0 public
raft_group0: filter current node's IP in discover_group0
raft_group0: remove my_id arg from discover_group0
storage_service: make _raft_topology_change_enabled more advanced
docs: document raft topology upgrade and recovery
per its description, "`/storage_service/describe_ring/`" returns the
token ranges of an arbitrary keyspace. actually, it returns the
first keyspace which is of non-local-vnode-based-strategy. this API
is not used by nodetool, neither is it exercised in dtest.
scylla-manager has a wrapper for this API though, but that wrapper
is not used anywhere.
in this change, this API is dropped.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17197
Implements the /storage_service/raft_topology/upgrade route. The route
supports two methods: POST, which triggers the cluster-wide upgrade to
topology-on-raft, and GET which reports the status of the upgrade.
The table query param is added to get the describe_ring result for a
given table.
Both vnode table and tablet table can use this table param, so it is
easier for users to user.
If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.
E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"
Refs #16509Closesscylladb/scylladb#17118
* github.com:scylladb/scylladb:
tablets: Convert to use the new version of for_each_tablet
storage_service: Add describe_ring support for tablet table
storage_service: Mark host2ip as const
tablets: Add for_each_tablet_gently
Validate replication strategy constraints in /storage_service/tablets/move API:
- replicas are not on the same node
- replicas don't move across DC (violates RF in each DC)
- availability is not reduced due to rack overloading
Add flag to force tablet move even though dc/rack constraints aren't fulfilled.
Test for the change: https://github.com/scylladb/scylla-dtest/pull/3911.
Fixes: #16379.
Closesscylladb/scylladb#16648
* github.com:scylladb/scylladb:
api: service: add force param to move_tablet api
service: validate replication strategy constraints
Since `t.parallel_foreach_table_state` may yield,
we should access `type` by reference when calling
`stop_compaction` since it is captured by the calling
lambda and gets lost when it returns if
`parallel_foreach_table_state` returns an unavailable
future.
Instead change all captures to `[&]` so we can access
the `type` variable held by the coroutine frame.
Fixes#16975
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#17143
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
according to the document "nodetool cleanup"
> Triggers removal of data that the node no longer owns
currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.
so, in this change we following the same model of major compaction
where we
1. forcing new active segment,
2. flushing tables being cleaned up
3. perform cleanup using compaction
Fixes#4734Closesscylladb/scylladb#16757
* github.com:scylladb/scylladb:
storage_service: fall back to local cleanup in cleanup_all
compaction: format flush_mode without the helper
compaction_manager: flush all tables before cleanup
replica: table: pass do_flush to table::perform_cleanup_compaction()
api, compaction: promote flush_mode
before this change, if no keyspaces are specified,
scylla-nodetool just enumerate all non-local keyspaces, and
call "/storage_service/keyspace_cleanup" on them one after another.
this is not quite efficient, as each this RESTful API call
force a new active commitlog segment, and flushes all tables.
so, if the target node of this command has N non-local keyspaces,
it would repeat the steps above for N times. this is not necessary.
and after a topology change, we would like to run a global
"nodetool cleanup" without specifying the keyspace, so this
is a typical use case which we do care about.
to address this performance issue, in this change, we improve
an existing RESTful API call "/storage_service/cleanup_all", so
if the topology coordinator is not enabled, we fall back to
a local cleanup to cleanup all non-local keyspaces.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
according to the document "nodetool cleanup"
> Triggers removal of data that the node no longer owns
currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.
so, in this change we following the same model of major compaction:
1. force new active segment,
2. flush all tables
3. perform cleanup using compaction, which rewrites the sstables
of specified tables
because we already `flush()` all tables in
`cleanup_keyspace_compaction_task_impl::run()`, there is no need to
call `flush()` again, in `table::perform_cleanup_compaction()`, so
the `flush()` call is dropped in this function, and the tests using
this function are updated to call `flush()` manually to preserve
the existing behavior.
there are two callers of `cleanup_keyspace_compaction_task_impl`,
* one is `storage_service::sstable_cleanup_fiber()`, which listens
for the events fired by topology_state_machine, which is in turn
driven by, for instance, "/storage_service/cleanup_all" API.
which cleanup all keyspaces in one after another.
* another is "/storage_service/keyspace_cleanup", which cleans up
the specified keyspace.
in the first use case, we can force a new active segment for a single
time, so another parameter to the ctor of
`cleanup_keyspace_compaction_task_impl` is introduced to specify if
the `db.flush_all_tables()` call should be skiped.
please note, there are two possible optimizations,
1. force new active segment only if the mutations in it touches the
tables being cleaned up
2. after forcing new active segment, only flush the (mem)tables
mutated by the non-active segments
but let's leave them for following-up changes. this change is a
minimal fix for data resurrection issue.
Fixes#16757
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.
This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
`db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact.
This PR:
- extends the public interface of utils::directories class to provide required directory paths to the users
- removes 'db::config::setup_directories()' to avoid altering the fields of configuration object
- replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs
Fixes: scylladb#5626
Closesscylladb/scylladb#16787
* github.com:scylladb/scylladb:
utils/directories: make utils::directories::set an internal type
db::config: keep dir paths unchanged
cql_transport/controler: use utils::directories to get paths of dirs
service/storage_proxy: use utils::directories to get paths of dirs
api/storage_service.cc: use utils::directories to get paths of dirs
tools/scylla-sstable.cc: use utils::directories to get paths
db/commitlog: do not use db::config to get dirs
Use utils::directories to get dirs paths in replica::database
Allow utils::directories to provide paths to dirs
Clean-up of utils::directories
This allows the user of `raft::server` to cause it to create a snapshot
and truncate the Raft log (leaving no trailing entries; in the future we
may extend the API to specify number of trailing entries left if
needed). In a later commit we'll add a REST endpoint to Scylla to
trigger group 0 snapshots.
One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.
In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).
Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).
The PR adds the API to `raft::server` and a HTTP endpoint that uses it.
In a follow-up PR, we plan to modify group 0 server startup logic to automatically
call this API if it sees that no snapshot is present yet (to automatically
fix the aforementioned 5.2 deployments once they upgrade.)
Closesscylladb/scylladb#16816
* github.com:scylladb/scylladb:
raft: remove `empty()` from `fsm_output`
test: add test for manual triggering of Raft snapshots
api: add HTTP endpoint to trigger Raft snapshots
raft: server: add `trigger_snapshot` API
raft: server: track last persisted snapshot descriptor index
raft: server: framework for handling server requests
raft: server: inline `poll_fsm_output`
raft: server: fix indentation
raft: server: move `io_fiber`'s processing of `batch` to a separate function
raft: move `poll_output()` from `fsm` to `server`
raft: move `_sm_events` from `fsm` to `server`
raft: fsm: remove constructor used only in tests
raft: fsm: move trace message from `poll_output` to `has_output`
raft: fsm: extract `has_output()`
raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor`
raft: server: pass `*_aborted` to `set_exception` call
This change replaces usage of db::config with usage
of utils::directories in api/storage_service.cc in
order to get the paths of directories.
Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
In this mode, the node is not reachable from the outside, i.e.
* it refuses all incoming RPC connections,
* it does not join the cluster, thus
* all group0 operations are disabled (e.g. schema changes),
* all cluster-wide operations are disabled for this node (e.g. repair),
* other nodes see this node as dead,
* cannot read or write data from/to other nodes,
* it does not open Alternator and Redis transport ports and the TCP CQL port.
The only way to make CQL queries is to use the maintenance socket. The node serves only local data.
To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file.
REST API works as usual, but some routes are disabled:
* authorization_cache
* failure_detector
* hinted_hand_off_manager
This PR also updates the maintenance socket documentation:
* add cqlsh usage to the documentation
* update the documentation to use `WhiteListRoundRobinPolicy`
Fixes#5489.
Closesscylladb/scylladb#15346
* github.com:scylladb/scylladb:
test.py: add test for maintenance mode
test.py: generalize usage of cluster_con
test.py: when connecting to node in maintenance mode use maintenance socket
docs: add maintenance mode documentation
main: add maintenance mode
main: move some REST routes initialization before joining group0
message_service: add sanity check that rpc connections are not created in the maintenance mode
raft_group0_client: disable group0 operations in the maintenance mode
service/storage_service: add start_maintenance_mode() method
storage_service: add MAINTENANCE option to mode enum
service/maintenance_mode: add maintenance_mode_enabled bool class
service/maintenance_mode: move maintenance_socket_enabled definition to seperate file
db/config: add maintenance mode flag
docs: add cqlsh usage to maintenance socket documentation
docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
join_cluster and start_maintenance_mode are incompatible.
To make sure that only one is called when the node starts, add the MAINTENANCE option.
start_maintenance_mode sets _operation_mode to MAINTENANCE.
join_cluster sets _operation_mode to STARTING.
set_mode will result in an internal error if:
* it tries to set MAINTENANCE mode when the _operation_mode is other than NONE,
i.e. start_maintenance_mode is called after join_cluster (or it is called during
the drain, but it also shouldn't happen).
* it tries to set STARTING mode when the mode is set to MAINTENANCE,
i.e. join_cluster is called after start_maintenance_mode.
This PR contains improvements related to usage of std::vector and looping over containers in the range-for loop.
It is advised to use `std::vector::reserve()` to avoid unneeded memory allocations when the total size is known beforehand.
When looping over a container that stores non-trivial types usage of const reference is advised to avoid redundant copies.
Closesscylladb/scylladb#16978
* github.com:scylladb/scylladb:
api/api.hh: use const reference when looping over container
api/api.hh: use std::vector::reserve() when the total size is known
When reference is not used in the range-for loop, then
each element of a container is copied. Such copying
is not a problem for scalar types. However, the in case
of non-trivial types it may cause unneeded overhead.
This change replaces copying with const references
to avoid copying of types like seastar::sstring etc.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
When growing via push_back(), std::vector may need to reallocate
its internal block of memory due to not enough space. It is advised
to allocate the required space before appending elements if the
size is known beforehand.
This change introduces usage of std::vector::reserve() in api.hh
to ensure that push_back() does not cause reallocations.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
This uses the `trigger_snapshot()` API added in previous commit on a
server running for the given Raft group.
It can be used for example in tests or in the context of disaster
recovery (ref scylladb/scylladb#16683).
Local keyspaces do not need cleanup, and
keyspaces configured with tablets, where their
replication strategy is per-table do not support
cleanup.
In both cases, just skip their cleanup via the api.
Fixes#16738
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#16785
Introduce new REST API "/storage_service/cleanup_all"
that, when triggered, instructs the topology coordinator to initiate
cluster wide cleanup on all dirty nodes. It is done by introducing new
global command "global_topology_request::cleanup".
this change is more about documentation of the RESTful API of
storage_service. as we define the API using Swagger 2.0 format, and
generate the API document from the definitions. so would be great
if the document matches with the API.
in this change, since the keyspace is not queried but mutated. so
changed to a more accurate description.
from the code perspective, it is but cosmetic. as we don't read the
description fields or verify them in our tests.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16637
As part of code coverage support we need to work with dumped profiles
for ScyllaDB executables.
Those profiles are created on two occasions:
1. When an application exits notmaly (which will trigger
__llvm_dump_profile registered in the exit hooks.
2. For ScyllaDB commit d7b524cf10 introduced a manual call to
__llvm_dump_profile upon receiving a SIGTERM signal.
This commit adds a third option, a rest API to dump the profile.
In addition the target file is logged and the counters are reset, which
enables incremental dumping of the profile.
Except for logging, if the executable is not instrumented, this API call
becomes a no-op so it bears minimal risk in keeping it in our releases.
Specifically for code coverage, the gain will be that we will not be
required to change the entire test run to shut down clusters gracefully
and this will cause minimal effect to the actual test behavior.
The change was tested by manually triggering the API in with and
without instrumentation as well as re triggering it with write
permissions for the profile file disabled (to test fault tolerance).
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.
The refactoring is structured as follows:
* Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
* Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
* Go over all the places which read `token_metadata` and switch them to the new version.
* Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.
These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.
Closesscylladb/scylladb#15903
* github.com:scylladb/scylladb:
topology: remove_endpoint: remove inet_address overload
token_metadata: topology: cleanup add_or_update_endpoint
token_metadata: add_replacing_endpoint: forbid replacing node with itself
topology: drop key_kind, host_id is now the primary key
dc_rack_fn: make it non-template
token_metadata: drop the template
shared_token_metadata: switch to the new token_metadata
gossiper: use new token_metadata
database: get_token_metadata -> new token_metadata
erm: switch to the new token_metadata
storage_service: get_token_metadata -> token_metadata2
storage_service: get_token_to_endpoint_map: use new token_metadata
api/token_metadata: switch to new version
storage_service::on_change: switch to new token_metadata
cdc: switch to token_metadata2
calculate_natural_endpoints: fix indentation
calculate_natural_endpoints: switch to token_metadata2
storage_service: get_changed_ranges_for_leaving: use new token_metadata
decommission_with_repair, removenode_with_repair -> new token_metadata
rebuild_with_repair, replace_with_repair: use new token_metadata
bootstrap: use new token_metadata
tablets: switch to token_metadata2
calculate_effective_replication_map: use new token_metadata
calculate_natural_endpoints: fix formatting
abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
network_topology_strategy_test: update new token_metadata
storage_service: on_alive: update new token_metadata
storage_service: handle_state_bootstrap: update new token_metadata
storage_service: snitch_reconfigured: update new token_metadata
storage_service: leave_ring: update new token_metadata
storage_service: node_ops_cmd_handler: update new token_metadata
storage_service: node_ops_cmd_handler: add coordinator_host_id
storage_service: bootstrap: update new token_metadata
storage_service: join_token_ring: update new token_metadata
storage_service: excise: update new token_metadata
storage_service: join_cluster: update new token_metadata
storage_service: on_remove: update new token_metadata
storage_service: handle_state_normal: fill new token_metadata
storage_service: topology_state_load: fill new token_metadata
storage_service: adjust update_topology_change_info to update new token_metadata
topology: set self host_id on the new topology
locator::topology: allow being_replaced and replacing nodes to have the same IP
token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
token_metadata: get_host_id: exception -> on_internal_error
token_metadata: add get_all_ips method
token_metadata: support host_id-based version
token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
locator: make dc_rack_fn a template
locator/topology: add key_kind parameter
token_metadata: topology_change_info: change field types to token_metadata_ptr
token_metadata: drop unused method get_endpoint_to_token_map_for_reading
If std::vector is resized its iterators and references may
get invalidated. While task_manager::task::impl::_children's
iterators are avoided throughout the code, references to its
elements are being used.
Since children vector does not need random access to its
elements, change its type to std::list<foreign_task_ptr>, which
iterators and references aren't invalidated on element insertion.
Fixes: #16380.
Closesscylladb/scylladb#16381
On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented:
* Expose quarantine-mode option of the scrub_keyspace REST API
* Exit with error and print a message, when scrub finishes with abort or validation_errors return code
The command comes with tests and all tests pass with both the new and the current nodetool implementations.
Refs: #15588
Refs: #16208Closesscylladb/scylladb#16391
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the scrub command
test/nodetool: rest_api_mock.py: add missing "f" to error message f string
api: extract scrub_status into its own header
For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api.
Closesscylladb/scylladb#15092
* github.com:scylladb/scylladb:
test: use async api in test_not_created_compaction_task_abort
test: test compaction task started asynchronously
api: tasks: api for starting async compaction
api: compaction: pass pointer to top level compaction tasks
If an option is not supported, reject the request instead of silently
ignoring the unsupported options.
It prevents the user thinks the option is supported but it is ignored by
scylla core.
Fixes#16299Closesscylladb/scylladb#16300
For all compaction types which can be started with api, add an asynchronous
version of api, which returns task_id of the corresponding task manager
task. With the task_id a user can check task status, abort, or wait for it,
using task manager api.
As a preparation for asynchronous compaction api, from which we
cannot take values by reference, top level compaction tasks get
pointers which need to be set to nullptr when they are not needed
(like in async api).
NodeId is used in all internal token_metadata data structures, that
previously used inet_address. We choose topology::key_kind based
on the value of the template parameter.
generic_token_metadata::update_topology overload with host_id
parameter is added to make update_topology_change_info work,
it now uses NodeId as a parameter type.
topology::remove_endpoint(host_id) is added to make
generic_token_metadata::remove_endpoint(NodeId) work.
pending_endpoints_for and endpoints_for_reading are just removed - they
are not used and not implemented. The declarations were left by mistake
from a refactoring in which these methods were moved to erm.
generic_token_metadata_base is extracted to contain declarations, common
to both token_metadata versions.
Templates are explicitly instantiated inside token_metadata.cc, since
implementation part is also a template and it's not exposed to the header.
There are no other behavioral changes in this commit, just syntax
fixes to make token_metadata a template.