We are going to remove the IP waiting loop from topology_state_load
in subsequent commits. An IP for a given host_id may change
after this function has been called by raft. This means we need
to subscribe to the gossiper notifications and call it later
with a new id<->ip mapping.
In this preparatory commit we move the existing address_map
update logic into storage_service so that in later commits
we can enhance it with topology_state_load call.
The supervisor::notify() function expects a single string - not a
format and parameters. Calls we have in main.cc like
supervisor::notify("starting {}", what);
end up printing the silly message "starting {}". The second parameter
"what" is converted to a bool, also having an unintended consequence
for telling notify we're "ready".
This patch fixes it to call fmt::format, as intended.
Fixes#16728
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#16729
seastar::logger is using the compile-time format checking by default if
compiled using {fmt} 8.0 and up. and it requires the format string to be
consteval string, otherwise we have to use `fmt::runtime()` explicitly.
so adapt the change, let's use the consteval string when formatting
logging messages.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16612
Scylla skips exit hooks so we have to manually trigger the data dump to disk
from the LLVM profiling instrumentation runtime which we need in order
to support code coverage.
We use a weak symbol to get the address of the profile dump function. This
is legal: the function is a public interface of the instrumentation runtime.
Closesscylladb/scylladb#16430
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.
To set up the maintenance socket, use the `maintenance-socket` option when starting the node.
* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.
The default value is `ignore`.
* With python driver
```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy
socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
# Driver tries to connect to other nodes in the cluster, so we need to filter them out.
load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```
Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.
Closesscylladb/scylladb#16172
* github.com:scylladb/scylladb:
test.py: add maintenance socket test
test.py: enable maintenance socket in tests by default
docs: add maintenance socket documentation
main: add maintenance socket
main: refactor initialization of cql controller and auth service
auth/service: don't create system_auth keyspace when used by maintenance socket
cql_controller: maintenance socket: fix indentation
cql_controller: add option to start maintenance socket
db/config: add maintenance_socket_enabled bool class
auth: add maintenance_socket_role_manager
db/config: add maintenance_socket variable
Add initialization of maintenance_auth_service and cql_maintenance_server_ctl.
Create maintenance socket which enables interaction with the node through
CQL protocol without authentication. The maintenance port is available
by Unix domain socket. It gives full-permission access.
It is created before the node joins the cluster.
Move initialization of cql controller and auth service to functions.
It will make it easier to create a new cql controller with a seperate auth service,
for example for the maintenance socket.
Make it possible to initialize new services before joining group0.
The maintenance socket is created before joining the cluster. When maintenance auth service
is started it creates system_auth keyspace if it's missing. It is not synchronized
with other nodes, because this node hasn't joined the group0 yet. Thus a node has
a mismatched schema and is unable to join the cluster.
The maintenance socket doesn't use role management, thus the problem is solved
by not creating system_auth keyspace when maintenance auth service is created.
The logic of regular CQL port's auth service won't be changed. For the maintenance
socket will be created a new separate auth service.
Add an option to listen on the maintenance socket. It is set up on an unix domain socket
and the metrics are disabled.
This enables having an independent authentication mechanism for this socket.
To start the maintenance socket, a new cql_controller has to be created
with
`db::maintenance_socket_enabled::yes` argument.
Creating maintenance socket will raise an exception if
* the path is longer than 107 chars (due to linux limits),
* a file or a directory already exists in the path.
The indentation is fixed in the next commit.
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.
Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.
Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.
We also update documentation by removing information irrelevant in 5.5.
Fixesscylladb/scylladb#15854
Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.
Closesscylladb/scylladb#16334
* github.com:scylladb/scylladb:
docs: update after making consistent_cluster_management mandatory
system_keyspace, main, cql_test_env: fix indendations
db: config: make consistent_cluster_management mandatory
test: boost: schema_change_test: replace disable_raft_schema_config
db: config: make override_decommission deprecated
db: config: make force_schema_commit_log deprecated
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.
The refactoring is structured as follows:
* Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
* Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
* Go over all the places which read `token_metadata` and switch them to the new version.
* Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.
These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.
Closesscylladb/scylladb#15903
* github.com:scylladb/scylladb:
topology: remove_endpoint: remove inet_address overload
token_metadata: topology: cleanup add_or_update_endpoint
token_metadata: add_replacing_endpoint: forbid replacing node with itself
topology: drop key_kind, host_id is now the primary key
dc_rack_fn: make it non-template
token_metadata: drop the template
shared_token_metadata: switch to the new token_metadata
gossiper: use new token_metadata
database: get_token_metadata -> new token_metadata
erm: switch to the new token_metadata
storage_service: get_token_metadata -> token_metadata2
storage_service: get_token_to_endpoint_map: use new token_metadata
api/token_metadata: switch to new version
storage_service::on_change: switch to new token_metadata
cdc: switch to token_metadata2
calculate_natural_endpoints: fix indentation
calculate_natural_endpoints: switch to token_metadata2
storage_service: get_changed_ranges_for_leaving: use new token_metadata
decommission_with_repair, removenode_with_repair -> new token_metadata
rebuild_with_repair, replace_with_repair: use new token_metadata
bootstrap: use new token_metadata
tablets: switch to token_metadata2
calculate_effective_replication_map: use new token_metadata
calculate_natural_endpoints: fix formatting
abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
network_topology_strategy_test: update new token_metadata
storage_service: on_alive: update new token_metadata
storage_service: handle_state_bootstrap: update new token_metadata
storage_service: snitch_reconfigured: update new token_metadata
storage_service: leave_ring: update new token_metadata
storage_service: node_ops_cmd_handler: update new token_metadata
storage_service: node_ops_cmd_handler: add coordinator_host_id
storage_service: bootstrap: update new token_metadata
storage_service: join_token_ring: update new token_metadata
storage_service: excise: update new token_metadata
storage_service: join_cluster: update new token_metadata
storage_service: on_remove: update new token_metadata
storage_service: handle_state_normal: fill new token_metadata
storage_service: topology_state_load: fill new token_metadata
storage_service: adjust update_topology_change_info to update new token_metadata
topology: set self host_id on the new topology
locator::topology: allow being_replaced and replacing nodes to have the same IP
token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
token_metadata: get_host_id: exception -> on_internal_error
token_metadata: add get_all_ips method
token_metadata: support host_id-based version
token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
locator: make dc_rack_fn a template
locator/topology: add key_kind parameter
token_metadata: topology_change_info: change field types to token_metadata_ptr
token_metadata: drop unused method get_endpoint_to_token_map_for_reading
Make host_id parameter non-optional and
move it to the beginning of the arguments list.
Delete unused overloads of add_or_update_endpoint.
Delete unused overload of token_metadata::update_topology
with inet_address argument.
For all compaction types which can be started with api, add an asynchronous
version of api, which returns task_id of the corresponding task manager
task. With the task_id a user can check task status, abort, or wait for it,
using task manager api.
With this commit, we begin the next stage of the
refactoring - updating the new version of the token_metadata
in all places where the old version is currently being updated.
In this commit we assign host_id of this node, both in main.cc
and in boost tests.
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.
Refs: https://github.com/scylladb/scylladb/issues/16255Closesscylladb/scylladb#16289
* github.com:scylladb/scylladb:
Update unified/build_unified.sh
Update main.cc
Update dist/common/scripts/scylla-housekeeping
Typos: fix typos in code
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.
Closesscylladb/scylladb#16250
* github.com:scylladb/scylladb:
treewide: get rid of now unused fb_utilities
tracing: use locator::topology rather than fb_utilities
streaming: use locator::topology rather than fb_utilities
raft: use locator::topology/messaging rather than fb_utilities
storage_service: use locator::topology rather than fb_utilities
storage_proxy: use locator::topology rather than fb_utilities
service_level_controller: use locator::topology rather than fb_utilities
misc_services: use locator::topology rather than fb_utilities
migration_manager: use messaging rather than fb_utilities
forward_service: use messaging rather than fb_utilities
messaging_service: accept broadcast_addr in config rather than via fb_utilities
messaging_service: move listen_address and port getters inline
test: manual: modernize message test
table: use gossiper rather than fb_utilities
repair: use locator::topology rather than fb_utilities
dht/range_streamer: use locator::topology rather than fb_utilities
db/view: use locator::topology rather than fb_utilities
database: use locator::topology rather than fb_utilities
db/system_keyspace: use topology via db rather than fb_utilities
db/system_keyspace: save_local_info: get broadcast addresses from caller
db/hints/manager: use locator::topology rather than fb_utilities
db/consistency_level: use locator::topology rather than fb_utilities
api: use locator::topology rather than fb_utilities
alternator: ttl: use locator::topology rather than fb_utilities
gossiper: use locator::topology rather than fb_utilities
gossiper: add get_this_endpoint_state_ptr
test: lib: cql_test_env: pass broadcast_address in cql_test_config
init: get_seeds_from_db_config: accept broadcast_address
locator: replication strategies: use locator::topology rather than fb_utilities
locator: topology: add helpers to retrieve this host_id and address
snitch: pass broadcast_address in snitch_config
snitch: add optional get_broadcast_address method
locator: ec2_multi_region_snitch: keep local public address as member
ec2_multi_region_snitch: reindent load_config
ec2_multi_region_snitch: coroutinize load_config
ec2_snitch: reindent load_config
ec2_snitch: coroutinize load_config
thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. What it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch
series also introduces.
* 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev:
storage_service: topology coordinator: ignore abort_requested_exception in background fibers
storage_service: fix de-initialization order between storage service and group0_service
Pass the broadcast_address from main to get_seeds_from_db_config
rather than getting it from fb_utilities.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
and set broadcast_address / broadcast_rpc_address in main
to remove this dependency of snitch on fb_utilities.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.
A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
cluster feature.
The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.
The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.
When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.
In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.
Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
assumed to be true.
Closesscylladb/scylladb#16254
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. But what it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch also
introduces.
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing
With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN 2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN 2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```
Fixes#15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928Closesscylladb/scylladb#16184
This series adds handling for more failures during a topology operation
(we already handle a failure during streaming). Here we add handling of
tablet draining errors by aborting the operation and handling of errors
after streaming where an operation cannot be aborted any longer. If the
error happens when rollback is no longer possible we wait for ring delay
and proceed to the next step. Each individual patch that adds the sleep
has an explanation what the consequences of the patch are.
* 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev:
test: add test to check errro handling during tablet draining
test: fix test_topology_streaming_failure test to not grep the whole file
storage_service: add error injection into the tablet migration code
storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
storage_service: topology coordinator: add rollback_to_normal node state
storage_service: topology coordinator: put fence version into the raft state
storage_service: topology coordinator: do fencing even if draining failed
Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal.
Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well.
fixes: #13745Closesscylladb/scylladb#15635
* github.com:scylladb/scylladb:
test: Add object_store test to validate config reloading works
test: Add config update facility to test cluster
test: Make S3_Server export config file as pathlib.Path
config: Make object storage config updateable_value_source
memtable: Extend list of checking codes
sstables/storage/s3: Fix missing TOC status check
s3/client: Map http exceptions into storage_io_error
exceptions: Extend storage_io_error construction options
this change addresses a regression introduced by
f4626f6b8e, which stopped notifying
systemd with the status that scylla is READY. without the
notification, systemd would wait in vain for the readiness of
scylla.
Refs f4626f6b8eFixes#16159
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16166
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.
In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The storage service is top-level service which depends on many other
services. Recently (see d42685d0cb storage_service: Load tablet
metadata on boot and from group0 changes) it also got implicit
dependency on query processor, but it still starts too early for
explicit reference on q.p.
This patch moves storage service start to later times. This is possible
because storage service is not explicitly needed by any other component
start/init in between its old and new start places. Also, cql_test_ent
starts storage service "that late" too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently when the coordinator decides to move the fence it issues an
RPC to each node and each node locally advances fence version. This is
fine if there are no failures or failures are handled by retrying
fencing, but if we want to allow topology changes to progress even in
the presence of barrier failures it is easier to store the fence version
in the raft state. The nodes that missed fence rpc may easily catch up
to the latest fence version by simply executing a raft barrier.
Currently, during group0 snapshot transfer, the node pulling
the snapshot will send the `raft_pull_topology_snapshot` verb even if
the cluster is not in topology-on-raft mode. The RPC handler returns an
empty snapshot in that case. However, using the verb outside topology on
raft causes problems:
- It can cause issues during rolling upgrade as the snapshot transfer
will keep failing on the upgraded nodes until the leader node is
upgraded,
- Topology changes on raft are still experimental, and using the RPC
outside experimental mode will prevent us from doing breaking changes
to it.
Solve the issue by passing the "topology changes on raft enabled" flag
to group0_state_machine and send the RPC only in topology on raft mode.
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.
This fix affects also alternator's secondary indexes.
Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:
./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000
Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:
p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0
With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.
Fixes https://github.com/scylladb/scylladb/issues/15844Closesscylladb/scylladb#15845
There are few of them that don't need the storage service for anything
but get token metadata from. Move them to own .cc/.hh units.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the continuation of 8c03eeb85d
Registering API handlers for services need to
* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)
* unset the handlers on stop so that the service is not used after it's stopped (and before API server is stopped)
This makes task manager handlers work this way
Closesscylladb/scylladb#15764
* github.com:scylladb/scylladb:
api: Unset task_manager test API handlers
api: Unset task_manager API handlers
api: Remove ctx->task_manager dependency
api: Use task_manager& argument in test API handlers
api: Push sharded<task_manager>& down the test API set calls
api: Use task_manager& argument in API handlers
api: Push sharded<task_manager>& down the API set calls