Commit Graph

819 Commits

Author SHA1 Message Date
Pavel Emelyanov
0981661f8b api: Unset task_manager test API handlers
So that the task_manager reference is not used when it shouldn't on stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:56:24 +03:00
Pavel Emelyanov
2d543af78e api: Unset task_manager API handlers
So that the task_manager reference is not used when it shouldn't on stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:56:01 +03:00
Pavel Emelyanov
0632ad50f3 api: Remove ctx->task_manager dependency
Now the task manager's API (and test API) use the argument and this
explicit dependency is no longer required

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:55:27 +03:00
Pavel Emelyanov
572c880d97 api: Use task_manager& argument in test API handlers
Now it's there and can be used. This will allow removing the
ctx->task_manager dependency soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:55:13 +03:00
Pavel Emelyanov
0396ce7977 api: Push sharded<task_manager>& down the test API set calls
This is to make it possible to use this reference instead of the ctx.tm
one by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:54:53 +03:00
Pavel Emelyanov
ef1d2b2c86 api: Use task_manager& argument in API handlers
Now it's there and can be used. This will allow removing the
ctx->task_manager dependency soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:54:24 +03:00
Pavel Emelyanov
14e10e7db4 api: Push sharded<task_manager>& down the API set calls
This is to make it possible to use this reference instead of the ctx.tm
one by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:52:46 +03:00
Pavel Emelyanov
162642ac18 api: Remove proxy reference from http context
Now it's unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 16:16:52 +03:00
Pavel Emelyanov
e76f23994c api,hints: Use proxy instead of ctx
Now hints endpoints use ctx.sp reference, but it has the direct proxy
reference at hand and should prefer it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 16:16:01 +03:00
Pavel Emelyanov
6ce7ec4a5e api,hints: Pass sharded<proxy>& instead of gossiper&
Proxy is the target service to handle hints API endpoints. Need to pass
it as argument to handlers

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 16:15:28 +03:00
Pavel Emelyanov
5f521116a2 api,hints: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 16:15:20 +03:00
Pavel Emelyanov
53891dd9cc api,hints: Move gossiper access to proxy
API handlers should try to avoid using any service other than the "main"
one. For hints API this service is going to be proxy, so no gossiper
access in the handler itself.

(indentation is left broken)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-05 16:14:26 +03:00
Kamil Braun
b68d6ad5e9 api: storage_service: unset reload_raft_topology_state
Every endpoint needs to be unset. Oversight in
992f1327d3.

Closes scylladb/scylladb#15591
2023-10-03 09:12:12 +03:00
Pavel Emelyanov
2603605cd5 api: Make storage_proxy handlers use proxy argument
And stop using proxy reference from http context. After a while the
proxy dependency will be removed from http context

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-29 14:10:09 +03:00
Pavel Emelyanov
fc4335387a api: Change some static helpers to use proxy instead of ctx
There are some helpers in storage_proxy.cc that get proxy reference from
passed http context argument. Next patch will stop using ctx for that
purpose, so prepare in advance by making the helpers use proxy reference
argument directly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-29 14:10:09 +03:00
Pavel Emelyanov
4910b4d5b7 api: Pass sharded<storage_proxy> reference to storage_proxy handlers
The goals is to make handlers use proxy argument instead of keeping
proxt as dependency on http context (other handlers are mostly such
already)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-29 14:10:09 +03:00
Pavel Emelyanov
7ef7b05397 api: Remove storage_service argument from storage_proxy setup
The code setting up storage_proxy/ endpoints no longer needs
storage_service and related decoration

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-29 14:10:09 +03:00
Pavel Emelyanov
0eea513663 api: Move storage_proxy/ endpoint using storage_service
The storage_proxy/get_schema_version is served by storage_service, so it
should be in storage_service.cc instead

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-29 14:10:08 +03:00
Pavel Emelyanov
b5eb474d95 api: Remove storage_proxy.hh from storage_service.cc
Proxy is not used in storage service handlers

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-29 14:09:01 +03:00
Kamil Braun
992f1327d3 api: storage_service: add REST API to reload topology state
Some tests may want to modify system.topology table directly. Add a REST
API to reload the state into memory. An alternative would be restarting
the server, but that's slower and may have other side effects undesired
in the test.

The API can also be called outside tests, it should not have any
observable effects unless the user modifies `system.topology` table
directly (which they should never do, outside perhaps some disaster
recovery scenarios).
2023-09-28 11:59:16 +02:00
Pavel Emelyanov
e022a76350 main, api: Set/Unset storage_service API in proper place
Currently the storage-service API handlers are set up in "random" place.
It can happen earlier -- as soon as the storage service itself is ready.

Also, despite storage service is stopped on shutdown, API handlers
continue reference it leading to potential use-after-frees or "local is
not initialized" assertions.

Fix both. Unsetting is pretty bulky, scylladb/seastar#1620 is to help.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 12:22:13 +03:00
Pavel Emelyanov
78a22c5ae3 api/storage_service: Remove gossiper arg from API
Now all handlers work purely on storage_service and gossiper argument is
no longer needed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 12:21:46 +03:00
Pavel Emelyanov
8dc6e74138 api/storage_service: Remove system keyspace arg from API
It's not used nowadays

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 12:21:25 +03:00
Pavel Emelyanov
27eaff9d44 api/storage_service: Get gossiper from storage service
Some handlers in set_storage_service() have implicit dependency on
gossiper. It's not API that should track it, but storage service itself,
so get the gossiper from service, not from the external argument (it
will be removed soon)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 12:20:27 +03:00
Pavel Emelyanov
4008ebb1b0 api/storage_service: Get token_metadata from storage service
The API handlers that live in set_storage_service() should be
self-contained and operate on storage-service only. Said that, they
should get the token metadata, when needed, from storage service, not
from somewhere else.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-26 12:19:24 +03:00
Petr Gusev
b90011294d config.cc: drop db::config::host_id
In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.

Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.

We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
2023-09-13 23:00:15 +04:00
Tomasz Grabiec
c27d212f4b api, storage_service: Recalculate table digests on relocal_schema api call
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.

Use like this:

  curl -X POST http://127.0.0.1:10000/storage_service/relocal_schema

Fixes #15380

Closes #15381
2023-09-13 18:27:57 +03:00
Nadav Har'El
548386a0bb treewide: reduce include of cql_statement.hh
ClangBuildAnalyzer reports cql3/cql_statement.hh as being one of the
most expensive header files in the project - being included (mostly
indirectly) in 129 source files, and costing a total of 844 CPU seconds
of compilation.

This patch is an attempt, only *partially* successful, to reduce the
number of times that cql_statement.hh is included. It succeeds in
lowering the number 129 to 99, but not less :-( One of the biggest
difficulties in reducing it further is that query_processor.hh includes
a lot of templated code, which needs stuff from cql_statement.hh.
The solution should be to un-template the functions in
query_processor.hh and move them from the header to a source file, but
this is beyond the scope of this patch and query_processor.hh appears
problematic in other respects as well.

Unfortunately the compilation speedup by this patch is negligible
(the `du -bc build/dev/**/*.o` metric shows less than 0.01% reduction).
Beyond the fact that this patch only removes 30% of the inclusions of
this header, it appears that most of the source files that no longer
include cql_statement.hh after this patch, included anyway many of the
other headers that cql_statement.hh included, so the saving is minimal.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15212
2023-09-08 13:23:50 +03:00
Aleksandra Martyniuk
cf37ab96f4 api: task_manager: fix indentation
Closes #15173
2023-09-02 08:18:59 +03:00
Benny Halevy
d00e49a1bb gossiper: keep and serve shared endpoint_state_ptr in map
This commit changes the interface to
using endpoint_state_ptr = lw_shared_ptr<const endpoint_state>
so that users can get a snapshot of the endpoint_state
that they must not modify in-place anyhow.
While internally, gossiper still has the legacy helpers
to manage the endpoint_state.

Fixes scylladb/scylladb#14799

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:34:36 +03:00
Benny Halevy
f1a88c01a2 api/failure_detector: get_all_endpoint_states: reduce allocations
reserve the result vector based on the known
number of endpoints and then move-construct each entry
rather than copying it.
Also, use refrences to traverse the application_state_map
rather than copying each of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:34:42 +03:00
Benny Halevy
4f5ffc7719 gossiper: add for_each_endpoint_state helpers
Before changing _endpoint_state_map to hold a
lw_shared_ptr<endpoint_state>, provide synchronous helpers
for users to traverse all endpoint_states with no need
to copy them (as long as the called func does not yield).

With that, gossiper::get_endpoint_states() can be made private.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:32:31 +03:00
Kamil Braun
ebc9056237 Merge 'Restore storage_service -> cdc_generation_service dependency' from Pavel Emelyanov
The main goal of this PR is to stop cdc_generation_service from calling
system_keyspace::bootstrap_complete(). The reason why it's there is that
gen. service doesn't want to handle generation before node joined the
ring or after it was decommissioned. The cleanup is done with the help
of storage_service->cdc_generation_service explicit dependency brought
back and this, in turn, suddenly freed the raft and API code from the
need to carry cdc gen. service reference around.

Closes #15047

* github.com:scylladb/scylladb:
  cdc: Remove bootstrap state assertion from after_join()
  cdc: Rework gen. service check for bootstrap state
  api: Don't carry cdc gen. service over
  storage_service: Use local cdc gen. service in join_cluster()
  storage_service: Remove cdc gen. service from raft_state_monitor_fiber()
  raft: Do not carry cdc gen. service over
  storage_service: Use local cdc gen. service in topo calls
  storage_service: Bring cdc_generation_service dependency back
2023-08-29 14:10:06 +02:00
Aleksandra Martyniuk
5e31ca7d20 tasks: api: show tasks' scopes
To make manual analysis of task manager tasks easier, task_status
and task_stats contain operation scope (e.g. shard, table).

Closes #15172
2023-08-29 11:32:16 +03:00
Pavel Emelyanov
c3a6e31368 api: Don't carry cdc gen. service over
There's a storage_service/cdc_streams_check_and_repair endpoint that
needs to provide cdc gen. service to call storage_service method on. Now
the latter has its own reference to the former and API can stop taking
care of that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Benny Halevy
8825828817 gossiper: add get_unreachable_members_synchronized
Modeled after get_live_members_synchronized,
get_unreachable_members_synchronized calls
replicate_live_endpoints_on_change to synchronize
the state of unreachable_members on all shards.

Fixes scylladb/scylladb#15088

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-24 11:36:55 +03:00
Kamil Braun
93be4c0cb0 Merge 'Base node liveliness consistently on gossiper::is_alive' from Benny Halevy
Currently he gossiper marks endpoint_state objects as alive/dead.
I some cases the endpoint_state::is_alive function is checked but in many other cases
gossiper::is_alive(endpoint) is used to determine if the endpoint is alive.

This series removed the endpoint_state::is_alive state and moves all the logic to gossiper::is_alive
that bases its decision on the endpoint having an endpoint_state and being in the _live_endpoints set.

For that, the _live_endpoints is made sure to be replicated to all shards when changed
and the endpoint_state changes are serialized under lock_endpoint, and also making sure that the
endpoint_state in the _endpoint_states_map is never updated in place, but rather a temporary copy is changed
and then safely replicated using gossiper::replicate

Refs https://github.com/scylladb/scylladb/issues/14794

Closes #14801

* github.com:scylladb/scylladb:
  gossiper: mark_alive: remove local_state param
  endpoint_state: get rid of _is_alive member and methods
  gossiper: is_alive: use _live_endpoints
  gossiper: evict_from_membership: erase endpoint from _live_endpoints
  gossiper: replicate_live_endpoints_on_change: use _live_endpoints_version to detect change
  gossiper: run: no need to replicate live_endpoints
  gossiper: fold update_live_endpoints_version into replicate_live_endpoints_on_change
  gossiper: add mutate_live_and_unreachable_endpoints
  gossiper: reset_endpoint_state_map: clear also shadow endpoint sets
  gossiper: reset_endpoint_state_map: clear live/unreachable endpoints on all shards
  gossiper: functions that change _live_endpoints must be called on shard 0
  gossiper: add lock_endpoint_update_semaphore
  gossiper: make _live_endpoints an unordered_set
  endpoint_state: use gossiper::is_alive externally
2023-08-23 17:18:05 +02:00
Botond Dénes
e7af2a7de8 Merge 'token_metadata::get_endpoint_to_host_id_map_for_reading: restrict to token owners' from Benny Halevy
And verify the they returned host_id isn't null.
Call on_internal_error_noexcept in that case
since all token owners are expected to have their
host_id set. Aborting in testing would help fix
issues in this area.

Fixes scylladb/scylladb#14843
Refs scylladb/scylladb#14793

Closes #14844

* github.com:scylladb/scylladb:
  api: storage_service: improve description of /storage_service/host_id
  token_metadata: get_endpoint_to_host_id_map_for_reading: restrict to token owners
2023-08-23 13:55:14 +03:00
Patryk Jędrzejczak
0beabdc6ba utils: introduce split_comma_separated_list
Three places handle comma-separated lists similarly:
- ss::remove_node.set(...) in api::set_storage_service,
- storage_service::parse_node_list,
- storage_service::is_repair_based_node_ops_enabled.
In the next commit, the fourth place that needs the same logic
appears -- storage_service::raft_replace. It needs to load
and parse the --ignore-dead-nodes-for-replace param from config.

Moreover, the code in is_repair_based_node_ops_enabled is
different and doesn't seem right. We swap '\"' and '\'' with ' '
but don't do anything with it afterward.

To avoid code duplication and fix is_repair_based_node_ops_enabled,
we introduce the new function utils::split_comma_separated_list.

This change has a small side effect on logging. For example,
ignore_nodes_strs in storage_service::parse_node_list might be
printed in a slightly different form.
2023-08-22 10:30:36 +02:00
Benny Halevy
97061cc3b8 endpoint_state: use gossiper::is_alive externally
Before we remove endpoint_state:_is_alive to rely
solely on gossipper::_live_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-22 09:06:09 +03:00
Benny Halevy
6e416b8ff2 api: storage_service: improve description of /storage_service/host_id
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-21 09:20:39 +03:00
Aleksandra Martyniuk
ae67f5d47e api: ignore future in task_manager_json::wait_task
Before returning task status, wait_task waits for it to finish with
done() method and calls get() on a resulting future.

If requested task fails, an exception will be thrown and user will
get internal server error instead of failed task status.

Result of done() method is ignored.

Fixes: #14914.

Closes #14915
2023-08-11 08:18:51 +03:00
Botond Dénes
946c6487ee Merge 'repair: Add ranges_parallelism option' from Asias He
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job.

To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel.

Fixes #4847

Closes #14886

* github.com:scylladb/scylladb:
  repair: Add ranges_parallelism option
  repair: Change to use coroutine in do_repair_ranges
2023-08-03 11:34:05 +03:00
Kamil Braun
39ca07c49b Merge 'Gossiper endpoint locking' from Benny Halevy
This series cleans up and hardens the endpoint locking design and
implementation in the gossiper and endpoint-state subscribers.

We make sure that all notifications (expect for `before_change`, that
apparently can be dropped) are called under lock_endpoint, as well as
all calls to gossiper::replicate, to serialize endpoint_state changes
across all shards.

An endpoint lock gets a unique permit_id that is passed to the
notifications and passed back by them if the notification functions call
the gossiper back for the same endpoint on paths that modify the
endpoint_state and may acquire the same endpoint lock - to prevent a
deadlock.

Fixes scylladb/scylladb#14838
Refs scylladb/scylladb#14471

Closes #14845

* github.com:scylladb/scylladb:
  gossiper: replicate: ensure non-null permit
  gossiper: add_saved_endpoint: lock_endpoint
  gossiper: mark_as_shutdown: lock_endpoint
  gossiper: real_mark_alive: lock_endpoint
  gossiper: advertise_token_removed: lock_endpoint
  gossiper: do_status_check: lock_endpoint
  gossiper: remove_endpoint: lock_endpoint if needed
  gossiper: force_remove_endpoint: lock_endpoint if needed
  storage_service: lock_endpoint when removing node
  gossiper: use permit_id to serialize state changes while preventing deadlocks
  gossiper: lock_endpoint: add debug messages
  utils: UUID: make default tagged_uuid ctor constexpr
  gossiper: lock_endpoint must be called on shard 0
  gossiper: replicate: simplify interface
  gossiper: mark_as_shutdown: make private
  gossiper: convict: make private
  gossiper: mark_as_shutdown: do not call convict
2023-08-02 13:50:08 +02:00
Benny Halevy
f74d154fe3 gossiper: use permit_id to serialize state changes while preventing deadlocks
Pass permit_id to subscribers when we acquire one
via lock_endpoint.  The subscribers then pass it back to
gossiper for paths that acquire lock_endpoint for
the same endpoint, to detect nested locks when the endpoint
is locked with the same permit_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-01 17:41:57 +03:00
Asias He
9b3fd9407b repair: Add ranges_parallelism option
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.

To make it safer, unlike the PR #4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.

Fixes #4847
2023-08-01 10:58:14 +08:00
Aleksandra Martyniuk
e072a2341d replica: api: return table_id instead of const table_id&
Return table_id instead of const table_id& from database::find_uuid
as copying table_id does not cause much overhead and simplifies
methods signature.
2023-07-25 17:13:24 +02:00
Aleksandra Martyniuk
cdbfa0b2f5 replica: iterate safely over tables related maps
Loops over _column_families and _ks_cf_to_uuid which may preempt
are protected by reader mode of rwlock so that iterators won't
get invalid.
2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk
52afd9d42d replica: wrap column families related maps into tables_metadata
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
2023-07-25 16:13:00 +02:00
Kefu Chai
6ce0d3a202 build: cmake: build api/api-doc/metrics.json
metrics.json was added in d694a42745,
`configure.py` was updated accordingly. this change mirrors this
change in CMake building system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14753
2023-07-19 11:39:21 +03:00