In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.
Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.
We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.
Use like this:
curl -X POST http://127.0.0.1:10000/storage_service/relocal_schemaFixes#15380Closes#15381
ClangBuildAnalyzer reports cql3/cql_statement.hh as being one of the
most expensive header files in the project - being included (mostly
indirectly) in 129 source files, and costing a total of 844 CPU seconds
of compilation.
This patch is an attempt, only *partially* successful, to reduce the
number of times that cql_statement.hh is included. It succeeds in
lowering the number 129 to 99, but not less :-( One of the biggest
difficulties in reducing it further is that query_processor.hh includes
a lot of templated code, which needs stuff from cql_statement.hh.
The solution should be to un-template the functions in
query_processor.hh and move them from the header to a source file, but
this is beyond the scope of this patch and query_processor.hh appears
problematic in other respects as well.
Unfortunately the compilation speedup by this patch is negligible
(the `du -bc build/dev/**/*.o` metric shows less than 0.01% reduction).
Beyond the fact that this patch only removes 30% of the inclusions of
this header, it appears that most of the source files that no longer
include cql_statement.hh after this patch, included anyway many of the
other headers that cql_statement.hh included, so the saving is minimal.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15212
This commit changes the interface to
using endpoint_state_ptr = lw_shared_ptr<const endpoint_state>
so that users can get a snapshot of the endpoint_state
that they must not modify in-place anyhow.
While internally, gossiper still has the legacy helpers
to manage the endpoint_state.
Fixesscylladb/scylladb#14799
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
reserve the result vector based on the known
number of endpoints and then move-construct each entry
rather than copying it.
Also, use refrences to traverse the application_state_map
rather than copying each of them.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Before changing _endpoint_state_map to hold a
lw_shared_ptr<endpoint_state>, provide synchronous helpers
for users to traverse all endpoint_states with no need
to copy them (as long as the called func does not yield).
With that, gossiper::get_endpoint_states() can be made private.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The main goal of this PR is to stop cdc_generation_service from calling
system_keyspace::bootstrap_complete(). The reason why it's there is that
gen. service doesn't want to handle generation before node joined the
ring or after it was decommissioned. The cleanup is done with the help
of storage_service->cdc_generation_service explicit dependency brought
back and this, in turn, suddenly freed the raft and API code from the
need to carry cdc gen. service reference around.
Closes#15047
* github.com:scylladb/scylladb:
cdc: Remove bootstrap state assertion from after_join()
cdc: Rework gen. service check for bootstrap state
api: Don't carry cdc gen. service over
storage_service: Use local cdc gen. service in join_cluster()
storage_service: Remove cdc gen. service from raft_state_monitor_fiber()
raft: Do not carry cdc gen. service over
storage_service: Use local cdc gen. service in topo calls
storage_service: Bring cdc_generation_service dependency back
There's a storage_service/cdc_streams_check_and_repair endpoint that
needs to provide cdc gen. service to call storage_service method on. Now
the latter has its own reference to the former and API can stop taking
care of that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Modeled after get_live_members_synchronized,
get_unreachable_members_synchronized calls
replicate_live_endpoints_on_change to synchronize
the state of unreachable_members on all shards.
Fixesscylladb/scylladb#15088
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently he gossiper marks endpoint_state objects as alive/dead.
I some cases the endpoint_state::is_alive function is checked but in many other cases
gossiper::is_alive(endpoint) is used to determine if the endpoint is alive.
This series removed the endpoint_state::is_alive state and moves all the logic to gossiper::is_alive
that bases its decision on the endpoint having an endpoint_state and being in the _live_endpoints set.
For that, the _live_endpoints is made sure to be replicated to all shards when changed
and the endpoint_state changes are serialized under lock_endpoint, and also making sure that the
endpoint_state in the _endpoint_states_map is never updated in place, but rather a temporary copy is changed
and then safely replicated using gossiper::replicate
Refs https://github.com/scylladb/scylladb/issues/14794Closes#14801
* github.com:scylladb/scylladb:
gossiper: mark_alive: remove local_state param
endpoint_state: get rid of _is_alive member and methods
gossiper: is_alive: use _live_endpoints
gossiper: evict_from_membership: erase endpoint from _live_endpoints
gossiper: replicate_live_endpoints_on_change: use _live_endpoints_version to detect change
gossiper: run: no need to replicate live_endpoints
gossiper: fold update_live_endpoints_version into replicate_live_endpoints_on_change
gossiper: add mutate_live_and_unreachable_endpoints
gossiper: reset_endpoint_state_map: clear also shadow endpoint sets
gossiper: reset_endpoint_state_map: clear live/unreachable endpoints on all shards
gossiper: functions that change _live_endpoints must be called on shard 0
gossiper: add lock_endpoint_update_semaphore
gossiper: make _live_endpoints an unordered_set
endpoint_state: use gossiper::is_alive externally
And verify the they returned host_id isn't null.
Call on_internal_error_noexcept in that case
since all token owners are expected to have their
host_id set. Aborting in testing would help fix
issues in this area.
Fixes scylladb/scylladb#14843
Refs scylladb/scylladb#14793
Closes#14844
* github.com:scylladb/scylladb:
api: storage_service: improve description of /storage_service/host_id
token_metadata: get_endpoint_to_host_id_map_for_reading: restrict to token owners
Three places handle comma-separated lists similarly:
- ss::remove_node.set(...) in api::set_storage_service,
- storage_service::parse_node_list,
- storage_service::is_repair_based_node_ops_enabled.
In the next commit, the fourth place that needs the same logic
appears -- storage_service::raft_replace. It needs to load
and parse the --ignore-dead-nodes-for-replace param from config.
Moreover, the code in is_repair_based_node_ops_enabled is
different and doesn't seem right. We swap '\"' and '\'' with ' '
but don't do anything with it afterward.
To avoid code duplication and fix is_repair_based_node_ops_enabled,
we introduce the new function utils::split_comma_separated_list.
This change has a small side effect on logging. For example,
ignore_nodes_strs in storage_service::parse_node_list might be
printed in a slightly different form.
Before returning task status, wait_task waits for it to finish with
done() method and calls get() on a resulting future.
If requested task fails, an exception will be thrown and user will
get internal server error instead of failed task status.
Result of done() method is ignored.
Fixes: #14914.
Closes#14915
This patch adds the ranges_parallelism option to repair restful API.
Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel.
Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job.
To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel.
Fixes#4847Closes#14886
* github.com:scylladb/scylladb:
repair: Add ranges_parallelism option
repair: Change to use coroutine in do_repair_ranges
This series cleans up and hardens the endpoint locking design and
implementation in the gossiper and endpoint-state subscribers.
We make sure that all notifications (expect for `before_change`, that
apparently can be dropped) are called under lock_endpoint, as well as
all calls to gossiper::replicate, to serialize endpoint_state changes
across all shards.
An endpoint lock gets a unique permit_id that is passed to the
notifications and passed back by them if the notification functions call
the gossiper back for the same endpoint on paths that modify the
endpoint_state and may acquire the same endpoint lock - to prevent a
deadlock.
Fixes scylladb/scylladb#14838
Refs scylladb/scylladb#14471
Closes#14845
* github.com:scylladb/scylladb:
gossiper: replicate: ensure non-null permit
gossiper: add_saved_endpoint: lock_endpoint
gossiper: mark_as_shutdown: lock_endpoint
gossiper: real_mark_alive: lock_endpoint
gossiper: advertise_token_removed: lock_endpoint
gossiper: do_status_check: lock_endpoint
gossiper: remove_endpoint: lock_endpoint if needed
gossiper: force_remove_endpoint: lock_endpoint if needed
storage_service: lock_endpoint when removing node
gossiper: use permit_id to serialize state changes while preventing deadlocks
gossiper: lock_endpoint: add debug messages
utils: UUID: make default tagged_uuid ctor constexpr
gossiper: lock_endpoint must be called on shard 0
gossiper: replicate: simplify interface
gossiper: mark_as_shutdown: make private
gossiper: convict: make private
gossiper: mark_as_shutdown: do not call convict
Pass permit_id to subscribers when we acquire one
via lock_endpoint. The subscribers then pass it back to
gossiper for paths that acquire lock_endpoint for
the same endpoint, to detect nested locks when the endpoint
is locked with the same permit_id.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This patch adds the ranges_parallelism option to repair restful API.
Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.
Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.
To make it safer, unlike the PR #4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.
Fixes#4847
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
metrics.json was added in d694a42745,
`configure.py` was updated accordingly. this change mirrors this
change in CMake building system.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14753
This patch adds a metrics API implementation.
The API supports get and set the metric relabel config.
Seastar supports metrics relabeling in runtime, following Prometheus
relabel_config.
Based on metrics and label name, a user can add or remove labels,
disable a metric and set the skip_when_empty flag.
The metrics-config API support such configuration to be done using the
RestFull API.
As it's a new API it is placed under the V2 path.
After this patch the following API will be available
'http://localhost:10000/v2/metrics-config/' GET/POST.
For example:
To get the current config:
```
curl -X GET --header 'Accept: application/json' 'http://localhost:10000/v2/metrics-config/'
```
To set a config:
```
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '[ \
{ \
"source_labels": [ \
"__name__" \
], \
"action": "replace", \
"target_label": "level", \
"replacement": "1", \
"regex": "io_que.*" \
} \
]' 'http://localhost:10000/v2/metrics-config/'
```
Until now, only the configuration API was part of the V2 API.
Now, when other APIs are added, it is possible that another API would be
the first to register. The first to register API is different in the
sense that it does not have a leading ',' to it.
This patch adds an option to mark the config API if it's the first.
This patch changes the base path of the V2 of the API to be '/'. That
means that the v2 prefix will be part of the path definition.
Currently, it only affect the config API that is created from code.
The motivation for the change is for Swagger definitions that are read
from a file. Currently, when using the swagger-ui with a doc path set
to http://localhost:10000/v2 and reading the Swagger from a file swagger
ui will concatenate the path and look for
http://localhost:10000/v2/v2/{path}
Instead, the base path is now '/' and the /v2 prefix will be added by
each endpoint definition.
From the user perspective, there is no change in current functionality.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
In get_sstables_for_key in api/column_family.cc a set of lw_shared_ptrs
to sstables is passes to reducer of map_reduce0. Reducer then accesses
these shared pointers. As reducer is invoked on the same shard
map_reduce0 is called, we have an illegal access to shared pointer
on non-owner cpu.
A set of shared pointers to sstables is trasnsformed in map function,
which is guaranteed to be invoked on a shard associated with the service.
Fixes: #14515.
Closes#14532
Return 400 Bad Request instead of 500 Internal Server Error
when user requests task or module which does not exist through
task manager and task manager test apis.
Closes#14166
* github.com:scylladb/scylladb:
test: add test checking response status when requested module does not exist
api: fix indentation
api: throw bad_param_exception when requested task/module does not exists
The call is generic enough not to drop the sstable itself on return so
that callers can do whatever they need with it. The only today's caller
is API which will convert sstables to filenames on its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In task manager and task manager test rest apis when a task or module
which does not exist is requested, we get Internal Server Error.
In such cases, wrap thrown exceptions in bad_param_exception
to respond with Bad Request code.
Modify test accordingly.
Task manager compaction tasks need table names for logs.
Thus, compaction tasks store table infos instead of table ids.
get_table_ids function is deleted as it isn't used anywhere.
Some assorted cleanups here: consolidation of schema agreement waiting
into a single place and removing unused code from the gossiper.
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1458/
Reviewed-by: Konstantin Osipov <kostja@scylladb.com>
* gleb/gossiper-cleanups of github.com:scylladb/scylla-dev:
storage_service: avoid unneeded copies in on_change
storage_service: remove check that is always true
storage_service: rename handle_state_removing to handle_state_removed
storage_service: avoid string copy
storage_service: delete code that handled REMOVING_TOKENS state
gossiper: remove code related to advertising REMOVING_TOKEN state
migration_manager: add wait_for_schema_agreement() function
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `api::table_info` without the help of `operator<<`.
but the corresponding `operator<<()` is preserved in this change, as we
still have lots of callers relying on this << operator instorage_service.cc
where std::vector<table_info> is formatted using operator<<(ostream&, const Range&)
defined in to_string.hh. we could have used fmt/ranges.h to print the
std::vector<table_info>. but the combination of operator<<(ostream&, const Range&)
and FMT_DEPRECATED_OSTREAM renders this impossible. because
unlike the builtin range formatter specializations, the fallback formatter
synthesized from the operator<< does not have brackets defined for
the range printer. the brackets are used as the left and right marks
of the range, for instance, the array-alike containers are printed
like [1,2,3], while the tuple-alike containers are printed like
(1,2,3). once we are allowed to remove FMT_DEPRECATED_OSTREAM, we
should be able to use the builtin range formatter, and remove the
operator<< for api::table_info by then.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13975
`check_and_repair_cdc_streams` is an existing API which you can use when the
current CDC generation is suboptimal, e.g. after you decommissioned a node the
current generation has more stream IDs than you need. In that case you can do
`nodetool checkAndRepairCdcStreams` to create a new generation with fewer
streams.
It also works when you change number of shards on some node. We don't
automatically introduce a new generation in that case but you can use
`checkAndRepairCdcStreams` to create a new generation with restored
shard-colocation.
This PR implements the API on top of raft topology, it was originally
implemented using gossiper. It uses the `commit_cdc_generation` topology
transition state and a new `publish_cdc_generation` state to create new CDC
generations in a cluster without any nodes changing their `node_state`s in the
process.
Closes#13683
* github.com:scylladb/scylladb:
docs: update topology-over-raft.md
test: topology_experimental_raft: test `check_and_repair_cdc` API
raft topology: implement `check_and_repair_cdc_streams` API
raft topology: implement global request handling
raft topology: introduce `prepare_new_cdc_generation_data`
raft_topology: `get_node_to_work_on_opt`: return guard if no node found
raft topology: remove `node_to_work_on` from `commit_cdc_generation` transition
raft topology: separate `publish_cdc_generation` state
raft topology: non-node-specific `exec_global_command`
raft topology: introduce `start_operation()`
raft topology: non-node-specific `topology_mutation_builder`
topology_state_machine: introduce `global_topology_request`
topology_state_machine: use `uint16_t` for `enum_class`es
raft topology: make `new_cdc_generation_data_uuid` topology-global
in this change, the type of the "generation" field of "sstable" in the
return value of RESTful API entry point at
"/storage_service/sstable_info" is changed from "long" to "string".
this change depends on the corresponding change on tools/jmx submodule,
so we have to include the submodule change in this very commit.
this API is used by our JMX exporter, which in turn exposes the
SSTable information via the "StorageService.getSSTableInfo" mBean
operation, which returns the retrieved SSTable info as a list of
CompositeData. and "generation" is a field of an element in the
CompositeData. in general, the scylla JMX exporter is consumed
by the nodetool, which prints out returned SSTable info list with
a pretty formatted table, see
tools/java/src/java/org/apache/cassandra/tools/nodetool/SSTableInfo.java.
the nodetool's formatter is not aware of the schema or type of the
SSTables to be printed, neither does it enforce the type -- it just
tries it best to pretty print them as a tabular.
But the fields in CompositeData is typed, when the scylla JMX exporter
translates the returned SSTables from the RESTful API, it sets the
typed fields of every `SSTableInfo` when constructing `PerTableSSTableInfo`.
So, we should be consistent on the type of "generation" field on both
the JMX and the RESTful API sides. because we package the same version
of scylla-jmx and nodetool in the same precompiled tarball, and enforce
the dependencies on exactly same version when shipping deb and rpm
packages, we should be safe when it comes to interoperability of
scylla-jmx and scylla. also, as explained above, nodetool does not care
about the typing, so it is not a problem on nodetool's front.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13834
Task manager's tasks covering scrub compaction on top,
shard and table level.
For this levels we have common scrub tasks for each scrub
mode since they share code. Scrub modes will be differentiated
on compaction group level.
Closes#13694
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test scrub compaction
compaction: add table_scrub_sstables_compaction_task_impl
compaction: add shard_scrub_sstables_compaction_task_impl
compaction: add scrub_sstables_compaction_task_impl
api: get rid of unnecessary std::optional in scrub
compaction: rename rewrite_sstables_compaction_task_impl