scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Michał Jadwiszczak	4e236f3392	cql3/statements/create_service_level: forbid creating SL starting with `$` Tenant names starting with `$` are reserved for internal ones. Forbid creating new service level which name starts with `$` and log a warning for existing service levels with `$` prefix. (cherry picked from commit `d729d1b272`) Closes scylladb/scylladb#20198	2024-08-19 16:20:48 +03:00
Emil Maskovsky	0770069dda	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent. (cherry picked from commit `2dbe9ef2f2`)	2024-08-01 19:36:00 +02:00
Pavel Emelyanov	a30337e719	service_level_controller_test: Use topology::is_me() helper The on_leave_cluster() callback needs to check if the leaving node is the local one. It currently compares endpoint with the my_address() obtained via pretty long dependency chain of auth_service->query_processor->storage_proxy->database->token_metadata This patch makes the whole thing _much_ shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:47:12 +03:00
Pavel Emelyanov	634c066c43	service_level_controller: Add dependency on shared_token_metadata The controller needs to access topology, so it needs the token metadata at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:43:01 +03:00
Wojciech Mitros	8472c46c8a	service_level_controller: coroutinize notify_service_level_removed To avoid conflicts arising from the discrepancy between different versions of the repository, use coroutines instead of continuations in service_level_controller::notify_service_level_removed(). Closes scylladb/scylladb#18525	2024-05-06 14:20:49 +03:00
Piotr Dulikowski	64ba620dc2	Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased. The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost. Refs scylladb/scylladb#6403 Fixes scylladb/scylladb#12278 Closes scylladb/scylladb#15567 * github.com:scylladb/scylladb: docs: Update Hinted Handoff documentation db/hints: Add endpoint_downtime_not_bigger_than() db/hints: Migrate hinted handoff when cluster feature is enabled db/hints: Handle arbitrary directories in resource manager db/hints: Start using hint_directory_manager db/hints: Enforce providing IP in get_ep_manager() db/hints: Introduce hint_directory_manager db/hints/resource_manager: Update function description db/hints: Coroutinize space_watchdog::scan_one_ep_dir() db/hints: Expose update lock of space watchdog db/hints: Add function for migrating hint directories to host ID db/hints: Take both IP and host ID when storing hints db/hints: Prepare initializing endpoint managers for migrating from IP to host ID db/hints: Migrate to locator::host_id db/hints: Remove noexcept in do_send_one_mutation() service: Add locator::host_id to on_leave_cluster service: Fix indentation db/hints: Fix indentation	2024-05-06 09:58:18 +02:00
Benny Halevy	ebff5f5d70	everywhere: include seastar headers using angle brackets seastar is an external library therefore it should use the system-include syntax. Closes scylladb/scylladb#18513	2024-05-06 10:00:31 +03:00
Dawid Medrek	54ae9797b9	service: Add locator::host_id to on_leave_cluster We extend the function endpoint_lifecycle_subscriber::on_leave_cluster by another argument -- locator::host_id. It's more convenient to have a consistent pair of IP and host ID.	2024-04-26 22:44:03 +02:00
Kefu Chai	e2f3fed373	service: qos: fix a typo s/accesor/accessor/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18124	2024-04-03 10:33:54 +02:00
Marcin Maliszkiewicz	ff17a29b54	service: qos: create separate function for reloading data accessor Scylla's main is already too long, it's better to contain this logic inside qos service.	2024-03-26 17:26:19 +01:00
Michał Jadwiszczak	a08918a320	main: create raft dda if sl data was migrated Create `raft_service_levels_distributed_data_accessor` if service levels were migrated to v2 table. This supports raft recovery mode, as service levels will be read from v2 table in the mode.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	dab909b1d1	service:qos: store information about sl data migration Save information whether service levels data was migrated to v2 table. The information is stored in `system.scylla_local` table. It's written with raft command and included in raft snapshot.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	2917ec5d51	service:qos: service levels migration Migrate data from `system_distributes.service_levels` to `system.service_levels_v2` during raft topology upgrade. Migration process reads data from old table with CL ALL and inserts the data to the new table via raft.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	159a6a2169	service:qos: fix `is_v2()` method	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	fd32f5162a	service:qos: add a method to upgrade data accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d5fa0747d7	service:qos: add abort_source for group0 operations Add mechanism to abort ongoing group0 operations while draining service_level_controller or leaving the cluster.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	71c07addb5	service:qos: use group0_guard in data accessor Adjust service_level_controller and service_level_controller::service_level_distributed_data_accessor interfaces to take `group0_guard` while adding/altering/dropping a service level.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	da82c5f0b0	cql3:statements: run service level statements on shard0 with raft guard To migrate service levels to be raft managed, obtain `group0_guard` to be able to pass it to service_level_controller's methods. Using this mechanism also automatically provides retries in case of concurrent group0 operation.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c0e22fcb9c	service:qos: fix indentation	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	1f3c6b2813	service:qos: coroutinize some of the methods Functions: - `service_level_controller::set_distributed_service_level()` - `service_level_controller::drop_distributed_service_level()` - `service_level_controller::drain()` Coroutines increase readability of those functions.	2024-03-21 23:14:57 +01:00
Marcin Maliszkiewicz	9f172f1843	auth: coroutinize functions in standard_role_manager Affected functions are: find_record, create_default_role_if_missing, create_or_replace, drop, modify_membership, query_all, get_attribute, set_attribute, remove_attribute	2024-03-01 16:25:14 +01:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Botond Dénes	f22fc88a64	Merge 'Configure service levels interval' from Michał Jadwiszczak Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests. This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest. Closes scylladb/scylladb#16394 * github.com:scylladb/scylladb: test:cql-pytest: change service levels intervals in tests configure service levels interval	2024-01-17 12:24:49 +02:00
Kefu Chai	ece2bd2f6e	service: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16764	2024-01-15 13:29:33 +02:00
Michał Jadwiszczak	f6a464ad81	configure service levels interval So far the service levels interval, responsible for updating SL configuration, was hardcoded in main. Now it's extracted to `service_levels_interval_ms` option.	2024-01-12 10:28:24 +01:00
Benny Halevy	0b310c471c	service_level_controller: use locator::topology rather than fb_utilities Expose cql3::query_processor in auth::service to get to the topology via storage_proxy.replica::database Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:17:47 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Michał Jadwiszczak	1b08338fe7	service:qos: add option to include effective names to SLO Allow to include `slo_effective_names` in `service_level_options` to be able to determine from which service level the specific option value comes from.	2023-11-30 13:07:20 +01:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	afc06f0017	messaging: forward-declare types in messaging_service.hh messaging_service.hh is a switchboard - it includes many things, and many things include it. Therefore, changes in the things it includes affect many translation units. Reduce the dependencies by forward-declaring as much as possible. This isn't pretty, but it reduces compile time and recompilations. Other headers adjusted as needed so everything (including `ninja dev-headers`) still compile. Closes #10755	2022-06-09 15:52:12 +03:00
MaciekCisowski	439001b8c2	service_level_controller: fix small typo in exception message Closes #10136	2022-02-26 22:23:26 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Pavel Solodovnikov	b958e85c54	utils: atomic_vector: rename `for_each` to `thread_for_each` To emphasize that the function requires `seastar::thread` context to function properly. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-01-11 09:29:12 +03:00
Piotr Sarna	4bfaa7d9fc	Merge 'Service levels: fix undefined behaviours' from Eliran Sinvani This mini series contains two fixes that are bundled together since the second one assumes that the first one exists (or it will not fix anything really...), the two problems were: 1. When certain operations are called on a service level controller which doesn't have it's data accessor set, it can lead to a crash since some operations will still try to dereference the accessor pointer. 2. The cql environment test initialized the accessor with a sharded<system_distributed_data>& however this sharded class as itself is not initialized (sharded::start wasn't called), so for the same that were unsafe for null dereference the accessor will now crash for trying to access uninitialized sharded instance. Closes #9468 * github.com:scylladb/scylla: CQL test environment: Fix bad initialization order Service Level Controller: Fix possible dereference of a null pointer	2021-10-18 08:53:53 +02:00
Eliran Sinvani	6d3e8055f9	Service Level Controller: Fix possible dereference of a null pointer If the service level controller don't have his data accessor set, calls for getting of distributed information might dereference this unset pointer for the accessor. Here we add code that will return a result as if there is no data available to the accessor (a behaviour which is roughly equivalent to a null data accessor). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2021-10-12 13:27:50 +03:00
Avi Kivity	f6d59c33ff	service: service_level_controller: drop unused variable sl_compare Reported by gcc 11.	2021-10-10 18:16:50 +03:00
Eliran Sinvani	c38ceafdcf	Service Level Controller: Add an extention point to the API (#9374 ) In order to ease future extensions to the information being sent by the service level configuration change API, we pack the additional parameters (other the the service level options) to the interface in a structure. This will allow an easy expansion in the future if more parameters needs to be sent to the observer.i Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2021-10-01 10:20:28 +03:00
Eliran Sinvani	7f44736939	Service Levels: do not notify stale service level removals Before this commit, the service_level_controller will notify the subscribers on stale deletes, meaning, deletes of localy non exixtent service_levels. The code flow shouldn't ever get to such a state, but as long as this condition is checked instead of being asserted it is worthwhile to change the code to be safe. Closes #9253	2021-08-26 18:27:52 +03:00
Eliran Sinvani	47d3862b63	Service Level Controller: Add a listener API for service level config changes This change adds an api for registering a listener for service_level configuration chanhes. It notifies about removal addition and change of service level. The hidden assumption is that some listeners are going to create and/or manage service level specific resources and this it what guided the time of the call to the subscriber. Addition and change of a service level are called before the actual change takes place, this guaranties that resource creation can take place before the service level or new config starts to be used. The deletion notification is called only after the deletion took place and this guranties that the service level can't be active and the resources created can be safely destroyed.	2021-08-16 11:38:59 +03:00
Pavel Emelyanov	c39f04fa6f	code: Remove storage-service header from irrelevant places Some .cc files over the code include the storage service for no real need. Drop the header and include (in some) what's really needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:19 +03:00
Eliran Sinvani	ccdef39d21	Service Level Controller: Stop configuration polling loop upon leaving the cluster This change subscribes service_level_controller for nodes life cycle notifications and uses the notification of leaving the cluster for the current node to stop the configuration polling loop. If the loop continues to run it's queries will fail consistently since the nodes will not answers to queries. It is worth mentioning that the queries failing in the current state of code is harmles but noisy since after 90 seconsd, if the scylla process is not shut down the failures will start to generate failure logs every 90 seconds which is confusing for users. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2021-07-14 09:31:40 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Eliran Sinvani	f2091bb227	workload prioritization: Reduce the logging sensitivity to "glitches" in availability Before this patch every failure to pull the configuration have been reported as a warning. However this is confusing for users for two reasons: 1. It pollutes the logs if the configuration is polled which is Scylla's mode of operation. Such a line is logged every failed iteration. 2. It confuses users because even though this level is warning, it logs out an exception and the log message contains the word failed. We see it a lot during QA runs and customer questions from the field. Point 2 is only solvable by reducing the verbosity of the logged information, which will make debugging harder. Point 1 is addressed here in the following manner, first the one shot configuration pull function is not handling the exception itself, this is OK because it is harmless to fail once or twice in a row in configuration pulling like in every other query, the caller is the one that will be responsible to handle the exception and log the information. Second, the polling loop capture the exceptions being thrown from the configuration pulling function and only report an error with the latest exception if the polling has failed in consecutive iterations over the last 90 seconds. This value was chosen because this is about the empirical worst case time that it takes to a node to notice one of the other nodes in the cluster is down (hence not querying it). It is not important for the user or to us to be notified on temporary glitches in availability (through this error at least) and since we are eventually consistent is ok that some nodes will catch up with the configuration later than others. We also set a threshold in which if the configuration still couldn't be retrieved then the logging level is bumped to ERROR. Closes #8574	2021-05-24 10:51:47 +02:00
Piotr Sarna	368a6976ff	qos: allow returning combined service level options Originally, the API for finding a service level controller returned its name, which also implied that only a single service level may be active for a user and provide its options. After adding timeout parameters it makes more sense to return a result which combines multiple service level parameters - e.g. a user can be attached to one level for read timeouts and a separate one for write timeouts.	2021-05-10 12:39:41 +02:00
Eliran Sinvani	fc93133cbe	Service level controller: fix wrong default service level removal log An out of block log print resulted in repeated prints about removal of the default service level. The period of this print is every time the configuration is scanned for changes. It happens when the default service level is one of the last on the map (sorted as in the map). Fixes #8567 Closes #8576	2021-05-03 09:08:41 +03:00
Eliran Sinvani	02d37cb133	workload prioritization: Fix configuration change detection The configuration detection is based on a loop that advances two iterators and compares the two collection for deducing the configuration change. In order to correctly deduce the changes the iteration have to be according to the key (service level name) order for both of the collections. If it doesn't happen the results are undefined and in some cases can lead to a crash of the system. The bug is that the _service_level_db field was implemented using an unordered_map which obviously don't guarantie the configuration change detection assumption. The fix was simply to change the field type to a map instead of unordered_map. Another problem is that when a static service level (i.e default) is at the end of the keys list, it is repeatedly being deleted - which doesn't really do anything since deleting a static service level is just retaining it's defult values but it is stil wrong.	2021-04-27 12:29:31 +02:00
Eliran Sinvani	946fc6af08	workload prioritization: add exception protection in configuration polling Exceptions around the loop polling were not handled properly. This is an issue due to the fact that if an unhandled exception slips out to the configuration polling loop itself it will break it. When the configuration polling loop is broken, any further change to the configuration will not be acted uppon in the nodes where the loop is broken until the node is restarted. The chances for exceptions are now greater than before since in one of the previous commits we started quering the workload prioritization configuration table with a sensible, shorter timeout. This change also adds a logger for the workload prioritization module and some logging mainly arround the configuration polling loop. Most logs are added in the info level since they are not expected to happen frequently but when they do we would like to have some information by default regarding what broke the loop.	2021-04-27 12:29:31 +02:00
Piotr Sarna	55ae110774	qos: make sure to wait for sl updates on shutdown The service level controller spawns an updating thread, which wasn't properly waited for during shutdown. This behavior is now fixed. In order to make the shutdown order more standardized, the operation is split into two phases - draining and stopping. Tests: manual Fixes #8468	2021-04-22 09:58:27 +02:00
Piotr Sarna	41951d34ad	qos: add waiting for the updater future The distributed data updated used to spawn a future without waiting for it. It was quite safe, since the future had its own abort source, but it's better to remember it and wait for it during stop() anyway.	2021-04-12 16:01:04 +02:00

1 2

51 Commits