scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Botond Dénes	49d6bf8947	Merge 'main: safely check stop_signal in-between starting services' from Benny Halevy To simplify aborting scylla while starting the services, add a _ready state to stop_signal, so that until main is ready to be stopped by the abort_source, just register that the signal is caught, and let a check() method poll that and request abort and throw respective exception only then, in controlled points that are in-between starting of services after the service started successfully and a deferred stop action was installed. This patch prevents gate_closed_exception to escape handling when start-up is aborted early with the stop signal, causing https://github.com/scylladb/scylladb/issues/23153 The regression is apparently due to `a25c3eaa1c` Fixes https://github.com/scylladb/scylladb/issues/23153 * Requires backport to 2025.1 due to `a25c3eaa1c` Closes scylladb/scylladb#23103 * github.com:scylladb/scylladb: main: add checkpoints main: safely check stop_signal in-between starting services main: move prometheus start message main: move per-shard database start message	2025-03-06 08:28:29 +02:00
Benny Halevy	b6705ad48b	main: add checkpoints Before starting significant services that didn't have a corresponding call to supervisor::notify before them. Fixes #23153 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-05 07:29:34 +02:00
Benny Halevy	feef7d3fa1	main: safely check stop_signal in-between starting services To simplify aborting scylla while starting the services, Add a _ready state to stop_signal, so that until main is ready to be stopped by the abort_source, just register that the signal is caught, and let a check() method poll that and request abort and throw respective exception only then, in controlled points that are in-between starting of services after the service started successfully and a deferred stop action was installed. Refs #23153 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-05 07:15:17 +02:00
Benny Halevy	282ff344db	main: move prometheus start message The `prometheus_server` is started only conditionally but the notification message is sent and logged unconditionally. Move it inside the condtional code block. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-05 07:09:09 +02:00
Benny Halevy	23433f593c	main: move per-shard database start message It is now logged out of place, so move it to right before calling `start` on every database shard. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-05 07:09:09 +02:00
Amnon Heiman	fd5d1f1f6a	main.cc: label metrics with basic_level The following metrics will be marked with basic_level label: scylla_scylladb_current_version scylla_reactor_utilization Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-03-03 16:58:39 +02:00
Botond Dénes	5d63ef4d15	Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them. Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points. Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line. Closes scylladb/scylladb#22327 * github.com:scylladb/scylladb: tools: Add standard extensions and propagate to schema load cql_test_env: Use add all extensions instead of inidividually main: Move extensions adding to function tomstone_gc: Make validate work for tools	2025-02-26 13:52:47 +02:00
Tomasz Grabiec	3d01ce3707	config: Make tablets_initial_scale_factor live-updateable	2025-02-19 16:29:08 +01:00
Tomasz Grabiec	7e4a61953d	tablets: load_balancer: Pick initial_scale_factor from config So that it can be live-updated.	2025-02-19 16:29:08 +01:00
Tomasz Grabiec	f1bda8d4c1	tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal The limit is enforced by controlling average per-shard tablet replica count in a given DC, which is controlled by per-table tablet count. This is effective in respecting the limit on individual shards as long as tablet replicas are distributed evenly between shards. There is no attempt to move tablets around in order to enforce limits on individual shards in case of imbalance between shards. If the average per-shard tablet count exceeds the limit, all tables which contribute to it (have replicas in the DC) are scaled down by the same factor. Due to rounding up to the nearest power of 2, we may overshoot the per-shard goal by at most a factor of 2. If different DCs want different scale factors of a given table, the lowest scale factor is chosen for a given table. The limit is configurable. It's a global per-cluster config which controls how many tablet replicas per shard in total we consider to be still ok. It controls tablet allocator behavior, when choosing initial tablet count. Even though it's a per-node config, we don't support different limits per node. All nodes must have the same value of that config. It's similar in that regard to other scheduler config items like tablets_initial_scale_factor and target_tablet_size_in_bytes.	2025-02-19 16:29:07 +01:00
Pavel Emelyanov	5d1f74b86a	main: Start sharded<view_builder> earlier The view_builder service is needed by repair service, but is started after it. It's OK in a sense that repair service holds a sharded reference on it and checks whether local_is_initialized() before using it, which is not nice. Fortunately, starting sharded view buidler can be done early enough, because most of its dependencies would be already started by that time. Two exceptions are -- view_update_generator and system_distributed_keyspace. Both can be moved up too with the same justification. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-02-14 20:26:55 +03:00
Gleb Natapov	d288d79d78	api: initialize token metadata API after starting the gossiper Token metadata API now depend on gossiper to do ip to host id mappings, so initialized it after the gossiper is initialized and de-initialized it before gossiper is stopped. Fixes: scylladb/scylladb#22743 Closes scylladb/scylladb#22760	2025-02-13 14:39:05 +01:00
Botond Dénes	4a7a75dfcb	Merge 'tasks: use host_id in task manager' from Aleksandra Martyniuk Use host_id in a children list of a task in task manager to indicate a node on which the child was created. Move TASKS_CHILDREN_REQUEST to IDL. Send it by host_id. Fixes: https://github.com/scylladb/scylladb/issues/22284. Ip to host_id transition; backport isn't needed. Closes scylladb/scylladb#22487 * github.com:scylladb/scylladb: tasks: drop task_manager::config::broadcast_address as it's unused tasks: replace ip with host_id in task_identity api: task_manager: pass gossiper to api::set_task_manager tasks: keep host_id in task_manager tasks: move tasks_get_children to IDL	2025-02-11 11:32:27 +02:00
Ernest Zaslavsky	dee4fc7150	aws creds: add STS and Instance Metadata service credentials providers This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.	2025-02-05 14:57:19 +02:00
Ernest Zaslavsky	d534051bea	aws creds: add env. and file credentials providers This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.	2025-02-05 14:57:19 +02:00
Aleksandra Martyniuk	fe02555c46	tasks: drop task_manager::config::broadcast_address as it's unused	2025-02-05 10:11:54 +01:00
Aleksandra Martyniuk	0c868870b4	api: task_manager: pass gossiper to api::set_task_manager Pass gossiper to api::set_task_manager. It will be used later for host_id to ip transition.	2025-02-05 10:10:29 +01:00
Aleksandra Martyniuk	4470c2f6d3	tasks: keep host_id in task_manager Keep host_id of a node in task manager. If host_id wasn't resolved yet, task manager will keep an empty id. It's a preparation for the following changes.	2025-02-05 10:10:29 +01:00
Ernest Zaslavsky	c911fc4f34	s3 creds: move credentials out of endpoint config This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.	2025-02-04 16:45:23 +02:00
Botond Dénes	98fdf05b0e	Merge 'Fix repair vs storage services initialization order' from Pavel Emelyanov Repair service is started after storage service, while storage service needs to reference repair one for its needs. Recently it was noticed, that this reverse order may cause troubles and was fixed with the help of an extra gate. That's not nice and makes the start-stop mess even worse. The correct fix is to fix the order both services start/stop in. Closes scylladb/scylladb#22368 * github.com:scylladb/scylladb: Revert "repair: add repair_service gate" main: Start repair before storage service repair: Check for sharded<view-builder> when constructing row_level_repair	2025-01-30 11:26:24 +02:00
Avi Kivity	60cdf62fae	Merge 'Remove sharded<system_distributed_keyspace>& argument from storage_service::join_cluster()' from Pavel Emelyanov There's such a reference on storage_service itself, it can use this->_sys_dist_ks instead thus making its API (both internal and external) a bit simpler. Closes scylladb/scylladb#22483 * github.com:scylladb/scylladb: storage_service: Drop sys_dist_ks argument from track_upgrade_progress_to_topology_coordinator() storage_service: Drop sys_dist_ks argument from raft_state_monitor_fiber() storage_service: Drop sys_dist_ks argument from join_topology() storage_service: Drop sys_dist_ks argument from join_cluster()	2025-01-26 15:56:37 +02:00
Kefu Chai	769162de91	tree: correct misspellings these misspellings were identified by codespell. let's fix them. one of them is a part of a user visble string. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22443	2025-01-26 15:54:06 +02:00
Pavel Emelyanov	ca9b59f3b2	storage_service: Drop sys_dist_ks argument from join_cluster() Storage service has _sys_dist_ks onboard and can just use it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-01-24 12:26:32 +03:00
Pavel Emelyanov	fff5b8adbc	main: Start repair before storage service The latter service uses repair, but not the vice-versa, so the correct (de)initialization order should be the same. refs: #2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-01-22 19:21:40 +03:00
Avi Kivity	0092bb5831	Merge 'main: rename `cql_sg_stats` metrics on scheduling group rename' from Piotr Dulikowski This PR contains the missing part of a fix for scylladb/scylla-enterprise#4912 which was omitted during migration of workload prioritization to the source available repository. Even though the regression test for it was ported, it was silently made ineffective by a different fix (scylladb/scylla-enterprise#4764), so this PR also improves the test. Fixes: scylladb/scylladb#22404 No need to backport - service levels are not yet a part of any source-available release. Closes scylladb/scylladb#22416 * github.com:scylladb/scylladb: test/auth_cluster: make test_service_level_metric_name_change useful main: rename `cql_sg_stats` metrics on scheduling group rename	2025-01-22 14:22:09 +02:00
Piotr Dulikowski	de153a2ba7	main: rename `cql_sg_stats` metrics on scheduling group rename This commit contains the part of a fix for scylladb/scylla-enterprise#4912 that was accidentally omitted when workload prioritization were ported from enterprise to scylladb.git repo. Without it, the metrics created by `cql_sg_stats` would not be updated, leading to wrong scheduling group names being used in metrics' names, and could lead to "double metric registration errors" in some unlucky circumstances where a scheduling group would be created, destroyed and then created again. Fixes: scylladb/scylladb#22404	2025-01-20 18:16:46 +01:00
Botond Dénes	1f20f7810e	Merge 'main, encryption: correct misspellings' from Kefu Chai in this changeset, some misspellings identified by codespell were corrected. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#22301 * github.com:scylladb/scylladb: ent/encryption: rename "sie" to "get_opt" ent,main: fix misspellings	2025-01-20 16:43:21 +02:00
Piotr Dulikowski	6aa962f5f4	Merge 'Add audit subsystem for database operations' from Paweł Zakrzewski Introduces a comprehensive audit system to track database operations for security and compliance purposes. This change includes: Core Components: - New audit subsystem for logging database operations - Service level integration for proper resource management - CQL statement tracking with operation categories - Login process integration for tenant management Key Features: - Configurable audit logging (syslog/table) - Operation categorization (QUERY/DML/DDL/DCL/AUTH/ADMIN) - Selective auditing by keyspace/table - Password sanitization in audit logs - Service level shares support (1-1000) for workload prioritization - Proper lifecycle management and cleanup I ran the dtests for audit (manually enabled) and they pass. The in-repo tests pass. Notably, there should be no non-whitespace changes between this and scylla-enterprise Fixes scylladb/scylla-enterprise#4999 Closes scylladb/scylladb#22147 * github.com:scylladb/scylladb: audit: Add shares support to service level management audit: Add service level support to CQL login process audit: Add support to CQL statements audit: Integrate audit subsystem into Scylla main process audit: Add documentation for the audit subsystem audit: Add the audit subsystem	2025-01-17 13:14:55 +01:00
Kamil Braun	89ee2a6834	Merge 'drop ip addresses from token metadata' from Gleb Now that all topology related code uses host ids there is not point to maintain ip to id (and back) mappings in the token metadata. After the patch the mapping will be maintained in the gossiper only. The rest of the system will use host ids and in rare cases where translation is needed (mostly for UX compatibility reasons) the translation will be done using gossiper. Fixes: scylladb/scylla#21777 * 'gleb/drop-ip-from-tm-v3' of github.com:scylladb/scylla-dev: (57 commits) hint manager: do not translate ip to id in case hint manager is stopped already locator: token_metadata: drop update_host_id() function that does nothing now locator: topology: drop indexing by ips repair: drop unneeded code storage_service: use host_id to look for a node in on_alive handler storage_proxy: translate ips to ids in forward array using gossiper locator: topology: remove unused functions storage_service: check for outdated ip in on_change notification in the peers table storage_proxy: translate id to ip using address map in tablets's describe_ring code instead of taking one from the topology topology coordinator: change connection dropping code to work on host ids cql3: report host id instead of ip in error during SELECT FROM MUTATION_FRAGMENTS query locator: drop unused function from tablet_effective_replication_map api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead locator: token_metadata: remove unused ip based functions locator: network_topology_strategy: use host_id based function to check number of endpoints in dcs gossiper: drop get_unreachable_token_owners functions storage_service: use gossiper to map ip to id in node_ops operations storage_service: fix indentation after the last patch storage_service: drop loops from node ops replace_prepare handling since there can be only one replacing node token_metadata: drop no longer used functions ...	2025-01-17 11:00:52 +01:00
Gleb Natapov	50fb22c8f9	locator: topology: drop indexing by ips Do not track id to ip mapping in the topology class any longer. There are no remaining users.	2025-01-16 16:37:08 +02:00
Gleb Natapov	122d58b4ad	api: view_build_statuses: do not use IP from the topology, but translate id to ip using address map instead	2025-01-16 16:37:07 +02:00
Gleb Natapov	1b6e1456e5	messaging_service: drop the usage of ip based token_metadata APIs We want to drop ips from token_metadata so move to use host id based counterparts. Messaging service gets a function that maps from ips to id when is starts listening.	2025-01-16 16:37:06 +02:00
Gleb Natapov	4d7c05ad82	hints: move create_hint_sync_point function to host ids One of its caller is in the RESTful API which gets ips from the user, so we convert ips to ids inside the API handler using gossiper before calling the function. We need to deprecate ip based API and move to host id based.	2025-01-15 16:30:28 +02:00
Gleb Natapov	755ee9a2c5	api: do not use token_metadata to retrieve ip to id mapping in token_metadata RESTful endpoints We want to drop ip knowledge from the token_metadata, so use gossiper to retrieve the mapping instead.	2025-01-15 16:30:28 +02:00
Calle Wilund	4aaf3df45e	main: Move extensions adding to function Easily called from elsewhere. The extensions we should always include (oxymoron?)	2025-01-15 12:07:39 +00:00
Paweł Zakrzewski	1810e2e424	audit: Integrate audit subsystem into Scylla main process Adds core integration of the audit subsystem into Scylla's main process flow. Changes include: - Import audit subsystem header - Initialize audit system during server startup using configuration and token metadata - Start audit system after API server initialization with query processor and memory manager - Add proper shutdown sequence for audit system using RAII pattern - Add error handling for audit system initialization failures The audit system is now properly integrated into Scylla's lifecycle, ensuring: - Correct initialization order relative to other subsystems - Proper resource cleanup during shutdown - Graceful error handling for initialization failures	2025-01-15 11:10:36 +01:00
Piotr Dulikowski	72f28ce81e	Merge 'main, view: Pair view builder drain with its start' from Dawid Mędrek In this PR, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb/scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad8378c7). Unfortunately, it led to other issues described in scylladb/scylladb#21534, so we revert that patch. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder; that function will also call `view_builder::drain()`. Another notable thing is `view_builder::drain()` may be called earlier by `storage_service::do_drain()`. In other words, these changes cover the situation when Scylla runs into a problem when starting up. Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too. Fixes scylladb/scylladb#20772 Fixes scylladb/scylladb#21534 Closes scylladb/scylladb#21909 * github.com:scylladb/scylladb: test/topology_custom: Add test for Scylla with disabled view building main, view: Pair view builder drain with its start Revert "main,cql_test_env: start group0_service before view_builder"	2025-01-15 09:50:26 +01:00
Takuya ASADA	f2a53d6a2c	dist: make p11-kit-trust.so able to work in relocatable package Currently, our relocatable package doesn't contains p11-kit-trust.so since it dynamically loaded, not showing on "ldd" results (Relocatable packaging script finds dependent libraries by "ldd"). So we need to add it on create-relocatable-pacakge.py. Also, we have two more problems: 1. p11 module load path is defined as "/usr/lib64/pkcs11", not referencing to /opt/scylladb/libreloc (and also RedHat variants uses different path than Debian variants) 2. ca-trust-source path is configured on build time (on Fedora), it compatible with RedHat variants but not compatible with Debian variants To solve these problems, we need to override default p11-kit configuration. To do so, we need to add an configuration file to /opt/scylladb/share/pkcs11/modules/p11-kit-trust.module. Also, ofcause p11-kit doesn't reference /opt/scylladb by default, we need to override load path by p11_kit_override_system_files(). On the configuration file, we can specify module load path by "modules: <path>", and also we can specify ca-trust-source path by "x-init-reservied: paths=<path>". Fixes scylladb/scylladb#13904 Closes scylladb/scylladb#22302	2025-01-15 10:09:17 +02:00
Kefu Chai	92c6c8a32f	ent,main: fix misspellings these misspellings are identified by codespell. they are either in comment or logging messages. let's fix them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-14 21:08:17 +08:00
Dawid Mędrek	06ce976370	main, view: Pair view builder drain with its start In these changes, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb/scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad8378c7). Unfortunately, it led to other issues described in scylladb/scylladb#21534. We reverted that change in the previous commit. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder; that function will also call `view_builder::drain()`. Another notable thing is `view_builder::drain()` may be called earlier by `storage_service::do_drain()`. In other words, these changes cover the situation when Scylla runs into a problem when starting up. Fixes scylladb/scylladb#20772	2025-01-13 00:41:22 +01:00
Dawid Mędrek	a5715086a4	Revert "main,cql_test_env: start group0_service before view_builder" The patch solved a problem related to an initialization order (scylladb/scylladb#20772), but we ran into another one: scylladb/scylladb#21534. After moving the initialization of group0_service, it ended up being destroyed AFTER the CDC generation service would. Since CDC generations are accessed in `storage_service::topology_state_load()`: ``` for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) { rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id); co_await _cdc_gens.local().handle_cdc_generation(gen_id); ``` we started getting the following failure: ``` Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed. ``` We're reverting the patch to go back to a more stable version of Scylla and in the following commit, we'll solve the original issue in a more systematic way. This reverts commit `7bad8378c7`.	2025-01-12 18:13:56 +01:00
Avi Kivity	814942505f	Merge 'Introduce Encryption-at-Rest (EAR) for sstables and commitlog' from Calle Wilund Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631 EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data. Introduces OpenSSL based file level encrypted storage, managed via a set of providers ranging from local files to cloud KMS providers. For a more comprehensive explanation, see the included docs (or if possible, original source tree). Manual bulk merge of EAR feature from enterprise repo to main scylla repo. Breaks some features apart, but main EAR is still a humongous commit, because to separate this I would have to mess with code incrementally, adding time and risk. This PR includes the local file gen tool, tests and also p11 validation. Note: CI will not execute the full tests unless master CI is set to provide the same environment as the enterprise one. Not sure about the status of this ATM. Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality will be enabled, otherwise not. Closes scylladb/scylladb#22233 * github.com:scylladb/scylladb: docs: Add EAR docs main/build: Add p11-kit and initialize tools: Add local-file-key-generator tool tests: Add EAR tests tmpdir: shorten test tempdir path EAR: port the ear feature from enterprise cql_test_env: Add optional query timeout schema/migration_manager: Add schema validate sstables: add get_shared_components accessor config/config_file: Add exports and definitions of config_type_for<>	2025-01-12 16:10:46 +02:00
Benny Halevy	8d2ff8a915	utils: add disk_space_monitor Instantiated only on shard 0. Currently, only subscribe from unit test Manual unit test using loop mount was added. Note that the test requires sudo access and root access to /dev/loop, so it cannot run in rootless podman instance, and it'd fail with Permission denied. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21523	2025-01-12 14:51:15 +02:00
Calle Wilund	083f735366	main/build: Add p11-kit and initialize For p11 certification/validation	2025-01-09 10:40:47 +00:00
Calle Wilund	f901beec87	tools: Add local-file-key-generator tool For generating key files for local provider	2025-01-09 10:40:47 +00:00
Kefu Chai	e4463b11af	treewide: replace boost::algorithm::join() with fmt::join() Replace usages of `boost::algorithm::join()` with `fmt::join()` to improve performance and reduce dependency on Boost. `fmt::join()` allows direct formatting of ranges and tuples with custom separators without creating intermediate strings. When formatting comma-separated values into another string, fmt::join() avoids the overhead of temporary string creation that `boost::algorithm::join()` requires. This change also helps streamline our dependencies by leveraging the existing fmt library instead of Boost.Algorithm. To avoid the ambiguity, some caller sites were updated to call `seastar::format()` explicitly. See also - boost::algorithm::join(): https://www.boost.org/doc/libs/1_87_0/doc/html/string_algo/reference.html#doxygen.join_8hpp - fmt::join(): https://fmt.dev/11.0/api/#ranges-api Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22082	2025-01-07 12:45:05 +02:00
Piotr Dulikowski	49f5fc0e70	api: introduce service levels specific API Introduces two endpoints with operations specific to service levels: - switch_tenants: updates the scheduling group of all connections to be aligned with the service level specific to the logged in user. This is mostly legacy API, as with service levels on raft this is done automatically. - count_connections: for each user and for each scheduling group, counts how many connections are assigned to that user and scheduling group. This API is used in tests.	2025-01-02 07:13:34 +01:00
Piotr Dulikowski	f1b9737e07	messaging_service: use separate set of connections per service levels In order to make sure that the scheduling group carries over RPC, and also to prevent priority inversion issues between different service levels, modify the messaging service to use separate RPC connections for each service level in order to serve user traffic. The above is achieved by reusing the existing concept of "tenants" in messaging service: when a new service level (or, more accurately, service-level specific scheduling group) is first used in an RPC, a new tenant is created. In addition, extend the service level controller to be able to quickly look up the service level name of the currently active scheduling group in order to speed up the logic for choosing the tenant.	2025-01-02 07:13:34 +01:00
Piotr Dulikowski	7383013f43	replica/database: add reader concurrency semaphore groups Replace the reader concurrency semaphores for user reads and view updates with the newly introduced reader concurrency semaphore group, which assigns a semaphore for each service level. Each group is statically assigned to some pool of memory on startup and dynamically distribute this memory between the semaphores, relative to the number of shares of the corresponding scheduling group. The intent of having a separate reader concurrency semaphore for each scheduling group is to prevent priority inversion issues due to reads with different priorities waiting on the same semaphore, as well as make memory allocation more fair between service levels due to the adjusted number of shares.	2025-01-02 07:13:34 +01:00
Piotr Dulikowski	4cfd26efaf	qos: manage and assign scheduling groups to service levels Introduce the core logic of workload prioritization, responsible for assigning scheduling groups to service levels. The service level controller maintains a pool of scheduling groups for the currently present service levels, as well as a pool of unused scheduling groups which were previously used by some service level that was deleted during node's lifetime. When a new service level is created, the SL controller either assigns a scheduling group from the unused SG pool, or creates a new one if the pool is empty. The scheduling group is renamed to "sl:<scheduling group name>". When updating shares of a service level (and also when creating a new service level), the shares of the corresponding scheduling group are synchronized with those of the service level. When a service level is deleted, its group is released to the aforementioned pool of unused scheduling groups and the prefix of its name is changed from "sl:" to "sl_deleted:". For now, these scheduling groups are not used by any user operations. This will be changed in subsequent commits.	2025-01-02 07:13:34 +01:00

1 2 3 4 5 ...

1422 Commits