scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Avi Kivity	0a5d9532f9	Merge 'Sanitize batchlog manager start/stop' from Pavel Emelyanov This code is now spread over main and differs in cql_test_env. The PR unifies both places and makes the manager start-stop look standard refs: #2795 Closes #15375 * github.com:scylladb/scylladb: batchlog_manager: Remove start() method batchlog_manager: Start replay loop in constructor main, cql_test_env: Start-stop batchlog manager in one "block" batchlog_manager: Move shard-0 check into batchlog_replay_loop() batchlog_manager: Fix drain() reentrability	2023-09-13 18:20:56 +03:00
Kamil Braun	a184b07cbb	Merge 'raft topology: make CDC_GENERATIONS_V3 single-partition, timeuuid-sorted' from Patryk Jędrzejczak We make the `CDC_GENERATIONS_V3` table single-partition and change the clustering key from `range_end` to `(id, range_end)`. We also change the type of `id` to `timeuuid` and ensure that a new generation always has the highest `id`. These changes allow efficient clearing of obsolete CDC generation data, which we need to prevent Raft-topology snapshots from endlessly growing as we introduce new generations over time. All this code is protected by an experimental feature flag. It includes the definition of `CDC_GENERATIONS_V3`. The table is not created unless the feature flag is enabled. Fixes #15163 Closes #15319 * github.com:scylladb/scylladb: system_keyspace: rename cdc_generation_id_v2 system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3 cdc: generation: remove topology_description_generator cdc: do not create uuid in make_new_generation_data system_kayspace: make CDC_GENERATIONS_V3 single-partition cdc: generation: introduce get_common_cdc_generation_mutations cdc: generation: rename get_cdc_generation_mutations	2023-09-13 12:54:49 +02:00
Pavel Emelyanov	d48aff5789	batchlog_manager: Remove start() method It's now a no-op, can be dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:37:52 +03:00
Pavel Emelyanov	3966a50ed4	batchlog_manager: Start replay loop in constructor ... and sanitize the future used on stop. The loop in question is now started in .start(), but all callers now construct the manager late enough, so the loop spawning can be moved. This also calls for renaming the future member of the class and allows to make it regular, not shared, future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:35:53 +03:00
Pavel Emelyanov	9f45778467	batchlog_manager: Move shard-0 check into batchlog_replay_loop() Currently the only caller of it is the batchlog manager itself. It checks for the shard-id to be zero, calls the method, then the method asserts that it's run on shard-0. Moving the check into the method removes the need for assertion and makes further patching simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:32:12 +03:00
Pavel Emelyanov	38d0ea0916	batchlog_manager: Fix drain() reentrability Currently drain() is called twise -- first time from storage_service::drain() (on shutdown), second via batchlog_manager::stop(). The routine is unintentinally re-entrable, because: - explicit check for not aborting the abort source twise - breaking semaphore can be done multiple times - co-await-ing of the _started future works because the future is shared That's not extremely elegant, better to make the drain() bail out early if it was already called. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-12 16:30:07 +03:00
Patryk Jędrzejczak	92209996b5	system_keyspace: rename cdc_generation_id_v2 Changing the second value of cdc_generation_id_v2 from uuid_type to timeuuid_type made the name of cdc_generation_id_v2 unsuitable because it does not match cdc::generation_id_v2 anymore.	2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak	1c58c6336a	system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3 We change the type of IDs in CDC_GENERATIONS_V3 to timeuuid to give them a time-based order. We also change how we initialize them so that the new CDC generation always has the highest ID. This is the last step to enabling the efficient clearing of obsolete CDC generation data. Additionally, we change the types of current_cdc_generation_uuid, new_cdc_generation_data_uuid and the second values of the elements in unpublished_cdc_generations to timeuuid, so that they match id in CDC_GENERATIONS_V3.	2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak	2cd430ac80	system_kayspace: make CDC_GENERATIONS_V3 single-partition We make CDC_GENERATIONS_V3 single-partition by adding the key column and changing the clustering key from range_end to (id, range_end). This is the first step to enabling the efficient clearing of obsolete CDC generation data, which we need to prevent Raft-topology snapshots from endlessly growing as we introduce new generations over time. The next step is to change the type of the id column to timeuuid. We do it in the following commits. After making CDC_GENERATIONS_V3 single-partition, there is no easy way of preserving the num_ranges column. As it is used only for sanity checking, we remove it to simplify the implementation.	2023-09-12 09:51:45 +02:00
Patryk Jędrzejczak	ed1c1369d9	cdc: generation: rename get_cdc_generation_mutations In the following commits, we modify the CDC_GENERATIONS_V3 schema to enable efficient clearing of obsolete CDC generation data. These modifications make the current get_cdc_generation_mutations work only for the CDC_GENERATIONS_V2 schema, and we need a new function for CDC_GENERATIONS_V3, so we add the "_v2" suffix.	2023-09-11 12:30:21 +02:00
Botond Dénes	b062b245ad	Merge 'Don't cache dc:rack on system keyspace local cache' from Pavel Emelyanov The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three. Closes #15280 * github.com:scylladb/scylladb: system_keyspace: Don't require snitch argument on start system_keyspace: Don't cache local dc:rack pair system_keyspace: Save local info with explicit location storage_service: Get endpoint location from snitch, not system keyspace snitch: Introduce and use get_location() method repair: Local location variables instead of system keyspace's one repair: Use full endpoint location instead of datacenter part	2023-09-11 10:26:26 +03:00
Patryk Jędrzejczak	d404443b54	system_keyspace: load unpublished_cdc_generations to topology We extend service::topology with the list of unpublished CDC generations and load its contents from system.topology. This step is the last one in making unpublished CDC generations accessible to the topology coordinator. Note that when we load unpublished_cdc_generations, we don't perform any sanity checks contrary to current_cdc_generation_uuid. Every unpublished CDC generation was a current generation once, and we checked it at that moment.	2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak	5ed9d4db6d	raft topology: add unpublished_cdc_generations to system.topology In the following commits, we replace the topology::transition_state::publish_cdc_generation state with a background fiber that continually publishes committed CDC generations. To make these generations accessible to the topology coordinator, we store them in the new column of system.topology -- unpublished_cdc_generations.	2023-09-08 09:05:01 +02:00
Aleksandra Martyniuk	8a65477202	tasks: db: change default task_ttl value If a test isn't going to use task manager or isn't interested in statuses of finished tasks, then keeping them in the memory for some time (currently 10s by default) after they are finished is a memory waste. Set default task_ttl value to zero. It can be changed by setting --task-ttl-in-seconds or through rest api (/task_manager/ttl). In conf/scylla.yaml set task-ttl-in-seconds to 10. Closes #15239	2023-09-07 12:42:29 +03:00
Dawid Medrek	c7fe5d7f94	utils/lister: Limit the API of scan_dir() to fs::path Right now, the function allows for passing the path to a file as a seastar::sstring, which is then converted to std::filesystem::path -- implicitly to the caller. However, the function performs I/O, and there is no reason to accept any other type than std::filesystem::path, especially because the conversion is straightforward. Callers can perform it on their own. This commit introduces the more constrained API. Closes #15266	2023-09-05 20:50:42 +03:00
Pavel Emelyanov	5d52a35e05	system_keyspace: Don't require snitch argument on start Now system keyspace is finally independent from snitch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-05 12:57:09 +03:00
Pavel Emelyanov	1daa8fa3bb	system_keyspace: Don't cache local dc:rack pair Now no code needs it from system keyspace Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-05 12:56:45 +03:00
Pavel Emelyanov	9926917bf5	system_keyspace: Save local info with explicit location On boot system keyspace is kicked to insert local info into system.local table. Among other things there's dc:rack pair which sys.ks. gets from its cache which, in turn, should have been previously initialized from snitch on sys.ks. start. This patch makes the local info updating method get the dc:rack from caller via argument. Callers, in turn, call snitch directly, because these are main and cql_test_env startup routines. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-05 12:54:46 +03:00
Kefu Chai	f6cca741ea	config: remove "experimental" option "experimental" option was marked "Unused" in `64bc8d2f7d`. but we chose to keep it in hope that the upgrade test does not fail. despite that the upgrade tests per-se survived the "upgrade", after the upgrade, the tests exercising the experimental features are still failing hard. they have not been updated to set the "experimental-features" option, and are still relying on "experimental" to enable all the experimental features under test. so, in this change, let's just drop the option so that scylla can fail early at seeing this "experimental" option. this should help us to identify the tests relying on it quicker. as the "experimental" features should only be used in development environment, this change should have no impact to production. Refs #15214 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15233	2023-09-05 10:09:04 +03:00
Piotr Smaroń	eb46f1bd17	guardrails: restrict replication factor (RF) Replacing `minimum_keyspace_rf` config option with 4 config options: `{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which allow us to impose soft limits (issue a warning) and hard limits (not execute CQL) on RF when creating/altering a keyspace. The reason to rather replace than extend `minimum_keyspace_rf` config option is to be aligned with Cassandra, which did the same, and has the same parameters' names. Only min soft limit is enabled by default and it is set to 3, which means that we'll generate a CQL warning whenever RF is set to either 1 or 2. RF's value of 0 is always allowed and means that there will not be any replicas on a given DC. This was agreed with PM. Because we don't allow to change guardrails' values when scylla is running (per PM), there're no tests provided with this PR, and dtests will be provided separately. Exceeding guardrails' thresholds will be tracked by metrics. Resolves #8619 Refs #8892 (the RF part, not the replication-strategy part) Closes #14262	2023-09-04 19:22:17 +03:00
Avi Kivity	78fc3b5f56	config: rename stream_plan_ranges_percentage to *_fraction The value is specified as a fraction between 0 and 1, so don't mislead users into specifying a value between 0 and 100. Closes #15261	2023-09-03 23:24:29 +03:00
Avi Kivity	9a3d57256a	Merge 'config: add index_cache_fraction' from Michał Chojnowski Index caching was disabled by default because it caused performance regressions for some small-partition workloads. See https://github.com/scylladb/scylladb/issues/11202. However, it also means that there are workloads which could benefit from the index cache, but (by default) don't. As a compromise, we can set a default limit on the memory usage of index cache, which should be small enough to avoid catastrophic regressions in small-partition workloads, but big enough to accommodate workloads where index cache is obviously beneficial. This series adds such a configurable limit, sets it to to 0.2 of total cache memory by default, and re-enables index caching by default. Fixes #15118 Closes #14994 * github.com:scylladb/scylladb: test: boost/cache_algorithm_test: add cache_algorithm_test sstables: partition_index_cache: deglobalize stats utils: cached_file: deglobalize cached_file metrics db: config: enable index caching by default config: add index_cache_fraction utils: lru: add move semantics to list links	2023-09-03 19:39:31 +03:00
Michał Chojnowski	f00bed9429	sstables: partition_index_cache: deglobalize stats Move partition_index_cache stats from a thread_local variable to cache_tracker. After the change, partition_index_cache receives a reference to the stats via constructor, instead of referencing a global. This is needed so that cache_tracker can know the memory usage of index caches (for cache eviction purposes) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Michał Chojnowski	c7d9d35030	utils: cached_file: deglobalize cached_file metrics Move cached_file metrics from a thread_local variable to cache_tracker. This is needed so that cache_tracker can know the memory usage of index caches (for purposes of cache eviction) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Michał Chojnowski	023accf246	db: config: enable index caching by default Index caching was disabled by default because it caused performance regressions for some small-partition workloads. See #11202. However, it also means that there are workloads which could benefit from the index cache, but (by default) don't. As a compromise, we can set a default limit on the memory usage of index cache, which should be small enough to avoid catastrophical regressions in small-partition workloads, but big enough to accomodate workloads where index cache is obviously beneficial. This patch sets such a limit to 0.2 of total cache memory, and re-enables index caching by default.	2023-09-01 22:34:23 +02:00
Michał Chojnowski	50b429f255	config: add index_cache_fraction Adds a configurable upper limit to memory usage by index caches. See the source code comments added in this patch for more details. This patch shouldn't change visible behaviour, because the limit is set to 1.0 by default, so it is never triggerred. We will change the default in a future patch.	2023-09-01 22:34:23 +02:00
Tomasz Grabiec	7b65d4d947	Merge 'Gossiper: provide strong exception safety for endpoint state changes' from Benny Halevy This series ensures that endpoint state changes (for each single endpoint) are applied to the gossiper endpoint_state_map as a whole and on all shards. Any failure in the process will keep the existing endpoint state intact. Note that verbs that modify the endpoint states of multiple endpoints may still succeed to modify some of them before hitting an error and those changes are committed to the endpoint_state_map, so we don't ensure atomicity when updating multiple endpoints' states. Fixes scylladb/scylladb#14794 Fixes scylladb/scylladb#14799 Closes #15073 * github.com:scylladb/scylladb: gossiper: move endpoint_state by value to apply it gossiper: replicate: make exception safe gms: pass endpoint_state_ptr to endpoint_state change subscribers gossiper: modify endpoint state only via replicate gossiper: keep and serve shared endpoint_state_ptr in map gossiper: get_max_endpoint_state_version: get state by reference api/failure_detector: get_all_endpoint_states: reduce allocations cdc/generation: get_generation_id_for: get endpoint_state& gossiper: add for_each_endpoint_state helpers gossiper: add num_endpoints gossiper: add my_endpoint_state	2023-09-01 12:23:19 +02:00
Benny Halevy	4f5ffc7719	gossiper: add for_each_endpoint_state helpers Before changing _endpoint_state_map to hold a lw_shared_ptr<endpoint_state>, provide synchronous helpers for users to traverse all endpoint_states with no need to copy them (as long as the called func does not yield). With that, gossiper::get_endpoint_states() can be made private. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 08:32:31 +03:00
Dawid Medrek	9afaf39acb	Get rid of UB in commitlog.hh Identifiers starting with an underscore followed by a capital letter are reserved. They should not be used. Closes #15227	2023-08-31 00:03:04 +03:00
Benny Halevy	2c54d7a35a	view, storage_proxy: carry effective_replication_map along with endpoints When sending mutation to remote endpoint, the selected endpoints must be in sync with the current effective_replication_map. Currently, the endpoints are sent down the storage_proxy stack, and later on an effective_replication_map is retrieved again, and it might not match the target or pending endpoints, similar to the case seen in https://github.com/scylladb/scylladb/issues/15138 The correct way is to carry the same effective replication map used to select said endpoints and pass it down the stack. See also https://github.com/scylladb/scylladb/pull/15141 Fixes scylladb/scylladb#15144 Fixes scylladb/scylladb#14730 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15142	2023-08-29 09:08:42 +03:00
Kamil Braun	93be4c0cb0	Merge 'Base node liveliness consistently on gossiper::is_alive' from Benny Halevy Currently he gossiper marks endpoint_state objects as alive/dead. I some cases the endpoint_state::is_alive function is checked but in many other cases gossiper::is_alive(endpoint) is used to determine if the endpoint is alive. This series removed the endpoint_state::is_alive state and moves all the logic to gossiper::is_alive that bases its decision on the endpoint having an endpoint_state and being in the _live_endpoints set. For that, the _live_endpoints is made sure to be replicated to all shards when changed and the endpoint_state changes are serialized under lock_endpoint, and also making sure that the endpoint_state in the _endpoint_states_map is never updated in place, but rather a temporary copy is changed and then safely replicated using gossiper::replicate Refs https://github.com/scylladb/scylladb/issues/14794 Closes #14801 * github.com:scylladb/scylladb: gossiper: mark_alive: remove local_state param endpoint_state: get rid of _is_alive member and methods gossiper: is_alive: use _live_endpoints gossiper: evict_from_membership: erase endpoint from _live_endpoints gossiper: replicate_live_endpoints_on_change: use _live_endpoints_version to detect change gossiper: run: no need to replicate live_endpoints gossiper: fold update_live_endpoints_version into replicate_live_endpoints_on_change gossiper: add mutate_live_and_unreachable_endpoints gossiper: reset_endpoint_state_map: clear also shadow endpoint sets gossiper: reset_endpoint_state_map: clear live/unreachable endpoints on all shards gossiper: functions that change _live_endpoints must be called on shard 0 gossiper: add lock_endpoint_update_semaphore gossiper: make _live_endpoints an unordered_set endpoint_state: use gossiper::is_alive externally	2023-08-23 17:18:05 +02:00
Patryk Jędrzejczak	ef2eac9941	raft topology: make every type in request_param a named struct We make every alternative type in the request_param variant a named struct to make the code more readable. Additionally, this change will make extending request parameters easier if we decide to do so in the future. Closes #15132	2023-08-23 16:56:00 +02:00
Kamil Braun	169d19e5b0	Merge 'raft topology: support --ignore-dead-nodes in removenode and replace' from Patryk Jędrzejczak We add support for `--ignore-dead-nodes` in `raft_removenode` and `--ignore-dead-nodes-for-replace` in `raft_replace`. For now, we allow passing only host ids of the ignored nodes. Supporting IPs is currently impossible because `raft_address_map` doesn't provide a mapping from IP to a host id. The main steps of the implementation are as follows: - add the `ignore_nodes` column to `system.topology`, - set the `ignore_nodes` value of the topology mutation in `raft_removenode` and `raft_replace`, - extend `service::request_param` with alternative types that allow storing a set of ids of the ignored nodes, - load `ignore_nodes` from `system.topology` into `request_param` in `system_keyspace::load_topology_state`, - add `ignore_nodes` to `exclude_nodes` in `topology_coordinator::exec_global_command`, - pass `ignore_nodes` to `replace_with_repair` and `remove_with_repair` in `storage_service::raft_topology_cmd_handler`. Additionally, we add `test_raft_ignore_nodes.py` with two tests that verify the added changes. Fixes #15025 Closes #15113 * github.com:scylladb/scylladb: test: add test_raft_ignore_nodes test: ManagerClient.remove_node: allow List[HostId] for ignore_dead raft topology: pass ignore_nodes to {replace, remove}_with_repair raft topology: exec_global_command: add ignore_nodes to exclude_nodes raft topology: exec_global_command: change type of exclude_nodes topology_state_machine: extend request_param with a set of raft ids raft topology: set ignore_nodes in raft_removenode and raft_replace utils: introduce split_comma_separated_list raft topology: add the ignore_nodes column to system.topology	2023-08-22 18:04:59 +02:00
Kamil Braun	cdc3cd2b79	Merge 'raft: add fencing tests' from Petr Gusev In this PR a simple test for fencing is added. It exercises the data plane, meaning if it somehow happens that the node has a stale topology version, then requests from this node will get an error 'stale topology'. The test just decrements the node version manually through CQL, so it's quite artificial. To test a more real-world scenario we need to allow the topology change fiber to sometimes skip unavailable nodes. Now the algorithm fails and retries indefinitely in this case. The PR also adds some logs, and removes one seemingly redundant topology version increment, see the commit messages for details. Closes #14901 * github.com:scylladb/scylladb: test_fencing: add test_fence_hints test.py: output the skipped tests test.py: add skip_mode decorator and fixture test.py: add mode fixture hints: add debug log for dropped hints hints: send_one_hint: extend the scope of file_send_gate holder pylib: add ScyllaMetrics hints manager: add send_errors counter token_metadata: add debug logs fencing: add simple data plane test random_tables.py: add counter column type raft topology: don't increment version when transitioning to node_state::normal	2023-08-22 16:28:21 +02:00
Patryk Jędrzejczak	1f57d80ba1	topology_state_machine: extend request_param with a set of raft ids We add two new alternative types to service::request_param: removenode_param and replace_param. They allow storing the list of ignored nodes loaded from the ignore_nodes column of system.topology. We also remove the raft::server_id type because it has been only used by the replace operation.	2023-08-22 14:17:37 +02:00
Petr Gusev	439c91851f	hints: add debug log for dropped hints Dropping data is rather important event, let's log it at least at the debug level. It'll help in debugging tests.	2023-08-22 15:48:40 +04:00
Petr Gusev	9fd3df13a2	hints: send_one_hint: extend the scope of file_send_gate holder The problem was that the holder in with_gate call was released too early. This happened before the possible call to on_hint_send_failure in then_wrapped. As a result, the effects of on_hint_send_failure (segment_replay_failed flag) were not visible in send_one_file after ctx_ptr->file_send_gate.close(), so we could decide that the segment was sent in full and delete it even if sending of some hints led to errors. Fixes #15110	2023-08-22 15:48:40 +04:00
Petr Gusev	1b7603af23	hints manager: add send_errors counter There was no indication of problems in the hints manager metrics before. We need this counter for fencing tests in the later commit, but it seems to be useful on its own.	2023-08-22 14:31:04 +04:00
Patryk Jędrzejczak	0beabdc6ba	utils: introduce split_comma_separated_list Three places handle comma-separated lists similarly: - ss::remove_node.set(...) in api::set_storage_service, - storage_service::parse_node_list, - storage_service::is_repair_based_node_ops_enabled. In the next commit, the fourth place that needs the same logic appears -- storage_service::raft_replace. It needs to load and parse the --ignore-dead-nodes-for-replace param from config. Moreover, the code in is_repair_based_node_ops_enabled is different and doesn't seem right. We swap '\"' and '\'' with ' ' but don't do anything with it afterward. To avoid code duplication and fix is_repair_based_node_ops_enabled, we introduce the new function utils::split_comma_separated_list. This change has a small side effect on logging. For example, ignore_nodes_strs in storage_service::parse_node_list might be printed in a slightly different form.	2023-08-22 10:30:36 +02:00
Patryk Jędrzejczak	16f5db8af2	raft topology: add the ignore_nodes column to system.topology In the following commits, we add support for --ignore-dead-nodes in raft_removenode and --ignore-dead-nodes-for-replace in raft_replace. To make these request parameters accessible for the topology coordinator, we store them in the new ignore_nodes column of system.topology.	2023-08-22 10:30:12 +02:00
Benny Halevy	97061cc3b8	endpoint_state: use gossiper::is_alive externally Before we remove endpoint_state:_is_alive to rely solely on gossipper::_live_endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-22 09:06:09 +03:00
Avi Kivity	ce43effc21	Merge "fix rebuild with consistent topology management" From Gleb Natapov " The series fixes bogus asserting during topology state load and add a test that runs rebuild to make sure the code will not regress again. Fixes #14958 " * 'gleb/rebuilding_fix_v1' of github.com:scylladb/scylla-dev: test: add rebuild test system_keyspace: fix assertion for missing transition_state	2023-08-21 16:00:42 +03:00
Pavel Emelyanov	6bc30f1944	system_keyspace: De-bloat .setup() from messing with system.local On boot several manipulations with system.local are performed. 1. The host_id value is selected from it with key = local If not found, system_keyspace generates a new host_id, inserts the new value into the table and returns back 2. The cluster_name is selected from it with key = local Then it's system_keyspace that either checks that the name matches the one from db::config, or inserts the db::config value into the table 3. The row with key = local is updated with various info like versions, listen, rpc and bcast addresses, dc, rack, etc. Unconditionally All three steps are scattered over main, p.1 is called directly, p.2 and p.3 are executed via system_keyspace::setup() that happens rather late. Also there's some touch of this table from the cql_test_env startup code. The proposal is to collect this setup into one place and execute it early -- as soon as the system.local table is populated. This frees the system_keyspace code from the logic of selecting host id and cluster name leaving it to main and keeps it with only select/insert work. refs: #2795 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #15082	2023-08-20 21:24:31 +03:00
Kefu Chai	12d6ec5a18	config: respect --log-with-color 1 scylladb overrides some of seastar logging related options with its own options by applying them with `logging::apply_settings()`. but we fail to inherit `with_color` from Seastar as we are using the designated initializer, so the unspecified members are zero initialized. that's why we always have logging message in black and white even if scylla is running in a tty and `--log-with-color 1` is specified. so, make the debugging life more colorful, let's inherit the option from Seastar, and apply it when setting logging related options. see also `29e09a3292` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15076	2023-08-20 13:47:43 +03:00
Tomasz Grabiec	bd8bb5d4b1	Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho Compaction group is the data plane for tablets, so this integration allows each tablet to have its own storage (memtable + sstables). A crucial step for dynamic tablets, where each tablet can be worked on independently. There are still some inefficiencies to be worked on, but as it is, it already unlocks further development. ``` INFO 2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata INFO 2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf ``` Closes #14863 * github.com:scylladb/scylladb: Kill scylla option to configure number of compaction groups replica: Wire tablet into compaction group token_metadata: Add this_host_id to topology config replica: Switch to chunked_vector for storing compaction groups replica: Generate group_id for compaction_group on demand	2023-08-18 15:17:17 +02:00
Avi Kivity	1901475598	Merge 'config: mark "experimental" option unused and cleanups' from Kefu Chai in this series, the "experimental" option is marked `Unused` as it has been marked deprecated for almost 2 years since scylla 4.6. and use `experimental_features` to specify the used experimental features explicitly. Closes #14948 * github.com:scylladb/scylladb: config: remove unused namespace alias config: use std::ranges when appropriate config: drop "experimental" option test: disable 'enable_user_defined_functions' if experimental_features does not include udf test: pylib: specify experimental_features explicitly	2023-08-17 20:42:02 +03:00
Kefu Chai	6788903fd6	db: config: mark config class final in `34c3688017`, we added a virtual function to `config_file`, and we new and delete pointer pointing to a `db::config` instance with `unique_ptr<>`. this makes the compiler nervous, as deleting a pointer pointing to an instance of non-final class with virtual function could lead to leak, if this pointer actually points to a derived class of this non-final class. so, in order to silence the warning and to prevent potential problem in future, let's mark `db::config` final. the warning from Clang 16 looks like: ``` In file included from /home/kefu/dev/scylladb/test/lib/test_services.cc:10: In file included from /home/kefu/dev/scylladb/test/lib/test_services.hh:25: In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/memory:78: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: error: delete called on non-final 'db::config' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<db::config>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/test/lib/test_services.cc:189:16: note: in instantiation of member function 'std::unique_ptr<db::config>::~unique_ptr' requested here auto cfg = std::make_unique<db::config>(); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15071	2023-08-17 13:43:16 +03:00
Raphael S. Carvalho	b578d6643f	Kill scylla option to configure number of compaction groups The option was introduced to bootstrap the project. It's still useful for testing, but that translates into maintaining an additional option and code that will not be really used outside of testing. A possible option is to later map the option in boost tests to initial_tablets, which may yield the same effect for testing. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-16 18:23:53 -03:00
Piotr Smaroń	34c3688017	db: config: add live_updatable_config_params_changeable_via_cql option If `live_updatable_config_params_changeable_via_cql` is set to true, configuration parameters defined with `liveness::LiveUpdate` option can be updated in the runtime with CQL, i.e. by updating `system.config` virtual table. If we don't want any configuration parameter to be changed in the runtime by updating `system.config` virtual table, this option should be set to false. This option should be set to false for e.g. cloud users, who can only perform CQL queries, and should not be able to change scylla's configuration on the fly. Current implemenatation is generic, but has a small drawback - messages returned to the user can be not fully accurate, consider: ``` cqlsh> UPDATE system.config SET value='2' WHERE name='task_ttl_in_seconds'; WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="option is not live-updateable" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} ``` where `task_ttl_in_seconds` has been defined with `liveness::LiveUpdate`, but because `live_updatable_config_params_changeable_via_cql` is set to `false` in `scylla.yaml,` `task_ttl_in_seconds` cannot be modified in the runtime by updating `system.config` virtual table. Fixes #14355 Closes #14382	2023-08-16 17:56:27 +03:00
Benny Halevy	8fbcf1ab9f	view: start: ignore also abort_requested_exception We see the abort_requested_exception error from time to time, instead of sleep_aborted that was expected and quietly ignored (in debug log level). Treat abort_requested_exception the same way since the error is expected on shutdown and to reduce test flakiness, as seen for example, in https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3033/artifact/logs-full.release.010/1691896356104_repair_additional_test.py%3A%3ATestRepairAdditional%3A%3Atest_repair_schema/node2.log ``` INFO 2023-08-13 03:12:29,151 [shard 0] compaction_manager - Asked to stop WARN 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={}: seastar::sleep_aborted (Sleep is aborted) INFO 2023-08-13 03:12:29,152 [shard 0] gossip - failure_detector_loop: Finished main loop WARN 2023-08-13 03:12:29,152 [shard 0] cdc - Aborted update CDC description table with generation (2023/08/13 03:12:17, d74aad4b-6d30-4f22-947b-282a6e7c9892) INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Asked to stop INFO 2023-08-13 03:12:29,152 [shard 1] compaction_manager - Stopped INFO 2023-08-13 03:12:29,153 [shard 0] init - Signal received; shutting down INFO 2023-08-13 03:12:29,153 [shard 0] init - Shutting down view builder ops INFO 2023-08-13 03:12:29,153 [shard 0] view - Draining view builder INFO 2023-08-13 03:12:29,153 [shard 1] view - Draining view builder INFO 2023-08-13 03:12:29,153 [shard 0] compaction_manager - Stopped ERROR 2023-08-13 03:12:29,153 [shard 0] view - start failed: seastar::abort_requested_exception (abort requested) ERROR 2023-08-13 03:12:29,153 [shard 1] view - start failed: seastar::abort_requested_exception (abort requested) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15029	2023-08-13 18:39:09 +03:00

1 2 3 4 5 ...

3282 Commits