scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	83d491af02	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18898	2024-05-30 18:03:51 +03:00
Kefu Chai	35e1fcde1f	mutation,db: drop operator<< for mutation and seed_provider_type& since we've migrated away from the generic homebrew formatters for range-alike containers, there is no need to keep there operator<< around -- they were preserved in order to work with the container formatters which expect operator<< of the elements. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	617e532859	db: config: drop operator<<() for error_injection_at_startup it is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18701	2024-05-16 15:10:57 +03:00
Asias He	952dfc6157	repair: Introduce repair_partition_count_estimation_ratio config option In commit `642f9a1966` (repair: Improve estimated_partitions to reduce memory usage), a 10% hard coded estimation ratio is used. This patch introduces a new config option to specify the estimation ratio of partitions written by repair out of the total partitions. It is set to 0.1 by default. Fixes #18615 Closes scylladb/scylladb#18634	2024-05-13 15:16:55 +03:00
Kamil Braun	03818c4aa9	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. Closes scylladb/scylladb#18443	2024-05-07 23:40:23 +02:00
Kamil Braun	d8313dda43	Merge 'db: config: move consistent-topology-changes out of experimental and make it the default for new clusters' from Patryk Jędrzejczak We move consistent cluster management out of experimental and make it the default for new clusters in 6.0. In code, we make the `consistent-topology-changes` flag unused and assumed to be true. In 6.0, the topology upgrade procedure will be manual and voluntary, so some clusters will still be using the gossip-based topology even though they support the raft-based topology. Therefore, we need to continue testing the gossip-based topology. This is possible by using the `force-gossip-topology-changes` flag introduced in scylladb/scylladb#18284. Ref scylladb/scylladb#17802 Closes scylladb/scylladb#18285 * github.com:scylladb/scylladb: docs: raft.rst: update after removing consistent-topology-changes treewide: fix indentation after the previous patch db: config: make consistent-topology-changes unused test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls test: test_read_required_hosts: run with force-gossip-topology-changes storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes storage_service: join_token_ring: fix finish_setup_after_join calls	2024-04-26 14:45:29 +02:00
Patryk Jędrzejczak	3a34bb18cd	db: config: make consistent-topology-changes unused We make the `consistent-topology-changes` experimental feature unused and assumed to be true in 6.0. We remove code branches that executed if `consistent-topology-changes` was disabled.	2024-04-25 14:33:21 +02:00
Aleksandra Martyniuk	58f72f9019	cql3: statements: change default tombstone_gc mode for tablets Currently, if tombstone_gc mode isn't specified for a table, then "timeout" is used by default. With tablets, running "nodetool repair -pr" may miss a tablet if it migrated across the nodes. Then, if we expire tombstones for ranges that weren't repaired, we may get data resurrection. Set default tombstone_gc mode value for DDLs that don't specify it. It's set to "repair" for tables which use tablets unless they use local replication strategy or rf = 1. Otherwise it's set to "timeout".	2024-04-24 10:42:10 +02:00
Patryk Jędrzejczak	14911051ee	db: config: introduce force-gossip-topology-changes We are going to make the `consistent-topology-changes` experimental feature unused in 6.0. However, the topology upgrade procedure will be manual and voluntary, so some 6.0 clusters will be using the gossip-based topology. Therefore, we need to continue testing the gossip-based topology. The solution is introducing a new flag, `force-gossip-topology-changes`, that will enforce the gossip-based topology in a fresh cluster. In this patch, we only introduce the parameter without any effect. Here is the explanation. Making `consistent-topology-changes` unused and introducing `force-gossip-topology-changes` requires adjustments in scylla-dtest. We want to merge changes to scylladb and scylla-dtest in a way that ensures all tests are run correctly during the whole process. If we merged all changes to scylladb first, before merging the scylla-dtest changes, all tests would run with the raft-based topology and the ones excluded in the raft-based topology would fail. We also can't merge all changes to scylla-dtest first. However, we can follow this plan: 1. scylladb: merge this patch 2. scylla-dtest: start using `force-gossip-topology-changes` in jobs that run without the raft-based topology 3. scylladb: merge the rest of the changes 4. scylla-dtest: merge the rest of the changes Ref scylladb/scylladb#17802 Closes scylladb/scylladb#18284	2024-04-23 09:42:46 +02:00
Lakshmi Narayanan Sreethar	e8026197d2	db/config: add a new variable to limit memory used by table components A new configuration variable, components_memory_reclaim_threshold, has been added to configure the maximum allowed percentage of available memory for all SSTable components in a shard. If the total memory usage exceeds this threshold, it will be reclaimed from the components to bring it back under the limit. Currently, only the memory used by the bloom filters will be restricted. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Petr Gusev	49a4220fea	error_injection: pass injection parameters at startup Injection parameters can be used in the lambda passed to inject_with_handler method to take some values from the test. However, there was no way to set values to these parameters on node startup, only through the error injection REST api. Therefore, we couldn't rely on this when inject_with_handler is used during node startup, it could trigger before we call the api from the test. In this commit with solve this problem by allowing these parameters to be assigned through scylla.yaml config. The defer.hh header was added to error_injection.hh to fix compilation after adding error_injection.hh to config.hh, defer function is used in error_injection.hh.	2024-03-19 20:17:02 +04:00
Botond Dénes	5e37c1465f	db/config: introduce query_page_size_in_bytes Regulates the page size in bytes via config, instead of the currently used hard-coded constant. Allows tests to configure lower limits so they can work with smaller data-sets when testing paging related functionality. Not wired yet.	2024-02-27 02:14:45 -05:00
Tomasz Grabiec	ef9e5e64a3	locator: token_metadata: Introduce topology barrier stall detector When topology barrier is blocked for longer than configured threshold (2s), stale versions are marked as stalled and when they get released they report backtrace to the logs. This should help to identify what was holding for token metadata pointer for too long. Example log: token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ... Closes scylladb/scylladb#17427	2024-02-21 15:05:34 +02:00
Avi Kivity	93af3dd69b	Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect. Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket. Fixes https://github.com/scylladb/scylladb/issues/16487. Closes scylladb/scylladb#17113 * github.com:scylladb/scylladb: maintenance_socket: add option to set owning group transport/controller: get rid of magic number for socket path's maximal length transport/controller: set unix_domain_socket_permissions for maintenance_socket transport/controller: pass unix_domain_socket_permissions to generic_server::listen transport/controller: split configuring sockets into separate functions	2024-02-20 15:09:54 +02:00
Mikołaj Grzebieluch	182cfebe40	maintenance_socket: add option to set owning group Option `maintenance-socket-group` sets the owning group of the maintenance socket. If not set, the group will be the same as the user running the scylla node.	2024-02-19 10:21:00 +01:00
Kefu Chai	3dfb0f86f1	db: add formatter for error_injection_at_startup before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `error_injection_at_startup`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17211	2024-02-08 19:40:48 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Patryk Wrobel	dc8d5ffaf6	db::config: keep dir paths unchanged This change is intended to ensure, that db::config fields related to directories are not changed. To achieve that a member function called setup_directories() is removed. The responsibility for directories paths has been moved to utils::directories, which may generate default paths if the configuration does not provide a specific value. Fixes: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Raphael S. Carvalho	638e6e30cb	db: Make target tablet size a live-updateable config option Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Mikołaj Grzebieluch	8b2f0e38d9	service/maintenance_mode: move maintenance_socket_enabled definition to seperate file	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	e6a83b9819	db/config: add maintenance mode flag	2024-01-25 15:27:53 +01:00
Nadav Har'El	df6c9828ef	Merge 'Add protobuf and Native histogram support' from Amnon Heiman Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. Native histograms hold the benefits of high resolution at a lower resource cost. This series allows sending histograms in a native histogram format over protobuf. By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf. Fixes #12931 Closes scylladb/scylladb#16737 * github.com:scylladb/scylladb: main.cc: Add prometheus_allow_protobuf command line histogram_metrics_helper: support native histogram config: Add prometheus_allow_protobuf flag	2024-01-24 21:24:50 +02:00
Amnon Heiman	fc9bd2de03	config: Add prometheus_allow_protobuf flag Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. The prometheus_allow_protobuf flag allows the user to enable protobuf protocol. When this flag is set to true, and the Prometheus server sends in the request that it accepts protobuf, the result will be in protobuf protocol. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:07 +02:00
Pavel Emelyanov	d1d4620af8	config: Add --tablets-initial-scale-factor Previous patch taught tablets allocator to multiply the initial tablets count by some value. This patch makes this factor configurable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:18:18 +03:00
Michał Jadwiszczak	f6a464ad81	configure service levels interval So far the service levels interval, responsible for updating SL configuration, was hardcoded in main. Now it's extracted to `service_levels_interval_ms` option.	2024-01-12 10:28:24 +01:00
Nadav Har'El	7c5092cb8f	test: add missing "tags" schema extension to cql_test_env One of the unfortunate anti-features of cql_test_env (the framework used in our CQL tests that are written in C++) is that it needs to repeat various bizarre initializations steps done in main.cc, otherwise various requests work incorrectly. One of these steps that main.cc is to initialize various "schema extensions" which some of the Scylla features need to work correctly. We remembered to initialize some schema extensions in cql_test_env, but forgot others. The one I will need in the following patch is the "tags" extension, which we need to mark materialized views used by local secondary indexes as "synchronous_updates" - without this patch the LSI tests in secondary_index_test.cc will crash. In addition to adding the missing extension, this patch also replaces the segmentation-fault crash when it's missing (caused by a dynamic cast failure) by a clearer on_internal_error() - so if we ever have this bug again, it will be easier to debug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Mikołaj Grzebieluch	cf43787295	db/config: add maintenance_socket_enabled bool class	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	e682e362a3	db/config: add maintenance_socket variable If set to "ignore", maintenance socket will be disabled. If set to "workdir", maintenance socket will be opened on <scylla's workdir>/cql.m. Otherwise it will be opened on path provided by maintenance_socket variable. It is set by default to 'ignore'.	2023-12-18 11:42:05 +01:00
Benny Halevy	66ba983fe0	compaction_manager: flush_all_tables before major compaction Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Pavel Emelyanov	210b01a5ce	config: Make object storage config updateable_value_source Now its plain updateable_value, but without the ..._source object the updateable_value is just a no-op value holder. In order for the observers to operate there must be the value source, updating it would update the attached updateable values _and_ notify the observers. In order for the config to be the u.v._source, config entries should be comparable to each other, thus the <=> operator for it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Piotr Smaroń	8c464b2ddb	guardrails: restrict replication strategy (RS) Replacing `restrict_replication_simplestrategy` config option with 2 config options: `replication_strategy_{warn,fail}_list`, which allow us to impose soft limits (issue a warning) and hard limits (not execute CQL) on replication strategy when creating/altering a keyspace. The reason to rather replace than extend `restrict_replication_simplestrategy` config option is that it was not used and we wanted to generalize it. Only soft guardrail is enabled by default and it is set to SimpleStrategy, which means that we'll generate a CQL warning whenever replication strategy is set to SimpleStrategy. For new cloud deployments we'll move SimpleStrategy from warn to the fail list. Guardrails violations will be tracked by metrics. Resolves #5224 Refs #8892 (the replication strategy part, not the RF part) Closes scylladb/scylladb#15399	2023-10-31 18:34:41 +03:00
Petr Gusev	b90011294d	config.cc: drop db::config::host_id In this refactoring commit we remove the db::config::host_id field, as it's hacky and duplicates token_metadata::get_my_id. Some tests want specific host_id, we add it to cql_test_config and use in cql_test_env. We can't pass host_id to sstables_manager by value since it's initialized in database constructor and host_id is not loaded yet. We also prefer not to make a dependency on shared_token_metadata since in this case we would have to create artificial shared_token_metadata in many tools and tests where sstables_manager is used. So we pass a function that returns host_id to sstables_manager constructor.	2023-09-13 23:00:15 +04:00
Kefu Chai	f6cca741ea	config: remove "experimental" option "experimental" option was marked "Unused" in `64bc8d2f7d`. but we chose to keep it in hope that the upgrade test does not fail. despite that the upgrade tests per-se survived the "upgrade", after the upgrade, the tests exercising the experimental features are still failing hard. they have not been updated to set the "experimental-features" option, and are still relying on "experimental" to enable all the experimental features under test. so, in this change, let's just drop the option so that scylla can fail early at seeing this "experimental" option. this should help us to identify the tests relying on it quicker. as the "experimental" features should only be used in development environment, this change should have no impact to production. Refs #15214 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15233	2023-09-05 10:09:04 +03:00
Piotr Smaroń	eb46f1bd17	guardrails: restrict replication factor (RF) Replacing `minimum_keyspace_rf` config option with 4 config options: `{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which allow us to impose soft limits (issue a warning) and hard limits (not execute CQL) on RF when creating/altering a keyspace. The reason to rather replace than extend `minimum_keyspace_rf` config option is to be aligned with Cassandra, which did the same, and has the same parameters' names. Only min soft limit is enabled by default and it is set to 3, which means that we'll generate a CQL warning whenever RF is set to either 1 or 2. RF's value of 0 is always allowed and means that there will not be any replicas on a given DC. This was agreed with PM. Because we don't allow to change guardrails' values when scylla is running (per PM), there're no tests provided with this PR, and dtests will be provided separately. Exceeding guardrails' thresholds will be tracked by metrics. Resolves #8619 Refs #8892 (the RF part, not the replication-strategy part) Closes #14262	2023-09-04 19:22:17 +03:00
Avi Kivity	78fc3b5f56	config: rename stream_plan_ranges_percentage to *_fraction The value is specified as a fraction between 0 and 1, so don't mislead users into specifying a value between 0 and 100. Closes #15261	2023-09-03 23:24:29 +03:00
Michał Chojnowski	50b429f255	config: add index_cache_fraction Adds a configurable upper limit to memory usage by index caches. See the source code comments added in this patch for more details. This patch shouldn't change visible behaviour, because the limit is set to 1.0 by default, so it is never triggerred. We will change the default in a future patch.	2023-09-01 22:34:23 +02:00
Patryk Jędrzejczak	0beabdc6ba	utils: introduce split_comma_separated_list Three places handle comma-separated lists similarly: - ss::remove_node.set(...) in api::set_storage_service, - storage_service::parse_node_list, - storage_service::is_repair_based_node_ops_enabled. In the next commit, the fourth place that needs the same logic appears -- storage_service::raft_replace. It needs to load and parse the --ignore-dead-nodes-for-replace param from config. Moreover, the code in is_repair_based_node_ops_enabled is different and doesn't seem right. We swap '\"' and '\'' with ' ' but don't do anything with it afterward. To avoid code duplication and fix is_repair_based_node_ops_enabled, we introduce the new function utils::split_comma_separated_list. This change has a small side effect on logging. For example, ignore_nodes_strs in storage_service::parse_node_list might be printed in a slightly different form.	2023-08-22 10:30:36 +02:00
Tomasz Grabiec	bd8bb5d4b1	Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho Compaction group is the data plane for tablets, so this integration allows each tablet to have its own storage (memtable + sstables). A crucial step for dynamic tablets, where each tablet can be worked on independently. There are still some inefficiencies to be worked on, but as it is, it already unlocks further development. ``` INFO 2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata INFO 2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf ``` Closes #14863 * github.com:scylladb/scylladb: Kill scylla option to configure number of compaction groups replica: Wire tablet into compaction group token_metadata: Add this_host_id to topology config replica: Switch to chunked_vector for storing compaction groups replica: Generate group_id for compaction_group on demand	2023-08-18 15:17:17 +02:00
Avi Kivity	1901475598	Merge 'config: mark "experimental" option unused and cleanups' from Kefu Chai in this series, the "experimental" option is marked `Unused` as it has been marked deprecated for almost 2 years since scylla 4.6. and use `experimental_features` to specify the used experimental features explicitly. Closes #14948 * github.com:scylladb/scylladb: config: remove unused namespace alias config: use std::ranges when appropriate config: drop "experimental" option test: disable 'enable_user_defined_functions' if experimental_features does not include udf test: pylib: specify experimental_features explicitly	2023-08-17 20:42:02 +03:00
Kefu Chai	6788903fd6	db: config: mark config class final in `34c3688017`, we added a virtual function to `config_file`, and we new and delete pointer pointing to a `db::config` instance with `unique_ptr<>`. this makes the compiler nervous, as deleting a pointer pointing to an instance of non-final class with virtual function could lead to leak, if this pointer actually points to a derived class of this non-final class. so, in order to silence the warning and to prevent potential problem in future, let's mark `db::config` final. the warning from Clang 16 looks like: ``` In file included from /home/kefu/dev/scylladb/test/lib/test_services.cc:10: In file included from /home/kefu/dev/scylladb/test/lib/test_services.hh:25: In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/memory:78: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:99:2: error: delete called on non-final 'db::config' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/unique_ptr.h:404:4: note: in instantiation of member function 'std::default_delete<db::config>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/test/lib/test_services.cc:189:16: note: in instantiation of member function 'std::unique_ptr<db::config>::~unique_ptr' requested here auto cfg = std::make_unique<db::config>(); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15071	2023-08-17 13:43:16 +03:00
Raphael S. Carvalho	b578d6643f	Kill scylla option to configure number of compaction groups The option was introduced to bootstrap the project. It's still useful for testing, but that translates into maintaining an additional option and code that will not be really used outside of testing. A possible option is to later map the option in boost tests to initial_tablets, which may yield the same effect for testing. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-16 18:23:53 -03:00
Piotr Smaroń	34c3688017	db: config: add live_updatable_config_params_changeable_via_cql option If `live_updatable_config_params_changeable_via_cql` is set to true, configuration parameters defined with `liveness::LiveUpdate` option can be updated in the runtime with CQL, i.e. by updating `system.config` virtual table. If we don't want any configuration parameter to be changed in the runtime by updating `system.config` virtual table, this option should be set to false. This option should be set to false for e.g. cloud users, who can only perform CQL queries, and should not be able to change scylla's configuration on the fly. Current implemenatation is generic, but has a small drawback - messages returned to the user can be not fully accurate, consider: ``` cqlsh> UPDATE system.config SET value='2' WHERE name='task_ttl_in_seconds'; WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="option is not live-updateable" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} ``` where `task_ttl_in_seconds` has been defined with `liveness::LiveUpdate`, but because `live_updatable_config_params_changeable_via_cql` is set to `false` in `scylla.yaml,` `task_ttl_in_seconds` cannot be modified in the runtime by updating `system.config` virtual table. Fixes #14355 Closes #14382	2023-08-16 17:56:27 +03:00
Kefu Chai	64bc8d2f7d	config: drop "experimental" option "experimental" was marked deprecated in `8b917f7c`. this change was included since Scylla 4.6. now that 5.3 has been branched, this change will be included 5.4. this should be long enough for the user's turn around if this option is ever used. the dtests using this option has been audited and updated accordingly. and the unit testing this option is removed as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 10:17:34 +08:00
Amnon Heiman	d10a3dd19a	config: add enable_node_table_metrics flag By default, per-table-per-shard metrics reporting is turned off, and the aggregated version of the metrics (per-table-per-node) will be turned on. There could be a situation where a user with an excessive number of tables would suffer from performance issues, both from the network and the metrics collection server. This patch adds a config option, enable_node_table_metrics, which allows users to turn off per-table metrics reporting altogether. For example, when running Scylla with the command line argument '--enable-node-aggregated-table_metrics 0' per-table metrics will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2023-08-02 10:20:18 +03:00
Tomasz Grabiec	4e9d95d78c	Merge 'Compact data before streaming' from Botond Dénes Currently, streaming and repair processes and sends data as-is. This is wasteful: streaming might be sending data which is expired or covered by tombstones, taking up valuable bandwidth and processing time. Repair additionally could be exposed to artificial differences, due to different nodes being in different states of compactness. This PR adds opt-in compaction to `make_streaming_reader()`, then opts in all users. The main difference being in how these choose the current compaction time to use: * Load'n'stream and streaming uses the current time on the local node. * Repair uses a centrally chosen compaction time, generated on the repair master and propagated to al repair followers. This is to ensure all repair participants work with the exact state of compactness. Importantly, this compaction does not purge tombstones (tombstone GC is disabled completely). Fixes: https://github.com/scylladb/scylladb/issues/3561 Closes #14756 * github.com:scylladb/scylladb: replica: make_[multishard_]streaming_reader(): make compaction_time mandatory repair/row_level: opt in to compacting the stream streaming: opt-in to compacting the stream sstables_loader: opt-in for compacting the stream replica/table: add optional compacting to make_multishard_streaming_reader() replica/table: add optional compacting to make_streaming_reader() db/config: add config item for enabling compaction for streaming and repair repair: log the error which caused the repair to fail readers: compacting_reader: use compact_mutation_state::abandon_current_partition() mutation/mutation_compactor: allow user to abandon current partition	2023-07-28 16:42:13 +02:00
Avi Kivity	cf81eef370	Merge 'schema_mutations, migration_manager: Ignore empty partitions in per-table digest' from Tomasz Grabiec Schema digest is calculated by querying for mutations of all schema tables, then compacting them so that all tombstones in them are dropped. However, even if the mutation becomes empty after compaction, we still feed its partition key. If the same mutations were compacted prior to the query, because the tombstones expire, we won't get any mutation at all and won't feed the partition key. So schema digest will change once an empty partition of some schema table is compacted away. Tombstones expire 7 days after schema change which introduces them. If one of the nodes is restarted after that, it will compute a different table schema digest on boot. This may cause performance problems. When sending a request from coordinator to replica, the replica needs schema_ptr of exact schema version request by the coordinator. If it doesn't know that version, it will request it from the coordinator and perform a full schema merge. This adds latency to every such request. Schema versions which are not referenced are currently kept in cache for only 1 second, so if request flow has low-enough rate, this situation results in perpetual schema pulls. After `ae8d2a550d` (5.2.0), it is more liekly to run into this situation, because table creation generates tombstones for all schema tables relevant to the table, even the ones which will be otherwise empty for the new table (e.g. computed_columns). This change inroduces a cluster feature which when enabled will change digest calculation to be insensitive to expiry by ignoring empty partitions in digest calculation. When the feature is enabled, schema_ptrs are reloaded so that the window of discrepancy during transition is short and no rolling restart is required. A similar problem was fixed for per-node digest calculation in c2ba94dc39e4add9db213751295fb17b95e6b962. Per-table digest calculation was not fixed at that time because we didn't persist enabled features and they were not enabled early-enough on boot for us to depend on them in digest calculation. Now they are enabled before non-system tables are loaded so digest calculation can rely on cluster features. Fixes #4485. Manually tested using ccm on cluster upgrade scenarios and node restarts. Closes #14441 * github.com:scylladb/scylladb: test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled schema_mutations, migration_manager: Ignore empty partitions in per-table digest migration_manager, schema_tables: Implement migration_manager::reload_schema() schema_tables: Avoid crashing when table selector has only one kind of tables	2023-07-28 00:01:33 +03:00
Botond Dénes	9e3987fc96	db/config: add config item for enabling compaction for streaming and repair Compacting can greatly reduce the amount of data to be processed by streaming and repair, but with certain data shapes, its effectiveness can be reduced and its CPU overhead might outweight the benefits. This should very rarely be the case, but leave an off switch in case this becomes a problem in a deployment. Not wired yet.	2023-07-27 03:22:11 -04:00
Patryk Jędrzejczak	5b167a4ad7	config: add schema_commitlog_segment_size_in_mb variable In #14668, we have decided to introduce a new scylla.yaml variable for the schema commitlog segment size. The segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. Therefore, increasing the schema commitlog segment size is sometimes necessary.	2023-07-19 14:16:41 +02:00
Asias He	dad5caf141	streaming: Add stream_plan_ranges_percentage This option allows user to change the number of ranges to stream in batch per stream plan. Currently, each stream plan streams 10% of the total ranges. With more ranges per stream plan, it reduces the waiting time between two stream plans. For example, stream_plan1: shard0 (t0), shard1 (t1) stream_plan2: shard0 (t2), shard1 (t3) We start stream_plan2 after all shards finish streaming in stream_plan1. If shard0 and shard1 in stream_plan1 finishes at different time. One of the shards will be idle. If we stream more ranges in a single stream plan, the waiting time will be reduced. Previously, we retry the stream plan if one of the stream plans is failed. That's one of the reasons we want more stream plans. With RBNO and `1f8b529e08` (range_streamer: Disable restream logic), the restream factor is not important anymore. Also, more ranges in a single stream plan will create bigger but fewer sstables on the receiver side. The default value is the same as before: 10% percentage of total ranges. Fixes #14191 Closes #14402	2023-07-14 09:03:01 +03:00

1 2 3 4 5 ...

334 Commits