scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 22:13:19 +00:00

Author	SHA1	Message	Date
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Kefu Chai	7a8e8c2ced	db: add formatter for db::write_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::write_type`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17093	2024-02-01 10:22:45 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Piotr Smaroń	35ba037724	config: fix a typo in --role-manager's description Closes scylladb/scylladb#17063	2024-01-30 16:13:33 +02:00
Pavel Emelyanov	370fbd346c	Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel `db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact. This PR: - extends the public interface of utils::directories class to provide required directory paths to the users - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs Fixes: scylladb#5626 Closes scylladb/scylladb#16787 * github.com:scylladb/scylladb: utils/directories: make utils::directories::set an internal type db::config: keep dir paths unchanged cql_transport/controler: use utils::directories to get paths of dirs service/storage_proxy: use utils::directories to get paths of dirs api/storage_service.cc: use utils::directories to get paths of dirs tools/scylla-sstable.cc: use utils::directories to get paths db/commitlog: do not use db::config to get dirs Use utils::directories to get dirs paths in replica::database Allow utils::directories to provide paths to dirs Clean-up of utils::directories	2024-01-29 18:01:15 +03:00
Kamil Braun	0912d2a2c6	Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this PR is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Fixes scylladb/scylladb#16944 Closes scylladb/scylladb#17009 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: document the left_token_ring state topology_coordinator: adjust reason string in left_token_ring handler raft topology: make left_token_ring a transition state topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes	2024-01-29 15:29:01 +01:00
Kefu Chai	43094d2023	db: add formatter for db::read_repair_decision before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::read_repair_decision`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17033	2024-01-29 15:43:51 +02:00
Patryk Wrobel	781a6a5071	utils/directories: make utils::directories::set an internal type Previously, utils::directories::set could have been used by clients of utils::directories class to provide dirs for creation. Due to moving the responsibility for providing paths of dirs from db::config to utils::directories, such usage is no longer the case. This change: - defines utils::directories::set in utils/directories.cc to disallow its usage by the clients of utils::directories - makes utils::directories::create_and_verify() member function private; now it is used only by the internals of the class - introduces a new member function to utils::directories called create_and_verify_sharded_directory() to limit the functionality provided to clients Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	dc8d5ffaf6	db::config: keep dir paths unchanged This change is intended to ensure, that db::config fields related to directories are not changed. To achieve that a member function called setup_directories() is removed. The responsibility for directories paths has been moved to utils::directories, which may generate default paths if the configuration does not provide a specific value. Fixes: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	804afffb11	db/commitlog: do not use db::config to get dirs This change removes usage of db::config to get path of commitlog_directory. Instead, it introduces a new parameter to directly pass the path to db::commitlog::config::from_db_config(). Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Jędrzejczak	b0eef50b2e	raft topology: make left_token_ring a transition state A node can be in the `left_token_ring` state after: - a finished decommission, - a failed bootstrap, - a failed replace. When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this patch is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Regarding implementation, most of the changes are simple refactoring. The less obvious are: - Before this patch, in `system_keyspace::left_topology_state`, we had to keep the ignored nodes' IDs for replace to ensure that the replacing node will have access to it after moving to the `left_token_ring` state, which happens when replace fails. We don't need this workaround anymore. When we enter the new `left_token_ring` transition state, the new node will still be in the `decommissioning` state, so it won't lose its request param. - Before this patch, a decommissioning node lost its tokens while moving to the `left_token_ring` state. After the patch, it loses tokens while still being in the `decommissioning` state. We ensure that all `decommissioning` handlers correctly handle a node that lost its tokens. Moving the `left_token_ring` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There are only three changes: - adding `auto node = get_node_to_work_on(std::move(guard));`, - adding `builder.del_transition_state()`, - changing error logged when `global_token_metadata_barrier` fails.	2024-01-29 10:39:07 +01:00
Kefu Chai	8f38bd5376	commitlog: add formatter for db::replay_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::replay_position`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17014	2024-01-29 09:59:30 +02:00
Dawid Medrek	b92fb3537a	main: Postpone start-up of hint manager In this commit, we postpone the start-up of the hint manager until we obtain information about other nodes in the cluster. When we start the hint managers, one of the things that happen is creating endpoint managers -- structures managed by db::hints::manager. Whether we create an instance of endpoint manager depends on the value returned by host_filter::can_hint_for, which, in turn, may depend on the current state of locator::topology. If locator::topology is incomplete, some endpoint managers may not be started even though they should (because the target node IS part of the cluster and we SHOULD send hints to it if there are some). The situation like that can happen because we start the hint managers too early. This commit aims to solve that problem. We only start the hint managers when we've gathered information about the other nodes in the cluster and created the locator::topology using it. Hinted Handoff is not negatively affected by these changes since in between the previous point of starting the hint managers and the current one, all of the mutations performed by service::storage_proxy target the local node, so no hints would need to be generated anyway. Fixes scylladb/scylladb#11870 Closes scylladb/scylladb#16511	2024-01-26 12:49:40 +01:00
Kamil Braun	4f736894e1	Merge 'Add maintenance mode' from Mikołaj Grzebieluch In this mode, the node is not reachable from the outside, i.e. * it refuses all incoming RPC connections, * it does not join the cluster, thus * all group0 operations are disabled (e.g. schema changes), * all cluster-wide operations are disabled for this node (e.g. repair), * other nodes see this node as dead, * cannot read or write data from/to other nodes, * it does not open Alternator and Redis transport ports and the TCP CQL port. The only way to make CQL queries is to use the maintenance socket. The node serves only local data. To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file. REST API works as usual, but some routes are disabled: * authorization_cache * failure_detector * hinted_hand_off_manager This PR also updates the maintenance socket documentation: * add cqlsh usage to the documentation * update the documentation to use `WhiteListRoundRobinPolicy` Fixes #5489. Closes scylladb/scylladb#15346 * github.com:scylladb/scylladb: test.py: add test for maintenance mode test.py: generalize usage of cluster_con test.py: when connecting to node in maintenance mode use maintenance socket docs: add maintenance mode documentation main: add maintenance mode main: move some REST routes initialization before joining group0 message_service: add sanity check that rpc connections are not created in the maintenance mode raft_group0_client: disable group0 operations in the maintenance mode service/storage_service: add start_maintenance_mode() method storage_service: add MAINTENANCE option to mode enum service/maintenance_mode: add maintenance_mode_enabled bool class service/maintenance_mode: move maintenance_socket_enabled definition to seperate file db/config: add maintenance mode flag docs: add cqlsh usage to maintenance socket documentation docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy	2024-01-26 11:02:34 +01:00
Raphael S. Carvalho	638e6e30cb	db: Make target tablet size a live-updateable config option Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Avi Kivity	03313d359e	Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay. This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations. Fixes #16752 Closes scylladb/scylladb#16888 * github.com:scylladb/scylladb: test: test_tablets: add a test for cleanup after migration test: pylib: add ScyllaCluster.wipe_sstables test: boost: add commitlog_cleanup_test db: commitlog_replayer: ignore mutations affected by (tablet) cleanups replica: table: garbage-collect irrelevant system.commitlog_cleanups records db: commitlog: add min_position() replica: table: populate system.commitlog_cleanups on tablet cleanup db: system_keyspace: add system.commitlog_cleanups replica: table: refresh compound sstable set after tablet cleanup	2024-01-25 20:51:03 +02:00
Mikołaj Grzebieluch	8b2f0e38d9	service/maintenance_mode: move maintenance_socket_enabled definition to seperate file	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	e6a83b9819	db/config: add maintenance mode flag	2024-01-25 15:27:53 +01:00
Patryk Jędrzejczak	378cbd0b70	raft topology: ensure at most one transitioning node We add a sanity check to ensure at most one transitioning node at a time. If there is more, something must have gone wrong. In the future, we might implement concurrent topology operations. Then, we will remove this sanity check. We also extend the comment describing `transition_nodes` so that it better explains why we use a map and how it should be handled.	2024-01-25 13:42:46 +01:00
Kefu Chai	0fbfc96619	db: add formatter for schema_tables::table_kind before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for db::schema_tables::table_kind, and its operator<<() is still used by the homebrew generic formatter for std::map<>, so it is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16972	2024-01-25 11:33:13 +03:00
Nadav Har'El	df6c9828ef	Merge 'Add protobuf and Native histogram support' from Amnon Heiman Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. Native histograms hold the benefits of high resolution at a lower resource cost. This series allows sending histograms in a native histogram format over protobuf. By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf. Fixes #12931 Closes scylladb/scylladb#16737 * github.com:scylladb/scylladb: main.cc: Add prometheus_allow_protobuf command line histogram_metrics_helper: support native histogram config: Add prometheus_allow_protobuf flag	2024-01-24 21:24:50 +02:00
Kefu Chai	c978d1b3f8	config: s/re-use/reuse/ this misspelling is identified by codespell. per m-w, reuse is a word per-se, and we don't need the hyphen for addressing the ambiguity in the use cases, like, recover and re-cover. see also https://www.merriam-webster.com/dictionary/reuse Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16962	2024-01-24 15:19:03 +02:00
Michał Chojnowski	a246bb39ef	db: commitlog_replayer: ignore mutations affected by (tablet) cleanups To avoid data resurrection, mutations deleted by cleanup operations have to be skipped during commitlog replay. This patch implements this, based on the metadata recorded on cleanup operations into system.commitlog_cleanups.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	05ff32ebf9	db: commitlog: add min_position() Add a helper function which returns the minimum replay position across all existing or future commitlog segments. Only positions greater or equal to it can be replayed on the next reboot. We will use this helper in a future patch to garbage collect some cleanup metadata which refers to replay positions.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	7c5a8894be	db: system_keyspace: add system.commitlog_cleanups Add a system table which will hold records of cleanup operations, for the purpose of filtering commitlog replays to avoid data resurrections.	2024-01-24 10:37:38 +01:00
Botond Dénes	26d814d8be	Merge 'Configure initial tablets count scaling' from Pavel Emelyanov There are currently two options how to "request" the number of initial tables for a table 1. specify it explicitly when creating a keyspace 2. let scylla calculate it on its own Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly. Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient. Closes scylladb/scylladb#16919 * github.com:scylladb/scylladb: config: Add --tablets-initial-scale-factor tablet_allocator: Add initial tablets scale to config tablet_allocator: Add config	2024-01-23 13:25:12 +02:00
Amnon Heiman	fc9bd2de03	config: Add prometheus_allow_protobuf flag Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. The prometheus_allow_protobuf flag allows the user to enable protobuf protocol. When this flag is set to true, and the Prometheus server sends in the request that it accepts protobuf, the result will be in protobuf protocol. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:07 +02:00
Pavel Emelyanov	d1d4620af8	config: Add --tablets-initial-scale-factor Previous patch taught tablets allocator to multiply the initial tablets count by some value. This patch makes this factor configurable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:18:18 +03:00
David Garcia	f3eeba8cc6	docs: parse config.cc properties as rst text This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST). By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more. Closes scylladb/scylladb#16311	2024-01-22 16:40:18 +02:00
Kefu Chai	5c0484cb02	db: add formatter for db::operation_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for db::operation_type, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16832	2024-01-19 10:16:41 +02:00
Kefu Chai	0ae81446ef	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16766	2024-01-17 16:30:14 +02:00
Botond Dénes	f22fc88a64	Merge 'Configure service levels interval' from Michał Jadwiszczak Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests. This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest. Closes scylladb/scylladb#16394 * github.com:scylladb/scylladb: test:cql-pytest: change service levels intervals in tests configure service levels interval	2024-01-17 12:24:49 +02:00
Calle Wilund	af0772d605	commitlog: Add wait_for_pending_deletes Refs #16757 Allows waiting for all previous and pending segment deletes to finish. Useful if a caller of `discard_completed_segments` (i.e. a memtable flush target) not only wants to ensure segments are clean and released, but thoroughly deleted/recycled, and hence no treat to resurrecting data on crash+restart. Test included. Closes scylladb/scylladb#16801	2024-01-17 09:30:55 +02:00
Tomasz Grabiec	3d76aefb98	Merge "Enhance topology request status tracking" from Gleb Currently to figure out if a topology request is complete a submitter checks the topology state and tries to figure out from that the status of the request. This is not exact. Lets look at rebuild handling for instance. To figure out if request is completed the code waits for request object to disappear from the topology, but if another rebuild starts between the end of the previous one and the code noticing that it completed the code will continue waiting for the next rebuild. Another problem is that in case of operation failure there is no way to pass an error back to the initiator. This series solves those problems by assigning an id for each request and tracking the status of each request in a separate table. The initiator can query the request status from the table and see if the request was completed successfully or if it failed with an error, which is also evadable from the table. The schema for the table is: CREATE TABLE system.topology_requests ( id timeuuid PRIMARY KEY, initiating_host uuid, start_time timestamp, done boolean, error text, end_time timestamp, ); and all entries have TTL of one month.	2024-01-17 00:37:19 +01:00
Gleb Natapov	84197ff735	storage_service: topology coordinator: check topology operation completion using status in topology_requests table Instead of trying to guess if a request completed by looking into the topology state (which is sometimes can be error prone) look at the request status in the new topology_requests. If request failed report a reason for the failure from the table.	2024-01-16 17:02:54 +02:00
Avi Kivity	a9844ed69a	Merge 'view: revert cleanup filter that doesn't work with tablets' from Nadav Har'El The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass. The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place. Fixes #16598 Closes scylladb/scylladb#16670 * github.com:scylladb/scylladb: view: revert cleanup filter that doesn't work with tablets mv: sleep a bit before view-update-generator restart	2024-01-16 16:42:20 +02:00
Gleb Natapov	584551f849	topology coordinator: add request_id to the topology state machine Provide a unique ID for each topology request and store it the topology state machine. It will be used to index new topology requests table in order to retrieve request status.	2024-01-16 13:57:27 +02:00
Gleb Natapov	ecb8778950	system keyspace: introduce local table to store topology requests status The table has the following schema and will be managed by raft: CREATE TABLE system.topology_requests ( id timeuuid PRIMARY KEY, initiating_host uuid, start_time timestamp, done boolean, error text, end_time timestamp, ); In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.	2024-01-16 13:57:16 +02:00
Gleb Natapov	a4ac64a652	system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request Next patch will need ignore nodes list while processing removenode request. Load it.	2024-01-14 14:44:07 +02:00
Gleb Natapov	cc54796e23	raft topology: add cleanup state to the topology state machine The patch adds cleanup state to the persistent and in memory state and handles the loading. The state can be "clean" which means no cleanup needed, "needed" which means the node is dirty and needs to run cleanup at some point, "running" which means that cleanup is running by the node right now and when it will be completed the state will be reset to "clean".	2024-01-14 13:30:54 +02:00
Nadav Har'El	1bcaeb89c7	view: revert cleanup filter that doesn't work with tablets This patch reverts commit `10f8f13b90` from November 2022. That commit added to the "view update generator", the code which builds view updates for staging sstables, a filter that ignores ranges that do not belong to this node. However, 1. I believe this filter was never necessary, because the view update code already silently ignores base updates which do not belong to this replica (see get_view_natural_endpoint()). After all, the view update needs to know that this replica is the Nth owner of the base update to send its update to the Nth view replica, but if no such N exists, no view update is sent. 2. The code introduced for that filter used a per-keyspace replication map, which was ok for vnodes but no longer works for tablets, and causes the operation using it to fail. 3. The filter was used every time the "view update generator" was used, regardless of whether any cleanup is necessary or not, so every such operation would fail with tablets. So for example the dtest test_mvs_populating_from_existing_data fails with tablets: * This test has view building in parallel with automatic tablet movement. * Tablet movement is streaming. * When streaming happens before view building has finished, the streamed sstables get "view update generator" run on them. This causes the problematic code to be called. Before this patch, the dtest test_mvs_populating_from_existing_data fails when tablets are enabled. After this patch, it passes. Fixes #16598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:24:44 +02:00
Nadav Har'El	0fe40f729e	mv: sleep a bit before view-update-generator restart The "view update generator" is responsible for generating view updates for staging sstables (such as coming from repair). If the processing fails, the code retries - immediately. If there is some persistent bug, such as issue #16598, we will have a tight loop of error messages, potentially a gigabyte of identical messages every second. In this patch we simply add a sleep of one second after view update generation fails before retrying. We can still get many identical error messages if there is some bug, but not more than one per second. Refs #16598. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:13:52 +02:00
Michał Jadwiszczak	f6a464ad81	configure service levels interval So far the service levels interval, responsible for updating SL configuration, was hardcoded in main. Now it's extracted to `service_levels_interval_ms` option.	2024-01-12 10:28:24 +01:00
Kefu Chai	344ea25ed8	db: add fmt::format for db::consistency_level before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `db::consistency_level` * drop its `operator<<`, as it is not used anymore Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16755	2024-01-12 10:49:00 +02:00
Kefu Chai	54d49c04e0	db, sstable: bump up default sstable format to "md" before this change, we defaults to use "mc" sstable format, and switch to "md" if the cluster agrees on using it, and to "me" if the cluster agrees on using this. the cluster feature is used to get the consensus across the members in the cluster, if any of the existing nodes in the cluster has its `sstable_format` configured to, for instance, "mc", then the cluster is stuck with "mc". but we disabled "mc" sstable format back in `3d345609`, the first LTS release including that change was scylla v5.2.0. which means, the cluster of the last major version Scylla should be using "md" or "me". per our document on upgrade, see docs/upgrade/index.rst, > You should perform the upgrades consecutively - to each > successive X.Y version, without skipping any major or minor version. > > Before you upgrade to the next version, the whole cluster (each > node) must be upgraded to the previous version. we can assume that, a 6.x node will only join a cluster with 5.x or 6.x nodes. (joining a 7.x cluster should work, but this is not relevant to this change). in both cases, since 5.x and up scylla can only configured with "md" `sstable_format`, there is no need to switch from "mc" to "md" anymore. so we can ditch the code supporting it. Refs #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kefu Chai	be364d30fd	db: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16664	2024-01-09 11:44:19 +02:00
Kefu Chai	34259a03d0	treewide: use consteval string as format string when formatting log message seastar::logger is using the compile-time format checking by default if compiled using {fmt} 8.0 and up. and it requires the format string to be consteval string, otherwise we have to use `fmt::runtime()` explicitly. so adapt the change, let's use the consteval string when formatting logging messages. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16612	2024-01-02 19:08:47 +02:00
Benny Halevy	c520fc23f0	system_keyspace: update_peer_info: drop single-column overloads They are no longer used. Instead, all callers now pass peer_info. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00

1 2 3 4 5 ...

3586 Commits