scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Kamil Braun	bb22e06a9e	Merge 'abort failed rebuild instead of retrying it forever' from Gleb Add error handling to rebuild instead of retrying it until succeeds. * 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev: test: add test for rebuild failure test: add expected_error to rebuild_node operation topology_coordinator: Propagate rebuild failure to the initiator	2024-01-31 10:07:28 +01:00
Patryk Wrobel	1b6ab65c51	reader_concurrency_semaphore.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::stringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17064	2024-01-31 09:31:50 +02:00
Botond Dénes	f8d3070559	Merge 'Fix flakiness in test_raft_snapshot_request' from Kamil Braun Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added. Closes scylladb/scylladb#17071 * github.com:scylladb/scylladb: test_raft_snapshot_request: fix flakiness test: topology/util: update comment for `reconnect_driver`	2024-01-31 09:30:27 +02:00
Pavel Emelyanov	84ddc37130	utils: Coroutinize disk_sanity() It's pretty hairy in its future-promises form, with coroutines it's much easier to read Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17052	2024-01-31 09:20:21 +02:00
Kefu Chai	8a9f13c187	redis: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17057	2024-01-31 09:17:18 +02:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Kamil Braun	74bf60a8ca	test_raft_snapshot_request: fix flakiness Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added.	2024-01-30 16:21:24 +01:00
Kamil Braun	39339b9f70	test: topology/util: update comment for `reconnect_driver` The issues mentioned in the comment before are already fixed. Unfortunately, there is another, opposite issue which this function can be used for. The previous issue was about the existing driver session not reconnecting. The current issue is about the existing driver session reconnecting too much... (and in the middle of queries.)	2024-01-30 15:36:48 +01:00
Piotr Smaroń	35ba037724	config: fix a typo in --role-manager's description Closes scylladb/scylladb#17063	2024-01-30 16:13:33 +02:00
Kamil Braun	cf3f26dc94	test_maintenance_mode: fix flakiness Wait until CQL is available and nodes see each other before trying to perform a query. Closes scylladb/scylladb#17059	2024-01-30 14:11:14 +02:00
Gleb Natapov	8b50613465	test: add test for rebuild failure	2024-01-30 11:04:19 +02:00
Gleb Natapov	d62204e758	test: add expected_error to rebuild_node operation	2024-01-30 11:04:19 +02:00
Gleb Natapov	51c40034f5	topology_coordinator: Propagate rebuild failure to the initiator Do not retry rebuild endlessly, but report the error instead.	2024-01-30 11:04:19 +02:00
Kefu Chai	90c0e83f9a	thrift: remove unused namespace definition thrift_transport is never used, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17050	2024-01-30 09:16:47 +02:00
Michał Chojnowski	904bb25987	test: test_tablet_cleanup: wait for servers to see each other before multi-node queries Waiting for CQL connections is not enough. For the queries to succeed, nodes must see each other. We have to wait for this, otherwise the test will be flaky. Fixes #17029 Closes scylladb/scylladb#17040	2024-01-30 08:56:01 +02:00
Tomasz Grabiec	36f218c83b	Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.) Refs: #16739 Fixes: #16843 Closes scylladb/scylladb#17008 * github.com:scylladb/scylladb: test/topolgy_experimental_raft: test_tablets.py: add test for resharding test/pylib: manager[_client]: add update_cmdline() main: refuse startup when tablet resharding is required locator: tablets: add check_tablet_replica_shards()	2024-01-29 23:39:41 +01:00
Pavel Emelyanov	370fbd346c	Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel `db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact. This PR: - extends the public interface of utils::directories class to provide required directory paths to the users - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs Fixes: scylladb#5626 Closes scylladb/scylladb#16787 * github.com:scylladb/scylladb: utils/directories: make utils::directories::set an internal type db::config: keep dir paths unchanged cql_transport/controler: use utils::directories to get paths of dirs service/storage_proxy: use utils::directories to get paths of dirs api/storage_service.cc: use utils::directories to get paths of dirs tools/scylla-sstable.cc: use utils::directories to get paths db/commitlog: do not use db::config to get dirs Use utils::directories to get dirs paths in replica::database Allow utils::directories to provide paths to dirs Clean-up of utils::directories	2024-01-29 18:01:15 +03:00
Kamil Braun	0912d2a2c6	Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this PR is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Fixes scylladb/scylladb#16944 Closes scylladb/scylladb#17009 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: document the left_token_ring state topology_coordinator: adjust reason string in left_token_ring handler raft topology: make left_token_ring a transition state topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes	2024-01-29 15:29:01 +01:00
Kefu Chai	819fc95a67	reader: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17036	2024-01-29 16:21:42 +02:00
Kefu Chai	43094d2023	db: add formatter for db::read_repair_decision before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::read_repair_decision`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17033	2024-01-29 15:43:51 +02:00
Botond Dénes	d202d32f81	Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb/scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call	2024-01-29 15:06:04 +02:00
Beni Peled	8009170d3a	docs: update the installation instructions with the new gpg 2024 key Closes scylladb/scylladb#17019	2024-01-29 14:37:25 +02:00
Kefu Chai	6f55d68dd9	.git: add more skip words these words are either * shortened words: strategy => strat, read_from_primary => fro * or acronyms: node_or_data => nd before we rename them with better names, let's just add them to the ignore word list. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17002	2024-01-29 14:37:03 +02:00
Patryk Wrobel	781a6a5071	utils/directories: make utils::directories::set an internal type Previously, utils::directories::set could have been used by clients of utils::directories class to provide dirs for creation. Due to moving the responsibility for providing paths of dirs from db::config to utils::directories, such usage is no longer the case. This change: - defines utils::directories::set in utils/directories.cc to disallow its usage by the clients of utils::directories - makes utils::directories::create_and_verify() member function private; now it is used only by the internals of the class - introduces a new member function to utils::directories called create_and_verify_sharded_directory() to limit the functionality provided to clients Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	dc8d5ffaf6	db::config: keep dir paths unchanged This change is intended to ensure, that db::config fields related to directories are not changed. To achieve that a member function called setup_directories() is removed. The responsibility for directories paths has been moved to utils::directories, which may generate default paths if the configuration does not provide a specific value. Fixes: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	0f3b00f9ad	cql_transport/controler: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in cql_transport/controler. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:38 +01:00
Patryk Wrobel	f08768e767	service/storage_proxy: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in service/storage_proxy. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	5ac3d0f135	api/storage_service.cc: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories in api/storage_service.cc in order to get the paths of directories. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	51fa108df7	tools/scylla-sstable.cc: use utils::directories to get paths This change replaces usage of db::config with usage of utils::directories to get paths of directories in tools/scylla-sstable.cc. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	804afffb11	db/commitlog: do not use db::config to get dirs This change removes usage of db::config to get path of commitlog_directory. Instead, it introduces a new parameter to directly pass the path to db::commitlog::config::from_db_config(). Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	9483d149af	Use utils::directories to get dirs paths in replica::database This change replaces the usage of db::config with usage of utils::directories to get dirs paths in replica::database class. Moreover, it adjusts tests that require construction of replica::database - its constructor has been changed to accept utils::directories object. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1cd676e438	Allow utils::directories to provide paths to dirs This change extends utils::directories class in the following way: - adds new member variables that correspond to fields from db::config that describe paths of directories - introduces a public interface to retrieve the values of the new members - allows construction of utils::directories object based on db::config to setup internal member variables related to paths to dirs The new members of utils::directories are overriden when the provided values are empty. The way of setting paths is taken from db::config. To ensure that the new logic works correctly `utils_directories_test` has been created. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1b0ccaf4f2	Clean-up of utils::directories This change is intended to clean-up files in which utils::directories class is defined to ease further extensions. The preparation consists of: - removal of `using namespace` from directories.hh to avoid namespace pollution in files, that include this header - explicit inclusion of headers, that were missing or were implicitly included to ensure that directories.hh is self-sufficient - defining directories::set class outside of its parent to improve readability Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Botond Dénes	fd66ce1591	test/topolgy_experimental_raft: test_tablets.py: add test for resharding Check that scylla refuses to start when the shard count is reduced.	2024-01-29 07:04:33 -05:00
Botond Dénes	a7a5aada2a	test/pylib: manager[_client]: add update_cmdline() Similar to the existing update_config(). Updates the command-line arguments of the specified nodes, merging the new options into the existing ones. Needs a restart to take effect.	2024-01-29 07:04:33 -05:00
Botond Dénes	8a439fc2a8	main: refuse startup when tablet resharding is required We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)	2024-01-29 07:04:33 -05:00
Botond Dénes	95b6aeebae	locator: tablets: add check_tablet_replica_shards() Checks that all tablets with a replica on the this node, have a valid replica shard (< smp::count). Will be used to check whether the node can start-up with the current shard-count.	2024-01-29 07:04:33 -05:00
Patryk Jędrzejczak	7c10cae6c4	docs: dev: topology-over-raft: document the left_token_ring state In one of the previous patches, we changed the `left_token_ring` state from a node state to a transition state. We document it in this patch. The node state wasn't documented, so there is nothing to remove.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	9b2d1a20a3	topology_coordinator: adjust reason string in left_token_ring handler We were using the "finished decommission node" reason string for a failed bootstrap and replace.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	b0eef50b2e	raft topology: make left_token_ring a transition state A node can be in the `left_token_ring` state after: - a finished decommission, - a failed bootstrap, - a failed replace. When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this patch is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Regarding implementation, most of the changes are simple refactoring. The less obvious are: - Before this patch, in `system_keyspace::left_topology_state`, we had to keep the ignored nodes' IDs for replace to ensure that the replacing node will have access to it after moving to the `left_token_ring` state, which happens when replace fails. We don't need this workaround anymore. When we enter the new `left_token_ring` transition state, the new node will still be in the `decommissioning` state, so it won't lose its request param. - Before this patch, a decommissioning node lost its tokens while moving to the `left_token_ring` state. After the patch, it loses tokens while still being in the `decommissioning` state. We ensure that all `decommissioning` handlers correctly handle a node that lost its tokens. Moving the `left_token_ring` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There are only three changes: - adding `auto node = get_node_to_work_on(std::move(guard));`, - adding `builder.del_transition_state()`, - changing error logged when `global_token_metadata_barrier` fails.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	12eb0738cf	topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes The `exclude_nodes` variable was unused, but it wasn't a bug. The `left_token_ring` and `rollback_to_normal` handlers correctly compute excluded nodes on their own.	2024-01-29 10:39:06 +01:00
Kefu Chai	0cbf8f75f0	db: add formatter for dht::decorated_key and repair_sync_boundary before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for dht::decorated_key and repair_sync_boundary. please note, before this change, repair_sync_boundary was using the operator<< based formatter of `dht::decorated_key`, so we are updating both of them in a single commit. because we still use the homebrew generic formatter of vector<> in to format vector<repair_sync_boundary> and vector<dht::decorated_key>, so their operator<< are preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16994	2024-01-29 11:11:41 +02:00
Tzach Livyatan	06a9a925a5	Update link to sizing / pricing calc Closes scylladb/scylladb#17015	2024-01-29 11:07:20 +02:00
Kefu Chai	b5ff098f28	thrift: add formatter for cassandra::ConsistencyLevel::type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cassandra::ConsistencyLevel::type. please note, the operator<< for `cassandra::ConsistencyLevel::type` is generated using `thrift` command line tool, which does not emit specialization for fmt::formatter yet, so we need to use `fmt::ostream_formatter` to implement the formatter for this type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17013	2024-01-29 10:10:35 +02:00
Pavel Emelyanov	3abdb3c7ee	tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets It's now unused, string with initial tablets its parsed elsewhere Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17010	2024-01-29 10:03:38 +02:00
Kefu Chai	912c588975	thrift: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17012	2024-01-29 10:02:30 +02:00
Kefu Chai	abb12979f8	raft: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17011	2024-01-29 10:00:56 +02:00
Kefu Chai	8f38bd5376	commitlog: add formatter for db::replay_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::replay_position`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17014	2024-01-29 09:59:30 +02:00
Botond Dénes	d3c1be9107	Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El This series does a similar change to Alternator as was done recently to CQL: 1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to. 3. In particular, all test/alternator tests will also automatically run with tablets enabled 4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets. 5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later). Fixes #16355 Closes scylladb/scylladb#16900 * github.com:scylladb/scylladb: test/alternator: add "--vnodes" option to run script alternator: use tablets by default, if available test/alternator: run some tests without tablets	2024-01-29 09:22:13 +02:00
Kefu Chai	cb5453d534	.git: only allow codespell to run on master branch so that non-master branches are not read by 3rd-party tools unless they are audited. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16999	2024-01-29 09:04:20 +02:00

1 2 3 4 5 ...

40964 Commits