scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Botond Dénes	2a4b991772	Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs. The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function. Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw. Fixes #17035 Closes scylladb/scylladb#17073 * github.com:scylladb/scylladb: utils: replace assert() by on_internal_error() utils: add on_internal_error with common logger utils: add a timeuuid minimum, like we had maximum test/cql-pytest: tests for "date" type	2024-02-01 10:48:48 +02:00
Asias He	2888c3086c	utils: Add uuid_xor_to_uint32 helper Convert the uuid to a uint32_t using xor. It is useful to get a uint32_t number from the uuid. Refs: #16927 Closes scylladb/scylladb#17049	2024-02-01 10:27:55 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Botond Dénes	181f68f248	Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Also add a test. Fixes scylladb/scylladb#16683 Closes scylladb/scylladb#17072 * github.com:scylladb/scylladb: test: add test for fixing a broken group 0 snapshot raft_group0: trigger snapshot if existing snapshot index is 0	2024-01-31 13:04:59 +02:00
Nadav Har'El	827c20467c	utils: add a timeuuid minimum, like we had maximum Our time-handling code in UUID_gen.hh is very fragile for very large timestamps, because the different types - such as Cassandra "timestamp" and Timeuuid use very different resolution and ranges. In issue #17035 we discovered a situation where a certain CQL "timestamp"-type value could cause an assertion-failure and a crash in the create_time() function that creates a timeuuid - because that timestamp didn't fit the place we have in timeuuid. We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which we refuse timestamps, to avoid these assertions failure. However, we missed the possibility of negative timestamps (which are allowed in CQL), and indeed a negative timestamp (or a timestamp which was "wrapped" to a negative value) is what caused issue #17035. So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the most negative timestamp that we support to well below the area which causes problems, and adds tests that reproduce #17035 and that we didn't break anything else (e.g., negative timestamps are still allowed - just not extremely negative timestamps). Fixes #17035. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 11:32:26 +02:00
Kamil Braun	bb22e06a9e	Merge 'abort failed rebuild instead of retrying it forever' from Gleb Add error handling to rebuild instead of retrying it until succeeds. * 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev: test: add test for rebuild failure test: add expected_error to rebuild_node operation topology_coordinator: Propagate rebuild failure to the initiator	2024-01-31 10:07:28 +01:00
Nadav Har'El	47955642d9	test/cql-pytest: tests for "date" type This patch adds a few simple tests for the values of the "date" column type, and how it can be initialized from string or integers, and what do those values mean. Two of the tests reproduce issue #17066, where validation is missing for values that don't fit in a 32-bit unsigned integer. Refs #17066 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 10:58:02 +02:00
Kamil Braun	57d5aa5a68	test: add test for fixing a broken group 0 snapshot In a cluster with group 0 with snapshot at index 0 (such group 0 might be established in a 5.2 cluster, then preserved once it upgrades to 5.4 or later), no snapshot transfer will be triggered when a node is bootstrapped. This way to new node might not obtain full schema, or obtain incorrect schema, like in scylladb/scylladb#16683. Simulate this scenario in a test case using the RECOVERY mode and error injections. Check that the newly added logic for creating a new snapshot if such situation is detected helps in this case.	2024-01-30 16:44:01 +01:00
Kamil Braun	74bf60a8ca	test_raft_snapshot_request: fix flakiness Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added.	2024-01-30 16:21:24 +01:00
Kamil Braun	39339b9f70	test: topology/util: update comment for `reconnect_driver` The issues mentioned in the comment before are already fixed. Unfortunately, there is another, opposite issue which this function can be used for. The previous issue was about the existing driver session not reconnecting. The current issue is about the existing driver session reconnecting too much... (and in the middle of queries.)	2024-01-30 15:36:48 +01:00
Kamil Braun	cf3f26dc94	test_maintenance_mode: fix flakiness Wait until CQL is available and nodes see each other before trying to perform a query. Closes scylladb/scylladb#17059	2024-01-30 14:11:14 +02:00
Gleb Natapov	8b50613465	test: add test for rebuild failure	2024-01-30 11:04:19 +02:00
Gleb Natapov	d62204e758	test: add expected_error to rebuild_node operation	2024-01-30 11:04:19 +02:00
Michał Chojnowski	904bb25987	test: test_tablet_cleanup: wait for servers to see each other before multi-node queries Waiting for CQL connections is not enough. For the queries to succeed, nodes must see each other. We have to wait for this, otherwise the test will be flaky. Fixes #17029 Closes scylladb/scylladb#17040	2024-01-30 08:56:01 +02:00
Tomasz Grabiec	36f218c83b	Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.) Refs: #16739 Fixes: #16843 Closes scylladb/scylladb#17008 * github.com:scylladb/scylladb: test/topolgy_experimental_raft: test_tablets.py: add test for resharding test/pylib: manager[_client]: add update_cmdline() main: refuse startup when tablet resharding is required locator: tablets: add check_tablet_replica_shards()	2024-01-29 23:39:41 +01:00
Pavel Emelyanov	370fbd346c	Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel `db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact. This PR: - extends the public interface of utils::directories class to provide required directory paths to the users - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs Fixes: scylladb#5626 Closes scylladb/scylladb#16787 * github.com:scylladb/scylladb: utils/directories: make utils::directories::set an internal type db::config: keep dir paths unchanged cql_transport/controler: use utils::directories to get paths of dirs service/storage_proxy: use utils::directories to get paths of dirs api/storage_service.cc: use utils::directories to get paths of dirs tools/scylla-sstable.cc: use utils::directories to get paths db/commitlog: do not use db::config to get dirs Use utils::directories to get dirs paths in replica::database Allow utils::directories to provide paths to dirs Clean-up of utils::directories	2024-01-29 18:01:15 +03:00
Kamil Braun	0912d2a2c6	Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this PR is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Fixes scylladb/scylladb#16944 Closes scylladb/scylladb#17009 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: document the left_token_ring state topology_coordinator: adjust reason string in left_token_ring handler raft topology: make left_token_ring a transition state topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes	2024-01-29 15:29:01 +01:00
Botond Dénes	d202d32f81	Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb/scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call	2024-01-29 15:06:04 +02:00
Patryk Wrobel	f08768e767	service/storage_proxy: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in service/storage_proxy. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	804afffb11	db/commitlog: do not use db::config to get dirs This change removes usage of db::config to get path of commitlog_directory. Instead, it introduces a new parameter to directly pass the path to db::commitlog::config::from_db_config(). Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	9483d149af	Use utils::directories to get dirs paths in replica::database This change replaces the usage of db::config with usage of utils::directories to get dirs paths in replica::database class. Moreover, it adjusts tests that require construction of replica::database - its constructor has been changed to accept utils::directories object. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1cd676e438	Allow utils::directories to provide paths to dirs This change extends utils::directories class in the following way: - adds new member variables that correspond to fields from db::config that describe paths of directories - introduces a public interface to retrieve the values of the new members - allows construction of utils::directories object based on db::config to setup internal member variables related to paths to dirs The new members of utils::directories are overriden when the provided values are empty. The way of setting paths is taken from db::config. To ensure that the new logic works correctly `utils_directories_test` has been created. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Botond Dénes	fd66ce1591	test/topolgy_experimental_raft: test_tablets.py: add test for resharding Check that scylla refuses to start when the shard count is reduced.	2024-01-29 07:04:33 -05:00
Botond Dénes	a7a5aada2a	test/pylib: manager[_client]: add update_cmdline() Similar to the existing update_config(). Updates the command-line arguments of the specified nodes, merging the new options into the existing ones. Needs a restart to take effect.	2024-01-29 07:04:33 -05:00
Patryk Jędrzejczak	b0eef50b2e	raft topology: make left_token_ring a transition state A node can be in the `left_token_ring` state after: - a finished decommission, - a failed bootstrap, - a failed replace. When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this patch is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Regarding implementation, most of the changes are simple refactoring. The less obvious are: - Before this patch, in `system_keyspace::left_topology_state`, we had to keep the ignored nodes' IDs for replace to ensure that the replacing node will have access to it after moving to the `left_token_ring` state, which happens when replace fails. We don't need this workaround anymore. When we enter the new `left_token_ring` transition state, the new node will still be in the `decommissioning` state, so it won't lose its request param. - Before this patch, a decommissioning node lost its tokens while moving to the `left_token_ring` state. After the patch, it loses tokens while still being in the `decommissioning` state. We ensure that all `decommissioning` handlers correctly handle a node that lost its tokens. Moving the `left_token_ring` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There are only three changes: - adding `auto node = get_node_to_work_on(std::move(guard));`, - adding `builder.del_transition_state()`, - changing error logged when `global_token_metadata_barrier` fails.	2024-01-29 10:39:07 +01:00
Kefu Chai	8f38bd5376	commitlog: add formatter for db::replay_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::replay_position`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17014	2024-01-29 09:59:30 +02:00
Botond Dénes	d3c1be9107	Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El This series does a similar change to Alternator as was done recently to CQL: 1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to. 3. In particular, all test/alternator tests will also automatically run with tablets enabled 4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets. 5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later). Fixes #16355 Closes scylladb/scylladb#16900 * github.com:scylladb/scylladb: test/alternator: add "--vnodes" option to run script alternator: use tablets by default, if available test/alternator: run some tests without tablets	2024-01-29 09:22:13 +02:00
Dawid Medrek	b92fb3537a	main: Postpone start-up of hint manager In this commit, we postpone the start-up of the hint manager until we obtain information about other nodes in the cluster. When we start the hint managers, one of the things that happen is creating endpoint managers -- structures managed by db::hints::manager. Whether we create an instance of endpoint manager depends on the value returned by host_filter::can_hint_for, which, in turn, may depend on the current state of locator::topology. If locator::topology is incomplete, some endpoint managers may not be started even though they should (because the target node IS part of the cluster and we SHOULD send hints to it if there are some). The situation like that can happen because we start the hint managers too early. This commit aims to solve that problem. We only start the hint managers when we've gathered information about the other nodes in the cluster and created the locator::topology using it. Hinted Handoff is not negatively affected by these changes since in between the previous point of starting the hint managers and the current one, all of the mutations performed by service::storage_proxy target the local node, so no hints would need to be generated anyway. Fixes scylladb/scylladb#11870 Closes scylladb/scylladb#16511	2024-01-26 12:49:40 +01:00
Kefu Chai	a9d781d70f	test/nodetool: only test "storage_service/cleanup_all" with scylla this RESTful API is a scylla specific extension and is only used by scylla-nodetool. currently, the java-based nodetool does not use it at all, so mark it with "scylla_only". one can verify this change with: ``` pytest --mode=debug --nodetool=cassandra test_cleanup.py::test_cleanup ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17001	2024-01-26 13:19:15 +02:00
Botond Dénes	582ddc70ec	Merge 'test/nodetool: return a randomized address if not running with unshare' from Kefu Chai we should allow user to run nodetool tests without `test.py`. but there are good chance that the host could be reused by multiple tests or multiple users who could be using port 12345. by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Closes scylladb/scylladb#16996 * github.com:scylladb/scylladb: test/nodetool: return a randomized address if not running with unshare test/nodetool: return an address from loopback_network fixture	2024-01-26 13:15:58 +02:00
Kamil Braun	4f736894e1	Merge 'Add maintenance mode' from Mikołaj Grzebieluch In this mode, the node is not reachable from the outside, i.e. * it refuses all incoming RPC connections, * it does not join the cluster, thus * all group0 operations are disabled (e.g. schema changes), * all cluster-wide operations are disabled for this node (e.g. repair), * other nodes see this node as dead, * cannot read or write data from/to other nodes, * it does not open Alternator and Redis transport ports and the TCP CQL port. The only way to make CQL queries is to use the maintenance socket. The node serves only local data. To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file. REST API works as usual, but some routes are disabled: * authorization_cache * failure_detector * hinted_hand_off_manager This PR also updates the maintenance socket documentation: * add cqlsh usage to the documentation * update the documentation to use `WhiteListRoundRobinPolicy` Fixes #5489. Closes scylladb/scylladb#15346 * github.com:scylladb/scylladb: test.py: add test for maintenance mode test.py: generalize usage of cluster_con test.py: when connecting to node in maintenance mode use maintenance socket docs: add maintenance mode documentation main: add maintenance mode main: move some REST routes initialization before joining group0 message_service: add sanity check that rpc connections are not created in the maintenance mode raft_group0_client: disable group0 operations in the maintenance mode service/storage_service: add start_maintenance_mode() method storage_service: add MAINTENANCE option to mode enum service/maintenance_mode: add maintenance_mode_enabled bool class service/maintenance_mode: move maintenance_socket_enabled definition to seperate file db/config: add maintenance mode flag docs: add cqlsh usage to maintenance socket documentation docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy	2024-01-26 11:02:34 +01:00
Kefu Chai	01727a5399	test/nodetool: return a randomized address if not running with unshare we should allow user to run nodetool tests without `test.py`. but there are good chance that the host could be reused by multiple tests or multiple users who could be using port 12345. by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-26 13:32:47 +08:00
Kefu Chai	358d30fd29	test/nodetool: return an address from loopback_network fixture * rename "maybe_setup_loopback_network" to "server_address" * return an address from the fixture this change prepares for bringing back the randomized IP and port, in case users run this test without test.py, by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-26 13:20:37 +08:00
Raphael S. Carvalho	3b14c5b84a	test/topology_experimental_raft: Add tablet split test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	4245ad333a	test/topology_experimental_raft: Disable load balancer in test fencing This is easier to reproducer after changes in load balancer, to emit resize decisions, which in turn results in topology version being incremented, and that might race with fencing tests that manipulate the topology version manually. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	7ed5b44d52	load_balancer: Implement resize decisions This implements the ability in load balancer to emit split or merge requests, cancel ongoing ones if they're no longer needed, and also finalize those that are ready for the topology changes. That's all based on average tablet size, collected by coordinator from all nodes, and split and merge thresholds. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	ed2138a35a	tablet_mutation_builder: Add set_resize_decision() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	0d5ba1ee4b	tablets: Add resize decision metadata to tablet metadata The new metadata describes the ongoing resize operation (can be either of merge, split or none) that spans tablets of a given table. That's managed by group0, so down nodes will be able to see the decision when they come back up and see the changes to the metadata. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:06 -03:00
Avi Kivity	03313d359e	Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay. This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations. Fixes #16752 Closes scylladb/scylladb#16888 * github.com:scylladb/scylladb: test: test_tablets: add a test for cleanup after migration test: pylib: add ScyllaCluster.wipe_sstables test: boost: add commitlog_cleanup_test db: commitlog_replayer: ignore mutations affected by (tablet) cleanups replica: table: garbage-collect irrelevant system.commitlog_cleanups records db: commitlog: add min_position() replica: table: populate system.commitlog_cleanups on tablet cleanup db: system_keyspace: add system.commitlog_cleanups replica: table: refresh compound sstable set after tablet cleanup	2024-01-25 20:51:03 +02:00
Kamil Braun	543ad0987a	Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak We didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. This PR fixes this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node. We also stop filtering stale topology exceptions in `test_topology_ops`. We added this filter after detecting the bug fixed by this PR. Fixes scylladb/scylladb#15804 Fixes scylladb/scylladb#16579 Fixes scylladb/scylladb#16642 Closes scylladb/scylladb#16797 * github.com:scylladb/scylladb: test: test_topology_ops: remove failed mutations filter raft topology: send barrier_and_drain to a decommissioning node raft topology: ensure at most one transitioning node	2024-01-25 16:09:02 +01:00
Botond Dénes	6d5ee6d48a	Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * do not use a random address for the mock server, as the network namespaces are separated. Fixes #16542 Closes scylladb/scylladb#16773 * github.com:scylladb/scylladb: test/nodetool: run nodetool tests using "unshare" test.py: add "launcher" option support	2024-01-25 16:53:49 +02:00
Mikołaj Grzebieluch	763911af5b	test.py: add test for maintenance mode The test checks that in maintenance mode server A is not available for other nodes and for clients. It is possible to connect by the maintenance socket to server A and perform local CQL operations.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	ca35e352f5	test.py: generalize usage of cluster_con Add option to pass load_balancing policy. Change hosts type to list of IPs or cassandra.Endpoint.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	77a656bfd6	test.py: when connecting to node in maintenance mode use maintenance socket A node in the maintenance socket hasn't an opened regular CQL port. To connect to the node, the scylla cluster needs to use the node's maintenance socket.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	c08266cfe5	raft_group0_client: disable group0 operations in the maintenance mode In maintenance mode, the node doesn't communicate with other nodes, so it doesn't start or apply group0 operations. Users can still try to start it, e.g. change the schema, and the node can't allow it. Init _upgrade_state with recovery in the maintenance mode. Throw an error if the group0 operation is started in maintenance mode.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	8b2f0e38d9	service/maintenance_mode: move maintenance_socket_enabled definition to seperate file	2024-01-25 15:27:53 +01:00
Pavel Emelyanov	bf3cae4992	Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. the tests are updated to print the time duration instead of count to provide information with a higher resolution. Fixes #15902 Closes scylladb/scylladb#16264 * github.com:scylladb/scylladb: tests: utils: error injection: print time duration instead of count error_injection: do not cast to milliseconds when injecting timeout	2024-01-25 16:13:27 +03:00
Avi Kivity	69d597075a	Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec New tablet replicas are allocated and rebuilt synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet rebuilding transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes https://github.com/scylladb/scylladb/issues/16690. Closes scylladb/scylladb#16894 * github.com:scylladb/scylladb: tests: tablets: Add tests for removenode and replace tablets: Add support for removenode and replace handling topology_coordinator: tablets: Do not fail in a tight loop topology_coordinator: tablets: Avoid warnings about ignored failured future storage_service, topology: Track excluded state in locator::topology raft topology: Introduce param-less topology::get_excluded_nodes() raft topology: Move get_excluded_nodes() to topology tablets: load_balancer: Generalize load tracking tablets: Introduce get_migration_streaming_info() which works on migration request tablets: Move migration_to_transition_info() to tablets.hh tablets: Extract get_new_replicas() which works on migraiton request tablets: Move tablet_migration_info to tablets.hh tablets: Store transition kind per tablet	2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak	b348014745	test: test_topology_ops: remove failed mutations filter We added this filter after detecting a bug in the Raft-based topology. We weren't sending `barrier_and_drain` commands to a decommissioning node that could still be coordinating requests. It could cause stale topology exceptions on replicas if the decommissioning node sent a request with an old topology version after normal nodes received the new fence version. This bug has been fixed in the previous commit, so we remove the filter.	2024-01-25 13:42:48 +01:00

1 2 3 4 5 ...

6240 Commits