scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Nadav Har'El	341af86167	test/cql-pytest: reproducer for GROUP BY regression This patch adds a simple reproducer for a regression in Scylla 5.4 caused by commit `432cb02`, breaking LIMIT support in GROUP BY. Refs #17237 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17275	2024-02-12 13:09:52 +02:00
Kamil Braun	7d73c40125	Merge 'test.py: tablets: Fix flakiness of test_tablet_missing_data_repair' from Tomasz Grabiec Reimplements stop/start sequence using rolling_restart() which is safe with regards to UP status propagation and not prone to sudden connection drop which may cause later CQL queries to time out. It also ensures that CQL is up on all the remaining nodes when the with_down callback is executed. The test was observed to fail in CI like this: ``` cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.157.135.26:9042 datacenter1>: ConnectionException('Pool for 127.157.135.26:9042 is shutdown')}) ... @pytest.mark.repair @pytest.mark.asyncio async def test_tablet_missing_data_repair(manager: ManagerClient): ... for idx in range(0,3): s = servers[idx].server_id await manager.server_stop_gracefully(s, timeout=120) > await check() ``` Hopefully: Fixes #17107 Closes scylladb/scylladb#17252 * github.com:scylladb/scylladb: test: py: tablets: Fix flakiness of test_tablet_missing_data_repair test: pylib: manager_client: Wait for driver to catch up in rolling_restart() test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down	2024-02-12 11:52:09 +01:00
Botond Dénes	f068d1a6fa	query: do not kill unpaged queries when they reach the tombstone-limit The reason we introduced the tombstone-limit (query_tombstone_page_limit), was to allow paged queries to return incomplete/empty pages in the face of large tombstone spans. This works by cutting the page after the tombstone-limit amount of tombstones were processed. If the read is unpaged, it is killed instead. This was a mistake. First, it doesn't really make sense, the reason we introduced the tombstone limit, was to allow paged queries to process large tombstone-spans without timing out. It does not help unpaged queries. Furthermore, the tombstone-limit can kill internal queries done on behalf of user queries, because all our internal queries are unpaged. This can cause denial of service. So in this patch we disable the tombstone-limit for unpaged queries altogether, they are allowed to continue even after having processed the configured limit of tombstones. Fixes: #17241 Closes scylladb/scylladb#17242	2024-02-12 12:34:04 +02:00
Patryk Wrobel	9fccd968d3	test_tablets.py: implement test_tablet_count_metric_per_shard This change introduces a new test that verifies the functionality related to tablet_count metric. It checks if tablet_count metric is correctly reported and updated when new tables are created, when tables are dropped and when `move_tablet` is executed. Refs: scylladb#16131 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17165	2024-02-12 11:49:38 +02:00
Kefu Chai	54995fcac0	test/manual: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17255	2024-02-12 11:49:38 +02:00
Patryk Jędrzejczak	e64162e8f6	test: add test_writes_to_previous_cdc_generations In one of the previous patches, we allowed writing to the previous CDC generations for `generation_leeway`. Now, we add tests for this change.	2024-02-12 10:14:00 +01:00
Nadav Har'El	13e16475fa	cql-pytest: fix skipping of tests on Cassandra or old Scylla Recently we added a trick to allow running cql-pytests either with or without tablets. A single fixture test_keyspace uses two separate fixtures test_keyspace_tablets or test_keyspace_vnodes, as requested. The problem is that even if test_keyspace doesn't use its test_keyspace_tablets fixture (it doesn't, if the test isn't parameterized to ask for tablets explicitly), it's still a fixture, and it causes the test to be skipped. This causes every test to be skipped when running on Cassandra or old Scylla which doesn't support tablets. The fix is simple - the internal fixture test_keyspace_tablets should yield None instead of skipping. It is the caller, test_keyspace, which now skips the test if tablets are requested but test_keyspace_tablets is None. Fixes #17266 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17267	2024-02-11 21:03:25 +02:00
Kefu Chai	f990ea9678	tools/scylla-nodetool: implement describecluster Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17240	2024-02-11 20:21:07 +02:00
Benny Halevy	2ed29e31db	gms: inet_address: make constructors explicit In particular, `inet_address(const sstring& addr)` is dangerous, since a function like `topology::get_datacenter(inet_address ep)` might accidentally convert a `sstring` argument into an `inet_address` (which would most likely throw an obscure std::invalid_argument if the datacenter name does not look like an inet_address). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17260	2024-02-11 15:44:13 +02:00
Tomasz Grabiec	1eedc85990	test: py: tablets: Fix flakiness of test_tablet_missing_data_repair Reimplement stop/start sequence using rolling_restart() which is safe with regards to UP status propagation and not prone to sudden connection drop which may cause later CQL queries to time out. It also ensures that CQL is up on all the remaining nodes when the with_down callback is executed. Hopefully: Fixes #17107	2024-02-09 20:37:06 +01:00
Tomasz Grabiec	27ed2d94fc	test: pylib: manager_client: Wait for driver to catch up in rolling_restart() For sanity of the developers who want to execute CQL queries after rolling restarts.	2024-02-09 20:35:41 +01:00
Tomasz Grabiec	3ce4ec796a	test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down	2024-02-09 20:35:41 +01:00
Petr Gusev	4554653ad9	storage_proxy: add a test for stop_remote This patch adds a reproducer test for an issue #16382. See scylladb/seastar#2044 for details of the problem. The test is enabled only in dev mode since it requires error injection mechanism. The patch adds a new injection into storage_proxy::handle_read to simulate the problem scenario - the node is shutting down and there are some unfinished pending replica requests. Closes scylladb/scylladb#16776	2024-02-09 17:23:13 +01:00
Raphael S. Carvalho	daa82f406c	test_tablets: Enable table debug log in split test If the test fails, it's helpful to see how split completion was handled. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17236	2024-02-09 14:38:24 +02:00
Kamil Braun	e9e24f47ec	Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski This PR implements a procedure that upgrades existing clusters to use raft-based topology operations. The procedure does not start automatically, it must be triggered manually by the administrator after making sure that no topology operations are currently running. Upgrade is triggered by sending `POST /storage_service/raft_topology/upgrade` request. This causes the topology coordinator to start who drives the rest of the process: it builds the `system.topology` state based on information observed in gossip and tells all nodes to switch to raft mode. Then, topology coordinator runs normally. Upgrade progress is tracked in a new static column `upgrade_state` in `system.topology`. The procedure also serves as an extension to the current recovery procedure on raft. The current recovery procedure requires restarting nodes in a special mode which disables raft, perform `nodetool removenode` on the dead nodes, clean up some state on the nodes and restart them so that they automatically rebuild the group 0. Raft topology fits into existing procedure by falling back to legacy topology operations after disabling raft. After rebuilding the group 0, upgrade needs to be triggered again. Because upgrade is manual and it might not be convenient for administrators to run it right after upgrading the cluster, we allow the cluster to operate in legacy topology operations mode until upgrade, which includes allowing new nodes to join. In order to allow it, nodes now ask the cluster about the mode they should use to join before proceeding by using a new `JOIN_NODE_QUERY` RPC. The procedure is explained in more detail in `topology-over-raft.md`. Fixes: https://github.com/scylladb/scylladb/issues/15008 Closes scylladb/scylladb#17077 * github.com:scylladb/scylladb: test/topology_custom: upgrade/recovery tests for topology on raft cdc/generation_service: in legacy mode, fall back to raft tables system_keyspace: add read_cdc_generation_opt cdc/generation_service: turn off gossip notifications in raft topo mode cql_test_env: move raft_topology_change_enabled var earlier group0_state_machine: pull snapshot after raft topology feature enabled storage_service: disable persistent feature enabler on upgrade storage_service: replicate raft features to system.peers storage_service: gossip tokens and cdc generation in raft topology mode API: add api for triggering and monitoring topology-on-raft upgrade storage_service: infer which topology operations to use on startup storage_service: set the topology kind value based on group 0 state raft_group0: expose link to the upgrade doc in the header feature_service: fall back to checking legacy features on startup storage_service: add fiber for tracking the topology upgrade progress gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES topology_coordinator: implement core upgrade logic topology_coordinator: extract top-level error handling logic storage_service: initialize discovery leader's state earlier topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data topology_state_machine: introduce upgrade_state storage_service: disallow topology ops when upgrade is in progress raft_group0_client: add in_recovery method storage_service: introduce join_node_query verb raft_group0: make discover_group0 public raft_group0: filter current node's IP in discover_group0 raft_group0: remove my_id arg from discover_group0 storage_service: make _raft_topology_change_enabled more advanced docs: document raft topology upgrade and recovery	2024-02-09 11:54:53 +01:00
Kefu Chai	c1c96bbc16	api/storage_service: drop /storage_service/describe_ring/ API per its description, "`/storage_service/describe_ring/`" returns the token ranges of an arbitrary keyspace. actually, it returns the first keyspace which is of non-local-vnode-based-strategy. this API is not used by nodetool, neither is it exercised in dtest. scylla-manager has a wrapper for this API though, but that wrapper is not used anywhere. in this change, this API is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17197	2024-02-09 12:49:21 +02:00
Piotr Dulikowski	4d4976feb0	test/topology_custom: upgrade/recovery tests for topology on raft Adds three tests for the new upgrade procedure: - test_topology_upgrade - upgrades a cluster operating in legacy mode to use raft topology operations, - test_topology_recovery_basic - performs recovery on a three-node cluster, no node removal is done, - test_topology_majority_loss - simulates a majority loss scenario, i.e. removed two nodes out of three, performs recovery to rebuild the raft topology state and re-add two nodes back.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	d04b3338ce	cdc/generation_service: in legacy mode, fall back to raft tables When a node enters recovery after being in raft topology mode, topology operations switch back to legacy mode. We want CDC to keep working when that happens, so we need for the legacy code to be able to access generations created back in raft mode - so that the node can still properly serve writes to CDC log tables. In order to make this possible, modify the legacy logic to also look for a cdc generation in raft tables, if it is not found in legacy tables.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	77a8f5e3d6	cdc/generation_service: turn off gossip notifications in raft topo mode In raft topology mode CDC information is propagated through group 0. Prevent the generation service from reacting to gossiper notifications after we made the switch to raft mode.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	29e286ee03	cql_test_env: move raft_topology_change_enabled var earlier We will need to pass it to cdc::generation_service::config in the next commit, so move it a bit earlier.	2024-02-08 19:12:28 +01:00
Botond Dénes	8fcb4ed707	tools/scylla-nodetool: implement describering Also implementing tablet support, which basically just means that a new table parameter is also accepted and forwarded to the API, in addition to the existing keyspace one.	2024-02-08 09:20:25 -05:00
Botond Dénes	2df2733ed1	tools/scylla-nodetool.cc: handle API request failures gracefully Currently, error handling is done via catching http::unexpected_status_error and re-throwing an std::runtime_error. Turns out this no longer works, because this error will only be thrown by the http client, if the request had an expected reply code set. The scylla_rest_client doesn't set an expected reply code, so this exception was never thrown for some time now. Furthermore, even when the above worked, it was not too user-friendly as the error message only included the reply-code, but not the reply itself. So in this patch this is fixed: * The handling of http::unexpected_status_error is removed, we don't want to use this mechanism, because it yields very terse error messages. * Instead, the status code of the request is checked explicitely and all cases where it is not 200 are handled. * A new api_request_failed exception is added, which is throw for all non-200 statuses with the extracted error message from the server (if any). * This exception is caught by main, the error message is printed and scylla-nodetool returns with a new distinct error-code: 4. With this, all cases where the request fails on ScyllaDB are handled and we shouldn't hit cases where a nodetool command fails with some obscure JSON parsing error, because the error reply has different JSON schema than the expected happy-path reply.	2024-02-08 09:20:25 -05:00
Botond Dénes	d4f7f23b98	test/nodetool: util.py: add check_nodetool_fails_with_all() Similar to the existing check_nodetool_fails_with() but checks that all error messages from expected_errors are contained in stderr. While at it, use list as the typing hint, instead of typing.List.	2024-02-08 09:20:25 -05:00
Kurashkin Nikita	7ce9a3e9e5	cql: add limits for integer values when creating date type Added a simple check that prevents entering int values that lead to overflow when creating a date type. Fixes #17066 Closes scylladb/scylladb#17102	2024-02-08 00:08:01 +02:00
Michał Chojnowski	f5e3a728e4	row_cache_test: test cache consistency during memtable-to-cache merge A rather minimal reproducer for #16759. Not extensive.	2024-02-07 18:31:36 +01:00
Michał Chojnowski	bed20a2e37	row_cache: use preemption_source in update() To facilitate testing the state of cache after the update is preempted at various points, pass a preemption_source& to update() instead of calling the reactor directly. In release builds, the calls to preemption_source methods should compile to the same direct reactor calls as today. Only in dev mode they should add an extra branch. (However, the `preemption_source&` argument has to be shoveled in any mode).	2024-02-07 18:31:36 +01:00
Botond Dénes	35da9551fb	Merge 'storage_service: Add describe_ring support for tablet table' from Asias He The table query param is added to get the describe_ring result for a given table. Both vnode table and tablet table can use this table param, so it is easier for users to user. If the table param is not provided by user and the keyspace contains tablet table, the request will be rejected. E.g., curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles" curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1" Refs #16509 Closes scylladb/scylladb#17118 * github.com:scylladb/scylladb: tablets: Convert to use the new version of for_each_tablet storage_service: Add describe_ring support for tablet table storage_service: Mark host2ip as const tablets: Add for_each_tablet_gently	2024-02-07 10:41:36 +02:00
Raphael S. Carvalho	41a5c9eaec	test: Reduce mem footprint of test_token_group_based_splitting_mutation_writer Reduces footprint from hundreds of MB to a very few MB. Issue could be reproduced with: ./build/dev/test/boost/mutation_writer_test --run_test=test_token_group_based_splitting_mutation_writer -- -m 500M --smp 1 --random-seed 1848215131 Fixes #17076. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17187	2024-02-07 09:21:24 +02:00
Tomasz Grabiec	032c1a3d04	Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. closes: #16529 Closes scylladb/scylladb#17079 * github.com:scylladb/scylladb: tablets: Make sure topology has enough endpoints for RF cql-pytest: Disable tablets when RF > nodes-in-DC test: Remove test that configures RF larger than the number of nodes keyspace_metadata: Include tablets property in DESCRIBE	2024-02-06 22:38:11 +01:00
Kamil Braun	c0c291b985	Merge 'raft topology: harden IP related tests' from Petr Gusev In this PR we add the tests for two scenarios, related to the use of IPs in raft topology. * When the replaced node transitions to the `LEFT` state we used to remove the IP of such node from gossiper. If we replace with same IP, this caused the IP of the new node to be removed from gossiper. This problem was fixed by #16820, this PR adds a regression test for it. * When a node is restarted after decommissioning some other node, the restarting node tries to apply the raft log, this log contains a record about the decommissioned node, and we got stuck trying to resolve its IP. This was fixed by #16639 - we excluded IPs from the RAFt log application code and moved it entirely to host_id-s. This PR adds a regression test for this case. Closes scylladb/scylladb#15967 Closes scylladb/scylladb#14803 Closes scylladb/scylladb#17180 * github.com:scylladb/scylladb: test_topology_ops: check node restart after decommission test_replace_reuse_ip: check other servers see the IP	2024-02-06 14:28:06 +01:00
Nadav Har'El	14315fcbc3	mv: fix missing view deletions in some cases of range tombstones For efficiency, if a base-table update generates many view updates that go the same partition, they are collected as one mutation. If this mutation grows too big it can lead to memory exhaustion, so since commit `7d214800d0` we split the output mutation to mutations no longer than 100 rows (max_rows_for_view_updates) each. This patch fixes a bug where this split was done incorrectly when the update involved range tombstones, a bug which was discovered by a user in a real use case (#17117). Range tombstones are read in two parts, a beginning and an end, and the code could split the processing between these two parts and the result that some of the range tombstones in update could be missed - and the view could miss some deletions that happened in the base table. This patch fixes the code in two places to avoid breaking up the processing between range tombstones: 1. The counter "_op_count" that decides where to break the output mutation should only be incremented when adding rows to this output mutation. The existing code strangely incrmented it on every read (!?) which resulted in the counter being incremented on every input fragment, and in particular could reach the limit 100 between two range tombstone pieces. 2. Moreover, the length of output was checked in the wrong place... The existing code could get to 100 rows, not check at that point, read the next input - half a range tombstone - and only then check that we reached 100 rows and stop. The fix is to calculate the number of rows in the right place - exactly when it's needed, not before the step. The first change needs more justification: The old code, that incremented _op_count on every input fragment and not just output fragments did not fit the stated goal of its introduction - to avoid large allocations. In one test it resulted in breaking up the output mutation to chunks of 25 rows instead of the intended 100 rows. But, maybe there was another goal, to stop the iteration after 100 input rows and avoid the possibility of stalls if there are no output rows? It turns out the answer is no - we don't need this _op_count increment to avoid stalls: The function build_some() uses `co_await on_results()` to run one step of processing one input fragment - and `co_await` always checks for preemption. I verfied that indeed no stalls happen by using the existing test test_long_skipped_view_update_delete_with_timestamp. It generates a very long base update where all the view updates go to the same partition, but all but the last few updates don't generate any view updates. I confirmed that the fixed code loops over all these input rows without increasing _op_count and without generating any view update yet, but it does NOT stall. This patch also includes two tests reproducing this bug and confirming its fixed, and also two additional tests for breaking up long deletions that I wanted to make sure doesn't fail after this patch (it doesn't). By the way, this fix would have also fixed issue #12297 - which we fixed a year ago in a different way. That issue happend when the code went through 100 input rows without generating any output rows, and incorrectly concluding that there's no view update to send. With this fix, the code no longer stops generating the view update just because it saw 100 input rows - it would have waited until it generated 100 output rows in the view update (or the input is really done). Fixes #17117 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17164	2024-02-06 14:57:33 +02:00
Kefu Chai	97587a2ea4	test/boost: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17139	2024-02-06 13:22:16 +02:00
Botond Dénes	ce3233112e	Merge 'configure.py: add -Wextra to cflags' from Kefu Chai also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them. this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra. Closes scylladb/scylladb#17131 * github.com:scylladb/scylladb: configure.py: add -Wextra to cflags test/tablets: do not compare signed and unsigned	2024-02-06 12:57:32 +02:00
Petr Gusev	646ca9515e	test_topology_ops: check node restart after decommission There used to be a problem with restarting a node after decommissioning some other node - the restarting node tries to apply the raft log, this log contains a record about the decommissioned node, and we got stuck trying to resolve its IP. This was fixed in #16639 - we excluded IPs from the RAFt log application code and moved it entirely to host_id-s. In this commit we add a regression test for this case. We move the decommission_node call before server_stop/server_start. We need to add one more server to retain majority when the node is decommissioned, otherwise the topology coordinator won't migrate from the stopped node before replacing it, and we'll get an error. closes #14803	2024-02-06 13:29:42 +04:00
Petr Gusev	aeed5c5fe3	test_replace_reuse_ip: check other servers see the IP The replaced node transitions to LEFT state, and we used to remove the IPs of such nodes from gossiper. If we replace with same IP, this caused the IP of the new node to be removed from gossiper. This problem was fixed by #16820, this commit adds a regression test for it. closes #15967	2024-02-06 13:28:04 +04:00
Kamil Braun	968d1e3e78	Merge 'raft topology: make rollback_to_normal a transition state' from Patryk Jędrzejczak After changing `left_token_ring` from a node state to a transition state in scylladb/scylladb#17009, we do the same for `rollback_to_normal`. `rollback_to_normal` was created as a node state because `left_token_ring` was a node state. This change will allow us to distinguish a failed removenode from a failed decommission in the `rollback_to_normal` handler. Currently, we use the same logic for both of them, so it's not required. However, this might change, as it has happened with the decommission and the failed bootstrap/replace in the `left_token_ring` state (scylladb/scylladb#16797). We are making this change now because it would be much harder after branching. Fixes scylladb/scylladb#17032 Closes scylladb/scylladb#17136 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: align indentation docs: dev: topology-over-raft: document the rollback_to_normal state topology_coordinator: improve logs in rollback_to_normal handler raft topology: make rollback_to_normal a transition state	2024-02-05 16:30:20 +01:00
Nadav Har'El	7888b23e9e	Merge 'test/cql-pytest: re-enable disabled tests' from Botond Dénes In a previous PR (https://github.com/scylladb/scylladb/pull/16840), we enabled tablets by default when running the cql-pytest suite. To handle tests which are failing with tablets enabled, we used a new fixture, `xfail_tablets` to mark these as xfail. This means that we effectively lost test coverage, as these tests can now freely fail and no-one will notice if this is due to a new regression. To restore test coverage, this PR re-enables all the previously disabled tests, by parametrizing each one of them to run with both vnodes and tablets, and targetedly mark as xfail, only the tablet variant. After these tests are fixed with tablets (or the underlying functionality they test is fixed to work with tablets), we will run them with both vnodes and tablets, because these tests apparently do care which replication method is used. Together with https://github.com/scylladb/scylladb/pull/16802, this means all previously disabled test is re-enabled and no coverage is lost. Closes scylladb/scylladb#16945 * github.com:scylladb/scylladb: test/cql-pytest: conftest.py: remove xfail_tablets fixture test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests test/cql-pytest: test_describe.py: re-enable disabled tests test/cql-pytest: test_cdc.py: re-enable disabled tests test/cql-pytest: add parameter support to test_keyspace	2024-02-05 14:12:57 +02:00
Asias He	904bafd069	tablets: Convert to use the new version of for_each_tablet It is more gently than the old one.	2024-02-05 18:45:40 +08:00
Pavel Emelyanov	45dbe38658	tablets: Make sure topology has enough endpoints for RF When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Pavel Emelyanov	8471d88576	cql-pytest: Disable tablets when RF > nodes-in-DC All the cql-pytest-s run agains single scylla node, but new_random_keyspace() helper may request RF in the rage of 1 through 6, so tablets need to be explicitly disabled when the RF is too large Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Pavel Emelyanov	3b9ca29411	test: Remove test that configures RF larger than the number of nodes This is going to be disabled soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:03 +03:00
Pavel Emelyanov	8910d37994	keyspace_metadata: Include tablets property in DESCRIBE When tablets are enabled and a keyspace being described has them explicitly disabled or non-automatic initial value of zero, include this into the returned describe statement too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:49:20 +03:00
Avi Kivity	784c2f8ad2	Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing. Closes scylladb/scylladb#17130 * github.com:scylladb/scylladb: treewide: replace seastar::future::get0() with seastar::future::get() sstable: capture return value of get0() using auto utils: result_loop: define result_type with decayed type [avi: add another one that snuck in while this was cooking]	2024-02-04 15:23:33 +02:00
Patryk Jędrzejczak	25b90f5554	raft topology: make rollback_to_normal a transition state After changing `left_token_ring` from a node state to a transition state in scylladb/scylladb#17009, we do the same for `rollback_to_normal`. `rollback_to_normal` was created as a node state because `left_token_ring` was a node state. This change will allow us to distinguish a failed removenode from a failed decommission in the `rollback_to_normal` handler. Currently, we use the same logic for both of them, so it's not required. However, this might change, as it has happened with the decommission and the failed bootstrap/replace in the `left_token_ring` state (scylladb/scylladb#16797). We are making this change now because it would be much harder after branching. The change also simplifies the code in `topology_coordinator:rollback_current_topology_op`. Moving the `rollback_to_normal` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There is only one change - adding `auto node = get_node_to_work_on(std::move(guard));`.	2024-02-02 16:55:20 +01:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Kefu Chai	aea6cd0b2d	test/tablets: do not compare signed and unsigned this change should silence following warning: ``` test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] 19:47:04 for (int i = 0; i < smp::count * 20; i++) { 19:47:04 ~ ^ ~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 20:49:21 +08:00
Botond Dénes	dc8e13baed	Merge 'Move some tablets tests from topology_custom to cql-pytest' from Pavel Emelyanov The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance Closes scylladb/scylladb#17101 * github.com:scylladb/scylladb: test/topology_custom: Remove test_tablets.py test/topology: Move test_tablet_change_initial_tablets test/topology: Move test_tablet_explicit_disabling test/topology: Move test_tablet_default_initialization test/topology: Move test_tablet_change_replication_strategy test/topology: Move test_tablet_change_replication_vnode_to_tablets cql-pytest: Add skip_without_tablets fixture	2024-02-01 16:28:43 +02:00
Kamil Braun	c911bf1a33	test_raft_snapshot_request: fix flakiness (again) At the end of the test, we wait until a restarted node receives a snapshot from the leader, and then verify that the log has been truncated. To check the snapshot, the test used the `system.raft_snapshots` table, while the log is stored in `system.raft`. Unfortunately, the two tables are not updated atomically when Raft persists a snapshot (scylladb/scylladb#9603). We first update `system.raft_snapshots`, then `system.raft` (see `raft_sys_table_storage::store_snapshot_descriptor`). So after the wait finishes, there's no guarantee the log has been truncated yet -- there's a race between the test's last check and Scylla doing that last delete. But we can check the snapshot using `system.raft` instead of `system.raft_snapshots`, as `system.raft` has the latest ID. And since `1640f83fdc`, storing that ID and truncating the log in `system.raft` happens atomically. Closes scylladb/scylladb#17106	2024-02-01 16:06:12 +02:00
Patryk Wrobel	25324bbe50	cql_test_env.cc: remove dead code This change removes empty anonymous namespace that is a dead code. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17099	2024-02-01 13:17:48 +02:00
Pavel Emelyanov	64cb3a6496	test/topology_custom: Remove test_tablets.py It's now empty, all test cases had been moved to cql-pytest Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00

... 109 110 111 112 113 ...

11801 Commits