scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	0693091aff	test: test_restart_leaving_replica_during_cleanup: reconnect driver after restart The test can currently fail like this: ``` > await cql.run_async(f"ALTER TABLE {ks}.test WITH tablets = {{'min_tablet_count': 1}}") E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.158.27.9:9042 datacenter1>: <Error from server: code=0000 [Server error] message="Failed to apply group 0 change due to concurrent modification">}) ``` The following happens: - node A is restarted and becomes the group0 leader, - the driver sends the ALTER TABLE request to node B, - the request hits group 0 concurrent modification error 10 times and fails because node A performs tablet migrations at the the same time. What is unexpected is that even though the driver session uses the default retry policy, the driver doesn't retry the request on node A. The request is guaranteed to succeed on node A because it's the only node adding group0 entries. The driver doesn't retry the request on node A because of a missing `wait_for_cql_and_get_hosts` call. We add it in this commit. We also reconnect the driver just in case to prevent hitting scylladb/python-driver#295. Moreover, we can revert the workaround from `4c9efc08d8`, as the fix from this commit also prevents DROP KEYSPACE failures. The commit has been tested in byo with `_concurrent_ddl_retries{0}` to verify that node A really can't hit group 0 concurrent modification error and always receives the ALTER TABLE request from the driver. All 300 runs in each build mode passed. Fixes #25938 Closes scylladb/scylladb#28632	2026-02-16 12:56:18 +01:00
Marcin Maliszkiewicz	6a4aef28ae	Merge 'test: explicitly set compression algorithm in test_autoretrain_dict' from Andrzej Jackowski When `test_autoretrain_dict` was originally written, the default `sstable_compression_user_table_options` was `LZ4Compressor`. The test assumed (correctly) that initially the compression doesn't use a trained dictionary, and later in the test scenario, it changed the algorithm to one with a dictionary. However, the default `sstable_compression_user_table_options` is now `LZ4WithDictsCompressor`, so the old assumption is no longer correct. As a result, the assertion that data is initially not compressed well may or may not fail depending on dictionary training timing. To fix this, this commit explicitly sets `ZstdCompressor` as the initial `sstable_compression_user_table_options`, ensuring that the assumption that initial compression is without a dictionary is always met. Note: `ZstdCompressor` differs from the former default `LZ4Compressor`. However, it's a better choice — the test aims to show the benefit of using a dictionary, not the benefit of Zstd over LZ4 (and the test uses ZstdWithDictsCompressor as the algorithm with the dictionary). Fixes: https://github.com/scylladb/scylladb/issues/28204 Backport: 2025.4, as test already failed there (and also backport to 2026.1 to make everything consistent). Closes scylladb/scylladb#28625 * github.com:scylladb/scylladb: test: explicitly set compression algorithm in test_autoretrain_dict test: remove unneeded semicolons from python test	2026-02-16 11:38:24 +01:00
Ernest Zaslavsky	034c6fbd87	s3_client: limit multipart upload concurrency Prevent launching hundreds or thousands of fibers during multipart uploads by capping concurrent part submissions to 16. Closes scylladb/scylladb#28554	2026-02-16 13:32:58 +03:00
Botond Dénes	9f57d6285b	Merge 'test: improve error reporting and retries in get_scylla_2025_1_executable' from Marcin Maliszkiewicz Harden get_scylla_2025_1_executable() by improving error reporting when subprocesses fail, increasing curl's retry count for more resilient downloads, and enabling --retry-all-errors to retry on all failures. Fixes https://github.com/scylladb/scylladb/issues/27745 Backport: no, it's not a bug fix Closes scylladb/scylladb#28628 * github.com:scylladb/scylladb: test: pylib: retry on all errors in get_scylla_2025_1_executable curl's call test: pylib: increase curl's number of retries when downloading scylla test: pylib: improve error reporting in get_scylla_2025_1_executable	2026-02-16 10:09:17 +02:00
Andrei Chekun	8c5c1096c2	test: ensure that that table used it cqlpy/test_tools have at least 3 pk One of the tests check that amount of the PK should be more than 2, but the method that creates it can return table with less keys. This leads to flakiness and to avoid it, this PR ensures that table will have at least 3 PK Closes scylladb/scylladb#28636	2026-02-16 09:50:58 +02:00
Anna Mikhlin	33cf97d688	.github/workflows: ignore quoted comments for trigger CI prevent CI from being triggered when trigger-ci command appears inside quoted (>) comment text Fixes: https://scylladb.atlassian.net/browse/RELENG-271 Closes scylladb/scylladb#28604	2026-02-16 09:33:16 +02:00
Andrei Chekun	e144d5b0bb	test.py: fix JUnit double test case records Move the hook for overwriting the XML reporter to be the first, to avoid double records. Closes scylladb/scylladb#28627	2026-02-15 19:02:24 +02:00
Jenkins Promoter	69249671a7	Update pgo profiles - aarch64	2026-02-15 05:22:17 +02:00
Jenkins Promoter	27aaafb8aa	Update pgo profiles - x86_64	2026-02-15 04:26:36 +02:00
Piotr Dulikowski	9c1e310b0d	Merge 'vector_search: Fix flaky vector_store_client_https_rewrite_ca_cert' from Karol Nowacki Most likely, the root cause of the flaky test was that the TLS handshake hung for an extended period (60s). This caused the test case to fail because the ANN request duration exceeded the test case timeout. The PR introduces two changes: * Mitigation of the hanging TLS handshake: This issue likely occurred because the test performed certificate rewrites simultaneously with ANN requests that utilize those certificates. * Production code fix: This addresses a bug where the TLS handshake itself was not covered by the connection timeout. Since tls::connect does not perform the handshake immediately, the handshake only occurs during the first write operation, potentially bypassing connect timeout. Fixes: #28012 Backport to 2026.01 and 2025.04 is needed, as these branches are also affected and may experience CI flakiness due to this test. Closes scylladb/scylladb#28617 * github.com:scylladb/scylladb: vector_search: Fix missing timeout on TLS handshake vector_search: test: Fix flaky cert rewrite test	2026-02-13 19:03:50 +01:00
Marcin Maliszkiewicz	1b0a68d1de	test: pylib: retry on all errors in get_scylla_2025_1_executable curl's call It's difficult to say if our download backend would always return transient error correctly so that the curl could retry. Instead it's more robust to always retry on error.	2026-02-12 16:18:52 +01:00
Marcin Maliszkiewicz	8ca834d4a4	test: pylib: increase curl's number of retries when downloading scylla By default curl does exponential backoff, and we want to keep that but there is time cap of 10 minutes, so with 40 retries we'd wait long time, instead we set the cap to 60 seconds. Total waiting time (excluding receiving request time): before - 17m after - 35m	2026-02-12 16:18:52 +01:00
Marcin Maliszkiewicz	70366168aa	test: pylib: improve error reporting in get_scylla_2025_1_executable Curl or other tools this function calls will now log error in the place they fail instead of doing plain assert.	2026-02-12 16:18:52 +01:00
Andrzej Jackowski	9ffa62a986	test: explicitly set compression algorithm in test_autoretrain_dict When `test_autoretrain_dict` was originally written, the default `sstable_compression_user_table_options` was `LZ4Compressor`. The test assumed (correctly) that initially the compression doesn't use a trained dictionary, and later in the test scenario, it changed the algorithm to one with a dictionary. However, the default `sstable_compression_user_table_options` is now `LZ4WithDictsCompressor`, so the old assumption is no longer correct. As a result, the assertion that data is initially not compressed well may or may not fail depending on dictionary training timing. To fix this, this commit explicitly sets `ZstdCompressor` as the initial `sstable_compression_user_table_options`, ensuring that the assumption that initial compression is without a dictionary is always met. Note: `ZstdCompressor` differs from the former default `LZ4Compressor`. However, it's a better choice — the test aims to show the benefit of using a dictionary, not the benefit of Zstd over LZ4 (and the test uses ZstdWithDictsCompressor as the algorithm with the dictionary). Fixes: scylladb/scylladb#28204	2026-02-12 14:58:39 +01:00
Andrzej Jackowski	e63cfc38b3	test: remove unneeded semicolons from python test	2026-02-12 14:49:17 +01:00
Aleksandra Martyniuk	f955a90309	test: fix test_remove_node_violating_rf_rack_with_rack_list test_remove_node_violating_rf_rack_with_rack_list creates a cluster with four nodes. One of the nodes is excluded, then another one is stopped, excluded, and removed. If the two stopped nodes were both voters, the majority is lost and the cluster loses its raft leader. As a result, the node cannot be removed and the operation times out. Add the 5th node to the cluster. This way the majority is always up. Fixes: https://github.com/scylladb/scylladb/issues/28596. Closes scylladb/scylladb#28610	2026-02-12 12:58:48 +02:00
Ferenc Szili	4ca40929ef	test: add read barrier to test_balance_empty_tablets The test creates a single node cluster, then creates 3 tables which remain empty. Then it adds another node with half the disk capacity of the first one, and then it waits for the balancer to migrate tablets to the newly added node by calling the quiesce topology API. The number of tablets on the smaller node should be exactly half the number of tablets on the larger node. After waiting for quiesce topology, we could have a situation where we query the number of tablets from the node which still hasn't processed the last tablet migrations and updated system.tablets. This patch adds a read barrier so that both nodes see the same tablets metadata before we query the number of tablets. Fixes: SCYLLADB-603 Closes scylladb/scylladb#28598	2026-02-12 11:16:34 +02:00
Karol Nowacki	079fe17e8b	vector_search: Fix missing timeout on TLS handshake Currently the TLS handshake in the vector search client does not have a timeout. This is because tls::connect does not perform handshake itself; the handshake is deferred until the first read/write operation is performed. This can lead to long hangs on ANN requests. This commit calls tls::check_session_is_resumed() after tls::connect to force the handshake to happen immediately and to run under with_timeout.	2026-02-12 10:08:37 +01:00
Karol Nowacki	aef5ff7491	vector_search: test: Fix flaky cert rewrite test The test is flaky most likely because when TLS certificate rewrite happens simultaneously with an ANN request, the handshake can hang for a long time (~60s). This leads to a timeout in the test case. This change introduces a checkpoint in the test so that it will wait for the certificate rewrite to happen before sending an ANN request, which should prevent the handshake from hanging and make the test more reliable. Fixes: #28012	2026-02-12 09:58:54 +01:00
Piotr Dulikowski	38c4a14a5b	Merge 'test: cluster: Fix test_sync_point' from Dawid Mędrek The test `test_sync_point` had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. --- Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. As a bonus, we rewrite the auxiliary code responsible for fetching metrics and manipulating sync points. Now it's asynchronous and uses the existing standard mechanisms available to developers. Furthermore, we reduce the time needed for executing `test_sync_point` by 27 seconds. --- The total difference in time needed to execute the whole test file (on my local machine, in dev mode): Before: CPU utilization: 0.9% real 2m7.811s user 0m25.446s sys 0m16.733s After: CPU utilization: 1.1% real 1m40.288s user 0m25.218s sys 0m16.566s --- Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203 Backport: This improves the stability of our CI, so let's backport it to all supported versions. Closes scylladb/scylladb#28602 * github.com:scylladb/scylladb: test: cluster: Reduce wait time in test_sync_point test: cluster: Fix test_sync_point test: cluster: Await sync points asynchronously test: cluster: Create sync points asynchronously test: cluster: Fetch hint metrics asynchronously	2026-02-12 09:34:09 +01:00
Dawid Mędrek	f83f911bae	test: cluster: Reduce wait time in test_sync_point If everything is OK, the sync point will not resolve with node 3 dead. As a result, the waiting will use all of the time we allocate for it, i.e. 30 seconds. That's a lot of time. There's no easy way to verify that the sync point will NOT resolve, but let's at least reduce the waiting to 3 seconds. If there's a bug, it should be enough to trigger it at some point, while reducing the average time needed for CI.	2026-02-10 17:05:02 +01:00
Dawid Mędrek	a256ba7de0	test: cluster: Fix test_sync_point The test had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203	2026-02-10 17:05:02 +01:00
Dawid Mędrek	c5239edf2a	test: cluster: Await sync points asynchronously There's a dedicated HTTP API for communicating with the cluster, so let's use it instead of yet another custom solution.	2026-02-10 17:05:02 +01:00
Dawid Mędrek	ac4af5f461	test: cluster: Create sync points asynchronously There's a dedicated HTTP API for communicating with the nodes, so let's use it instead of yet another custom solution.	2026-02-10 17:05:01 +01:00
Dawid Mędrek	628e74f157	test: cluster: Fetch hint metrics asynchronously There's a dedicated API for fetching metrics now. Let's use it instead of developing yet another solution that's also worse.	2026-02-10 17:04:59 +01:00
Pawel Pery	81d11a23ce	Revert "Merge 'vector_search: add validator tests' from Pawel Pery" This reverts commit `bcd1758911`, reversing changes made to `b2c2a99741`. There is a design decision to not introduce additional test orchestration tool for scylladb.git (see comments for #27499). One commit has already been reverted in `55c7bc7`. Last CI runs made validator test flaky, so it is a time to remove all remaining validator tests. It needs a backport to 2026.1 to remove remaining validator tests from there. Fixes: VECTOR-497 Closes scylladb/scylladb#28568	2026-02-08 16:29:58 +02:00
Avi Kivity	bb99bfe815	test: scylla_gdb: tighten check for Error output from gdb When running a gdb command, we check that the string 'Error' does not appear within the output. However, if the command output includes the string 'Error' as part of its normal operation, this generates a false positive. In fact the task_histogram can include the string 'error::Error' from the Rust core::error module. Allow for that and only match 'Error' that isn't 'error::Error'. Fixes #28516. Closes scylladb/scylladb#28574	2026-02-08 09:48:23 +02:00
Anna Stuchlik	dc8f7c9d62	doc: replace the OS Support page with a link to the new location We've moved that page to another place; see https://github.com/scylladb/scylladb/issues/28561. This commit replaces the page with the link to the new location and adds a redirection. Fixes https://github.com/scylladb/scylladb/issues/28561 Closes scylladb/scylladb#28562	2026-02-06 11:38:21 +02:00
Avi Kivity	7a3ce5f91e	test: minio: disable web console minio starts a web console on a random port. This was seen to interfere with the nodetool tests when the web console port clashed with the mock API port. Fix by disabling the web console. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-496 Closes scylladb/scylladb#28492	2026-02-05 20:11:32 +02:00
Nikos Dragazis	5d1e6243af	test/cluster: Remove short_tablet_stats_refresh_interval injection The test `test_size_based_load_balancing.py::test_balance_empty_tablets` waits for tablet load stats to be refreshed and uses the `short_tablet_stats_refresh_interval` injection to speed up the refresh interval. This injection has no effect; it was replaced by the `tablet_load_stats_refresh_interval_in_seconds` config option (patch: `1d6808aec4`), so the test currently waits for 60 seconds (default refresh interval). Use the config option. This reduces the execution time to ~8 seconds. Fixes SCYLLADB-556. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#28536	2026-02-05 20:11:32 +02:00
Pavel Emelyanov	10c278fff7	database: Remove _flush_sg member from replica::database This field is only used to initialize the following _memtable_controller one. It's simpler just to do the initialization with whatever value the field itself is initialized and drop the field itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28539	2026-02-05 13:02:35 +02:00
Petr Hála	a04dbac369	open-coredump: Change to use new backtrace * This is a breaking change, which removes compatibility with the old backtrace - See https://staging.backtrace.scylladb.com/api/docs#/default/search_by_build_id_search_build_id_post for the APIDoc * Add timestamp field to log * Tested locally Closes scylladb/scylladb#28325	2026-02-05 11:50:47 +02:00
Marcin Maliszkiewicz	0753d9fae5	Merge 'test: remove xfail marker from a few passing tests' from Nadav Har'El This patch fixes the few remaining cases of XPASS in test/cqlpy and test/alternator. These are tests which, when written, reproduced a bug and therefore were marked "xfail", but some time later the bug was fixed and we either did not notice it was ever fixed, or just forgot to remove the xfail marker. Removing the no-longer-needed xfail markers is good for test hygiene, but more importantly is needed to avoid regressions in those already-fixed areas (if a test is already marked xfail, it can start to fail in a new way and we wouldn't notice). Backport not needed, xpass doesn't bother anyone. Closes scylladb/scylladb#28441 * github.com:scylladb/scylladb: test/cqlpy: remove xfail from tests for fixed issue 7972 test/cqlpy: remove xfail from tests for fixed issue 10358 test/cqlpy: remove xfail from passing test testInvalidNonFrozenUDTRelation test/alternator: remove xfail from passing test_update_item_increases_metrics_for_new_item_size_only	2026-02-05 10:10:43 +01:00
Marcin Maliszkiewicz	6eca74b7bb	Merge 'More Alternator tests for BatchWriteItem' from Nadav Har'El The goal of this small pull request is to reproduce issue #28439, which found a bug in the Alternator Streams output when BatchWriteItem is called to write multiple items in the same partition, and always_use_lwt write isolation mode is used. * The first patch reproduces this specific bug in Alternator Streams. * The second patch adds missing (Fixes #28171) tests for BatchWriteItem in different write modes, and shows that BatchWriteItem itself works correctly - the bug is just in Alternator Streams' reporting of this write. Closes scylladb/scylladb#28528 * github.com:scylladb/scylladb: test/alternator: add test for BatchWriteItem with different write isolations test/alternator: reproducer for Alternator Streams bug	2026-02-05 10:07:29 +01:00
Yaron Kaikov	b30ecb72d5	ci: fix PR number extraction for unlabeled events When the workflow is triggered by removing the 'conflicts' label (pull_request_target unlabeled event), github.event.issue.number is not available. Use github.event.pull_request.number as fallback. Fixes: https://scylladb.atlassian.net/browse/RELENG-245 Closes scylladb/scylladb#28543	2026-02-05 08:41:43 +02:00
Michał Hudobski	6b9fcc6ca3	auth: add CDC streams and timestamps to vector search permissions It turns out that the cdc driver requires permissions to two additional system tables. This patch adds them to VECTOR_SEARCH_INDEXING and modifies the unit tests. The integration with vector store was tested manually, integration tests will be added in vector-store repository in a follow up PR. Fixes: SCYLLADB-522 Closes scylladb/scylladb#28519	2026-02-04 09:10:08 +01:00
Nadav Har'El	47e827262f	test/alternator: add test for BatchWriteItem with different write isolations Alternator's various write operations have different code paths for the different write isolation modes. Because most of the test suite runs in only a single write mode (currently - only_rmw_uses_lwt), we already introduced a test file test/alternator/test_write_isolation.py for checking the different write operations in all four write isolation modes. But we missed testing one write operation - BatchWriteItem. This operation isn't very "interesting" because it doesn't support any read-modify-option option (it doesn't support UpdateExpression, ConditionExpression or ReturnValues), but even without those, the pure write code still has different code paths with and without LWT, and should be tested. So we add the missing test here - and it passes. In issue #28439 we discovered a bug that can be seen in Alternator Streams in the case of BatchWriteItem with multiple writes to the same partition and always_use_lwt mode. The fact that the test added here passes shows that the bug is NOT in BatchWriteItem itself, which works correctly in this case - but only in the Alternator Streams layer. Fixes #28171 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-04 09:24:29 +02:00
Nadav Har'El	c63f43975f	test/alternator: reproducer for Alternator Streams bug This patch adds a reproducer for an Alternator Streams bug described in issue #28439, where the stream returns the wrong events (and fewer of them) in the following specific combination of the following circumstances: 1. A BatchWriteItem operation writing multiple items to the same partition. 2. The "always_use_lwt" write isolation mode is used. (the bug doesn't occur in other write isolation modes). We didn't catch this bug earlier because the Alternator Streams test we had for BatchWriteItem had multiple items in multiple partitions, and we missed the multiple-items-in-one-partition case. Moreover, today we run all the tests in only_rmw_uses_lwt mode (in the past, we did use always_use_lwt, but changed recently in commit `e7257b1393` following commit `76a766c` that changed test.py). As issue #28439 explains, the underlying cause of the bug is that the always_use_lwt causes the multiple items to be written with the same timestamp, which confused the Alternator Streams code reading the CDC log. The bug is not in BatchWriteItem itself, or in ScyllaDB CDC, but just in the Alternator Streams layer. The test in this patch is parameterized to run on each of the four write isolation modes, and currently fails (and so marked xfail) just for the one mode 'always_use_lwt'. The test is scylla_only, as its purpose is to checks the different write isolation mode - which don't exist in AWS DynamoDB. Refs #28439 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-04 09:17:48 +02:00
Radosław Cybulski	03ff091bee	alternator: improve events output when test failed Improve events printing, when test in test_streams.py failed. New code will print both expected and received events (keys, previous image, new image and type). New code will explicitly mark, at which output event comparison failed. Fixes #28455 Closes scylladb/scylladb#28476	2026-02-03 21:55:07 +02:00
Anna Stuchlik	a427ad3bf9	doc: remove the link to the Open Source blog post Fixes https://github.com/scylladb/scylladb/issues/28486 Closes scylladb/scylladb#28518	2026-02-03 14:15:16 +01:00
Botond Dénes	3adf8b58c4	Merge 'test: pylib: scylla_cluster: set shutdown_announce_in_ms to 0' from Patryk Jędrzejczak The usual Scylla shutdown in a cluster test takes ~2.1s. 2s come from ``` co_await sleep(std::chrono::milliseconds(_gcfg.shutdown_announce_ms)); ``` as the default value of `shutdown_announce_in_ms` is 2000. This sleep makes every `server_stop_gracefully` call 2s slower. There are ~300 such calls in cluster tests (note that some come from `rolling_restart`). So, it looks like this sleep makes cluster tests 300 * 2s = 10min slower. Indeed, `./test.py --mode=dev cluster` takes 61min instead of 71min on the potwor machine (the one in the Warsaw office) without it. We set `shutdown_announce_in_ms` to 0 for all cluster tests to make them faster. The sleep is completely unnecessary in tests. Removing it could introduce flakiness, but if that's the case, then the test for which it happens is incorrect in the first place. Tests shouldn't assume that all nodes receive and handle the shutdown message in 2s. They should use functions like `server_not_sees_other_server` instead, which are faster and more reliable. Improvement of the tests running time, so no backport. The fix of `test_tablets_parallel_decommission` may have to be backported to 2026.1, but it can be done manually. Closes scylladb/scylladb#28464 * github.com:scylladb/scylladb: test: pylib: scylla_cluster: set shutdown_announce_in_ms to 0 test: test_tablets_parallel_decommission: prevent group0 majority loss test: delete test_service_levels_work_during_recovery	2026-02-03 08:19:05 +02:00
Pavel Emelyanov	19ea05692c	view_build_worker: Do not switch scheduling groups inside work_on_view_building_tasks The handler appeared back in `c9e710dca3`. In this commit it performed the "core" part of the task -- the do_build_range() method -- inside the streaming sched group. The setup code looks seemingly was copied from the view_builder::do_build_step() method and got the explicit switch of the scheduling group. The switch looks both -- justified and not. On one hand, it makes it explict that the activity runs in the streaming scheduling group. On the other hand, the verb already uses RPC index on 1, which is negotiated to be run in streaming group anyway. On the "third hand", even though being explicit the switch happens too late, as there exists a lot of other activities performed by the handler that seems to also belong to the same scheduling group, but which is not switched into explicitly. By and large, it seems better to avoid the explicit switch and rely on the RPC-level negotiation-based sched group switching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28397	2026-02-03 07:00:32 +02:00
Anna Stuchlik	77480c9d8f	doc: fix the links on the repair-related pages This is a follow-up to https://github.com/scylladb/scylladb/pull/28199. This commit fixes the syntax of the internal links. Fixes https://github.com/scylladb/scylladb/issues/28486 Closes scylladb/scylladb#28487	2026-02-03 06:54:08 +02:00
Botond Dénes	64b38a2d0a	Merge 'Use gossiper scheduling group where needed' from Pavel Emelyanov This is the continuation of #28363 , this time about getting gossiper scheduling group via database. Several places that do it already have gossiper at hand and should better get the group from it. Eventually, this will allow to get rid of database::get_gossip_scheduling_group(). Refining inter-components API, not backporting Closes scylladb/scylladb#28412 * github.com:scylladb/scylladb: gossiper: Export its scheduling group for those who need it migration_manager: Reorder members	2026-02-03 06:51:31 +02:00
Nadav Har'El	48b01e72fa	test/alternator: add test verifying that keys only allow S/B/N type Recently we had a question whether key columns can have any supported type. I knew that actually - they can't, that key columns can have only the types S(tring), B(inary) or N(umber), and that is all. But it turns out we never had a test that confirms this understanding is true. We did have a test for it for GSI key types already, test_gsi.py::test_gsi_invalid_key_types, but we didn't have one for the base table. So in this patch we add this missing test, and confirm that, indeed, both DynamoDB and Alternator refuse a key attribute with any type other than S, B or N. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28479	2026-02-03 06:49:02 +02:00
Andrei Chekun	ed9a96fdb7	test.py: modify logic for adding function_path in JUnit Current way is checking only fail during the test phase, and it will miss the cases when fail happens on another phase. This PR eliminate this, so every phase will have modified node reporter to enrich the JUnit XML report with custom attribute function_path. Closes scylladb/scylladb#28462	2026-02-03 06:42:18 +02:00
Andrei Chekun	3a422e82b4	test.py: fix the file name in test summary Current way is always assumed that the error happened in the test file, but that not always true. This PR will show the error from the boost logger where actually error is happened. Closes scylladb/scylladb#28429	2026-02-03 06:38:21 +02:00
Benny Halevy	84caa94340	gossiper: add_expire_time_for_endpoint: replace fmt::localtime with gmtime in log printout 1. fmt::localtime is deprecated. 2. We should really print times in UTC, especially on the cloud. 3. The current log message does not print the timezone so it'd unclear to anyone reading the lof message if the expiration time is in the local timezone or in GMT/UTC. Fixes the following warning: ``` gms/gossiper.cc:2428:28: warning: 'localtime' is deprecated [-Wdeprecated-declarations] 2428 \| endpoint, fmt::localtime(clk::to_time_t(expire_time)), expire_time.time_since_epoch().count(), \| ^ /usr/include/fmt/chrono.h:538:1: note: 'localtime' has been explicitly marked deprecated here 538 \| FMT_DEPRECATED inline auto localtime(std::time_t time) -> std::tm { \| ^ /usr/include/fmt/base.h:207:28: note: expanded from macro 'FMT_DEPRECATED' 207 \| # define FMT_DEPRECATED [[deprecated]] \| ^ ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#28434	2026-02-03 06:36:53 +02:00
Pavel Emelyanov	8c42704c72	storage_service: Check raft rpc scheduling group from debug namespace Some storage_service rpc verbs may checks that a handler is executed inside gossiper scheduling group. For that, the expected group is grabbed from database. This patch puts the gossiper sched group into debug namespace and makes this check use it from there. It removes one more place that uses database as config provider. Refs #28410 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28427	2026-02-03 06:34:03 +02:00
Asias He	b5c3587588	repair: Add request type in the tablet repair log So we can know if the repair is an auto repair or a user repair. Fixes SCYLLADB-395 Closes scylladb/scylladb#28425	2026-02-03 06:26:58 +02:00

1 2 3 4 5 ...

51919 Commits