scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Nikos Dragazis	bafe2bbbbc	db/config: Deprecate sstable_compression_dictionaries_allow_in_ddl The option is a knob that allows to reject dictionary-aware compressors in the validation stage of CREATE/ALTER statements, and in the validation of `sstable_compression_user_table_options`. It was introduced in `7d26d3c7cb` to allow the admins of Scylla Cloud to selectively enable it in certain clusters. For more details, check: https://github.com/scylladb/scylla-enterprise/issues/5435 As of this series, we want to start offering dictionary compression as the default option in all clusters, i.e., treat it as a generally available feature. This makes the knob redundant. Additionally, making dictionary compression the default choice in `sstable_compression_user_table_options` creates an awkward dependency with the knob (disabling the knob should cause `sstable_compression_user_table_options` to fall back to a non-dict compressor as default). That may not be very clear to the end user. For these reasons, mark the option as "Deprecated", remove all relevant tests, and adjust the business logic as if dictionary compression is always available. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `96e727d7b9`)	2025-11-04 15:40:46 +02:00
Nikos Dragazis	260c9972b0	boost/cql_query_test: Get expected compressor from config Since `5b6570be52`, the default SSTable compression algorithm for user tables is no longer hardcoded; it can be configured via the `sstable_compression_user_table_options.sstable_compression` option in scylla.yaml. Modify the `test_table_compression` test to get the expected value from the configuration. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `d95ebe7058`)	2025-10-31 23:50:20 +00:00
Michael Litvak	59f97d0b71	test: cdc: extend cdc with tablets tests extend and improve the tests of virtual tables for cdc with tablets. split the existing virtual tables test to one test that validates the virtual tables against the internal cdc tables, and triggering some tablet splits in order to create entries in the cdc_streams_history table, and add another test with basic validation of the virtual tables when there are multiple cdc tables. (cherry picked from commit `4cc0a80b79`)	2025-10-30 02:44:47 +00:00
Pavel Emelyanov	080c55a115	lister: Fix race between readdir and stat Sometimes file::list_directory() returns entries without type set. In thase case lister calls file_type() on the entry name to get it. In case the call returns disengated type, the code assumes that some error occurred and resolves into exception. That's not correct. The file_type() method returns disengated type only if the file being inspected is missing (i.e. on ENOENT errno). But this can validly happen if a file is removed bettween readdir and stat. In that case it's not "some error happened", but a enry should be just skipped. In "some error happened", then file_type() would resolve into exceptional future on its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26595 (cherry picked from commit `d9bfbeda9a`) Closes scylladb/scylladb#26767	2025-10-29 11:29:57 +02:00
Patryk Jędrzejczak	680bfa9ab7	test: test_raft_recovery_stuck: reconnect driver after rolling restarts It turns out that #21477 wasn't sufficient to fix the issue. The driver may still decide to reconnect the connection after `rolling_restart` returns. One possible explanation is that the driver sometimes handles the DOWN notification after all nodes consider each other UP. Reconnecting the driver after restarting nodes seems to be a reliable workaround that many tests use. We also use it here. Fixes #19959 Closes scylladb/scylladb#26638 (cherry picked from commit `5321720853`) Closes scylladb/scylladb#26763	2025-10-29 11:27:49 +02:00
Botond Dénes	aac49601c6	Merge '[Backport 2025.4] cdc: garbage collect CDC streams for tablets' from Scylladb[bot] introduce helper functions that can be used for garbage collecting old cdc streams for tablets-based keyspaces. add a background fiber to the topology coordinator that runs periodically and checks for old CDC streams for tablets keyspaces that can be garbage collected. the garbage collection works by finding the newest cdc timestamp that has been closed for more than the configured cdc TTL, and removing all information from the cdc internal tables about cdc timestamps and streams up to this timestamp. in general it should be safe to remove information about these streams because they are closed for more than TTL, therefore all rows that were written to these streams with the configured TTL should be dead. the exception is if the TTL is altered to a smaller value, and then we may remove information about streams that still have live rows that were written with the longer ttl. Fixes https://github.com/scylladb/scylladb/issues/26669 - (cherry picked from commit `440caeabcb`) - (cherry picked from commit `6109cb66be`) Parent PR: #26410 Closes scylladb/scylladb#26728 * github.com:scylladb/scylladb: cdc: garbage collect CDC streams periodically cdc: helpers for garbage collecting old streams for tablets	2025-10-29 11:25:31 +02:00
Botond Dénes	087f739bf9	Merge '[Backport 2025.4] alternator/executor: instantly mark view as built when creating it with base table' from Scylladb[bot] `CreateTable` request creates GSI/LSI together with the base table, the base table is empty and we don't need to actually build the view. In tablet-based keyspaces we can just don't create view building tasks and mark the view build status as SUCCESS on all nodes. Then, the view building worker on each node will mark the view as built in `system.built_views` (`view_building_worker::update_built_views()`). Vnode-based keyspaces will use the "old" logic of view builder, which will process the view and mark it as built. Fixes scylladb/scylladb#26615 This fix should be backported to 2025.4. - (cherry picked from commit `8fbf122277`) - (cherry picked from commit `bdab455cbb`) - (cherry picked from commit `34503f43a1`) Parent PR: #26657 Closes scylladb/scylladb#26670 * github.com:scylladb/scylladb: test/alternator/test_tablets: add test for GSI backfill with tablets test/alternator/test_tablets: add reproducer for GSI with tablets alternator/executor: instantly mark view as built when creating it with base table	2025-10-29 11:21:27 +02:00
Michael Litvak	5319759bdb	cdc: garbage collect CDC streams periodically add a background fiber to the topology coordinator that runs periodically and checks for old CDC streams for tablets keyspaces that can be garbage collected. (cherry picked from commit `6109cb66be`)	2025-10-27 19:53:04 +00:00
Michael Litvak	55d9d5e7c2	cdc: helpers for garbage collecting old streams for tablets introduce helper functions that can be used for garbage collecting old cdc streams for tablets-based keyspaces. - get_new_base_for_gc: finds a new base timestamp given a TTL, such that all older timestamps and streams can be removed. - get_cdc_stream_gc_mutations: given new base timestamp and streams, builds mutations that update the internal cdc tables and remove the older streams. - garbage_collect_cdc_streams_for_table: combines the two functions above to find a new base and build mutations to update it for a specific table - garbage_collect_cdc_streams: builds gc mutations for all cdc tables (cherry picked from commit `440caeabcb`)	2025-10-27 19:53:04 +00:00
Patryk Jędrzejczak	c406e1dd17	Merge '[Backport 2025.4] raft topology: fix group0 tombstone GC in the Raft-based recovery procedure' from Scylladb[bot] Group0 tombstone GC considers only the current group 0 members while computing the group 0 tombstone GC time. It's not enough because in the Raft-based recovery procedure, there can be nodes that haven't joined the current group 0 yet, but they have belonged to a different group 0 and thus have a non-empty group 0 state ID. The current code can cause a data resurrection in group 0 tables. We fix this issue in this PR and add a regression test. This issue was uncovered by `test_raft_recovery_entry_loss`, which became flaky recently. We skipped this test for now. We will unskip it in a following PR because it's skipped only on master, while we want to backport this PR. Fixes #26534 This PR contains an important bugfix, so we should backport it to all branches with the Raft-based recovery procedure (2025.2 and newer). - (cherry picked from commit `1d09b9c8d0`) - (cherry picked from commit `6b2e003994`) - (cherry picked from commit `c57f097630`) Parent PR: #26612 Closes scylladb/scylladb#26682 * https://github.com/scylladb/scylladb: test: test group0 tombstone GC in the Raft-based recovery procedure group0_state_id_handler: remove unused group0_server_accessor group0_state_id_handler: consider state IDs of all non-ignored topology members	2025-10-27 10:15:49 +01:00
Petr Gusev	41f8f6b571	test_tablets_lwt: add test_tablets_merge_waits_for_lwt (cherry picked from commit `03d6829783`)	2025-10-24 12:22:20 +02:00
Petr Gusev	31e4bb1bc3	test.py: add universalasync_typed_wrap The universalasync.wrap function doesn't preserve the type information, which confuses the VS Code Pylance plugin and makes code navigation hard. In this commit we fix the problem by adding a typed wrapped around universalasync.wrap. Fixes: scylladb/scylladb#26639 (cherry picked from commit `33e9ea4a0f`)	2025-10-24 12:21:21 +02:00
Petr Gusev	a09c1b355e	topology_coordinator: fix log message (cherry picked from commit `e1667afa50`)	2025-10-24 12:21:21 +02:00
Pawel Pery	67e0c8e4b0	vector_search: fix flaky dns_refresh_aborted test The test process like that: - run long dns refresh process - request for the resolve hostname with short abort_source timer - result should be empty list, because of aborted request The test sometimes finishes long dns refresh before abort_source fired and the result list is not empty. There are two issues. First, as.reset() changes the abort_source timeout. The patch adds a get() method to the abort_source_timeout class, so there is no change in the abort_source timeout. Second, a sleep could be not reliable. The patch changes the long sleep inside a dns refresh lambda into condition_variable handling, to properly signal the end of the dns refresh process. Fixes: #26561 Fixes: VECTOR-268 It needs to be backported to 2025.4 Closes scylladb/scylladb#26566 (cherry picked from commit `10208c83ca`) Closes scylladb/scylladb#26598	2025-10-23 11:24:32 +02:00
Piotr Dulikowski	03d57bae80	Merge '[Backport 2025.4] storage_proxy: wait for write handlers destruction' from Scylladb[bot] `shared_ptr<abstract_write_response_handler>` instances are captured in the `lmutate` and `rmutate` lambdas of `send_to_live_endpoints()`. As a result, an `abstract_write_response_handler` object may outlive its removal from the `storage_proxy::_response_handlers` map -> `cancel_all_write_response_handlers()` doesn't actually wait for requests completion -> `sp::drain_on_shutdown()` doesn't guarantee all requests are drained -> `sp::stop_remote()` completes too early and `paxos_store` is destroyed while LWT local writes might still be in progress. In this PR we introduce a `write_handler_destroy_promise` to wait for such pending instances in `cancel_write_handlers()` and `cancel_all_write_response_handlers()` to prevent the `use-after-free`. A better long-term solution might be to replace `shared_ptr` with `unique_ptr` for `abstract_write_response_handler` and use a separate gate to track the `lmutate/rmutate` lambdas. We do not actually need to wait for these lambdas to finish before sending a timeout or error response to the client, as we currently do in `~abstract_write_response_handler`. Fixes scylladb/scylladb#26355 backport: need to be backported to 2025.4 since #26355 is reproduced on LWT over tablets - (cherry picked from commit `bf2ac7ee8b`) - (cherry picked from commit `b269f78fa6`) - (cherry picked from commit `bbcf3f6eff`) - (cherry picked from commit `8925f31596`) Parent PR: #26408 Closes scylladb/scylladb#26658 * github.com:scylladb/scylladb: test_tablets_lwt: add test_lwt_shutdown storage_proxy: wait for write handler destruction storage_proxy: coroutinize cancel_write_handlers storage_proxy: cancel_write_handlers: don't hold a strong pointer to handler	2025-10-23 10:49:52 +02:00
Patryk Jędrzejczak	76560ca095	test: test group0 tombstone GC in the Raft-based recovery procedure We add a regression test for the bug fixed in the previous commits. (cherry picked from commit `c57f097630`)	2025-10-22 17:13:34 +00:00
Andrei Chekun	d1274f01aa	test.py: rewrite the wait_for_first_completed Rewrite wait_for first_completed to return only first completed task guarantee of awaiting(disappearing) all cancelled and finished tasks Use wait_for_first_completed to avoid false pass tests in the future and issues like #26148 Use gather_safely to await tasks and removing warning that coroutine was not awaited Closes scylladb/scylladb#26435 (cherry picked from commit `24d17c3ce5`) Closes scylladb/scylladb#26663	2025-10-22 18:12:52 +02:00
Michael Litvak	aa2065fe2e	storage_service: improve colocated repair error to show table names When requesting repair for tablets of a colocated table, the request fails with an error. Improve the error message to show the table names instead of table IDs, because the table names are more useful for users. Fixes scylladb/scylladb#26567 Closes scylladb/scylladb#26568 (cherry picked from commit `b808d84d63`) Closes scylladb/scylladb#26624	2025-10-22 15:25:15 +02:00
Tomasz Grabiec	0621a8aee5	Merge '[Backport 2025.4] Synchronize tablet split and load-and-stream' from Scylladb[bot] Load-and-stream is broken when running concurrently to the finalization step of tablet split. Consider this: 1) split starts 2) split finalization executes barrier and succeed 3) load-and-stream runs now, starts writing sstable (pre-split) 4) split finalization publishes changes to tablet metadata 5) load-and-stream finishes writing sstable 6) sstable cannot be loaded since it spans two tablets two possible fixes (maybe both): 1) load-and-stream awaits for topology to quiesce 2) perform split compaction on sstable that spans both sibling tablets This patch implements # 1. By awaiting for topology to quiesce, we guarantee that load-and-stream only starts when there's no chance coordinator is handling some topology operation like split finalization. Fixes https://github.com/scylladb/scylladb/issues/26455. - (cherry picked from commit `3abc66da5a`) - (cherry picked from commit `4654cdc6fd`) Parent PR: #26456 Closes scylladb/scylladb#26651 * github.com:scylladb/scylladb: test: Add reproducer for l-a-s and split synchronization issue sstables_loader: Synchronize tablet split and load-and-stream	2025-10-22 14:23:04 +02:00
Michał Jadwiszczak	f6dde0aa4b	test/alternator/test_tablets: add test for GSI backfill with tablets The test should pass without the fix for scylladb/scylladb#26615, because the `executor::updata_table()` uses `service::prepare_new_view_announcement()`, which creates view building tasks for the view. But it's better to add this test. (cherry picked from commit `34503f43a1`)	2025-10-22 10:51:55 +00:00
Michał Jadwiszczak	207c273b29	test/alternator/test_tablets: add reproducer for GSI with tablets (cherry picked from commit `bdab455cbb`)	2025-10-22 10:51:54 +00:00
Pavel Emelyanov	45341ca246	Merge '[Backport 2025.4] s3_client: handle failures which require http::request updating' from Scylladb[bot] Apply two main changes to the s3_client error handling 1. Add a loop to s3_client's `make_request` for the case whe the retry strategy will not help since the request itself have to be updated. For example, authentication token expiration or timestamp on the request header 2. Refine the way we handle exceptions in the `chunked_download_source` background fiber, now we carry the original `exception_ptr` and also we wrap EVERY exception in `filler_exception` to prevent retry strategy trying to retry the request altogether Fixes: https://github.com/scylladb/scylladb/issues/26483 Should be ported back to 2025.3 and 2025.4 to prevent deadlocks and failures in these versions - (cherry picked from commit `55fb2223b6`) - (cherry picked from commit `db1ca8d011`) - (cherry picked from commit `185d5cd0c6`) - (cherry picked from commit `116823a6bc`) - (cherry picked from commit `43acc0d9b9`) - (cherry picked from commit `58a1cff3db`) - (cherry picked from commit `1d34657b14`) - (cherry picked from commit `4497325cd6`) - (cherry picked from commit `fdd0d66f6e`) Parent PR: #26527 Closes scylladb/scylladb#26650 * github.com:scylladb/scylladb: s3_client: tune logging level s3_client: add logging s3_client: improve exception handling for chunked downloads s3_client: fix indentation s3_client: add max for client level retries s3_client: remove `s3_retry_strategy` s3_client: support high-level request retries s3_client: just reformat `make_request` s3_client: unify `make_request` implementation	2025-10-22 11:33:53 +03:00
Petr Gusev	01658f9fcb	test_tablets_lwt: add test_lwt_shutdown (cherry picked from commit `8925f31596`)	2025-10-22 00:10:59 +00:00
Raphael S. Carvalho	92a603699e	test: Add reproducer for l-a-s and split synchronization issue Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `4654cdc6fd`)	2025-10-21 12:26:55 +00:00
Ernest Zaslavsky	94d49da8ec	s3_client: improve exception handling for chunked downloads Refactor the wrapping exception used in `chunked_download_source` to prevent the retry strategy from reattempting failed requests. The new implementation preserves the original `exception_ptr`, making the root cause clearer and easier to diagnose. (cherry picked from commit `1d34657b14`)	2025-10-21 12:26:50 +00:00
Lakshmi Narayanan Sreethar	45b9675d28	compaction: fix use after free when strategy is altered during compaction The `compaction_strategy_state` class holds strategy specific state via a `std::variant` containing different state types. When a compaction strategy performs compaction, it retrieves a reference to its state from the `compaction_strategy_state` object. If the table's compaction strategy is ALTERed while a compaction is in progress, the `compaction_strategy_state` object gets replaced, destroying the old state. This leaves the ongoing compaction holding a dangling reference, resulting in a use after free. Fix this by using `seastar::shared_ptr` for the state variant alternatives(`leveled_compaction_strategy_state_ptr` and `time_window_compaction_strategy_state_ptr`). The compaction strategies now hold a copy of the shared_ptr, ensuring the state remains valid for the duration of the compaction even if the strategy is altered. The `compaction_strategy_state` itself is still passed by reference and only the variant alternatives use shared_ptrs. This allows ongoing compactions to retain ownership of the state independently of the wrapper's lifetime. Fixes #25913 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `18c071c94b`)	2025-10-21 00:59:33 +00:00
Botond Dénes	99f2dd02bf	Merge '[Backport 2025.4] raft topology: disable schema pulls in the Raft-based recovery procedure' from Scylladb[bot] Schema pulls should always be disabled when group 0 is used. However, `migration_manager::disable_schema_pulls()` is never called during a restart with `recovery_leader` set in the Raft-based recovery procedure, which causes schema pulls to be re-enabled on all live nodes (excluding the nodes replacing the dead nodes). Moreover, schema pulls remain enabled on each node until the node is restarted, which could be a very long time. We fix this issue and add a regression test in this PR. Fixes #26569 This is an important bug fix, so it should be backported to all branches with the Raft-based recovery procedure (2025.2 and newer branches). - (cherry picked from commit `ec3a35303d`) - (cherry picked from commit `da8748e2b1`) - (cherry picked from commit `71de01cd41`) Parent PR: #26572 Closes scylladb/scylladb#26599 * github.com:scylladb/scylladb: test: test_raft_recovery_entry_loss: fix the typo in the test case name test: verify that schema pulls are disabled in the Raft-based recovery procedure raft topology: disable schema pulls in the Raft-based recovery procedure	2025-10-20 10:39:52 +03:00
Botond Dénes	76a6a059c8	Merge '[Backport 2025.4] Fix vector store client flaky test' from Scylladb[bot] This series of patches improves test vector_store_client_test stability. The primary issue with flaky connections was discovered while working on PR #26308. Key Changes: - Fixes premature connection closures in the mock server: The mock HTTP server was not consuming request payloads, causing it to close connections immediately after a response. Subsequent tests attempting to reuse these closed connections would fail intermittently, leading to flakiness. The server has been updated to handle payloads correctly. - Removes a retry workaround: With the underlying connection issue resolved, the retry logic in the vector_store_client_test_ann_request test is no longer needed and has been removed. - Mocks the DNS resolver in tests: The vector_store_client_uri_update_to_invalid test has been corrected to mock DNS lookups, preventing it from making real network requests. - Corrects request timeout handling: A bug has been fixed where the request timeout was not being reset between consecutive requests. - Unifies test timeouts: Timeouts have been standardized across the test suite for consistency. Fixes: #26468 It is recommended to backport this series to the 2025.4 branch. Since these changes only affect test code and do not alter any production logic, the backport is safe. Addressing this test flakiness will improve the stability of the CI pipeline and prevent it from blocking unrelated patches. - (cherry picked from commit `ac5e9c34b6`) - (cherry picked from commit `2eb752e582`) - (cherry picked from commit `d99a4c3bad`) - (cherry picked from commit `0de1fb8706`) - (cherry picked from commit `62deea62a4`) Parent PR: #26374 Closes scylladb/scylladb#26551 * github.com:scylladb/scylladb: vector_search: Unify test timeouts vector_search: Fix missing timeout reset vector_search: Refactor ANN request test vector_search: Fix flaky connection in tests vector_search: Fix flaky test by mocking DNS queries	2025-10-20 10:35:45 +03:00
Michał Chojnowski	6ff4910d96	test/cluster/test_bti_index.py: avoid a race with CQL tracing The test uses CQL tracing to check which files were read by a query. This is flaky if the coordinator and the replica are different shards, because the Python driver only waits for the coordinator, and not for replicas, to finish writing their traces. (So it might happen that the Python driver returns a result with only coordinator events and no replica events). Let's just dodge the issue by using --smp=1. Fixes scylladb/scylladb#26432 Closes scylladb/scylladb#26434 (cherry picked from commit `c35b82b860`) Closes scylladb/scylladb#26492	2025-10-20 10:32:58 +03:00
Michał Jadwiszczak	f5e76d0fcb	test/cluster/test_view_building_coordinator: skip reproducer instead of xfail The reproducer for issue scylladb/scylladb#26244 takes some time and since the test is failing, there is no point in wasting resources on it. We can change the xfail mark to skip. Refs scylladb/scylladb#26244 Closes scylladb/scylladb#26350 (cherry picked from commit `d92628e3bd`) Closes scylladb/scylladb#26365	2025-10-20 10:30:34 +03:00
Aleksandra Martyniuk	2819b8b755	test: wait for cql in test_two_tablets_concurrent_repair_and_migration_repair_writer_level In test_two_tablets_concurrent_repair_and_migration_repair_writer_level safe_rolling_restart returns ready cql. However, get_all_tablet_replicas uses the cql reference from manager that isn't ready. Wait for cql. Fixes: #26328 Closes scylladb/scylladb#26349 (cherry picked from commit `0e73ce202e`) Closes scylladb/scylladb#26362	2025-10-20 10:29:56 +03:00
Patryk Jędrzejczak	323a7b8c55	test: test_raft_recovery_entry_loss: fix the typo in the test case name (cherry picked from commit `71de01cd41`)	2025-10-17 10:27:33 +00:00
Patryk Jędrzejczak	cd0bb11eef	test: verify that schema pulls are disabled in the Raft-based recovery procedure We do this at the end of `test_raft_recovery_entry_loss`. It's not worth to add a separate regression test, as tests of the recovery procedure are complicated and have a long running time. Also, we choose `test_raft_recovery_entry_loss` out of all tests of the recovery procedure because it does some schema changes. (cherry picked from commit `da8748e2b1`)	2025-10-17 10:27:32 +00:00
Artsiom Mishuta	de5a13db28	test.py: reintroducing sudo in resource_gather.py conditionally reintroducing sudo for resource gathering when running under docker related: https://github.com/scylladb/scylladb/pull/26294#issuecomment-3346968097 fixes: https://github.com/scylladb/scylladb/issues/26312 Closes scylladb/scylladb#26401 (cherry picked from commit `99455833bd`) Closes scylladb/scylladb#26473	2025-10-17 09:27:13 +03:00
Michał Chojnowski	de8c2a8196	test/boost/sstable_compressor_factory_test: fix thread-unsafe usage of Boost.Test It turns out that Boost assertions are thread-unsafe, (and can't be used from multiple threads concurrently). This causes the test to fail with cryptic log corruptions sometimes. Fix that by switching to thread-safe checks. Fixes scylladb/scylladb#24982 Closes scylladb/scylladb#26472 (cherry picked from commit `7c6e84e2ec`) Closes scylladb/scylladb#26554	2025-10-15 12:08:54 +03:00
Karol Nowacki	da8bd30a5b	vector_search: Unify test timeouts The test previously used separate timeouts for requests (5s) and the overall test case (10s). This change unifies both timeouts to 10 seconds. (cherry picked from commit `62deea62a4`)	2025-10-14 22:49:42 +00:00
Karol Nowacki	4e9a42f343	vector_search: Fix missing timeout reset The `vector_store_client_test` could be flaky because the request timeout was not consistently reset in all code paths. This could lead to a timeout from a previous operation firing prematurely and failing the test. The fix ensures `abort_source_timeout` is reset before each request. The implementation is also simplified by changing `abort_source_timeout::reset` that combines the reset and arm operations into a same invocation. (cherry picked from commit `0de1fb8706`)	2025-10-14 22:49:42 +00:00
Karol Nowacki	6db7481c7a	vector_search: Refactor ANN request test Refactor the `vector_store_client_test_ann_request` test to use the `vs_mock_server` class, unifying the structure of the test cases. This change also removes retry logic that waited for the server to be ready. This is no longer necessary because the handler now exists for all index names and consumes the entire request payload, preventing connection closures. Previously, the server did not handle requests for unconfigured indexes, which caused the connection to close. This could lead to a race condition where the client would attempt to reuse a closed connection. (cherry picked from commit `d99a4c3bad`)	2025-10-14 22:49:42 +00:00
Karol Nowacki	62a5d4f932	vector_search: Fix flaky connection in tests The vector store mock server was not reading the ANN request body, which could cause it to prematurely close the connection. This could lead to a race condition where the client attempts to reuse a closed connection from its pool, resulting in a flaky test. The fix is to always read the request body in the mock server. (cherry picked from commit `2eb752e582`)	2025-10-14 22:49:42 +00:00
Karol Nowacki	f5319b06ae	vector_search: Fix flaky test by mocking DNS queries The `vector_store_client_uri_update_to_invalid` test was flaky because it performed real DNS lookups, making it dependent on the network environment. This commit replaces the live DNS queries with a mock to make the test hermetic and prevent intermittent failures. `vector_search_metrics_test` test did not call configure{vs}, as a consequence the test did real DNS queries, which made the test flaky. The refreshes counter increment has been moved before the call to the resolver. In tests, the resolver is mocked leading to lack of increments in production code. Without this change, there is no way to test DNS counter increments. The change also simplifies the test making it more readable. (cherry picked from commit `ac5e9c34b6`)	2025-10-14 22:49:42 +00:00
Piotr Wieczorek	c191c31682	alternator: Correct RCU undercount in BatchGetItem The `describe_multi_item` function treated the last reference-captured argument as the number of used RCU half units. The caller `batch_get_item`, however, expected this parameter to hold an item size. This RCU value was then passed to `rcu_consumed_capacity_counter::get_half_units`, treating the already-calculated RCU integer as if it were a size in bytes. This caused a second conversion that undercounted the true RCU. During conversion, the number of bytes is divided by `RCU_BLOCK_SIZE_LENGTH` (=4KB), so the double conversion divided the number of bytes by 16 MB. The fix removes the second conversion in `describe_multi_item` and changes the API of `describe_multi_item`. Fixes: https://github.com/scylladb/scylladb/pull/25847 Closes scylladb/scylladb#25842 (cherry picked from commit `a55c5e9ec7`) Closes scylladb/scylladb#26539	2025-10-14 11:53:09 +03:00
Pavel Emelyanov	e18072d4b8	Merge '[Backport 2025.4] service/qos: set long timeout for auth queries on SL cache update' from Scylladb[bot] pass an appropriate query state for auth queries called from service level cache reload. we use the function qos_query_state to select a query_state based on caller context - for internal queries, we set a very long timeout. the service level cache reload is called from group0 reload. we want it to have a long timeout instead of the default 5 seconds for auth queries, because we don't have strict latency requirement on the one hand, and on the other hand a timeout exception is undesired in the group0 reload logic and can break group0 on the node. Fixes https://github.com/scylladb/scylladb/issues/25290 backport possible to improve stability - (cherry picked from commit `a1161c156f`) - (cherry picked from commit `3c3dd4cf9d`) - (cherry picked from commit `ad1a5b7e42`) Parent PR: #26180 Closes scylladb/scylladb#26479 * github.com:scylladb/scylladb: service/qos: set long timeout for auth queries on SL cache update auth: add query_state parameter to query functions auth: refactor query_all_directly_granted	2025-10-13 15:26:21 +03:00
Patryk Jędrzejczak	b5c3e2465f	test: test_raft_no_quorum: test_can_restart: deflake the read barrier call Expecting the group 0 read barrier to succeed with a timeout of 1s, just after restarting 3 out of 5 voters, turned out to be flaky. In some unlikely scenarios, such as multiple vote splits, the Raft leader election could finish after the read barrier times out. To deflake the test, we increase the timeout of Raft operations back to 300s for read barriers we expect to succeed. Fixes #26457 Closes scylladb/scylladb#26489 (cherry picked from commit `5f68b9dc6b`) Closes scylladb/scylladb#26522	2025-10-12 21:02:02 +03:00
Asias He	3cae4a21ab	repair: Rename incremental mode name Using the name regular as the incremental mode could be confusing, since regular might be interpreted as the non-incremental repair. It is better to use incremental directly. Before: - regular (standard incremental repair) - full (full incremental repair) - disabled (incremental repair disabled) After: - incremental (standard incremental repair) - full (full incremental repair) - disabled (incremental repair disabled) Fixes #26503 Closes scylladb/scylladb#26504 (cherry picked from commit `13dd88b010`) Closes scylladb/scylladb#26521	2025-10-12 21:01:05 +03:00
Piotr Dulikowski	1f73e18eaf	Merge '[Backport 2025.4] db/view: Require rf_rack_valid_keyspaces when creating materialized view' from Scylladb[bot] Materialized views are currently in the experimental phase and using them in tablet-based keyspaces requires starting Scylla with an experimental feature, `views-with-tablets`. Any attempts to create a materialized view or secondary index when it's not enabled will fail with an appropriate error. After considerable effort, we're drawing close to bringing views out of the experimental phase, and the experimental feature will no longer be needed. However, materialized views in tablet-based keyspaces will still be restricted, and creating them will only be possible after enabling the configuration option `rf_rack_valid_keyspaces`. That's what we do in this PR. In this patch, we adjust existing tests in the tree to work with the new restriction. That shouldn't have been necessary because we've already seemingly adjusted all of them to work with the configuration option, but some tests hid well. We fix that mistake now. After that, we introduce the new restriction. What's more, when starting Scylla, we verify that there is no materialized view that would violate the contract. If there are some that do, we list them, notify the user, and refuse to start. High-level implementation strategy: 1. Name the restrictions in form of a function. 2. Adjust existing tests. 3. Restrict materialized views by both the experimental feature and the configuration option. Add validation test. 4. Drop the requirement for the experimental feature. Adjust the added test and add a new one. 5. Update the user documentation. Fixes scylladb/scylladb#23030 Backport: 2025.4, as we are aiming to support materialized views for tablets from that version. - (cherry picked from commit `a1254fb6f3`) - (cherry picked from commit `d6fcd18540`) - (cherry picked from commit `994f09530f`) - (cherry picked from commit `6322b5996d`) - (cherry picked from commit `71606ffdda`) - (cherry picked from commit `00222070cd`) - (cherry picked from commit `288be6c82d`) - (cherry picked from commit `b409e85c20`) Parent PR: #25802 Closes scylladb/scylladb#26416 * github.com:scylladb/scylladb: view: Stop requiring experimental feature db/view: Verify valid configuration for tablet-based views db/view: Require rf_rack_valid_keyspaces when creating view test/cluster/random_failures: Skip creating secondary indexes test/cluster/mv: Mark test_mv_rf_change as skipped test/cluster: Adjust MV tests to RF-rack-validity test/boost/schema_loader_test.cc: Explicitly enable rf_rack_valid_keyspaces db/view: Name requirement for views with tablets	2025-10-12 08:20:20 +02:00
Michael Litvak	3a9eb9b65f	auth: add query_state parameter to query functions add a query_state parameter to several auth functions that execute internal queries. currently the queries use the internal_distributed_query_state() query state, and we maintain this as default, but we want also to be able to pass a query state from the caller. in particular, the auth queries currently use a timeout of 5 seconds, and we will want to set a different timeout when executed in some different context. (cherry picked from commit `3c3dd4cf9d`)	2025-10-09 12:48:45 +00:00
Michał Chojnowski	22d3ee5670	sstables/trie: actually apply BYPASS CACHE to index reads BYPASS CACHE is implemented for `bti_index_reader` by giving it its own private `cached_file` wrappers over Partitions.db and Rows.db, instead of passing it the shared `cached_file` owned by the sstable. But due to an oversight, the private `cached_file`s aren't constructed on top of the raw Partitions.db and Rows.db files, but on top of `cached_file_impl` wrappers around those files. Which means that BYPASS CACHE doesn't actually do its job. Tests based on `scylla_index_page_cache_*` metrics and on CQL tracing still see the reads from the private files as "cache misses", but those misses are served from the shared cached files anyway, so the tests don't see the problem. In this commit we extend `test_bti_index.py` with a check that looks at reactor's `io_queue` metrics instead, and catches the problem. Fixes scylladb/scylladb#26372 Closes scylladb/scylladb#26373 (cherry picked from commit `dbddba0794`) Closes scylladb/scylladb#26424	2025-10-09 13:17:29 +03:00
Dawid Mędrek	2bdf792f8e	view: Stop requiring experimental feature We modify the requirements for using materialized views in tablet-based keyspaces. Before, it was necessary to enable the configuration option `rf_rack_valid_keyspaces`, having the cluster feature `VIEWS_WITH_TABLETS` enabled, and using the experimental feature `views-with-tablets`. We drop the last requirement. We adjust code to that change and provide a new validation test. We also update the user documentation to reflect the changes. Fixes scylladb/scylladb#23030 (cherry picked from commit `b409e85c20`)	2025-10-06 13:19:54 +00:00
Dawid Mędrek	2e2d1f17bb	db/view: Verify valid configuration for tablet-based views Creating a materialized view or a secondary index in a tablet-based keyspace requires that the user enabled two options: * experimental feature `views-with-tablets`, * configuration option `rf_rack_vaid_keyspaces`. Because the latter has only become a necessity recently (in this series), it's possible that there are already existing materialized views that violate it. We add a new check at start-up that iterates over existing views and makes sure that that is not the case. Otherwise, Scylla notifies the user of the problem. (cherry picked from commit `288be6c82d`)	2025-10-06 13:19:54 +00:00
Dawid Mędrek	e9aba62cc5	db/view: Require rf_rack_valid_keyspaces when creating view We extend the requirements for being able to create materialized views and secondary indexes in tablet-based keyspaces. It's now necessary to enable the configuration option `rf_rack_valid_keyspaces`. This is a stepping stone towards bringing materialized views and secondary indexes with tablets out of the experimental phase. We add a validation test to verify the changes. Refs scylladb/scylladb#23030 (cherry picked from commit `00222070cd`)	2025-10-06 13:19:54 +00:00

1 2 3 4 5 ...

9793 Commits