scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Michael Litvak	bd66edee5c	logstor: truncate table implement freeing all segments of a table for table truncate. first do barrier to flush all active and mixed segments and put all the table's data in compaction groups, then stop compaction for the table, then free the table's segments and remove the live entries from the index.	2026-03-18 19:24:27 +01:00
Michael Litvak	37c485e3d1	test: logstor: add separator and compaction tests	2026-03-18 19:24:27 +01:00
Michael Litvak	5a16980845	logstor: recovery: initial initial and basic recovery implementation. * find all files, read their segments and populate the index with the newest record for each key. * find which segments are used and build the usage histogram	2026-03-18 19:24:26 +01:00
Michael Litvak	a521bcbcee	test: add test_logstor.py add basic tests for key-value tables with logstor storage	2026-03-18 19:24:26 +01:00
Dawid Mędrek	a8dd13731f	Merge 'Improve debuggability of test/cluster/test_data_resurrection_in_memtable.py' from Botond Dénes This test was observed to fail in CI recently but there is not enough information in the logs to figure out what went wrong. This PR makes a few improvements to make the next investigation easier, should it be needed: * storage-service: add table name to mutation write failure error messages. * database: the `database_apply` error injection used to cause trouble, catching writes to bystander tables, making tests flaky. To eliminate this, it gained a filter to apply only to non-system keyspaces. Unfortunately, this still allows it to catch writes to the trace tables. While this should not fail the test, it reduces observability, as some traces disappear. Improve this error injection to only apply to selected table. Also merge it with the `database_apply_wait` error injection, to streamline the code a bit. * test/test_data_resurrection_in_memtable.py: dump data from the datable, before the checks for expected data, so if checks fail, the data in the table is known. Refs: SCYLLADB-812 Refs: SCYLLADB-870 Fixes: SCYLLADB-1050 (by restricting `database_apply` error injection, so it doesn't affect writes to system traces) Backport: test related improvement, no backport Closes scylladb/scylladb#28899 * github.com:scylladb/scylladb: test/cluster/test_data_resurrection_in_memtable.py: dump rows before check replica/database: consolidate the two database_apply error injections service/storage_proxy: add name of table to error message for write errors	2026-03-17 13:35:19 +01:00
Calle Wilund	a5df2e79a7	storage_service: Wait for snapshot/backup before decommission Fixes: SCYLLADB-244 Disables snapshot control such that any active ops finish/fail before proceeding with decommission. Note: snapshot control provided as argument, not member ref due to storage_service being used from both main and cql_test_env. (The latter has no snapshot_ctl to provide). Could do the snapshot lockout on API level, but want to do pre-checks before this. Note: this just disables backup/snapshot fully. Could re-enable after decommission, but this seems somewhat pointless. v2: * Add log message to snapshot shutdown * Make test use log waiting instead of timeouts Closes scylladb/scylladb#28980	2026-03-16 17:12:57 +02:00
bitpathfinder	85d5073234	test: Fix non-awaited coroutine in test_gossiper_empty_self_id_on_shadow_round The line with the error was not actually needed and has therefore been removed. Fixes: SCYLLADB-906 Closes scylladb/scylladb#28884	2026-03-16 17:07:36 +02:00
Botond Dénes	3e4e0c57b8	Merge 'Relax rf-rack-valid-keyspace option in backup/restore tests' from Pavel Emelyanov Some tests, when create a cluster, configure nodes with the rf-rack-valid option, because sometimes they want to have it OFF. For that the option is explicitly carried around, but the cluster creating helper can guess this option itself -- out of the provided topology and replication factor. Removing this option simplifies the code and (which a nicer outcome) the test "signature" that's used e.g. in command-line to run a specific test. Improving tests, not backporting Closes scylladb/scylladb#28860 * github.com:scylladb/scylladb: test: Relax topology_rf_validity parameter for some tests test: Auto detect rf-rack-valid option in create_cluster()	2026-03-16 17:06:46 +02:00
Patryk Jędrzejczak	526e5986fe	test: test_raft_no_quorum: decrease group0_raft_op_timeout_in_ms after quorum loss `test_raft_no_quorum.py::test_cannot_add_new_node` is currently flaky in dev mode. The bootstrap of the first node can fail due to `add_entry()` timing out (with the 1s timeout set by the test case). Other test cases in this test file could fail in the same way as well, so we need a general fix. We don't want to increase the timeout in dev mode, as it would slow down the test. The solution is to keep the timeout unchanged, but set it only after quorum is lost. This prevents unexpected timeouts of group0 operations with almost no impact on the test running time. A note about the new `update_group0_raft_op_timeout` function: waiting for the log seems to be necessary only for `test_quorum_lost_during_node_join_response_handler`, but let's do it for all test cases just in case (including `test_can_restart` that shouldn't be flaky currently). Fixes https://scylladb.atlassian.net/browse/SCYLLADB-913 Closes scylladb/scylladb#28998	2026-03-16 16:58:15 +02:00
Artsiom Mishuta	755d528135	test.py: fix warnings changes in this commit: 1)rename class from 'TestContext' to 'Context' so pytest will not consider this class as a test 2)extend pytest filterwarnings list to ignore warnings from external libs 3) use datetime.datetime.now(datetime.UTC) unstead datetime.datetime.utcnow() 4) use ResultSet.one() instead ResultSet[0] Fixes SCYLLADB-904 Fixes SCYLLADB-908 Related SCYLLADB-902 Closes scylladb/scylladb#28956	2026-03-15 12:00:10 +02:00
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Pavel Emelyanov	d544d8602d	test: Relax topology_rf_validity parameter for some tests Tests that call create_cluster() helper no longer need to carry the rf-validity parameter. This simplifies the code and test signature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Pavel Emelyanov	313985fed7	test: Auto detect rf-rack-valid option in create_cluster() The helper accepts its as boolean argument, but it can easily estimate one from the provided topology. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Avi Kivity	ae8a418744	Merge 'Await async calls in test tablets migration' from Benny Halevy Fix several test cases that did not await async tasks: - test_restart_leaving_replica_during_cleanup - test_restart_in_cleanup_stage_after_cleanup - test_tablet_back_and_forth_migration - test_staging_backlog_is_preserved_with_file_based_streaming Fixes SCYLLADB-910 * Minor fixes, no backport needed Closes scylladb/scylladb#28908 * github.com:scylladb/scylladb: test_tablets_migration: test_staging_backlog_is_preserved_with_file_based_streaming: convert for loop to asyncio.gather test_tablets_migration: test_tablet_back_and_forth_migration: await move_tablet test_tablets_migration: test_restart_in_cleanup_stage_after_cleanup: await move_task test_tablets_migration: test_restart_leaving_replica_during_cleanup: await move_task test_tablets_migration: drop unused imports from cassandra.query	2026-03-13 00:20:29 +02:00
Avi Kivity	b228eb26e6	Merge 'dbuild: Use slirp4netns network in dbuild nested containers' from Calle Wilund Fixes #25084 Add slirp4netns and use for nested containers. This will allow nested container port aliasing, helping CI stability. Note: this contains and updated Dockerfile for dbuild image, but since chicken and eggs, right now will force install slirp4netns before anything in dbuild script. Updates the mock server handling to use ephemeral ports and query from container, ensuring we don't get port collisions. (boost as well as pytest). Includes a timeout up, and a tweak to our scylla_cluster handling, ensuring we don't deadlock when pipe size is less than requires for our sys notify messages. Closes scylladb/scylladb#28727 * github.com:scylladb/scylladb: gcs_fixture: Change to use docker helper aws_kms_fixture: Modify to use docker helper test/lib/proc_util: Add docker helper pytest: use ephemeral port publish for docker mock servers dbuild: Use container network in dbuild nested containers scylla_cluster: Read notify sock in background to prevent deadlock	2026-03-12 23:49:25 +02:00
Nadav Har'El	ad832c263e	test/cluster: mark test_alternator_concurrent_rmw_same_partition_different_server not strictly xfail A few days ago, in commit `7b30a39` we added to pytest.ini the option xfail_strict. This option causes every time a test XPASSes, i.e., an xfail test actually passes - to be considered an error and fail the test. But some tests demonstrate a timing-related bug and do not reproduce the bug every single time. An example we noticed in one CI run is: test/cluster/test_alternator.py::test_alternator_concurrent_rmw_same_partition_different_server This test reproduces a timing-related bug (if you do an LWT write to one partition on to two different coordinators "at the same time", you can get a failure), but only most of the time, not 100% of the time. The solution is to add "strict=False" for the xfail marker on this specific test. This undoes the xfail_strict for this specific test, accepting that this specific test can either pass or fail. Note that this does NOT make this test worthless - we still see this test failing most of the time, and when a developer finally fixes this issue, the test will begin to pass all the time. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-941 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29016	2026-03-12 23:46:23 +02:00
Wojciech Mitros	32974770b0	test: add tests for CQL forwarding Add basic cluster tests for CQL forwarding. The test cases include: - basic reads and writes - prepared statements with binds - forwarding from a non-replica - exception passthrough during forwarding (using an injection) - re-preparing a statement on the target node, even if the user query is also an EXECUTE request on a prepared statement - verification metric updates The existing test_basic_write_read was modified so that a few extra cases could be validated on the same cluster.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	916a9995c1	transport: enable CQL forwarding for strong consistency statements We enable CQL forwarding by starting to return the bounce_to_node result message in redirect_statement() instead of throwing. The forwarding code introduced in the preceding patches reacts to these messages, allowing the requests to be forwarded. With the update, some tests assuming that requests can't be forwarded need to be adjusted, so we do that as well.	2026-03-12 19:43:35 +01:00
Alex	7fd39ba586	test/cluster: strengthen raft voters multi-DC test and tune debug runtime The test_raft_voters_multidc_kill_dc scenario had become weaker after group0 voter count was made always odd. In particular, the old num_nodes == 1 case (dc1=2, dc2=1, dc3=1) could pass even without the intended balancing logic, because with 3 voters total we naturally get one voter per DC. This change restores coverage of the original intent: - Replace num_nodes parametrization with explicit DC triples. - Use (3, 1, 1) to force a meaningful asymmetric topology where voter placement logic is required. - Keep a larger topology case (6, 3, 3) for broader coverage. - Mark (6, 3, 3) as skip_mode(debug) with reason: larger topology case is too slow in debug on minipcs. Also updated comments/docstring to match the new setup. Fixes: SCYLLADB-794 backport: None, it is done to deflake minipcs that will start working only on master Closes scylladb/scylladb#29000	2026-03-12 17:07:45 +01:00
Benny Halevy	b3fec20960	test_tablets_migration: test_staging_backlog_is_preserved_with_file_based_streaming: convert for loop to asyncio.gather Currently the test iterates on all servers and calls manager.api.disable_injection but it doesn't await those calls. Use asyncio.gather to await all calls in parallel. Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	61d5a2df02	test_tablets_migration: test_tablet_back_and_forth_migration: await move_tablet Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	b8655748a2	test_tablets_migration: test_restart_in_cleanup_stage_after_cleanup: await move_task Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	10dccc2c4e	test_tablets_migration: test_restart_leaving_replica_during_cleanup: await move_task Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	c9d653fb1e	test_tablets_migration: drop unused imports from cassandra.query Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Calle Wilund	3e8a9a0beb	pytest: use ephemeral port publish for docker mock servers Changes dockerized_service to use ephermal port publish, and query the published port from podman/docker. Modifies client code to use slightly changed usage syntax.	2026-03-11 12:32:01 +01:00
Piotr Dulikowski	d9a277453e	Merge 'cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race' from Alex Dathskovsky query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak handle could no longer be promoted and the prepare path could fail nondeterministically. This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops. Test coverage was extended in test/cluster/test_prepare_race.py: - reproduces the invalidation-during-prepare window with injection, - verifies prepare completes successfully, - then invalidates again and executes the same stale client prepared object, - confirms the driver transparently re-requests/re-prepares and execution succeeds. This change introduces: - no behavior change for normal prepare flow besides stronger lifetime guarantees, - no new protocol semantics, - preserves existing cache invalidation logic, - adds explicit cluster-level regression coverage for both the race and driver reprepare path. - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage Fixes: https://github.com/scylladb/scylladb/issues/27657 Backport to active branches recommended: No node crash, but user-visible PREPARE failures under rare schema-invalidation race; low-risk timeout-bounded retry improves robustness. Closes scylladb/scylladb#28952 * github.com:scylladb/scylladb: transport/messages: hold pinned prepared entry in PREPARE result cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race	2026-03-11 12:09:23 +01:00
Patryk Jędrzejczak	37aeba9c8c	Merge 'raft: add global read barrier to group0_batch::commit and switch auth and service levels' from Marcin Maliszkiewicz This series adds a global read barrier to raft_group0_client, ensuring that Raft group0 mutations are applied on all live nodes before returning to the caller. Currently, after a group0_batch::commit, the mutations are only guaranteed to be applied on the leader. Other nodes may still be catching up, leading to stale reads. This patch introduces a broadcast read barrier mechanism. Calling send_group0_read_barrier_to_live_members after committing will cause the coordinator to send a read barrier RPC to all live nodes (discovered via gossiper) and waits for them to complete. This is best effort attempt to get cluster-wide visibility of the committed state before the response is returned to the user. Auth and service levels write paths are switched to use this new mechanism. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-650 Backport: no, new feature Closes scylladb/scylladb#28731 * https://github.com/scylladb/scylladb: test: add tests for global group0_batch barrier feature qos: switch service levels write paths to use global group0_batch barrier auth: switch write paths to use global group0_batch barrier raft: add function to broadcast read barrier request raft: add gossiper dependency to raft_group0_client raft: add read barrier RPC	2026-03-11 10:37:19 +01:00
Botond Dénes	475220b9c9	Merge 'Remove the rest of pre raft topology code' from Gleb Natapov Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more. No need to backport since we remove functionality here. Closes scylladb/scylladb#28841 * github.com:scylladb/scylladb: service level: remove version 1 service level code features: move GROUP0_SCHEMA_VERSIONING to deprecated features list migration_manager: remove unused forward definitions test: remove unused code auth: drop auth_migration_listener since it does nothing now schema: drop schema_registry_entry::maybe_sync() function schema: drop make_table_deleting_mutations since it should not be needed with raft schema: remove calculate_schema_digest function schema: drop recalculate_schema_version function and its uses migration_manager: drop check for group0_schema_versioning feature cdc: drop usage of cdc_local table and v1 generation definition storage_service: no need to add yourself to the topology during reboot since raft state loading already did it storage_service: remove unused functions group0: drop with_raft() function from group0_guard since it always returns true now gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more gossiper: drop tokens from loaded_endpoint_state gossiper: remove unused functions storage_service: do not pass loaded_peer_features to join_topology() storage_service: remove unused fields from replacement_info gossiper: drop is_safe_for_restart() function and its use storage_service: remove unused variables from join_topology gossiper: remove the code that was only used in gossiper topology storage_service: drop the check for raft mode from recovery code cdc: remove legacy code test: remove unused injection points auth: remove legacy auth mode and upgrade code treewide: remove schema pull code since we never pull schema any more raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer group0: hoist the checks for an illegal upgrade into main.cc api: drop get_topology_upgrade_state and always report upgrade status as done service_level_controller: drop service level upgrade code test: drop run_with_raft_recovery parameter to cql_test_env group0: get rid of group0_upgrade_state storage_service: drop topology_change_kind as it is no longer needed storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more service_storage: remove unused functions storage_service: remove non raft rebuild code storage_service: set topology change kind only once group0: drop in_recovery function and its uses group0: rename use_raft to maintenance_mode and make it sync	2026-03-11 10:24:20 +02:00
Calle Wilund	6d8ac23731	test_encryption: Use maximum replication in _smoke_test Refs: SCYLLADB-557 We should use full replication in KS/CF creation and population, for at least two reasons: 1.) Ensure we wait fully for and write to all nodes 2.) Make test more "real", behaving like a proper cluster Closes scylladb/scylladb#28959	2026-03-11 09:54:57 +02:00
Botond Dénes	99fa912f1b	Merge 'Generalize streaming scopes tests' from Pavel Emelyanov To restore how streaming scopes work there are two tests that greatly duplicate each other -- test_restore_with_streaming_scopes from cluster/object_store suite and test_refresh_with_streaming_scopes from cluster suite. This patch generalizes both into a do_test_streaming_scopes() non-test function Closes scylladb/scylladb#28874 * github.com:scylladb/scylladb: test: Re-sort comments around do_test_streaming_scopes() test: Split do_load_sstables() test: Drop load_fn argument from do_load_sstables() test: Re-use do_test_streaming_scopes() in refresh test test: Introduce SSTablesOnLocalStorage test: Introduce SSTablesOnObjectStorage test: Move test_restore_with_streaming_scopes() into do_test_streaming_scopes()	2026-03-11 09:35:21 +02:00
Botond Dénes	3fed6f9eff	Merge 'service: tasks: scan all tablets in tablet_virtual_task::wait' from Aleksandra Martyniuk Currently, for repair tasks tablet_virtual_task::wait gathers the ids of tablets that are to be repaired. The gathered set is later used to check if the repair is still ongoing. However, if the tablets are resized (split or merged), the gathered set becomes irrelevant. Those, we may end up with invalid tablet id error being thrown. Wait until repair is done for all tablets in the table. Fixes: https://github.com/scylladb/scylladb/issues/28202 Backport to 2026.1 needed as it contains the change introducing the issue `d51b1fea94` Closes scylladb/scylladb#28323 * github.com:scylladb/scylladb: service: fix indentation test: add test_tablet_repair_wait service: remove status_helper::tablets service: tasks: scan all tablets in tablet_virtual_task::wait	2026-03-11 09:24:07 +02:00
Dawid Mędrek	167feabe1a	cql3: Reject user-provided timestamps for strongly consistent tables Similarly to LWTs, we reject queries with user-provided timestamps when they target strongly consistent tables. Such statements could force us to rewrite history, and that contradicts the philosophy of linearizability we aim for. Fixes SCYLLADB-879 Closes scylladb/scylladb#28867	2026-03-10 22:11:39 +02:00
Botond Dénes	81e214237f	Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk This pull request adds support for calculation and storing CRC32 digests for all SSTable components. This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream. Several test cases where introduced to verify expected behaviour. Additionally, this PR adds new rewrite component mechanism for safe sstable component rewriting. Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allowed crash recovery by simply removing the temporary file on startup. However, with component digests stored in scylla_metadata (#20100), replacing a component like Statistics requires atomically updating both the component and scylla_metadata with the new digest - impossible with POSIX rename. The new mechanism creates a clone sstable with a fresh generation: - Hard-links all components from the source except the component being rewritten and scylla_metadata - Copies original sstable components pointer and recognized components from the source - Invokes a modifier callback to adjust the new sstable before rewriting - Writes the modified component along with updated scylla_metadata containing the new digest - Seals the new sstable with a temporary TOC - Replaces the old sstable atomically, the same way as it is done in compaction This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair). In case of any failure durning the whole process, sstable will be automatically deleted on the node startup due to temporary toc persistence. Backport is not required, it is a new feature Fixes https://github.com/scylladb/scylladb/issues/20100, https://github.com/scylladb/scylladb/issues/27453 Closes scylladb/scylladb#28338 * github.com:scylladb/scylladb: docs: document components_digests subcomponent and trailing digest in Scylla.db sstable_compaction_test: Add tests for perform_component_rewrite sstable_test: add verification testcases of SSTable components digests persistance sstables: store digest of all sstable components in scylla metadata sstables: replace rewrite_statistics with new rewrite component mechanism sstables: add new rewrite component mechanism for safe sstable component rewriting compaction: add compaction_group_view method to specify sstable version sstables: add null_data_sink and serialized_checksum for checksum-only calculation sstables: extract default write open flags into a constant sstables: Add write_simple_with_digest for component checksumming sstables: Extract file writer closing logic into separate methods sstables: Implement CRC32 digest-only writer	2026-03-10 16:02:53 +02:00
Aleksandra Martyniuk	02257d1429	test: add test_tablet_repair_wait Add a test that checks if tablet_virtual_task::wait won't fail if tablets are merged.	2026-03-10 14:42:27 +01:00
Andrei Chekun	c36df5ecf4	test.py: eliminite drivers exception There is a race condition in driver that raises the RuntimeException. This pollutes the output, so this PR is just silencing this exception. Fixes: SCYLLADB-900 Closes scylladb/scylladb#28957	2026-03-10 14:31:36 +02:00
Alex	27051d9a7c	cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak handle could no longer be promoted and the prepare path could fail nondeterministically. This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops. Test coverage was extended in test/cluster/test_prepare_race.py: - reproduces the invalidation-during-prepare window with injection, - verifies prepare completes successfully, - then invalidates again and executes the same stale client prepared object, - confirms the driver transparently re-requests/re-prepares and execution succeeds. This change introduces: - no behavior change for normal prepare flow besides stronger lifetime guarantees, - no new protocol semantics, - preserves existing cache invalidation logic, - adds explicit cluster-level regression coverage for both the race and driver reprepare path. - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage	2026-03-10 14:17:57 +02:00
Piotr Dulikowski	37f8cdf485	Merge 'test.py: fix unawaited ScyllaLogFile.grep() coroutines' from Andrei Chekun Fixed several places where ScyllaLogFile.grep() was called without await, resulting in checking coroutine objects for truthiness instead of actual log matches. Fixes: SCYLLADB-903 No backport, framework fix and one test fix. Closes scylladb/scylladb#28909 * github.com:scylladb/scylladb: test.py: fix unawaited ScyllaLogFile.grep() coroutines tests: fix test_group0_recovers_after_partial_command_application	2026-03-10 12:29:23 +01:00
Gleb Natapov	aa9eb0ef8c	test: remove unused code	2026-03-10 10:46:48 +02:00
Gleb Natapov	0b508c5f96	test: remove unused injection points Also remove test_auth_raft_command_split test which is irrelevant since `5ba7d1b116` because it does not use the function that injects max sized command after the commit.	2026-03-10 10:09:39 +02:00
Pavel Emelyanov	61af7c8300	test: Re-sort comments around do_test_streaming_scopes() The test description of refreshing test is very elaborated and it's worth having it as the description of the streaming scopes test itself. Callers of the helper can go with smaller descriptions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 10:00:09 +03:00
Pavel Emelyanov	5ce3597c25	test: Split do_load_sstables() This helper does two things -- sorts sstables per server according to scope in use and calls sstables_storage.restore(). The code looks better if the sorting of sstables stays in a helper and the call for .restore() is moved to the caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 10:00:09 +03:00
Pavel Emelyanov	8c1fb2b39a	test: Drop load_fn argument from do_load_sstables() Now all callers provide the sstables_storage argument and the load_fn is effectively unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:59:08 +03:00
Pavel Emelyanov	59051ccc28	test: Re-use do_test_streaming_scopes() in refresh test Now it's possible to replace the whole body of the test_refresh_with_streaming_scopes() test by calling the corresponding helper function from backup/restore test module. This helper does exactly the same, and the SSTablesOnLocalStorage class provides the necessary save/restore implementations. One more thing to mention -- the refreshing test for some reason only wants to run with restored min-tablet-count equal to the original one. The do_test_streaming_scopes() needs to account for that, as it runs the tests for more options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:59:07 +03:00
Pavel Emelyanov	f6f1cb0391	test: Introduce SSTablesOnLocalStorage This class implements some of the sstables manipulations performed by test_refresh_with_streaming_scopes(). It's here to facilitate next patch that will use it to call do_test_streaming_scopes() helper. This patch moves two blocks of code out of the test into this new class. The shutil.rmtree(tmpbackup) is seemingly lost, but it really isn't -- the tmpbackup variable holds a name of a _subdir_ inside servers' workdirs. This path doesn't really exist on disk on its own, so removing it is a no-op. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:58:40 +03:00
Pavel Emelyanov	dae4da1810	test: Introduce SSTablesOnObjectStorage The class in question performs two operations for do_test_streaming_scopes(): saves sstables and restores them. Current caller of the helper is the test_restore_with_streaming_scopes() test that need to backup sstables on object storage and restore them from there with the restoration API. The SSTablesOnObjectStorage class does exactly that. The change in do_load_sstables() that checks for sstables_storage to be non None is needed to keep test_refresh_with_streaming_scopes() work -- that test doesn't provide sstables_storage (yet) and the function in question will call the load_fn callback. Next patch will eliminate it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:58:39 +03:00
Pavel Emelyanov	5a033dea47	test: Move test_restore_with_streaming_scopes() into do_test_streaming_scopes() The body of this test is duplicated by test_refresh_with_streaming_scopes() test from other module. Keeping it in a non-test top-level function will help generalizing these two tests. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:57:53 +03:00
Andrei Chekun	8acba40c84	test.py: fix unawaited ScyllaLogFile.grep() coroutines Fixed several places where ScyllaLogFile.grep() was called without await, resulting in checking coroutine objects for truthiness instead of actual log matches. Fixes: SCYLLADB-903	2026-03-09 19:41:07 +01:00
Andrei Chekun	224a11be65	tests: fix test_group0_recovers_after_partial_command_application Due to the fact that grep logs was not awaited this issue was masked. With adding await for log grep it started to fail. This PR fixes the test.	2026-03-09 19:41:07 +01:00
Marcin Maliszkiewicz	96a2b0e634	test: add tests for global group0_batch barrier feature Runtime: 16s in dev mode	2026-03-09 15:15:59 +01:00
Botond Dénes	1e41db5948	Merge 'service: tasks: return successful status if a table was dropped' from Aleksandra Martyniuk tablet_virtual_task::wait throws if a table on which a tablet operation was working is dropped. Treat the tablet operation as successful if a table is dropped. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-494 Needs backport to all live releases Closes scylladb/scylladb#28933 * github.com:scylladb/scylladb: test: add test_tablet_repair_wait_with_table_drop service: tasks: return successful status if a table was dropped	2026-03-09 16:04:44 +02:00

1 2 3 4 5 ...

1041 Commits