scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	8dc20e6aaf	test: cluster: Add reproducer for missed notification in topology coordinator	2026-01-13 00:40:23 +01:00
Tomasz Grabiec	7a04dd2d22	topology_coordinator: Wake up the state machine after stats refresh Otherwise, coordinator may not react to changing stats after explicit calls to trigger_load_stats_refresh() done on node replace or table creation, if stats take longer to refresh than it takes the coordinator to go idle. The periodic refresh does wake up the topology coordinator, so the issue is not dramatic in production, but it's annoying in tests, which take longer because of that. Fixes #25163	2026-01-13 00:40:23 +01:00
Tomasz Grabiec	d910e6ea63	topology_coordinator: Move tablet_load_stats_refresh_before_rebalancing injection earlier Refreshing stats will signal _topo_sm.event, so do it before waiting for the event, to avoiding busy looping in the coordinator. This will produce lots of logs in test cases which enable debug-level logging in the raft logger. Refs #28086	2026-01-13 00:40:16 +01:00
Tomasz Grabiec	e5dee2aab8	topology_coordinator: Fix potential missed notification Checking for work is not atomic, so there is room for missed notification. Especially that notifications are not always triggered from fibers which take the group0 guard. Fix by subscribing for the event before checking for work. Fixes #27958	2026-01-13 00:39:01 +01:00
Tomasz Grabiec	2b7aa3211d	topology_coordinator: Refresh load stats after table is created or altered We switched to the size-based load balancing, which now has more strict requirements for load stats. We no longer need only per-node stats (capacity), but also per-tablet stats. Bootstrapping a node triggers stats refresh, but allocating tablets on table creation didn't. So after creating a table, load balancer couldn't make progress for up to 60s (stats refresh period). This makes tests take longer, and can even cause failures if tests are using a low-enough timeout. Fixes #27921	2026-01-13 00:38:59 +01:00
Tomasz Grabiec	663831ebd7	tablets: Do a group0 read barrier on tablet load stats refresh Stats refresh will be triggered on topology coordinator by events like allocating new tablets on table creation. For refresh to be effective, all replicas must see the new tablets, otherwise stats will be incomplete.	2026-01-13 00:38:00 +01:00
Tomasz Grabiec	c4c5ed5aba	topology_coordinator: Ensure stats are refreshed in the gossip scheduling group Refresh can be triggered from different places, but it should run in the gossip scheduling group, like group0 operations.	2026-01-13 00:38:00 +01:00
Tomasz Grabiec	5e6935f276	test: Use ManagerClient.{disable,enable}_tablet_balancing()	2026-01-13 00:38:00 +01:00
Tomasz Grabiec	6936704677	test: Add missing calls to disable_tablet_balancing() in tests which use move_tablet() API If a test tries to move a tablet, it assumes the tablets are stable. This fixes flakiness exposed by size-based load-balancing and a later change to refresh stats sooner.	2026-01-13 00:38:00 +01:00
Tomasz Grabiec	c8098e07c9	test: pylib: Introduce ManagerClient.{disable,enable}_tablet_balancing() It's a global operation, so we can use any server. It's not only convenient. The call via api.disable_tablet_balancing() confuse people to think that it's a per-server operation. This leads to proliferation of code which does it needlessly on all servers.	2026-01-13 00:38:00 +01:00
Botond Dénes	6bcc18e5c6	erge 'test.py: integrate python tests to be executed with pytest runner' from Andrei Chekun This will move responsibility for running tests with pytest in the same manner as it was done with boost tests. From this commit, test.py is not responsible anymore for running python tests and relies completely on pytest. This is another step for unification of test execution. Convert skip_mode function to `pytest.mark` to be able to use to annotate the whole module instead of each test explicitly. NOTE: this is a breaking change. From this commit, several directories with tests will require a path to the file to launch the test. Affected directories test/alternator test/broadcast_tables test/cql test/cqlpy test/rest_api Changes only in framework, so no backport. This PR will increase the amount of the tests by 30 test, due to the fact that how test.py and pytest discover tests. test.py count a file as a test, and when skip used in suite.yaml it will exclude the tests from discovery completely. While the pytest count test funstion as a test and uses skip_mode mark and will discover the tests, but it will skip them during execution, hence the difference test.py output before PR: ```bash > ./test.py --mode=release rest_api/test_compaction_task rest_api/test_task_manager --list --no-gather-metrics ``` test.py output in this PR: ```bash > ./test.py --mode=release test/rest_api/test_compaction_task.py test/rest_api/test_task_manager.py --list rest_api/test_compaction_task.py::test_global_major_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_major_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_reshaping_compaction_task.release.1 rest_api/test_compaction_task.py::test_resharding_compaction_task.release.1 rest_api/test_compaction_task.py::test_regular_compaction_task.release.1 rest_api/test_compaction_task.py::test_compaction_task_abort.release.1 rest_api/test_compaction_task.py::test_major_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_compaction_progress[major_keyspace_compaction_task_impl_run_fail].release.1 rest_api/test_compaction_task.py::test_compaction_progress[shard_major_keyspace_compaction_task_impl_run_fail].release.1 rest_api/test_compaction_task.py::test_compaction_progress[table_major_keyspace_compaction_task_impl_run_fail].release.1 rest_api/test_task_manager.py::test_task_manager_modules.release.1 rest_api/test_task_manager.py::test_task_manager_tasks.release.1 rest_api/test_task_manager.py::test_task_manager_status_running.release.1 rest_api/test_task_manager.py::test_task_manager_status_done.release.1 rest_api/test_task_manager.py::test_task_manager_status_failed.release.1 rest_api/test_task_manager.py::test_task_manager_not_abortable.release.1 rest_api/test_task_manager.py::test_task_manager_wait.release.1 rest_api/test_task_manager.py::test_task_manager_ttl.release.1 rest_api/test_task_manager.py::test_task_manager_user_ttl.release.1 rest_api/test_task_manager.py::test_task_manager_sequence_number.release.1 rest_api/test_task_manager.py::test_task_manager_recursive_status.release.1 rest_api/test_task_manager.py::test_module_not_exists.release.1 rest_api/test_task_manager.py::test_task_folding.release.1 rest_api/test_task_manager.py::test_abort_on_unregistered_task.release.1 ``` Fixes: https://github.com/scylladb/scylladb/issues/27716 Closes scylladb/scylladb#26395 * github.com:scylladb/scylladb: test.py: fix test_vector_similarity.py docs: add directories excluded from test.py test.py: prevent file descriptors leaking test.py: capture print inside the test test.py: do not print header for collection with test.py test.py: remove not supported functionality test.py: switch of execution of several test directories by test.py runner test.py: integrate python tests to be executed with pytest runner test.py: fix test/vector_search_validator to be able to run with pytest test.py: prepare base class for migration test.py: move environment preparation to one method test.py: introduce new environment variable TESTPY_PREPARED_ENVIRONMENT	2026-01-12 14:17:19 +02:00
Botond Dénes	04b8f72946	Merge 'repair: Implement auto repair for tablet repair' from Asias He repair: Implement auto repair for tablet repair This patch implements the basic auto repair support for tablet repair. It was decided to add no per table configuration for the initial implementation, so two scylla yaml config options are introduced to set the default auto repair configs for all the tablet tables. - auto_repair_enabled_default Set true to enable auto repair for tablet tables by default. The value will be overridden by the per keyspace or per table configuration which is not implemented yet. - auto_repair_threshold_default_in_seconds Set the default time in seconds for the auto repair threshold for tablet tables. If the time since last repair is bigger than the configured time, the tablet is eligible for auto repair. The value will be overridden by the per keyspace or per table configuration which is not implemented yet. The following metrcis are added: - auto_repair_needs_repair_nr The number of tablets with auto repair enabled that needs repair - auto_repair_enabled_nr The number of tablets with auto repair enabled The metrics are useful to tell if auto repair is falling behind. In the future, more auto repair scheduling will be added, e.g., scheduling based on the repaired and unrepaired sstable set size, tombstone ratio and so on, in addition to the time based scheduling. Fixes SCYLLADB-99 New feature. No backport. Closes scylladb/scylladb#27534 * github.com:scylladb/scylladb: topology_coordinator: Add metrics for tablet repair repair: Implement auto repair for tablet repair	2026-01-12 14:16:01 +02:00
Yaniv Kaul	f1c9eda49e	Potential fix for code scanning alert no. 144: Workflow does not contain permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27809	2026-01-12 12:21:35 +02:00
Marcin Maliszkiewicz	3c9f52e709	Merge 'doc: update the Web Installer instructions' from Anna Stuchlik This PR: - Replaces a fixed version name with the variable for the current version in the instructions for installing a non-default version with Web Installer. This will make using the installer more user-friendly. - Removes the instruction for Open Source from the Web Installer docs. Fixes https://github.com/scylladb/scylladb/issues/28005 Fixes https://github.com/scylladb/scylladb/issues/28079 Closes scylladb/scylladb#28046 * github.com:scylladb/scylladb: doc: remove the instruction for Open Source from the Web Installer docs doc: add the version variable to the Web Installer instructions	2026-01-12 11:10:04 +01:00
Petr Gusev	889d7782ed	treewide: use coroutine::maybe_yield in coroutines It's more efficient since coroutine::maybe_yield returns a lightweight struct (awaitable), not the future. Closes scylladb/scylladb#28101	2026-01-12 10:38:47 +01:00
Marcin Maliszkiewicz	09af3828ab	auth: remove confusing deprecation msg from hash_with_salt Closes scylladb/scylladb#27705	2026-01-12 10:12:54 +01:00
Asias He	7980890029	topology_coordinator: Add metrics for tablet repair - scylla_tablet_ops_failed Number of failed tablet {auto, user} repair - scylla_tablet_ops_succeeded Number of succeeded tablet {auto, user} repair Currently auto_repair and user_repair tablet task are added. We can add more tablet tasks later, e.g., rebuild, migration.	2026-01-12 15:26:05 +08:00
Alex	e430065c92	db: views: serialize create/drop view operations via shard 0 Create and drop view operations are currently performed on all shards, and their execution is not fully serialized. On slower processors this can lead to interleavings that leave stale entries in `system.scylla_views_build` A problematic sequence looks like this: * `on_create_view()` runs on shard 0 → entries for shard 0 and shard 1 are created * `on_drop_view()` runs on shard 0 → entry for shard 0 is removed * `on_create_view()` runs on shard 1 → entries for shard 0 and shard 1 are created again * `on_drop_view()` runs on shard 1 → entry for shard 1 is removed, while the shard 0 entry remains This results in a leftover row in `system.scylla_views_builds_in_progress`, causing `view_build_test.cc` to get stuck indefinitely in an eventual state and eventually be terminated by CI. This patch fixes the issue by fully serializing all view create and drop operations through shard 0. Shard 0 becomes the single execution point and notifies other shards to perform their work in order. Requests originating. new process: - view_builder::on_create_view(...) runs only on shard 0 and kicks off dispatch_create_view(...) in the background. - dispatch_create_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed. - dispatch_create_view(...) calls handle_seed_view_build_progress(...) on shard 0. That: - writes the global “build progress” row across all shards via _sys_ks.register_view_for_building_for_all_shards(...). - After seeding, dispatch_create_view(...) broadcasts to all shards with container().invoke_on_all(...). - Each shard runs handle_create_view_local(...), which: - waits for pending base writes/streams, flushes the base, - resets the reader to the current token and adds the new view, - handles errors and triggers _build_step to continue processing. Drop view - view_builder::on_drop_view(...) runs only on shard 0 and kicks off dispatch_drop_view(...) in the background. - dispatch_drop_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed. - It broadcasts handle_drop_view_local(...) to all shards with invoke_on_all(...). - Each shard runs handle_drop_view_local(...), which: - removes the view from local build state (_base_to_build_step and _built_views) by scanning existing steps, - ignores missing keyspace cases. - After all shards finish local cleanup, shard 0 runs handle_drop_view_global_cleanup(...), which: - removes global build progress, built‑view state, and view build status in system tables, Shutdown - drain() waits on _view_notification_sem before _sem so in‑flight dispatches finish before bookkeeping is halted. In addition, the test is adjusted to remove the long eventual wait (596.52s / 30 iterations) and instead rely on the default wait of 17 iterations (~4.37 minutes), eliminating unnecessary delays while preserving correctness. Fixes: https://github.com/scylladb/scylladb/issues/27898 Backport: not required as the problem happens on master Closes scylladb/scylladb#27929	2026-01-12 09:23:22 +02:00
Michał Hudobski	92c988514c	vector_search: allow all where clauses in vector search queries To prepare for implementation of filtering we skip validation of where clauses in vector search queries. All queries that would be blocked by the lack of ALLOW FILTERING now will pass through. Fixes: VECTOR-410 Closes scylladb/scylladb#27758	2026-01-11 12:56:44 +02:00
Marcin Maliszkiewicz	03e0dd0841	Merge 'test/alternator: fix most tests to run on DynamoDB' from Nadav Har'El We can run Alternator's tests against DynamoDB with `test/alternator/run --aws`, and our intention is that all except a few specially marked should pass on DynamoDB - indicating that the test itself is correct and checks compatibility with DynamoDB and not with some misunderstood spec. Before this patch series, almost two dozen Alternator's tests failed on DynamoDB. This series fixes most of them. Refs #26079 (it fixes almost all the problems but probably not all of them so let's keep the issue open for a while longer) Closes scylladb/scylladb#27995 * github.com:scylladb/scylladb: test/alternator: fix some expected error messages to fit DynamoDB test/alternator: fix compressed request test on non-us-east1 test/alternator: fix test's expected error message on DynamoDB test/alternator: mark Alternator-only test scylla_only test/alternator: fix test on DynamoDB test/alternator: increase wait_for_gsi() timeout test/alternator: fix test passing a spurious parameter	2026-01-09 18:05:20 +01:00
Botond Dénes	7e1c8776b7	docs: remove sstabledump and sstablemetadata These tools are deprecated and no longer shipped by ScyllaDB packages. They no longer support the latest SSTable versions and ScyllaDB-only features, like encryption and dictionary based compression. Remove them from the documentation. Closes scylladb/scylladb#27608	2026-01-09 17:31:54 +01:00
Dawid Mędrek	2385afa1c7	scripts/pull_github_pr.sh: Update instructions for creating token The interface of Jenkins has changed, and the instructions for creating a token are out-of-date. This commit updates them. Closes scylladb/scylladb#28054	2026-01-09 17:45:00 +02:00
Ferenc Szili	0ede8d154b	docs: add docs for size based load balancing This patch updates the documentation for size based load balancing. Closes scylladb/scylladb#27616	2026-01-09 16:25:25 +02:00
Andrei Chekun	1f60208aa0	test.py: fix test_vector_similarity.py There is a known limitation of the xdist. Since it makes discovery in each thread, then compare it with master thread. The discovered lists of test should be the same. Sets are not order guaranteed, so they should not be used for parametrized testing, because discovery of the tests with using xdist will fail. This PR just converts set to dist, to eliminate issue mentioned above.	2026-01-09 15:08:40 +01:00
Yaniv Michael Kaul	af8eaa9ea5	scripts: fixes flagged by CodeQL/PyLens Unused imports, unused variables and such. Initially, there were no functional changes, just to get rid of some standard CodeQL warnings. I've then broken the CI, as apparently there's a install time(!?) Python script creation for the sole purpose of product naming. I changed it - we have it in etcdir, as SCYLLA-PRODUCT-FILE. So added (copied from a different script) a get_product() helper function in scylla_util.py and used it instead. While at it, also fixed the too broad import from scylla_util, which 'forced' me to also fix other specific imports (such as shutil). Improvement - no need to backport. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#27883	2026-01-09 15:13:12 +02:00
Anna Stuchlik	396093ff60	doc: remove the instruction for Open Source from the Web Installer docs Fixes https://github.com/scylladb/scylladb/issues/28079	2026-01-09 14:07:32 +01:00
Botond Dénes	af6cb0d0a4	Merge 'raft topology: preserve IP -> ID mapping of a replacing node on restart' from Patryk Jędrzejczak We currently do it only for a bootstrapping node, which is a bug. The missing IP can cause an internal error, for example, in the following scenario: - replace fails during streaming, - all live nodes are shut down before the rollback of replace completes, - all live nodes are restarted, - live nodes start hitting internal error in all operations that require IP of the replacing node (like client requests or REST API requests coming from nodetool). We fix the bug here, but we do it separately for replace with different IP and replace with the same IP. For replace with different IP, we persist the IP -> host ID mapping in `system.peers` just like for bootstrap. That's necessary, since there is no other way to determine IP of the replacing node on restart. For replace with the same IP, we can't do the same. This would require deleting the row corresponding to the node being replaced from `system.peers`. That's fine in theory, as that node is permanently banned, so its IP shouldn't be needed. Unfortunately, we have many places in the code where we assume that IP of a topology member is always present in the address map or that a topology member is always present in the gossiper endpoint set. Examples of such places: - nodetool operations, - REST API endpoints, - `db::hints::manager::store_hint`, - `group0_voter_handler::update_nodes`. We could fix all those places and verify that drivers work properly when they see a node in the token metadata, but not in `system.peers`. However, that would be too risky to backport. We take a different approach. We recover IP of the replacing node on restart based on the state of the topology state machine and `system.peers` just after loading `system.peers`. We rely on the fact that group 0 is set up at this point. The only case where this assumption is incorrect is a restart in the Raft-based recovery procedure. However, hitting this problem then seems improbable, and even if it happens, we can restart the node again after ensuring that no client and REST API requests come before replace is rolled back on the new topology coordinator. Hence, it's not worth to complicate the fix (by e.g. looking at the persistent topology state instead of the in-memory state machine). Fixes #28057 Backport this PR to all branches as it fixes a problematic bug. Closes scylladb/scylladb#27435 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: make generations of excluded nodes negative test: introduce test_full_shutdown_during_replace utils: error_injection: allow aborting wait_for_message raft topology: preserve IP -> ID mapping of a replacing node on restart	2026-01-09 14:56:16 +02:00
Calle Wilund	a7cdb602e1	db::commitlog: Fix sanity check error on race between segment flushing and oversized alloc Fixes #27992 When doing a commit log oversized allocation, we lock out all other writers by grabbing the _request_controller semaphore fully (max capacity). We thereafter assert that the semaphore is in fact zero. However, due to how things work with the bookkeep here, the semaphore can in fact become negative (some paths will not actually wait for the semaphore, because this could deadlock). Thus, if, after we grab the semaphore and execution actually returns to us (task schedule), new_buffer via segment::allocate is called (due to a non-fully-full segment), we might in fact grab the segment overhead from zero, resulting in a negative semaphore. The same problem applies later when we try to sanity check the return of our permits. Fix is trivial, just accept less-than-zero values, and take same possible ltz-value into account in exit check (returning units) Added whitebox (special callback interface for sync) unit test that provokes/creates the race condition explicitly (and reliably). Closes scylladb/scylladb#27998	2026-01-09 14:06:58 +02:00
Łukasz Paszkowski	7bf26ece4d	test_user_writes_rejection: Fix test flakiness caused by typo and non-local CL=ONE reads The current code: ``` try: cql.execute(f"INSERT INTO {cf} (pk, t) VALUES (-1, 'x')", host=host[0], execution_profile=cl_one_profile).result() except Exception: pass ``` contains a typo: `host=host[0]` which throws an exception becase Host object is not subscriptable. The test does not fail because the except block is too broad and suppresses all exceptions. Fixing the typo alone is insufficient. The write still succeeds because the remaining nodes are UP and the query uses CL=ONE, so no failure should be expected. Another source of flakiness is data verification: ``` SELECT * FROM {cf} WHERE pk = 0; ``` Even when a coordinator is explicitly provided, using CL=ONE does not guarantee a local read. The coordinator may forward the read request to another replica, causing the verification to fail nondeterministically. This patch rewrites the tests to address these issues: - Fix the typo: `host[0]` to `hosts[0]` - Verify data using `MUTATION_FRAGMENTS({cf})` which guarantees a local read on the coordinator node - Reconnect the driver after node restart Fixes https://github.com/scylladb/scylladb/issues/27933 Closes scylladb/scylladb#27934	2026-01-09 13:42:05 +02:00
Andrei Chekun	82e81a8664	docs: add directories excluded from test.py Add new directories that are excluded from the test.py executor and will be fully managed by pytest	2026-01-09 11:59:25 +01:00
Andrei Chekun	353bae7d66	test.py: prevent file descriptors leaking With migration to the pytest, file descriptors will be hanged during the whole life of the process. Previously it was not an issue, because test.py was executing only one file with Popen, so descriptors will be freed with process done. With new approach they are blocked. This will allow to eliminate this. Fix issue when we had issue with getting cluster and then trying to set it dirty while it None. Put cluster to the pool only if it was created	2026-01-09 11:59:25 +01:00
Andrei Chekun	67c5267053	test.py: capture print inside the test Capture the printing inside the test case to output it after the test and not directly during the testing process.	2026-01-09 11:59:25 +01:00
Andrei Chekun	594aedd6a5	test.py: do not print header for collection with test.py Skip printing the default pytest headers when printing list of the tests. Before: ``` $ ./test.py --mode=dev test/boost/sstable_conforms_to_mutation_source_test.cc --list Test session starts (platform: linux, Python 3.13.9, pytest 8.3.4, pytest-sugar 1.1.1) rootdir: /home/xtrey/projects/scylladb/test configfile: pytest.ini plugins: xdist-3.8.0, allure-pytest-2.15.0, sugar-1.1.1, anyio-4.8.0, asyncio-0.24.0, timeout-2.3.1 asyncio: mode=Mode.AUTO, default_loop_scope=session timeout: 24000.0s timeout method: signal timeout func_only: False session timeout: 24000.0s boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_mc_tiny.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_mc_medium.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_mc_large.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_md_tiny.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_md_medium.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_md_large.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_ms_tiny.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_ms_medium.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_ms_large.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_reversing_reader_random_schema.dev.1 Results (0.06s): ``` After: ``` $ ./test.py --mode=dev test/boost/sstable_conforms_to_mutation_source_test.cc --list boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_mc_tiny.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_mc_medium.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_mc_large.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_md_tiny.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_md_medium.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_md_large.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_ms_tiny.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_ms_medium.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_conforms_to_mutation_source_ms_large.dev.1 boost/sstable_conforms_to_mutation_source_test.cc::test_sstable_reversing_reader_random_schema.dev.1 Results (0.06s): ```	2026-01-09 11:59:25 +01:00
Andrei Chekun	21a1ff3d5c	test.py: remove not supported functionality In the current state pytest do not support the order of execution, so this parameter is removed. There is no big need in this due to the differences what pytest and test.py counted test. pytest run test functions in the threads, while test.py executed test files in the threads. That's why pytest's way is more granular and allows to fill threads better. Remove skip node, since it already added as a pytest mark for each test in the file. Remove pool_size, since this is not used by pytest at all. Pytest uses xdist to set the amount of threads instead of pool_size used by test.py	2026-01-09 11:59:25 +01:00
Andrei Chekun	e8c50a5ad4	test.py: switch of execution of several test directories by test.py runner With this commit test.py will lose ability to run tests by itself always bypassing execution to the pytest. NOTE: this is a breaking change. From this commit, several directories with tests will require a path to the file to launch the test. Affected directories test/alternator test/broadcast_tables test/cql test/cqlpy test/rest_api	2026-01-09 11:59:25 +01:00
Andrei Chekun	61d49525ad	test.py: integrate python tests to be executed with pytest runner With this commit test.py will be bypassing the tests execution to the pytest. However, it will still be able to run test by itself. With providing test name like `broadcast_tables/test_broadcast_tables` it will execute test with test.py runner, but if the path to the file will be provided like `test/broadcast_tables/test_broadcast_tables.py` it will bypass execution to the pytest. `--test-py-init` tells to run pytest session in test.py-compatible mode Update the help text for the name parameter for test.py about changes how it works and which directory is served by pytest	2026-01-09 11:59:25 +01:00
Andrei Chekun	808b29885f	test.py: fix test/vector_search_validator to be able to run with pytest build_mode fixture have dynamic scope. It depends how the pytest is executed. When it executed through test.py scope will be session and since it's broader that package everything work fine. While with pure pytest it will fail because build_mode will have module scope. This fix allows to run tests with pure pytest, this needed for migration test to be executed by pytest runner instead test.py.	2026-01-09 11:59:25 +01:00
Andrei Chekun	8252de7b55	test.py: prepare base class for migration Since all tests share the same base class and some of the tests executed by test.py and some with pytest, we need to handle two cases where configuration is located: suite.yaml and test_config.yaml After full migration suite.yaml case will be removed	2026-01-09 11:59:25 +01:00
Andrei Chekun	48ff74b6b2	test.py: move environment preparation to one method Since anyway these two methods should be called one by one in two different cases: when test.py executes test and pytest executes test, merging them into one. Additionally, set environment variable to show the underneath pytest process that environment was already prepared and there is no need to clean directories or start additional services.	2026-01-09 11:59:25 +01:00
Andrei Chekun	e074e21490	test.py: introduce new environment variable TESTPY_PREPARED_ENVIRONMENT Introduce the new environment variable that will be used to signalize to the pytest runner that environment war already prepared by test.py. This needed to be able to run the test with pytest and test.py(that actually will run pytest underneath).	2026-01-09 11:59:25 +01:00
Asias He	7ba7b25bdd	repair: Implement auto repair for tablet repair This patch implements the basic auto repair support for tablet repair. It was decided to add no per table configuration for the initial implementation, so two scylla yaml config options are introduced to set the default auto repair configs for all the tablet tables. - auto_repair_enabled_default Set true to enable auto repair for tablet tables by default. The value will be overridden by the per keyspace or per table configuration which is not implemented yet. - auto_repair_threshold_default_in_seconds Set the default time in seconds for the auto repair threshold for tablet tables. If the time since last repair is bigger than the configured time, the tablet is eligible for auto repair. The value will be overridden by the per keyspace or per table configuration which is not implemented yet. The following metrcis are added: - auto_repair_needs_repair_nr The number of tablets with auto repair enabled that needs repair - auto_repair_enabled_nr The number of tablets with auto repair enabled The metrics are useful to tell if auto repair is falling behind. In the future, more auto repair scheduling will be added, e.g., scheduling based on the repaired and unrepaired sstable set size, tombstone ratio and so on, in addition to the time based scheduling. Fixes SCYLLADB-99	2026-01-09 16:11:39 +08:00
Botond Dénes	60570d7114	Merge 'topology coordinator: restrict node join/remove to preserve RF-rack validity' from Michael Litvak Allow creating materialized views and secondary indexes in a tablets keyspace only if it's RF-rack-valid, and enforce RF-rack-validity while the keyspace has views by restricting some operations: * Altering a keyspace's RF if it would make the keyspace RF-rack-invalid * Adding a node in a new rack * Removing / Decommissioning the last node in a rack Previously the config option `rf_rack_valid_keyspaces` was required for creating views. We now remove this restriction - it's not needed because we always maintain RF-rack-validity for keyspaces with views. The restrictions are relevant only for keyspaces with numerical RF. Keyspace with rack-list-based RF are always RF-rack-valid. Fixes scylladb/scylladb#23345 Fixes https://github.com/scylladb/scylladb/issues/26820 backport to relevant versions for materialized views with tablets since it depends on rf-rack validity Closes scylladb/scylladb#26354 * github.com:scylladb/scylladb: docs: update RF-rack restrictions cql3: don't apply RF-rack restrictions on vector indexes cql3: add warning when creating mv/index with tablets about rf-rack service/tablet_allocator: always allow tablet merge of tables with views locator: extend rf-rack validation for rack lists test: test rf-rack validity when creating keyspace during node ops locator: fix rf-rack validation during node join/remove test: test topology restrictions for views with tablets test: add test_topology_ops_with_rf_rack_valid topology coordinator: restrict node join/remove to preserve RF-rack validity topology coordinator: add validation to node remove locator: extend rf-rack validation functions view: change validate_view_keyspace to allow MVs if RF=Racks db: enforce rf-rack-validity for keyspaces with views replica/db: add enforce_rf_rack_validity_for_keyspace helper db: remove enforce parameter from check_rf_rack_validity test: adjust test to not break rf-rack validity	2026-01-09 10:01:23 +02:00
Patryk Jędrzejczak	eee2b6c7af	Merge 'tablets: Make balancing disabling RPC preempt tablet transitions' from Tomasz Grabiec Disabling of balancing waits for topology state machine to become idle, to guarantee that no migrations are happening or will happen after the call returns. But it doesn't interrupt the scheduler, which means the call can take arbitrary amount of time. It may wait for tablet repair to be finished, which can take many hours. We should do it via topology request, which will interrupt the tablet scheduler. Enabling of balancing can be immediate. Fixes https://github.com/scylladb/scylladb/issues/27647 Fixes #27210 Closes scylladb/scylladb#27736 * https://github.com/scylladb/scylladb: test: Verify that repair doesn't block disabling of tablet load balancing tablets: Make balancing disabling call preempt tablet transitions	2026-01-08 21:55:19 +02:00
Piotr Dulikowski	8e3e39a64a	Merge 'service/storage_service: update service levels cache after upgrade to v2' from Michał Jadwiszczak Service levels cache is empty after upgrade to consistent topology if no mutations are commited to `system.service_levels_v2` or rolling restart is not done. To fix the bug, this patch adds service levels cache reloading after upgrading the SL data accessor to v2 in `storage_service::topology_state_load()`. Fixes [SCYLLADB-90](https://scylladb.atlassian.net/browse/SCYLLADB-90) This fix should be backported to all versions containing service levels on Raft. [SCYLLADB-90]: https://scylladb.atlassian.net/browse/SCYLLADB-90?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27585 * github.com:scylladb/scylladb: service/storage_service: update service levels cache after upgrade to v2 service/storage_service: check if service levels were already upgraded before doing migration to raft	2026-01-08 21:55:19 +02:00
Michał Hudobski	e2e479f20d	auth: fix cdc vector search indexing permission bug VECTOR_SEARCH_INDEXING permission didn't work on cdc tables as we mistakenly checked for vector indexes on the cdc table insted of the base. This patch fixes that and adds a test that validates this behavior. Fixes: VECTOR-476 Closes scylladb/scylladb#28050	2026-01-08 21:55:19 +02:00
Ernest Zaslavsky	19fe630c0e	Update seastar submodule seastar 4dcd4df..dd46b6fe ``` dd46b6fe net: expose DNS TTL via net::hostent b94f81b0 test: Extend statat() test to check ENOENT exception reporting ``` Closes scylladb/scylladb#28006	2026-01-08 21:55:19 +02:00
Michael Litvak	8f15c7a874	db/view/view_update_generator: move discover_staging_sstables to start Call discover_staging_sstables in view_update_generator::start() instead of in the constructor, because the constructor is called during initialization before sstables are loaded. The initialization order was changed in `5d1f74b86a` and caused this regression. It means the view update generator won't discover staging sstables on startup and view updates won't be generated for them. It also causes issues in sstable cleanup. view_update_generator::start() is called in a later stage of the initialization, after sstable loading, so do the discovery of staging sstables there. Fixes scylladb/scylladb#27956 Closes scylladb/scylladb#27970	2026-01-08 21:55:19 +02:00
Botond Dénes	8c72dcc1ec	Merge 'database: truncate_table_on_all_shards: consider can_flush on all shards' from Benny Halevy Currently, database::truncate_table_on_all_shards calls the table::can_flush only on the coordinator shard and therefore it may miss shards with dirty data if the coordinator shard happens to have empty memtables, leading to clearing the memtables with dirty data rather than flushing them. This change fixes that by making flush safe to be called, even if the memtable list is empty, and calling it on every shard that can flush (i.e. seal_immediate_fn is engaged). Also, change database_test::do_with_some_data is use random keys instead of hard-coded key names, to reproduce this issue with `snapshot_list_contains_dropped_tables`. Fixes #27639 * The issue exists since forever and might cause data loss due to wrongly clearing the memtable, so it needs backport to all live versions Closes scylladb/scylladb#27643 * github.com:scylladb/scylladb: test: database_test: do_with_some_data: randomize keys database: truncate_table_on_all_shards: drop outdated TODO comment database: truncate_table_on_all_shards: consider can_flush on all shards memtable_list: unify can_flush and may_flush test: database_test: add test_flush_empty_table_waits_on_outstanding_flush replica: table, storage_group, compaction_group: add needs_flush test: database_test: do_with_some_data_in_thread: accept void callback function	2026-01-08 21:55:19 +02:00
Avi Kivity	633e6e0037	build: update toolchain generation procedure for optimized clang Explain where to pick up existing clang archives, and how to upload new ones. Closes scylladb/scylladb#27690	2026-01-08 21:55:18 +02:00
Evgeniy Naydanov	a9da14be19	test: dtest: reproducer for parallel rebuild failure 2-DC cluster parallel non-RBNO rebuild failure when expanding RF in DC2. Steps to reproduce: 1. Provision a cluster with 2 datacenters and at least 2 nodes in the second datacenter. 2. Let’s assume datacenter names are "dc1" and "dc2". 3. Create a keyspace ("keyspace1") with RF=0 in dc2. 4. Populate some data into dc1. 5. Change keyspace1 replication in dc2 to 2. 6. On 2 nodes in dc2 run the following command in parallel: nodetool rebuild --source-dc dc1 Parallel execution of rebuilds is not possible with RBNO enabled. This test is the repro for #27804 Closes scylladb/scylladb#27747	2026-01-08 21:55:18 +02:00

1 2 3 4 5 ...

51390 Commits