scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Gleb Natapov	b59b3d4f8a	service level: remove version 1 service level code	2026-03-10 10:46:48 +02:00
Gleb Natapov	b633ec1779	features: move GROUP0_SCHEMA_VERSIONING to deprecated features list	2026-03-10 10:46:48 +02:00
Gleb Natapov	40ec0d4942	migration_manager: remove unused forward definitions	2026-03-10 10:46:48 +02:00
Gleb Natapov	aa9eb0ef8c	test: remove unused code	2026-03-10 10:46:48 +02:00
Gleb Natapov	4660f908f9	auth: drop auth_migration_listener since it does nothing now	2026-03-10 10:46:48 +02:00
Gleb Natapov	74b5a8d43d	schema: drop schema_registry_entry::maybe_sync() function Schema is synced through group0 now. Drop all the test of the function as well.	2026-03-10 10:46:47 +02:00
Gleb Natapov	b9f3281af6	schema: drop make_table_deleting_mutations since it should not be needed with raft Also remove the test since it is no longer relevant	2026-03-10 10:46:47 +02:00
Gleb Natapov	f76199e5c2	schema: remove calculate_schema_digest function It is used by the test only, so remove the test and its data as well.	2026-03-10 10:46:47 +02:00
Gleb Natapov	08e33ad7f7	schema: drop recalculate_schema_version function and its uses There is no need to recalculate schema version any more since it is set by group0.	2026-03-10 10:46:39 +02:00
Gleb Natapov	7bb334a5dd	migration_manager: drop check for group0_schema_versioning feature We do not allow upgrading from a version that does not have it any longer.	2026-03-10 10:39:59 +02:00
Gleb Natapov	4402b030ae	cdc: drop usage of cdc_local table and v1 generation definition	2026-03-10 10:39:59 +02:00
Gleb Natapov	6769615ff1	storage_service: no need to add yourself to the topology during reboot since raft state loading already did it	2026-03-10 10:39:59 +02:00
Gleb Natapov	33fbda9f3b	storage_service: remove unused functions	2026-03-10 10:39:58 +02:00
Gleb Natapov	0e3e7be335	group0: drop with_raft() function from group0_guard since it always returns true now Also drop the code that assumed that the function can return false.	2026-03-10 10:39:58 +02:00
Gleb Natapov	4e56ca3c76	gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more They were used by legacy topology and cdc code only.	2026-03-10 10:39:58 +02:00
Gleb Natapov	77f8f952b2	gossiper: drop tokens from loaded_endpoint_state	2026-03-10 10:39:58 +02:00
Gleb Natapov	706754dc24	gossiper: remove unused functions	2026-03-10 10:39:58 +02:00
Gleb Natapov	8ee4cdd4b7	storage_service: do not pass loaded_peer_features to join_topology() They are not used there any longer.	2026-03-10 10:39:58 +02:00
Gleb Natapov	24c01f2289	storage_service: remove unused fields from replacement_info	2026-03-10 10:39:58 +02:00
Gleb Natapov	2d8722d204	gossiper: drop is_safe_for_restart() function and its use The function checks that the node's state is not left or removed in gossiper during restart, but with raft topology a removed node will not be able to contact the cluster to get this information since it will be banned.	2026-03-10 10:39:58 +02:00
Gleb Natapov	6f739a8ee4	storage_service: remove unused variables from join_topology	2026-03-10 10:39:58 +02:00
Gleb Natapov	d35b83bec8	gossiper: remove the code that was only used in gossiper topology The topology state machine is always present now and can be passed to the gossiper during creation.	2026-03-10 10:39:58 +02:00
Gleb Natapov	390eb46c1a	storage_service: drop the check for raft mode from recovery code In non raft mode the node will node boot at all, so the check is redundant now.	2026-03-10 10:39:58 +02:00
Gleb Natapov	6a7e850161	cdc: remove legacy code The patch removes test/boost/cdc_generation_test.cc since it unit tests cdc::limit_number_of_streams_if_needed function which is remove here.	2026-03-10 10:38:57 +02:00
Gleb Natapov	0b508c5f96	test: remove unused injection points Also remove test_auth_raft_command_split test which is irrelevant since `5ba7d1b116` because it does not use the function that injects max sized command after the commit.	2026-03-10 10:09:39 +02:00
Gleb Natapov	1d188f0394	auth: remove legacy auth mode and upgrade code A system needs to be upgraded to use v2 auth before moving to this ScyllaDB version otherwise the boot will fail.	2026-03-10 10:09:39 +02:00
Gleb Natapov	02fc4ad0a9	treewide: remove schema pull code since we never pull schema any more Schema pull was used by legacy schema code which is not supported for a long time now and during legacy recovery which is no longer supported as well. It can be dropped now.	2026-03-10 10:09:39 +02:00
Gleb Natapov	0cf726c81f	raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer	2026-03-10 10:09:39 +02:00
Gleb Natapov	60a861c518	group0: hoist the checks for an illegal upgrade into main.cc The checks are spread around now, but having then in one place and done as early as possible simplifies the logic.	2026-03-10 10:09:39 +02:00
Gleb Natapov	1ff98c89e3	api: drop get_topology_upgrade_state and always report upgrade status as done Non upgraded version will not boot any longer.	2026-03-10 10:09:38 +02:00
Gleb Natapov	be153a4eb7	service_level_controller: drop service level upgrade code We do not allow upgrade from a version that is not updated yet, so the code is not used any longer.	2026-03-10 10:09:38 +02:00
Gleb Natapov	61cc091364	test: drop run_with_raft_recovery parameter to cql_test_env It is unused.	2026-03-10 10:09:38 +02:00
Gleb Natapov	00083b42a7	group0: get rid of group0_upgrade_state Simplify code by getting rid of group0_upgrade_state since upgrade is no longer supported, so no need to track its state. The none upgraded node will simply not boot and to detect that the patch checks the state directly from the system table.	2026-03-10 10:09:38 +02:00
Gleb Natapov	d4b55de214	storage_service: drop topology_change_kind as it is no longer needed The mode is always raft, so no need to keep a variable that tracks that.	2026-03-10 10:09:38 +02:00
Gleb Natapov	68ea6aa0a6	storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more	2026-03-10 10:09:38 +02:00
Gleb Natapov	06652948f3	service_storage: remove unused functions raft_topology_change_enabled and upgrade_state_to_topology_op_kind are not use any more. Remove the code.	2026-03-10 10:09:38 +02:00
Gleb Natapov	e8c72b7ba0	storage_service: remove non raft rebuild code Only raft is supported now.	2026-03-10 10:09:38 +02:00
Gleb Natapov	49ebab971d	storage_service: set topology change kind only once The only support mode is topology_change_kind::raft, so always set it in storage_service::join_cluster during join or regular boot. Drop the check for legacy mode from raft_group0::setup_group0_if_exist since the mode will not be set at this point any longer. The wrong upgrade will still be detected in storage_service::join_cluster where topology.upgrade_state is checked directly.	2026-03-10 10:09:38 +02:00
Gleb Natapov	4e072977d4	group0: drop in_recovery function and its uses Legacy recovery procedure is no longer supported and the code can be dropped.	2026-03-10 10:09:38 +02:00
Gleb Natapov	770762edd8	group0: rename use_raft to maintenance_mode and make it sync group0_upgrade_state::recovery is now used only in maintenance mode so rename the function to indicate it. Also there is no preemption point in the function any more and it can be a regular function, not a co-routine.	2026-03-10 10:09:33 +02:00
Patryk Jędrzejczak	46b7170347	Merge 'test/pylib: centralize timeout scaling and propagate build_mode in LWT helpers' from Alex Dathskovsky This series improves timeout handling consistency across the test framework and makes build-mode effects explicit in LWT tests. (starting with LWT test that got flaky) 1. Centralize timeout scaling Introduce scale_timeout(timeout) fixture in runner.py to provide a single, consistent mechanism for scaling test timeouts based on build mode. Previously, timeout adjustments were done in an ad-hoc manner across different helpers and tests. Centralizing the logic: Ensures consistent behavior across the test suite Simplifies maintenance and reasoning about timeout behavior Reduces duplication and per-test scaling logic This becomes increasingly important as tests run on heterogeneous hardware configurations, where different build modes (especially debug) can significantly impact execution time. 2. Make scale_timeout explicit in LWT helpers Propagate scale_timeout explicitly through BaseLWTTester and Worker, validating it at construction time instead of relying on implicit pytest fixture injection inside helper classes. Additionally: Update wait_for_phase_ops() and wait_for_tablet_count() to use scale_timeout_by_mode() for consistent polling behavior across modes Update all LWT test call sites to pass build_mode explicitly Increase default timeout values, as the previous defaults were too short and prone to flakiness, particularly under slower configurations such as debug builds Overall, this series improves determinism, reduces flakiness, and makes the interaction between build mode and test timing explicit and maintainable. backport: not required just an enhansment for test.py infra Closes scylladb/scylladb#28840 * https://github.com/scylladb/scylladb: test/auth_cluster: align service-level timeout expectations with scaled config test/lwt: propagate scale_timeout through LWT helpers; scale resize waits Pass scale_timeout explicitly through BaseLWTTester and Worker, validating it at construction time instead of relying on implicit pytest fixture injection inside helper classes. Update wait_for_phase_ops() and wait_for_tablet_count() to use scale_timeout_by_mode() so polling behavior remains consistent across build modes. Adjust LWT test call sites to pass scale_timeout explicitly. Increase default timeout values, as the previous defaults were too short and prone to flakiness under slower configurations (notably debug/dev builds). test/pylib: introduce scale_timeout fixture helper	2026-03-09 10:28:19 +01:00
Patryk Jędrzejczak	4c8dba15f1	Merge 'strong_consistency/state_machine: ensure and upgrade mutations schema' from Michał Jadwiszczak This patch fixes 2 issues within strong consistency state machine: - it might happen that apply is called before the schema is delivered to the node - on the other hand, the apply may be called after the schema was changed and purged from the schema registry The first problem is fixed by doing `group0.read_barrier()` before applying the mutations. The second one is solved by upgrading the mutations using column mappings in case the version of the mutations' schema is older. Fixes SCYLLADB-428 Strong consistency is in experimental phase, no need to backport. Closes scylladb/scylladb#28546 * https://github.com/scylladb/scylladb: test/cluster/test_strong_consistency: add reproducer for old schema during apply test/cluster/test_strong_consistency: add reproducer for missing schema during apply test/cluster/test_strong_consistency: extract common function raft_group_registry: allow to drop append entries requests for specific raft group strong_consistency/state_machine: find and hold schemas of applying mutations strong_consistency/state_machine: pull necessary dependencies db/schema_tables: add `get_column_mapping_if_exists()`	2026-03-09 09:49:22 +01:00
Marcin Maliszkiewicz	4150c62f29	Merge 'test_proxy_protocol: fix flaky system.clients visibility checks' from Piotr Smaron `test_proxy_protocol_port_preserved_in_system_clients` failed because it didn't see the just created connection in system.clients immediately. The last lines of the stacktrace are: ``` # Complete CQL handshake await do_cql_handshake(reader, writer) # Now query system.clients using the driver to see our connection cql = manager.get_cql() rows = list(cql.execute( f"SELECT address, port FROM system.clients WHERE address = '{fake_src_addr}' ALLOW FILTERING" )) # We should find our connection with the fake source address and port > assert len(rows) > 0, f"Expected to find connection from {fake_src_addr} in system.clients" E AssertionError: Expected to find connection from 203.0.113.200 in system.clients E assert 0 > 0 E + where 0 = len([]) ``` Explanation: we first await for the hand-made connection to be completed, then, via another connection, we're querying system.clients, and we don't get this hand-made connection in the resultset. The solution is to replace the bare cql.execute() calls with await wait_for_results(), a helper that polls via cql.run_async() until the expected row count is reached (30 s timeout, 100 ms period). Fixes: SCYLLADB-819 The flaky test is present on master and in previous release, so backporting only there. Closes scylladb/scylladb#28849 * github.com:scylladb/scylladb: test_proxy_protocol: introduce extra logging to aid debugging test_proxy_protocol: fix flaky system.clients visibility checks	2026-03-09 08:37:57 +01:00
Yaron Kaikov	977bdd6260	.github/workflows/trigger-scylla-ci: fix heredoc injection in trigger-scylla-ci workflow Move all ${{ }} expression interpolations into env: blocks so they are passed as environment variables instead of being expanded directly into shell scripts. This prevents an attacker from escaping the heredoc in the Validate Comment Trigger step and executing arbitrary commands on the runner. The Verify Org Membership step is hardened in the same way for defense-in-depth. Refs: GHSA-9pmq-v59g-8fxp Fixes: SCYLLADB-954 Closes scylladb/scylladb#28935	2026-03-08 21:34:51 +02:00
Artsiom Mishuta	fda68811e8	test.py: fix strict-config argument. The ini-level strict_config was removed/never existed as a config key in pytest 8 — it's only a command-line flag(and back in pytest 9) In pytest 8.3.5, the equivalent is the --strict-config CLI flag, not an ini option Fixes SCYLLADB-955 Closes scylladb/scylladb#28939	2026-03-08 16:09:29 +02:00
Dawid Mędrek	5feed00caa	Merge 'raft: read_barrier: update local commit_idx to read_idx when it's safe' from Patryk Jędrzejczak When the local entry with `read_idx` belongs to the current term, it's safe to update the local `commit_idx` to `read_idx`. The motivation for this change is to speed up read barriers. `wait_for_apply` executed at the end of `read_barrier` is delayed until the follower learns that the entry with `read_idx` is committed. It usually happens quickly in the `read_quorum` message. However, non-voters don't receive this message, so they have to wait for `append_entries`. If no new entries are being added, `append_entries` can come only from `fsm::tick_leader()`. For group0, this happens once every 100ms. The issue above significantly slows down cluster setups in tests. Nodes join group0 as non-voters, and then they are met with several read barriers just after a write to group0. One example is `global_token_metadata_barrier` in `write_both_read_new` performed just after `update_topology_state` in `write_both_read_old`. I tested the performance impact of this change with the following test: ```python for _ in range(10): await manager.servers_add(3) ``` It consistently takes 44-45s with the change and 50-51s without the change in dev mode. No backport: - non-critical performance improvement mostly relevant in tests, - the change requires some soak time in master. Closes scylladb/scylladb#28891 * github.com:scylladb/scylladb: raft: server: fix the repeating typo raft: clarify the comment about read_barrier_reply raft: read_barrier: update local commit_idx to read_idx when it's safe raft: log: clarify the specification of term_for	2026-03-06 18:50:08 +01:00
Piotr Smaron	f12e4ea42b	test_proxy_protocol: introduce extra logging to aid debugging In case of an error, we want to see the contents of the system.clients table to have a better understanding of what happened - whether the row(s) are really missing or maybe they are there, but 1 digit doesn't match or the row is half-written. We'll therefore query for the whole table on the CQL side, and then filter out the rows we want to later proceed with on the python side. This way we can dump the contents of the whole system.clients table if something goes south.	2026-03-06 14:50:12 +01:00
Piotr Smaron	d8cf2c5f23	test_proxy_protocol: fix flaky system.clients visibility checks `test_proxy_protocol_port_preserved_in_system_clients` failed because it didn't see the just created connection in system.clients immediately. The last lines of the stacktrace are: ``` # Complete CQL handshake await do_cql_handshake(reader, writer) # Now query system.clients using the driver to see our connection cql = manager.get_cql() rows = list(cql.execute( f"SELECT address, port FROM system.clients WHERE address = '{fake_src_addr}' ALLOW FILTERING" )) # We should find our connection with the fake source address and port > assert len(rows) > 0, f"Expected to find connection from {fake_src_addr} in system.clients" E AssertionError: Expected to find connection from 203.0.113.200 in system.clients E assert 0 > 0 E + where 0 = len([]) ``` Explanation: we first await for the hand-made connection to be completed, then, via another connection, we're querying system.clients, and we don't get this hand-made connection in the resultset. The solution is to replace the bare cql.execute() calls with await wait_for_results(), a helper that polls via cql.run_async() until the expected row count is reached (30 s timeout, 100 ms period). Fixes: SCYLLADB-819	2026-03-06 14:49:59 +01:00
Botond Dénes	4fdc0a5316	Merge 'Relax test's check_mutation_replicas() argument list' from Pavel Emelyanov The one accepts long list of arguments, some of those is not really needed. Also some callers can be relaxed not to provide default values for arguments with such. Improving tests, not backporting Closes scylladb/scylladb#28861 * github.com:scylladb/scylladb: test: Remove passing default "expected_replicas" to check_mutation_replicas() test: Remove scope and primary-replica-only arguments from check_mutation_replicas() helper	2026-03-06 11:25:00 +02:00
Szymon Malewski	d817e56e87	vector_similarity_fcts.cc: fix strict aliasing violation in extract_float_vector Previous code performed endian conversion by bulk-copying raw bytes into a std::vector<float> and then iterating over it via a reinterpret_cast<uint32_t> pointer. Accessing float storage through a uint32_t violates C++ strict aliasing rules, giving the compiler freedom to reorder or elide the stores, causing undefined behavior. Replace the two-pass approach with a single-pass loop using seastar::consume_be<uint32_t>() and std::bit_cast<float>(), which is both well-defined and auto-vectorizable. Follow-up #28754 Closes scylladb/scylladb#28912	2026-03-06 09:15:45 +01:00

1 2 3 4 5 ...

52412 Commits