scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Emil Maskovsky	5dac4b38fb	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 Backport: The test is primarily for an issue found in 2025.1, so it needs to be backported to all the 2025.x branches. Closes scylladb/scylladb#25685	2025-09-01 13:59:47 +02:00
Petr Gusev	2e757d6de4	cas: pass timeout_if_partially_accepted := write to accept_proposal() Write requests cannot be safely retried if some replicas respond with accepts and others with rejects. In this case, the coordinator is uncertain about the outcome of the LWT: a subsequent LWT may either complete the Paxos round (if a quorum observed the accept) or overwrite it (if a quorum did not). If the original LWT was actually completed by later rounds and the coordinator retried it, the write could be applied twice, potentially overwriting effects of other LWTs that slipped in between. Read requests do not have this problem, so they can be safely retried. Before this commit, handler->accept_proposal was called with timeout_if_partially_accepted := true. This caused both read and write requests to throw an "uncertainty" timeout to the user in the case of the contention described above. After this commit, we throw an "uncertainty" timeout only for write requests, while read requests are instead retried in the loop in sp::cas. Closes scylladb/scylladb#25602	2025-09-01 14:31:04 +03:00
Dawid Mędrek	fc50e9d0a4	test/perf: Require smp=1 in perf_cache_eviction Trying to run the test with more than one shard results in a failure when generating sharding metadata: ``` ERROR 2025-08-27 16:00:17,551 [shard 0:main] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /tmp/scylla-c9fa42fe/ks/cf-2938a030834e11f0a561ffa33feb022d/me-3gt6_12wh_1gifk2ijgeu1ovc1m5-big-Data.db). Aborting ``` Let's require that the test be run with a single shard. Closes scylladb/scylladb#25703	2025-09-01 08:59:35 +03:00
Andrei Chekun	e55c8a9936	test.py: modify run to use different junit output filenames Currently, run will execute twice pytest without modifying the path of the JUnit XML report. This leads that the second execution of the pytest will override the report. This PR fixing this issue so both reports will be stored. Closes scylladb/scylladb#25726	2025-09-01 08:56:48 +03:00
Avi Kivity	dfc7957a73	Merge 'test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints' from Benny Halevy Following up on `6129411a5e` improve test_vnode_keyspace_describe_ring be verifying that the endpoints listed by describe_ring match those returned by the `natural_endpoints` api (for random tokens). The latter are calculated using an independent code path directly from the effective_replication_map. * test exists currently only on master, no backport required Closes scylladb/scylladb#25610 * github.com:scylladb/scylladb: test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints test/pylib/rest_client: add natural_endpoints function	2025-08-31 20:36:15 +03:00
Avi Kivity	bae66cc0d8	Merge 'types: add byte-comparable format support for collections' from Lakshmi Narayanan Sreethar This PR builds on the byte comparable support introduced in #23541 to add byte comparable support for all the collection types. This implementation adheres to the byte-comparable format specification in https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md Refs https://github.com/scylladb/scylladb/issues/19407 New feature - backport not required. Closes scylladb/scylladb#25603 * github.com:scylladb/scylladb: types/comparable_bytes: add compatibility testcases for collection types types/comparable_bytes: update compatibility testcase to support collection types types/comparable_bytes: support empty type types/comparable_bytes: support reversed types types/comparable_bytes: support vector cql3 type types/comparable_bytes: support tuple and UDT cql3 type types/comparable_bytes: support map cql3 type types/comparable_bytes: support set and list cql3 types types/comparable_bytes: introduce encode/decode_component types/comparable_bytes: introduce to_comparable_bytes/from_comparable_bytes	2025-08-31 15:53:27 +03:00
Nadav Har'El	ff91027eac	utils, alternator: fix detection of invalid base-64 This patch fixes an error-path bug in the base-64 decoding code in utils/base64.cc, which among other things is used in Alternator to decode blobs in JSON requests. The base-64 decoding code has a lookup table, which was wrongly sized 255 bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF) was included in an invalid base-64 string, instead of detecting that this is an invalid byte (since the only valid bytes in a base-64 string are A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a nonsense 6-bit part, or even crash on an out-of-bounds read. Besides the trivial fix, this patch also includes a reproducing test, which tries to write a blob as a supposedly base-64 encoded string with a 0xFF byte in it. The test fails before this patch (the write succeeds, unexpectedly), and passes after this patch (the write fails as expected). The test also passes on DynamoDB. Fixes #25701 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25705	2025-08-31 15:38:01 +03:00
Piotr Wieczorek	5add43e15c	alternator: streams: Address minor incompatibilities with DynamoDB in GetRecords response. This commit adds missing fields to GetRecords responses: `awsRegion` and `eventVersion`. We also considered changing `eventSource` from `scylladb:alternator` to `aws:dynamodb` and setting `SizeBytes` subfield inside the `dynamodb` field. We set `awsRegion` to the datacenter's name of the node that received the request. This is in line with the AWS documentation, except that Scylla has no direct equivalent of a region, so we use the datacenter's name, which is analogous to DynamoDB's concept of region. The field `eventVersion` determines the structure of a Record. It is updated whenever the structure changes. We think that adding a field `userIdentity` bumped the version from `1.0` to `1.1`. Currently, Scylla doesn't support this field (#11523), hence we use the older 1.0 version. We have decided to leave `eventSource` as is, since it's easy to modify it in case of problems to `aws:dynamodb` used by DynamoDB. Not setting `SizeBytes` subfield inside the `dynamodb` field was dictated by the lack of apparent use cases. The documentation is unclear about how `SizeBytes` is calculated and after experimenting a little bit, I haven't found an obvious pattern. Fixes: #6931 Closes scylladb/scylladb#24903	2025-08-31 14:55:47 +03:00
Avi Kivity	bc5773f777	Merge 'Add out of space prevention mechanisms' from Łukasz Paszkowski When a scaling out is delayed or fails, it is crucial to ensure that clusters remain operational and recoverable even under extreme conditions. To achieve this, the following proactive measures are implemented: - reject writes - includes: inserts, updates, deletes, counter updates, hints, read+repair and lwt writes - applicable to: user tables, views, CDC log, audit, cql tracing - stop running compactions/repairs and prevent from starting new ones - reject incoming tablet migrations The aforementioned mechanisms are automatically enabled when node's disk utilization reaches the critical level (default: 98%) and disabled when the utilization drop below the threshold. Apart from that, the series add tests that require mounted volumes to simulate out of space. The paths to the volumes can be provided using the a pytest argument, i.e. `--space-limited-dirs`. When not provided, tests are skipped. Test scenarios: 1. Start a cluster and write data until one of the nodes reaches 90% of the disk utilization 2. Perform an operation that would take the nodes over 100% 3. The nodes should not exceed the critical disk utilization (98% by default) 4. Scale out the cluster by adding one node per rack 5. Retry or wait for the operation from step 2 The operation is: writing data, running compactions, building materialized views, running repair, migrating tablets (caused by RF change, decommission). The test is successful, if no nodes run out of space, the operation from step 2 is aborted/paused/timed out and the operation from step 5 is successful. `perf-simple-query --smp 1 -m 1G` results obtained for fixed 400MHz frequency: Read path (before) ``` instructions_per_op: mean= 39661.51 standard-deviation=34.53 median= 39655.39 median-absolute-deviation=23.33 maximum=39708.71 minimum=39622.61 ``` Read path (after) ``` instructions_per_op: mean= 39691.68 standard-deviation=34.54 median= 39683.14 median-absolute-deviation=11.94 maximum=39749.32 minimum=39656.63 ``` Write path (before): ``` instructions_per_op: mean= 50942.86 standard-deviation=97.69 median= 50974.11 median-absolute-deviation=34.25 maximum=51019.23 minimum=50771.60 ``` Write path (after): ``` instructions_per_op: mean= 51000.15 standard-deviation=115.04 median= 51043.93 median-absolute-deviation=52.19 maximum=51065.81 minimum=50795.00 ``` Fixes: https://github.com/scylladb/scylladb/issues/14067 Refs: https://github.com/scylladb/scylladb/issues/2871 No backport, as it is a new feature. Closes scylladb/scylladb#23917 * github.com:scylladb/scylladb: tests/cluster: Add new storage tests test/scylla_cluster: Override workdir when passed via cmdline streaming: Reject incoming migrations storage_service: extend locator::load_stats to collect per-node critical disk utilization flag repair_service: Add a facility to disable the service compaction_manager: Subscribe to out of space controller compaction_manager: Replace enabled/disabled states with running state database: Add critical_disk_utilization mode database can be moved to disk_space_monitor: add subscription API for threshold-based disk space monitoring docs: Add feature documentation config: Add critical_disk_utilization_level option replica/exceptions: Add a new custom replica exception	2025-08-30 18:47:57 +03:00
Calle Wilund	cc9eb321a1	commitlog: Ensure segment deletion is re-entrant Fixes #25709 If we have large allocations, spanning more than one segment, and the internal segment references from lead to secondary are the only thing keeping a segment alive, the implicit drop in discard_unused_segments and orphan_all can cause a recursive call to discard_unused_segments, which in turn can lead to vector corruption/crash, or even double free of segment (iterator confusion). Need to separate the modification of the vector (_segments) from actual releasing of objects. Using temporaries is the easiest solution. To further reduce recursion, we can also do an early clear of segment dependencies in callbacks from segment release (cf release). Closes scylladb/scylladb#25719	2025-08-30 08:24:57 +02:00
Piotr Dulikowski	7ccb50514d	Merge 'Introduce view building coordinator' from Michał Jadwiszczak This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views. The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table. The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built. The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results. This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch. The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation. If the operation fails at any moment, aborted tasks are rollback. The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149). For detailed description check: `docs/dev/view-building-coordinator.md` Fixes https://github.com/scylladb/scylladb/issues/22288 Fixes https://github.com/scylladb/scylladb/issues/19149 Fixes https://github.com/scylladb/scylladb/issues/21564 Fixes https://github.com/scylladb/scylladb/issues/17603 Fixes https://github.com/scylladb/scylladb/issues/22586 Fixes https://github.com/scylladb/scylladb/issues/18826 Fixes https://github.com/scylladb/scylladb/issues/23930 --- This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942 Closes scylladb/scylladb#23760 * github.com:scylladb/scylladb: test/cluster: add view build status tests test/cluster: add view building coordinator tests utils/error_injection: allow to abort `injection_handler::wait_for_message()` test: adjust existing tests utils/error_injection: add injection with `sleep_abortable()` db/view/view_builder: ignore `no_such_keyspace` exception docs/dev: add view building coordinator documentation db/view/view_building_worker: work on `process_staging` tasks db/view/view_building_worker: register staging sstable to view building coordinator when needed db/view/view_building_worker: discover staging sstables db/view/view_building_worker: add method to register staging sstable db/view/view_update_generator: add method to process staging sstables instantly db/view/view_update_generator: extract generating updates from staging sstables to a method db/view/view_update_generator: ignore tablet-based sstables db/view/view_building_coordinator: update view build status on node join/left db/view/view_building_coordinator: handle tablet operations db/view: add view building task mutation builder service/topology_coordinator: run view building coordinator db/view: introduce `view_building_coordinator` db/view/view_building_worker: update built views locally db/view: introduce `view_building_worker` db/view: extract common view building functionalities db/view: prepare to create abstract `view_consumer` message/messaging_service: add `work_on_view_building_tasks` RPC service/topology_coordinator: make `term_changed_error` public db/schema_tables: create/cleanup tasks when an index is created/dropped service/migration_manager: cleanup view building state on drop keyspace service/migration_manager: cleanup view building state on drop view service/migration_manager: create view building tasks on create view test/boost: enable proxy remote in some tests service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` service/migration_manager: coroutinize `prepare_new_view_announcement()` service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine` service: reload `view_building_state_machine` on group0 apply() service/vb_coordinator: add currently processing base db/system_keyspace: move `get_scylla_local_mutation()` up db/system_keyspace: add `view_building_tasks` table db/view: add view_building_state and views_state db/system_keyspace: add method to get view build status map db/view: extract `system.view_build_status_v2` cql statements to system_keyspace db/system_keyspace: move `internal_system_query_state()` function earlier db/view: ignore tablet-based views in `view_builder` gms/feature_service: add VIEW_BUILDING_COORDINATOR feature	2025-08-29 17:28:44 +02:00
Łukasz Paszkowski	e34deea50e	tests/cluster: Add new storage tests The storage submodule contains tests that require mounted volumes to be executed. The volumes are created automatically with the `volumes_factory` fixture. The tests in this suite are executed with the custom launcher `unshare -mr pytest` Test scenarios (when one node reaches critical disk utilization): 1. Reject user table writes 2. Disable/Enabled compaction 3. Reject split compactions 4. New split compactions not triggered 5. Abort tablet repair 6. Disable/Enabled incoming tablet migrations 7. Restart a node while a tablet split is triggered	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	4bb5696a5d	test/scylla_cluster: Override workdir when passed via cmdline Currently, workdir is set in ScyllaCluster constructor and it does not take into accout that the value could be overridden via cmdline arguments. When this happens, then some data (logs, configs) are stored under one path and other (data) is stored under a different. The patch allows overriding the value when passed via cmdline arguments leading to all files being stored under the same path.	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	9539e80e54	compaction_manager: Subscribe to out of space controller	2025-08-29 14:56:07 +02:00
Łukasz Paszkowski	3d03b88719	database: Add critical_disk_utilization mode database can be moved to When database operates in the critical disk utilization mode, all mutation writes including inserts, updates, deletes, counter updates, hints, read+repair, lwt writes) to user tables and other associated with them tables like views, CDC log, audit are rejected, with a clear error exception returned. The mode is meant to be used with the disk space monitor in order to prevent any user writes when node's disk utilization is too high.	2025-08-29 13:46:45 +02:00
Lakshmi Narayanan Sreethar	ce0c29e024	types/comparable_bytes: add compatibility testcases for collection types This patch adds compatibility testcases for the following cql3 types : set, list, map, tuple, vector and reversed types. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	4547f6f188	types/comparable_bytes: update compatibility testcase to support collection types The `abstract_type::from_string()` method used to parse the input data doesn't support collections yet. So the collection testdata will be passed as JSON strings to the testcase. This patch updates the testcase to adapt to this workaround. Also, extended the testcase to verify that Scylla's implementation can successfully decode the byte comparable output encoded by Cassandra. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	0997b3533c	types/comparable_bytes: support empty type Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	b799101a09	types/comparable_bytes: support reversed types A reversed type is first encoded using the underlying type and then all the bits are flipped to ensure that the lexicographical sort order is reversed. During decode, the bytes are flipped first and then decoded using the underlying type. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	6c2a3e2c51	types/comparable_bytes: support vector cql3 type The CQL vector type encoding is similar to the lists, where each element is transformed into a byte-comparable format and prefixed with a component marker. The sequence is terminated with a terminator marker to indicate the end of the collection. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	1ccfe522f1	types/comparable_bytes: support tuple and UDT cql3 type The CQL tuple and UDT types share the same internal implementation and therefore use the same byte comparable encoding. The encoding is similar to lists, where each element is transformed into a byte-comparable format and prefixed with a component marker. The sequence is terminated with a terminator marker to indicate the end of the collection. TODO: Add duplicate test items to maps, lists and sets For maps, add more entries that share keys ex map1 : key1 : value1, key2 : value2 map2 : key1 : value4 map3 : key2 : value5 etc Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	ca38c15a97	types/comparable_bytes: support map cql3 type The CQL map type is encoded as a sequence of key-value pairs. Each key and each value is individually prefixed with a component marker, and the sequence is terminated with a terminator marker to indicate the end of the collection. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	4d5e5f0c84	types/comparable_bytes: support set and list cql3 types The CQL set and list types are encoded as a sequence of elements, where each element is transformed into a byte-comparable format and prefixed with a component marker. The sequence is terminated with a terminator marker to indicate the end of the collection. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	8e46e8be01	types/comparable_bytes: introduce encode/decode_component The components of a collection, such as an element from a list, set, or vector; a key or value from a map; or a field from a tuple, share the same encode and decode logic. During encode, the component is transformed into the byte comparable format and is prefixed with the `NEXT_COMPONENT` marker. During decode, the component is transformed back into its serialized form and is prefixed with the serialized size. A null component is encoded as a single `NEXT_COMPONENT_NULL` marker and during decode, a `-1` is written to the serialized output. This commit introduces few helper methods that implement the above mentioned encode and decode logics. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:21 +05:30
Łukasz Paszkowski	3e740d25b5	disk_space_monitor: add subscription API for threshold-based disk space monitoring Introduce the `subscribe` method to disk_space_monitor, allowing clients to register callbacks triggered when disk utilization crosses a configurable threshold. The API supports flexible trigger options, including notifications on threshold crossing and direction (above/below). This enables more granular and efficient disk space monitoring for consumers.	2025-08-28 18:06:37 +02:00
Avi Kivity	46193f5e79	Merge 'service/qos: Modularize service level controller to avoid invalid access to auth::service' from Dawid Mędrek Move management over effective service levels from `service_level_controller` to a new dedicated type -- `auth_integration`. Before these changes, it was possible for the service level controller to try to access `auth::service` after it was deinitialized. For instance, it could happen when reloading the cache. That HAS happened as described in the following issue: scylladb/scylladb#24792. Although the problem might have been mitigated or even resolved in scylladb/scylladb@10214e13bd, it's not clear how the service will be used in the future. It's best to prevent similar bugs than trying to fix them later on. The logic responsible for preventing to access an uninitialized `auth::service` was also either non-existent, complex, or non-sufficient. To prevent accessing `auth::service` by the service level controller, we extract the relevant portion of the code to a separate entity -- `auth_integration`. It's an internal helper type whose sole purpose is to manage effective service levels. Thanks to that, we were able to nest the lifetime of `auth_integration` within the lifetime of `auth::service`. It's now impossible to attempt to dereference it while it's uninitialized. If a bug related to an invalid access is spotted again, though, it might also be easier to debug it now. There should be no visible change to the users of the interface of the service level controller. We strived to make the patch minimal, and the only affected part of the logic should be related to how `auth::service` is accessed. The relevant portion of the initialization and deinitialization flow: (a) Before the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. 2. Initialize other services. 3. Initialize and start `auth::service`. 4. (work) 5. Stop and deinitialize `auth::service`. 6. Deinitialize other services. 7. Deinitialize `service_level_controller`. (b) After the changes: 1. Initialize `service_level_controller`. Pass a reference to an uninitialized `auth::service` to it. () 2. Initialize other services. 3. Initialize and start `auth::service`. 4. Initialize `auth_integration`. Register it in `service_level_controller`. 5. (work) 6. Unregister `auth_integration` in `service_level_controller` and deinitialize it. 7. Stop and deinitialize `auth::service`. 8. Deinitialize other services. 9. Deinitialize `service_level_controller`. (): The reference to `auth::service` in `service_level_controller` is still necessary. We need to access the service when dropping a distributed service level. Although it would be best to cut that link between the service level controller and `auth::service` too, effectively separating the entities, it would require more work, so we leave it as-is for now. It shouldn't prove problematic as far as accessing an uninitialized service goes. Trying to drop a service level at the point when we're de-initializing auth should be impossible. For more context, see the function `drop_distributed_service_level` in `service_level_controller`. A trivial test has been included in the PR. Although its value is questionable as we only try to reload the service level cache at a specific moment, it's probably the best we can deliver to provide a reproducer of the issue this patch is resolving. Fixes scylladb/scylladb#24792 Backport: The impact of the bug was minimal as it only affected the shutdown. However, since CI is failing because of it, let's backport the change to all supported versions. Closes scylladb/scylladb#25478 * github.com:scylladb/scylladb: service/qos: Move effective SL cache to auth_integration service/qos: Add auth::service to auth_integration service/qos: Reload effective SL cache conditionally service/qos: Add gate to auth_integration service/qos: Introduce auth_integration	2025-08-28 13:42:55 +03:00
Michał Jadwiszczak	39db90a535	test/cluster: add view build status tests	2025-08-27 10:23:04 +02:00
Michał Jadwiszczak	f7ebc7b054	test/cluster: add view building coordinator tests	2025-08-27 10:23:04 +02:00
Michał Jadwiszczak	cf138da853	test: adjust existing tests - Disable tablets in `test_migration_on_existing_raft_topology`. Because views on tablets are experimental now, we can safely assume that view building coordinator will start with view build status on raft. - Add error injection to pause view building on worker. Used to pause view building process, there is analogous error injection in view_builder. - Do a read barrier in `test_view_in_system_tables` Increases test stability by making sure that the node sees up-to-date group0 state and `system.built_views` is synced. - Wait for view is build in some tests Increases tests stability by making sure that the view is built. - Remove xfail marker from `test_tablet_streaming_with_unbuilt_view` This series fix https://github.com/scylladb/scylladb/issues/21564 and this test should work now.	2025-08-27 10:23:04 +02:00
Michał Jadwiszczak	233f4dcee3	db/view/view_building_worker: register staging sstable to view building coordinator when needed Change return type of `check_needs_view_update_path()`. Instead of retrning bool which tells whether to use staging directory (and register to `view_update_generator`) or use normal directory. Now the function returns enum with possible values: - `normal_directory` - use normal directory for the sstable - `staging_directly_to_generator` - use staging directory and register to `view_update_generator` - `staging_managed_by_vbc` - use staging directory but don't register it to `view_update_generator` but create view building tasks for later The third option is new, it's used when the table has any view which is in building process currrently. In this case, registering it to `view_update_generator` prematurely may lead to base-view inconsistency (for example when a replica is in a pending state).	2025-08-27 10:23:03 +02:00
Michał Jadwiszczak	6e3e287a39	db/schema_tables: create/cleanup tasks when an index is created/dropped Similarly as in previous commits, create view building tasks when an index is created and cleanup view building status when it's dropped.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	19651b4978	test/boost: enable proxy remote in some tests After a few next patches, creating/dropping a view in tablet keyspace will require a remote proxy to obtain references to system keyspace and view building state. Because of this, remote proxy needs to be explicitly enabled in boost tests which create views.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	204f61ffe1	service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` The reference is needed to get `view_building_state_machine`.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	d2e1b6d44a	service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine` Those references are needed to manage view building tasks while a view is created/dropped.	2025-08-27 08:55:47 +02:00
Michał Jadwiszczak	f2e7051a84	service: reload `view_building_state_machine` on group0 apply() The state may be also reloaded on `topology_change` or `mixed_change` because topology coordinator may change view building tasks during tablet operations.	2025-08-27 08:55:47 +02:00
Nadav Har'El	e2c99436cf	Merge 'cdc, vector_search: enable CDC when the index is created' from Dawid Pawlik When a vector index is created in Scylla, it is initially built using a full scan of the database. After that, it stays up to date by tracking changes through CDC, which should be automatically enabled when the vector index is created. When a user attempts to enable Vector Search (VS), the system checks whether Change Data Capture (CDC) is enabled and properly configured: 1. CDC is not enabled - CDC is automatically enabled with the minimum required TTL (Time-to-Live) for VS (24 hours) and the delta mode set to 'full' or post-image is enabled. - If the user later tries to reduce the CDC TTL below 24 hours or set delta mode to 'keys' with post-image disabled, the action fails. - Error message: Clearly states that CDC TTL must be at least 24 hours and delta mode must be set to 'full' or post-image must be enabled for VS to function. 2. CDC is already enabled - If CDC TTL is ≥ 24 hours and delta mode is set to 'full' or post-image is enabled: VS is enabled successfully. - If CDC TTL is < 24 hours or delta mode is set to 'keys' with post-image disabled: The VS enabling process fails. - Error message: Informs the user that CDC TTL must be at least 24 hours, delta mode must be set to 'full' or post-image must be enabled, and provides a link to documentation on how to update the TTL, delta mode, and post-image. When a user attempts to disable CDC when VS is enabled, the action will fail and the user will be informed by error message that clearly states that VS needs to be disabled (vector indexes have to be dropped) first. Full setup requirements and steps will be detailed in the documentation of Vector Search. Co-authored-by: @smoczy123 Fixes: VECTOR-27 Fixes: VECTOR-25 Closes scylladb/scylladb#25179 * github.com:scylladb/scylladb: test/cqlpy: ensure Vector Search CDC options test/boost: adjust CDC boost tests for Vector Search test/cql: add Vector Search CDC enable/disable test cdc, vector_index: provide minimal option setup for Vector Search test/cqlpy: adjust describe table tests with CDC for Vector Search describe, cdc: adjust describe for cdc log tables cdc: enable CDC log when vector index is created test/cqlpy: run vector_index tests only on vnodes vector_index: check if vector index exists in schema	2025-08-26 23:01:32 +03:00
Dawid Mędrek	dd5a35dc67	service/qos: Add auth::service to auth_integration The new service, `auth_integration`, has taken over the responsibility over managing effective service levels from `service_level_controller`. However, before these changes, it still accessed `auth::service` via the service level controller. Let's change that. Note that we also remove a check that `auth::service` has been initialized. It's not necessary anymore because the lifetime of `auth_integration` is strictly nested within the lifetime of `auth::service`. In actuality, `service_level_controller` should lose its reference to `auth::service` completely. All of the management over effective service levels has already been moved to `auth_integration`. However, the referernce is still needed when dropping a distributed service level because we need to update the corresponding attribute for relevant roles. That should not lead to invalid accesses, though. Dropping a service level should not be possible when `auth::service` is not initialized.	2025-08-26 18:41:43 +02:00
Dawid Mędrek	e929279d74	service/qos: Reload effective SL cache conditionally Since `service_level_controller` outlives `auth_integration`, it may happen that we try to access it when it has already been deinitialized. To prevent that, we only try to reload or clear the effective service level cache when the object is still alive. These changes solve an existing problem with an invalid memory access. For more context, see issue scylladb/scylladb#24792. We provide a reproducer test that consistently fails before these changes but passes after them. Fixes scylladb/scylladb#24792	2025-08-26 18:41:40 +02:00
Dawid Mędrek	7d0086b093	service/qos: Introduce auth_integration We introduce a new type, `auth_integration`, that will be used internally by `service_level_controller`. Its purpose is to take over the responsibility over managing effective service levels. The main problem of the current implementation of service level controller is its dependency on `auth::service` whose lifetime is strictly nested within the lifetime of service level controller. That may and already have led to invalid memory accesses; for an example, see issue scylladb/scylladb#24792. Our strategy is to split service level controller into smaller parts and ensure that we access `auth::service` only when it's valid to do so. This commit is the first step towards that. We don't change anything in the logic yet, just add the new type. Further adjustments will be made in following commits.	2025-08-26 18:41:34 +02:00
Nadav Har'El	87dd96f9a2	Merge ' Alternator: DynamoDB compatible WCU Calculation via Read-Before-Write Support' from Amnon Heiman This series adds support for a DynamoDB-compatible Write Capacity Unit (WCU) calculation in Alternator by introducing an optional forced read-before-write mechanism. Alternator's model differs from DynamoDB, and as a result, some write operations may report lower WCU usage compared to what DynamoDB would report. While this is acceptable in many cases, there are scenarios where users may require accurate WCU reporting that aligns more closely with DynamoDB's behavior. To address this, a new configuration option, alternator_force_read_before_write, is introduced. When enabled, Alternator will perform a read before executing PutItem, UpdateItem, and DeleteItem operations. This allows it to take the existing item size into account when computing the WCU. BatchWriteItem support is also extended to use this mechanism. Because BatchWriteItem does not support returning old items directly, several internal changes were made to support reading previous item sizes with minimal overhead. Reads are performed at consistency level LOCAL_ONE for efficiency, and the WCU calculation is now done in multiple stages to accurately account for item size differences. In addition to the implementation changes, test coverage was added to validate the new behavior. These tests confirm that WCU is calculated based on the larger of the old and new items when read-before-write is active, including for BatchWriteItem. This feature comes with performance overhead and is therefore disabled by default. It can be enabled at runtime via the system.config table and should be used only when precise WCU tracking is necessary. New feature, no need to backport Closes scylladb/scylladb#24436 * github.com:scylladb/scylladb: alternator/test_returnconsumedcapacity.py: Test forced read before write alternator/executor.cc: DynamoDB WCU calculation in BatchWriteItem using read-before-write executor.cc: get_previous_item with consistency level executor: Extend API of put_or_delete_item alternator/executor.cc: Accurate WCU for put, update, delete config: add alternator_force_read_before_write	2025-08-24 11:38:24 +03:00
Avi Kivity	8815491085	treewide: include boost headers as "system" headers Boost is external to the project so treat its headers as "system" headers and include them with angle brackets. Closes scylladb/scylladb#25619	2025-08-22 17:21:24 +03:00
Piotr Dulikowski	5709d94826	Merge 'cql3: Warn when creating RF-rack-invalid keyspace' from Dawid Mędrek Although RF-rack-valid keyspaces are not universally enforced yet (they're governed by the configuration option `rf_rack_valid_keyspaces`), we'd like to encourage the user to abide by the restriction. To that end, we're introducing a warning when creating or altering a keyspace. If the configuration option is disabled, but the user is trying to create an RF-rack-invalid keyspace, they'll receive a warning. If the option is turned off, we will also log all of the RF-rack-invalid keyspaces at start-up. We provide validation tests. Fixes scylladb/scylladb#23330 Backport: we'd like to encourage the user to abide by the restriction even when they don't enforce it to make it easier in the future to adjust the schema when there's no way to disable it anymore. Because of that, we'd like to backport it to all relevant versions, starting with 2025.1. Closes scylladb/scylladb#24785 * github.com:scylladb/scylladb: main: Log RF-rack-invalid keyspaces at startup cql3/statements: Fix indentation cql3: Warn when creating RF-rack-invalid keyspace	2025-08-22 11:33:32 +02:00
Evgeniy Naydanov	ab15c94a09	test.py: dtest/commitlog_test: add test_pinned_cl_segment_doesnt_resurrect_data test_pinned_cl_segment_doesnt_resurrect_data was not moved in #24946 from scylla-dtest to this repo, because it's marked as xfail (#14879), but actually the issue is fixed and there is no reason to keep the test in scylla-dtest. Also remove unused imports. Closes scylladb/scylladb#25592	2025-08-22 11:30:10 +03:00
Raphael S. Carvalho	149f9d8448	replica: Fix race between drop table and merge completion handling Consider this: 1) merge finishes, wakes up fiber to merge compaction groups 2) drop table happens, which in turn invokes truncate underneath 3) merge fiber stops old groups 4) truncate disables compaction on all groups, but the ones stopped 5) truncate performs a check that compaction has been disabled on all groups, including the ones stopped 6) the check fails because groups being stopped didn't have compaction explicitly disabled on them To fix it, the check on step 6 will ignore groups that have been stopped, since those are not eligible for having compaction explicitly disabled on them. The compaction check is there, so ongoing compaction will not propagate data being truncated, but here it happens in the context of drop table which doesn't leave anything behind. Also, a group stopped is somewhat equivalent to compaction disabled on it, since the procedure to stop a group stops all ongoing compaction and eventually removes its state from compaction manager. Fixes #25551. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#25563	2025-08-22 10:19:43 +03:00
Botond Dénes	3dcb596201	Merge 'test: properly unset recovery_leader in the recovery procedure tests' from Patryk Jędrzejczak After changing the type of the `recovery_leader` config option from `sstring` to `UUID` in #25032, setting `recovery_leader` to an empty string became an incorrect way to unset it. The following error started to appear in the recovery procedure tests: ``` init - marshaling error: UUID string size mismatch: '' : recovery_leader ``` We unset `recovery_leader` properly in this PR. To do it, we introduce a simple way to remove config options in tests. Backport is unneeded. This error was harmless, and Scylla ignored `recovery_leader` after logging the error as expected by the tests. Closes scylladb/scylladb#25365 * github.com:scylladb/scylladb: test: properly unset recovery_leader in the recovery procedure tests test: manager_client: allow removing a config option test: manager_client: add docstring to server_update_config	2025-08-22 10:09:37 +03:00
Patryk Jędrzejczak	193a74576a	test/cluster/conftest: cluster_con: provide default values for port and use_ssl Some cluster tests use `cluster_con` when they need a different load balancing policy or auth provider. However, no test uses a port other than 9042 or enables SSL, but all tests must pass `9042, False` because these parameters don't have default values. This makes the code more verbose. Also, it's quite obvious that 9042 stands for port, but it's not obvious what `False` is related to, so there is a need to check the definition of `cluster_con` while reading any test that uses it. No reason to backport, it's only a minor refactoring. Closes scylladb/scylladb#25516	2025-08-22 09:51:24 +03:00
Andrzej Jackowski	86fc513bd9	auth: allow dropping roles in saslauthd_authenticator Before this change, `saslauthd_authenticator` prevented dropping roles. The current documentation instructs users to `Ensure Scylla has the same users and roles as listed in the LDAP directory`. Therefore, ScyllaDB should allow dropping roles so administrators can remove obsolete roles from both LDAP and ScyllaDB. The code change is minimal — dropping a role is a no-op, similar to the existing no-op implementations for successful `create` and `alter` operations. `saslauthd_authenticator_test` is updated to verify that dropping a role doesn't throw anymore. Fixes: scylladb/scylladb#25571 Closes scylladb/scylladb#25574	2025-08-22 09:40:44 +03:00
Dawid Mędrek	837d267cbf	main: Log RF-rack-invalid keyspaces at startup When the configuration option `rf_rack_valid_keyspaces` is enabled and there is an RF-rack-invalid keyspace, starting a node fails. However, when the configuration option is disabled, but there still is a keyspace that violates the condition, we'd like Scylla to print a warning informing the user about the fact. That's what happens in this commit. We provide a validation test.	2025-08-21 19:35:33 +02:00
Dawid Mędrek	60ea22d887	cql3: Warn when creating RF-rack-invalid keyspace Although RF-rack-valid keyspaces are not universally enforced yet (they're governed by the configuration option `rf_rack_valid_keyspaces`), we'd like to encourage the user to abide by the restriction. To that end, we're introducing a warning when creating or altering a keyspace. If the configuration option is disabled, but the user is trying to create an RF-rack-invalid keyspace, they'll receive a warning. We provide a validation test.	2025-08-21 19:29:33 +02:00
Evgeniy Naydanov	3a98331731	test.py: don't fail if use multiple tests from one dir in commandline There is the stash item REPEATED_FILES for directory items which used to cut recursion. But if multiple tests from one directory added to ./test.py commandline this solution prevents handling non-first tests well because it was already collected for the first one. Change behavior to not store all repeated files in the stash but just files which are in the process of repetition. Rename the stash item to REPEATING_FILES to reflect this change. Closes scylladb/scylladb#25611	2025-08-21 19:43:13 +03:00

1 2 3 4 5 ...

9440 Commits