scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Emil Maskovsky	5dac4b38fb	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 Backport: The test is primarily for an issue found in 2025.1, so it needs to be backported to all the 2025.x branches. Closes scylladb/scylladb#25685	2025-09-01 13:59:47 +02:00
Petr Gusev	2e757d6de4	cas: pass timeout_if_partially_accepted := write to accept_proposal() Write requests cannot be safely retried if some replicas respond with accepts and others with rejects. In this case, the coordinator is uncertain about the outcome of the LWT: a subsequent LWT may either complete the Paxos round (if a quorum observed the accept) or overwrite it (if a quorum did not). If the original LWT was actually completed by later rounds and the coordinator retried it, the write could be applied twice, potentially overwriting effects of other LWTs that slipped in between. Read requests do not have this problem, so they can be safely retried. Before this commit, handler->accept_proposal was called with timeout_if_partially_accepted := true. This caused both read and write requests to throw an "uncertainty" timeout to the user in the case of the contention described above. After this commit, we throw an "uncertainty" timeout only for write requests, while read requests are instead retried in the loop in sp::cas. Closes scylladb/scylladb#25602	2025-09-01 14:31:04 +03:00
Pavel Emelyanov	840cdab627	api: Move /load and /metrics/load handlers code to column_family.cc Both handlers need database to proceed and thus need to be registered (and unregistered) in a group that captures database for its handlers. Once moved, the used get_cf_stats() method can be marked local to column_family.cc file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25671	2025-09-01 08:11:00 +02:00
Dawid Mędrek	fc50e9d0a4	test/perf: Require smp=1 in perf_cache_eviction Trying to run the test with more than one shard results in a failure when generating sharding metadata: ``` ERROR 2025-08-27 16:00:17,551 [shard 0:main] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /tmp/scylla-c9fa42fe/ks/cf-2938a030834e11f0a561ffa33feb022d/me-3gt6_12wh_1gifk2ijgeu1ovc1m5-big-Data.db). Aborting ``` Let's require that the test be run with a single shard. Closes scylladb/scylladb#25703	2025-09-01 08:59:35 +03:00
Nadav Har'El	6d1abc5b2c	utils/base64: fix misleading code and comment (no functional change) utils/base64.cc had some strange code with a strange comment in base64_begins_with(). The code had base.substr(operand.size() - 4, operand.size()) The comment claims that this is "last 4 bytes of base64-encoded string", but this comment is misleading - operand is typically shorter than base (this this whole point of the base64_begins_with()), so the real intention of the code is not to find the last 4 bytes of base, but rather the next four bytes after the (operand.size() - 4) which we already copied. These four bytes that may need the full power of base64_decode_string() because they may or may not contain padding. But, if we really want the next 4 bytes, why pass operand.size() as the length of the substring? operand.size() is at least 4 (it's a mutiple of 4, and if it's 0 we returned earlier), but it could me more. We don't need more, we just need 4. It's not really wrong to take more than 4 (so this patch doesn't fix any bug), but can be wasteful. So this code should be: base.substr(operand.size() - 4, 4) We already have in test/boost/alternator_unit_test.cc a test, test_base64_begins_with that takes encoded base64 strings up to 12 characters in length (corresponding to decoded strings up to 8 chars), and substrings from length 0 to the base string's length, and check that test_base64_begins_with succeeds. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25712	2025-09-01 08:57:50 +03:00
Andrei Chekun	e55c8a9936	test.py: modify run to use different junit output filenames Currently, run will execute twice pytest without modifying the path of the JUnit XML report. This leads that the second execution of the pytest will override the report. This PR fixing this issue so both reports will be stored. Closes scylladb/scylladb#25726	2025-09-01 08:56:48 +03:00
Ernest Zaslavsky	05154e131a	cleanup: Add missing `#pragma once` Add missing `#pragma once` to include header Closes scylladb/scylladb#25761	2025-09-01 06:41:57 +03:00
Botond Dénes	fbff8d3b2d	Merge 'vector_store_client: disable Nagle's algorithm on the http client' from Pawel Pery Nagle’s algorithm and Delayed ACK’s algorithm are enabled by default on sockets in Linux. As a result we can experience 40ms latency on simply waiting for ACK on the client side. Disabling the Nagle’s algorithm (using TCP_NODELAY) should fix the issue (client won’t wait 40ms for ACKs). This change sets `TCP_NODELAY` on every socket created by the `http_client`. Checking for dead peers or network is helpful in maintaining a lifetime of the http client. This change also sets TCP_KEEPALIVE option on the http client's socket. Fixes: VECTOR-169 Closes scylladb/scylladb#25401 * github.com:scylladb/scylladb: vector_store_client: set keepalive for the http client's socket vector_store_client: disable Nagle's algorithm on the http client	2025-09-01 06:26:06 +03:00
Jenkins Promoter	619b4102bd	Update pgo profiles - x86_64	2025-09-01 05:08:56 +03:00
Jenkins Promoter	783f866bd3	Update pgo profiles - aarch64	2025-09-01 05:05:17 +03:00
Avi Kivity	dfc7957a73	Merge 'test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints' from Benny Halevy Following up on `6129411a5e` improve test_vnode_keyspace_describe_ring be verifying that the endpoints listed by describe_ring match those returned by the `natural_endpoints` api (for random tokens). The latter are calculated using an independent code path directly from the effective_replication_map. * test exists currently only on master, no backport required Closes scylladb/scylladb#25610 * github.com:scylladb/scylladb: test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints test/pylib/rest_client: add natural_endpoints function	2025-08-31 20:36:15 +03:00
Avi Kivity	bae66cc0d8	Merge 'types: add byte-comparable format support for collections' from Lakshmi Narayanan Sreethar This PR builds on the byte comparable support introduced in #23541 to add byte comparable support for all the collection types. This implementation adheres to the byte-comparable format specification in https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md Refs https://github.com/scylladb/scylladb/issues/19407 New feature - backport not required. Closes scylladb/scylladb#25603 * github.com:scylladb/scylladb: types/comparable_bytes: add compatibility testcases for collection types types/comparable_bytes: update compatibility testcase to support collection types types/comparable_bytes: support empty type types/comparable_bytes: support reversed types types/comparable_bytes: support vector cql3 type types/comparable_bytes: support tuple and UDT cql3 type types/comparable_bytes: support map cql3 type types/comparable_bytes: support set and list cql3 types types/comparable_bytes: introduce encode/decode_component types/comparable_bytes: introduce to_comparable_bytes/from_comparable_bytes	2025-08-31 15:53:27 +03:00
Avi Kivity	600349e29a	Merge 'tasks: return task::impl from make_and_start_task ' from Aleksandra Martyniuk Currently, make_and_start_task returns a pointer to task_manager::task that hides the implementation details. If we need to access the implementation (e.g. because we want a task to "return" a value), we need to make and start task step by step openly. Return task_manager::task::impl from make_and_start_task. Use it where possible. Fixes: https://github.com/scylladb/scylladb/issues/22146. Optimization; no backport Closes scylladb/scylladb#25743 * github.com:scylladb/scylladb: tasks: return task::impl from make_and_start_task compaction: use current_task_type repair: add new param to tablet_repair_task_impl repair: add new params to shard_repair_task_impl repair: pass argument by value	2025-08-31 15:44:37 +03:00
Nadav Har'El	ff91027eac	utils, alternator: fix detection of invalid base-64 This patch fixes an error-path bug in the base-64 decoding code in utils/base64.cc, which among other things is used in Alternator to decode blobs in JSON requests. The base-64 decoding code has a lookup table, which was wrongly sized 255 bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF) was included in an invalid base-64 string, instead of detecting that this is an invalid byte (since the only valid bytes in a base-64 string are A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a nonsense 6-bit part, or even crash on an out-of-bounds read. Besides the trivial fix, this patch also includes a reproducing test, which tries to write a blob as a supposedly base-64 encoded string with a 0xFF byte in it. The test fails before this patch (the write succeeds, unexpectedly), and passes after this patch (the write fails as expected). The test also passes on DynamoDB. Fixes #25701 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25705	2025-08-31 15:38:01 +03:00
Avi Kivity	1f4c9b1528	Merge 'system_keyspace: add peers cache to get_ip_from_peers_table' from Petr Gusev The gossiper can call `storage_service::on_change` frequently (see scylladb/scylla-enterprise#5613), which may cause high CPU load and even trigger OOMs or related issues. This PR adds a temporary cache for `system.peers` to resolve host_id -> ip without hitting storage on every call. The cache is short-lived to handle the unlikely case where `system.peers` is updated directly via CQL. This is a temporary fix; a more thorough solution is tracked in https://github.com/scylladb/scylladb/issues/25620. Fixes scylladb/scylladb#25660 backport: this patch needs to be backported to all supported versions (2025.1/2/3). Closes scylladb/scylladb#25658 * github.com:scylladb/scylladb: storage_service: move get_host_id_to_ip_map to system_keyspace system_keyspace: use peers cache in get_ip_from_peers_table storage_service: move get_ip_from_peers_table to system_keyspace	2025-08-31 15:34:35 +03:00
Piotr Wieczorek	5add43e15c	alternator: streams: Address minor incompatibilities with DynamoDB in GetRecords response. This commit adds missing fields to GetRecords responses: `awsRegion` and `eventVersion`. We also considered changing `eventSource` from `scylladb:alternator` to `aws:dynamodb` and setting `SizeBytes` subfield inside the `dynamodb` field. We set `awsRegion` to the datacenter's name of the node that received the request. This is in line with the AWS documentation, except that Scylla has no direct equivalent of a region, so we use the datacenter's name, which is analogous to DynamoDB's concept of region. The field `eventVersion` determines the structure of a Record. It is updated whenever the structure changes. We think that adding a field `userIdentity` bumped the version from `1.0` to `1.1`. Currently, Scylla doesn't support this field (#11523), hence we use the older 1.0 version. We have decided to leave `eventSource` as is, since it's easy to modify it in case of problems to `aws:dynamodb` used by DynamoDB. Not setting `SizeBytes` subfield inside the `dynamodb` field was dictated by the lack of apparent use cases. The documentation is unclear about how `SizeBytes` is calculated and after experimenting a little bit, I haven't found an obvious pattern. Fixes: #6931 Closes scylladb/scylladb#24903	2025-08-31 14:55:47 +03:00
Avi Kivity	bf9a963582	utils: mark crc barrett tables const They're marked constinit, but constinit does not imply const. Since they're not supposed to be modified, mark them const too. Closes scylladb/scylladb#25539	2025-08-31 11:37:39 +03:00
Avi Kivity	bc5773f777	Merge 'Add out of space prevention mechanisms' from Łukasz Paszkowski When a scaling out is delayed or fails, it is crucial to ensure that clusters remain operational and recoverable even under extreme conditions. To achieve this, the following proactive measures are implemented: - reject writes - includes: inserts, updates, deletes, counter updates, hints, read+repair and lwt writes - applicable to: user tables, views, CDC log, audit, cql tracing - stop running compactions/repairs and prevent from starting new ones - reject incoming tablet migrations The aforementioned mechanisms are automatically enabled when node's disk utilization reaches the critical level (default: 98%) and disabled when the utilization drop below the threshold. Apart from that, the series add tests that require mounted volumes to simulate out of space. The paths to the volumes can be provided using the a pytest argument, i.e. `--space-limited-dirs`. When not provided, tests are skipped. Test scenarios: 1. Start a cluster and write data until one of the nodes reaches 90% of the disk utilization 2. Perform an operation that would take the nodes over 100% 3. The nodes should not exceed the critical disk utilization (98% by default) 4. Scale out the cluster by adding one node per rack 5. Retry or wait for the operation from step 2 The operation is: writing data, running compactions, building materialized views, running repair, migrating tablets (caused by RF change, decommission). The test is successful, if no nodes run out of space, the operation from step 2 is aborted/paused/timed out and the operation from step 5 is successful. `perf-simple-query --smp 1 -m 1G` results obtained for fixed 400MHz frequency: Read path (before) ``` instructions_per_op: mean= 39661.51 standard-deviation=34.53 median= 39655.39 median-absolute-deviation=23.33 maximum=39708.71 minimum=39622.61 ``` Read path (after) ``` instructions_per_op: mean= 39691.68 standard-deviation=34.54 median= 39683.14 median-absolute-deviation=11.94 maximum=39749.32 minimum=39656.63 ``` Write path (before): ``` instructions_per_op: mean= 50942.86 standard-deviation=97.69 median= 50974.11 median-absolute-deviation=34.25 maximum=51019.23 minimum=50771.60 ``` Write path (after): ``` instructions_per_op: mean= 51000.15 standard-deviation=115.04 median= 51043.93 median-absolute-deviation=52.19 maximum=51065.81 minimum=50795.00 ``` Fixes: https://github.com/scylladb/scylladb/issues/14067 Refs: https://github.com/scylladb/scylladb/issues/2871 No backport, as it is a new feature. Closes scylladb/scylladb#23917 * github.com:scylladb/scylladb: tests/cluster: Add new storage tests test/scylla_cluster: Override workdir when passed via cmdline streaming: Reject incoming migrations storage_service: extend locator::load_stats to collect per-node critical disk utilization flag repair_service: Add a facility to disable the service compaction_manager: Subscribe to out of space controller compaction_manager: Replace enabled/disabled states with running state database: Add critical_disk_utilization mode database can be moved to disk_space_monitor: add subscription API for threshold-based disk space monitoring docs: Add feature documentation config: Add critical_disk_utilization_level option replica/exceptions: Add a new custom replica exception	2025-08-30 18:47:57 +03:00
Petr Gusev	898531fe7c	client_state: decoroutinize check_internal_table_permissions This function is on a hot path, better avoid allocating coroutine frames. Fixes scylladb/scylladb#25501 Closes scylladb/scylladb#25689	2025-08-30 18:46:54 +03:00
Avi Kivity	5c4a8ee134	Update seastar submodule * seastar 0a90f7945...c2d989333 (7): > Add missing `#pragma once` to response_parser.rl > simple-stream: avoid memcpy calls in fragmented streams for constant sizes > reactor: Move stopping activity out of main loop > Add sequential buffer size options to IOTune > disable exception interception when ASAN enabled > file, io_queue: Drop maybe_priority_class_ref{} from internal calls > reactor: Equip make_task() and lambda_task with concepts Closes scylladb/scylladb#25737	2025-08-30 14:53:34 +03:00
Calle Wilund	cc9eb321a1	commitlog: Ensure segment deletion is re-entrant Fixes #25709 If we have large allocations, spanning more than one segment, and the internal segment references from lead to secondary are the only thing keeping a segment alive, the implicit drop in discard_unused_segments and orphan_all can cause a recursive call to discard_unused_segments, which in turn can lead to vector corruption/crash, or even double free of segment (iterator confusion). Need to separate the modification of the vector (_segments) from actual releasing of objects. Using temporaries is the easiest solution. To further reduce recursion, we can also do an early clear of segment dependencies in callbacks from segment release (cf release). Closes scylladb/scylladb#25719	2025-08-30 08:24:57 +02:00
Piotr Dulikowski	7ccb50514d	Merge 'Introduce view building coordinator' from Michał Jadwiszczak This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views. The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table. The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built. The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results. This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch. The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation. If the operation fails at any moment, aborted tasks are rollback. The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149). For detailed description check: `docs/dev/view-building-coordinator.md` Fixes https://github.com/scylladb/scylladb/issues/22288 Fixes https://github.com/scylladb/scylladb/issues/19149 Fixes https://github.com/scylladb/scylladb/issues/21564 Fixes https://github.com/scylladb/scylladb/issues/17603 Fixes https://github.com/scylladb/scylladb/issues/22586 Fixes https://github.com/scylladb/scylladb/issues/18826 Fixes https://github.com/scylladb/scylladb/issues/23930 --- This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942 Closes scylladb/scylladb#23760 * github.com:scylladb/scylladb: test/cluster: add view build status tests test/cluster: add view building coordinator tests utils/error_injection: allow to abort `injection_handler::wait_for_message()` test: adjust existing tests utils/error_injection: add injection with `sleep_abortable()` db/view/view_builder: ignore `no_such_keyspace` exception docs/dev: add view building coordinator documentation db/view/view_building_worker: work on `process_staging` tasks db/view/view_building_worker: register staging sstable to view building coordinator when needed db/view/view_building_worker: discover staging sstables db/view/view_building_worker: add method to register staging sstable db/view/view_update_generator: add method to process staging sstables instantly db/view/view_update_generator: extract generating updates from staging sstables to a method db/view/view_update_generator: ignore tablet-based sstables db/view/view_building_coordinator: update view build status on node join/left db/view/view_building_coordinator: handle tablet operations db/view: add view building task mutation builder service/topology_coordinator: run view building coordinator db/view: introduce `view_building_coordinator` db/view/view_building_worker: update built views locally db/view: introduce `view_building_worker` db/view: extract common view building functionalities db/view: prepare to create abstract `view_consumer` message/messaging_service: add `work_on_view_building_tasks` RPC service/topology_coordinator: make `term_changed_error` public db/schema_tables: create/cleanup tasks when an index is created/dropped service/migration_manager: cleanup view building state on drop keyspace service/migration_manager: cleanup view building state on drop view service/migration_manager: create view building tasks on create view test/boost: enable proxy remote in some tests service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` service/migration_manager: coroutinize `prepare_new_view_announcement()` service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine` service: reload `view_building_state_machine` on group0 apply() service/vb_coordinator: add currently processing base db/system_keyspace: move `get_scylla_local_mutation()` up db/system_keyspace: add `view_building_tasks` table db/view: add view_building_state and views_state db/system_keyspace: add method to get view build status map db/view: extract `system.view_build_status_v2` cql statements to system_keyspace db/system_keyspace: move `internal_system_query_state()` function earlier db/view: ignore tablet-based views in `view_builder` gms/feature_service: add VIEW_BUILDING_COORDINATOR feature	2025-08-29 17:28:44 +02:00
Aleksandra Martyniuk	7fe1ad1f63	tasks: return task::impl from make_and_start_task Currently, make_and_start_task returns a pointer to task_manager::task that hides the implementation details. If we need to access the implementation (e.g. because we want a task to "return" a value), we need to make and start task step by step openly. Return task_manager::task::impl from make_and_start_task. Use it where possible. Fixes: https://github.com/scylladb/scylladb/issues/22146.	2025-08-29 17:12:07 +02:00
Aleksandra Martyniuk	0844a057d1	compaction: use current_task_type	2025-08-29 17:08:00 +02:00
Łukasz Paszkowski	e34deea50e	tests/cluster: Add new storage tests The storage submodule contains tests that require mounted volumes to be executed. The volumes are created automatically with the `volumes_factory` fixture. The tests in this suite are executed with the custom launcher `unshare -mr pytest` Test scenarios (when one node reaches critical disk utilization): 1. Reject user table writes 2. Disable/Enabled compaction 3. Reject split compactions 4. New split compactions not triggered 5. Abort tablet repair 6. Disable/Enabled incoming tablet migrations 7. Restart a node while a tablet split is triggered	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	4bb5696a5d	test/scylla_cluster: Override workdir when passed via cmdline Currently, workdir is set in ScyllaCluster constructor and it does not take into accout that the value could be overridden via cmdline arguments. When this happens, then some data (logs, configs) are stored under one path and other (data) is stored under a different. The patch allows overriding the value when passed via cmdline arguments leading to all files being stored under the same path.	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	7cfedb1214	streaming: Reject incoming migrations When a replica operates in the critical disk utilization mode, all incoming migrations are being rejected by rejecting an incoming sstable file. In the topology_coordinator, the rejected tablet is moved into the cleanup_target state in order to revert migration. Otherwise, retry happens and a cluster stays in the tablet_migration transition state preventing any other topology changes to happen, e.g. scaling out. Once the tablet migration is rejected, the load balancer will schedule a new migration.	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	54201960e6	storage_service: extend locator::load_stats to collect per-node critical disk utilization flag This commit extends the TABLE_LOAD_STATS RPC with information whether a node operates in the critical disk utilization mode. This information will be needed to distict between the causes why a table migration/repair was interrupted.	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	9809800aa8	repair_service: Add a facility to disable the service Repair service currently have two functions: stop() and shutdown() that stop all ongoing repairs and prevent any further repairs from being started. It is possible to stop the repair_service once. Once stopped, it cannot be restarted. We would like, however, to enable / disable the repair service many times. Similarly to compaction_manager, the repair service provides two new functions: - drain() - abort all ongoing local repair task and disable the service, i.e. no new local task will be scheduled and data received from the repair master is rejected. It's, though, still possible to schedule a global repair request - enable() - enable the service By default, the repair service is enabled immediately once started. For tablet-based keyspaces, the new facility prevents tablets from being repaired. Whenever the repair_service is disabled and the request to repair a tablet arrives, an exception is returned. Once the exception is thrown, the tablet is moved into the end_repair state and the operation will be retried later. Hence, disabling the service does not fail the global tablet repair request.	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	9539e80e54	compaction_manager: Subscribe to out of space controller	2025-08-29 14:56:07 +02:00
Aleksandra Martyniuk	f3b43b6384	repair: add new param to tablet_repair_task_impl Currently, sched_info is set immediately after tablet_repair_task_impl is created. Pass this param to constructor instead. It's a preparation for the following changes.	2025-08-29 14:37:00 +02:00
Aleksandra Martyniuk	57b47e282e	repair: add new params to shard_repair_task_impl Currently, neighbors and small_table_optimization_ranges_reduced_factor are set immediately after shard_repair_task_impl is created. Pass these params to constructor instead. It's a preparation for following changes.	2025-08-29 14:27:00 +02:00
Aleksandra Martyniuk	6a0d8728de	repair: pass argument by value shard_repair_task_impl constructor gets some of its arguments by const reference. Due to that those arguments are copied when they could be moved. Get shard_repair_task_impl constructor arguments by value. Use std::move where possible.	2025-08-29 14:24:47 +02:00
Łukasz Paszkowski	40c40be8a6	compaction_manager: Replace enabled/disabled states with running state Using a single state variable to keep track whether compaction manager is enabled/disabled is insufficient, as multiple services may independently request compactions to be disabled. To address this, a counter is introduced to record how many times the compaction manager has been drained. The manager is considered enabled only when this counter reaches zero. Introducing a counter, enabled and disabled states become obsolete. So they are replaced with a single running state.	2025-08-29 13:47:01 +02:00
Łukasz Paszkowski	3d03b88719	database: Add critical_disk_utilization mode database can be moved to When database operates in the critical disk utilization mode, all mutation writes including inserts, updates, deletes, counter updates, hints, read+repair, lwt writes) to user tables and other associated with them tables like views, CDC log, audit are rejected, with a clear error exception returned. The mode is meant to be used with the disk space monitor in order to prevent any user writes when node's disk utilization is too high.	2025-08-29 13:46:45 +02:00
Lakshmi Narayanan Sreethar	ce0c29e024	types/comparable_bytes: add compatibility testcases for collection types This patch adds compatibility testcases for the following cql3 types : set, list, map, tuple, vector and reversed types. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	4547f6f188	types/comparable_bytes: update compatibility testcase to support collection types The `abstract_type::from_string()` method used to parse the input data doesn't support collections yet. So the collection testdata will be passed as JSON strings to the testcase. This patch updates the testcase to adapt to this workaround. Also, extended the testcase to verify that Scylla's implementation can successfully decode the byte comparable output encoded by Cassandra. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	0997b3533c	types/comparable_bytes: support empty type Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	b799101a09	types/comparable_bytes: support reversed types A reversed type is first encoded using the underlying type and then all the bits are flipped to ensure that the lexicographical sort order is reversed. During decode, the bytes are flipped first and then decoded using the underlying type. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	6c2a3e2c51	types/comparable_bytes: support vector cql3 type The CQL vector type encoding is similar to the lists, where each element is transformed into a byte-comparable format and prefixed with a component marker. The sequence is terminated with a terminator marker to indicate the end of the collection. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	1ccfe522f1	types/comparable_bytes: support tuple and UDT cql3 type The CQL tuple and UDT types share the same internal implementation and therefore use the same byte comparable encoding. The encoding is similar to lists, where each element is transformed into a byte-comparable format and prefixed with a component marker. The sequence is terminated with a terminator marker to indicate the end of the collection. TODO: Add duplicate test items to maps, lists and sets For maps, add more entries that share keys ex map1 : key1 : value1, key2 : value2 map2 : key1 : value4 map3 : key2 : value5 etc Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	ca38c15a97	types/comparable_bytes: support map cql3 type The CQL map type is encoded as a sequence of key-value pairs. Each key and each value is individually prefixed with a component marker, and the sequence is terminated with a terminator marker to indicate the end of the collection. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	4d5e5f0c84	types/comparable_bytes: support set and list cql3 types The CQL set and list types are encoded as a sequence of elements, where each element is transformed into a byte-comparable format and prefixed with a component marker. The sequence is terminated with a terminator marker to indicate the end of the collection. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:22 +05:30
Lakshmi Narayanan Sreethar	8e46e8be01	types/comparable_bytes: introduce encode/decode_component The components of a collection, such as an element from a list, set, or vector; a key or value from a map; or a field from a tuple, share the same encode and decode logic. During encode, the component is transformed into the byte comparable format and is prefixed with the `NEXT_COMPONENT` marker. During decode, the component is transformed back into its serialized form and is prefixed with the serialized size. A null component is encoded as a single `NEXT_COMPONENT_NULL` marker and during decode, a `-1` is written to the serialized output. This commit introduces few helper methods that implement the above mentioned encode and decode logics. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:21 +05:30
Lakshmi Narayanan Sreethar	47e88be6e0	types/comparable_bytes: introduce to_comparable_bytes/from_comparable_bytes Added helper functions to_comparable_bytes() and from_comparable_bytes() to let collection encode/decode methods invoke encode/decode of the underlying types. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-08-29 12:26:09 +05:30
Łukasz Paszkowski	3e740d25b5	disk_space_monitor: add subscription API for threshold-based disk space monitoring Introduce the `subscribe` method to disk_space_monitor, allowing clients to register callbacks triggered when disk utilization crosses a configurable threshold. The API supports flexible trigger options, including notifications on threshold crossing and direction (above/below). This enables more granular and efficient disk space monitoring for consumers.	2025-08-28 18:06:37 +02:00
Łukasz Paszkowski	c2de678a87	docs: Add feature documentation 1. Adds user-facing page in /docs/troubleshooting/error-messages	2025-08-28 18:06:37 +02:00
Łukasz Paszkowski	535c901e50	config: Add critical_disk_utilization_level option The option defines the threshold at which the defensive mechanisms preventing nodes from running out of space, e.g. rejecting user writes shall be activated. Its default value is 98% of the disk capacity.	2025-08-28 18:06:37 +02:00
Łukasz Paszkowski	132fd1e3f2	replica/exceptions: Add a new custom replica exception The new exception `critical_disk_utilization_exception` is thrown when the user table mutation writes are being blocked due to e.g. reaching a critical disk utilization level. This new exception, is then correctly handled on the coordinator side when transforming into `mutation_write_failure_exception` with a meaningful error message: "Write rejected due to critical disk utilization".	2025-08-28 18:06:37 +02:00
Petr Gusev	4b907c7711	storage_service: move get_host_id_to_ip_map to system_keyspace Reimplemented the function to use the peers cache. It could be replaced with get_ip_from_peers_table, but that would create a coroutine frame for each call.	2025-08-28 12:48:46 +02:00

1 2 3 4 5 ...

49196 Commits