scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Avi Kivity	9696ee64d0	database: fix overflow when computing data distribution over shards We store the per-shard chunk count in a uint64_t vector global_offset, and then convert the counts to offsets with a prefix sum: ```c++ // [1, 2, 3, 0] --> [0, 1, 3, 6] std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), 0, std::plus()); ``` However, std::exclusive_scan takes the accumulator type from the initial value, 0, which is an int, instead of from the range being iterated, which is of uint64_t. As a result, the prefix sum is computed as a 32-bit integer value. If it exceeds 0x8000'0000, it becomes negative. It is then extended to 64 bits and stored. The result is a huge 64-bit number. Later on we try to find an sstable with this chunk and fail, crashing on an assertion. An example of the failure can be seen here: https://godbolt.org/z/6M8aEbo57 The fix is simple: the initial value is passed as uint64_t instead of int. Fixes https://github.com/scylladb/scylladb/issues/27417 Closes scylladb/scylladb#27418	2025-12-04 14:10:53 +01:00
Botond Dénes	384bffb8da	Merge 'compaction: limit the maximum shares allocated to a compaction scheduling class' from Raphael Raph Carvalho This PR adds support for limiting the maximum shares allocated to a compaction scheduling class by the compaction controller. It introduces a new configuration parameter, compaction_max_shares, which, when set to a non zero value, will cap the shares allocated to compaction jobs. This PR also exposes the shares computed by the compaction controller via metrics, for observability purposes. Fixes https://github.com/scylladb/scylladb/issues/9431 Enhancement. No need to backport. NOTE: Replaces PR https://github.com/scylladb/scylladb/pull/26696 Ran a test in which the backlog raised the need for max shares (normalized backlog above normalization_factor), and played with different values for new option compaction_max_shares to see it works (500, 1000, 2000, 250, 50) Closes scylladb/scylladb#27024 * github.com:scylladb/scylladb: db/config: introduce new config parameter `compaction_max_shares` compaction_manager:config: introduce max_shares compaction_controller: add configurable maximum shares compaction_controller: introduce `set_max_shares()`	2025-11-26 06:51:30 +02:00
Lakshmi Narayanan Sreethar	853811be90	compaction_controller: introduce `set_max_shares()` Add a method to dynamically adjust the maximum output of control points in the compaction controller. This is required for supporting runtime configuration of the maximum shares allocated to the compaction process by the controller. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-24 11:43:20 -03:00
Aleksandra Martyniuk	19a7d8e248	replica: database: change type of tables_metadata::_ks_cf_to_uuid If there is a lot of tables, a node reports oversized allocation in _ks_cf_to_uuid of type flat_hash_map. Change the type to std::unordered_map to prevent oversized allocations. Fixes: https://github.com/scylladb/scylladb/issues/26787. Closes scylladb/scylladb#27165	2025-11-24 06:42:40 +02:00
Radosław Cybulski	d589e68642	Add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime Closes scylladb/scylladb#26617	2025-11-21 12:27:41 +02:00
Botond Dénes	0cc5208f8e	Merge 'Add sstables_manager::config' from Pavel Emelyanov Currently sstables_manager keeps a reference on global db::config to configure itself. Most of other services use their own specific configs with much less data on-board for the same purposes (e.g. #24841, #19051 and #23705 did same for other services) This PR applies this approach to sstables_manager as well. Mostly it moves various values from db::config onto newly introduced struct sstables_manager::config, but it also adds specific tracking of sstable_file_io_extensions and patches tools/scylla-sstable not to use sstables_manager as "proxy" object to get db::config from along its calls. Shuffling components dependencies, no need to backport Closes scylladb/scylladb#27021 * github.com:scylladb/scylladb: sstables_manager: Drop db::config from sstables_manager tools/sstable: Make shard_of_with_tablets use db::config argument tools/sstable: Add db::config& to all operations tools/sstable: Get endpoints from storage manager sstables_manager: Hold sstable IO extensions on it sstables: Manager helper to grab file io extensions sstables_manager: Move default format on config sstables_manager: Move enable_sstable_data_integrity_check on config sstables_manager: Move data_file_directories on config sstables_manager: Move components_memory_reclaim_threshold on config sstables_manager: Move column_index_auto_scale_threshold on config sstables_manager: Move column_index_size on config sstables_manager: Move sstable_summary_ratio on config sstables_manager: Move enable_sstable_key_validation on config sstables_manager: Move available_memory on config code: Introduce sstables_manager::config sstables: Patch get_local_directories() to work on vector of paths code: Rename sstables_manager::config() into db_config()	2025-11-21 10:21:41 +02:00
Raphael S. Carvalho	74ecedfb5c	replica: Fail timed-out single-key read on cleaned up tablet replica Consider the following: 1) single-key read starts, blocks on replica e.g. waiting for memory. 2) the same replica is migrated away 3) single-key read expires, coordinator abandons it, releases erm. 4) migration advances to cleanup stage, barrier doesn't wait on timed-out read 5) compaction group of the replica is deallocated on cleanup 6) that single-key resumes, but doesn't find sstable set (post cleanup) 7) with abort-on-internal-error turned on, node crashes It's fine for abandoned (= timed out) reads to fail, since the coordinator is gone. For active reads (non timed out), the barrier will wait for them since their coordinator holds erm. This solution consists of failing reads which underlying tablet replica has been cleaned up, by just converting internal error to plain exception. Fixes #26229. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#27078	2025-11-20 11:44:03 +02:00
Botond Dénes	6ee0f1f3a7	Merge 'replica/table: add a metric for hypothetical total file size without compression' from Michał Chojnowski This patch adds a metric for pre-compression size of sstable files. This patch adds a per-table metric `scylla_column_family_total_disk_space_before_compression`, which measures the hypothetical total size of sstables on disk, if Data.db was replaced with an uncompressed equivalent. As for the implementation: Before the patch, tables and sstable sets are already tracking their total physical file size. Whenever sstables are added or removed, the size delta is propagated from the sstable up through sstable sets into table_stats. To implement the new metric, we turn the size delta that is getting passed around from a one-dimensional to a two-dimensional value, which includes both the physical and the pre-compression size. New functionality, no backport needed. Closes scylladb/scylladb#26996 * github.com:scylladb/scylladb: replica/table: add a metric for hypothetical total file size without compression replica/table: keep track of total pre-compression file size	2025-11-20 09:10:38 +02:00
Pavel Emelyanov	9cb776dee8	sstables_manager: Drop db::config from sstables_manager Now it has all it needs via its own specific config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 19:31:50 +03:00
Pavel Emelyanov	675eb3be98	sstables_manager: Hold sstable IO extensions on it Currently manager holds a reference on db::config and when sstables IO extensions are needed it grabs them from this config. Since db::config is going to be removed from sstables manager, it should either keep track of all config extensions, or only those that it needs. This patch makes the latter choice and keeps reference to sstable_file_io_ext. on manager. The reference is passed as constructor argument, not via manager config, but it's a random choice, no specific reason why not putting it on config itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 19:31:50 +03:00
Pavel Emelyanov	9868341c73	sstables_manager: Move default format on config It's explicitly `me` type by default, but places that can write sstables override it with db::config value: replica::database, tests and scylla sstable tool. Live-updateable, so use updateable_value<> type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 19:31:50 +03:00
Pavel Emelyanov	e6dee8aab5	sstables_manager: Move enable_sstable_data_integrity_check on config Set its default value to the one from db/config.cc. Only replica::database may want to re-configure it. Also not live-updateable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 19:31:50 +03:00
Pavel Emelyanov	78ab31118e	sstables_manager: Move data_file_directories on config Make it a reference, so all the code that configures it is updated to provide the target. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 19:31:50 +03:00
Pavel Emelyanov	cb1679d299	sstables_manager: Move components_memory_reclaim_threshold on config Set its default value to the one from db/config.cc. Only the replica::database and tests may want to re-configure it. This one is live-updateable, so use updateable_value<> type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 19:31:42 +03:00
Botond Dénes	8579e20bd1	Merge 'Enable digest+checksum verification for streaming/repair' from Taras Veretilnyk This PR enables integrity check of both checksum and digest for repair/streaming. In the past, streaming readers only verified the checksum of compressed SSTables. This change extends the checks to include the digest and the checksum (CRC) for both compressed and uncompressed SSTables. These additional checks require reading the digest and CRC components from disk, which may cause some I/O overhead. For uncompressed SSTables, this involves loading and computing checksums and digest from the data, while for compressed SSTables - where checksums are already verified inline - the only extra cost is reading and verifying the digest.If the reader range doesn't cover the full SSTable, the digest is not loaded and check is skipped. To support testing of these changes, a new option was added to the random_mutation_generator that allows disabling compression. Several new test cases were added to verify that the repair_reader correctly detects corruption. These tests corrupt digest or data component of an SSTable and confirm that the system throws the expected `malformed_sstable_exception`. Backport is not required, it is an improvement Refs #21776 Closes scylladb/scylladb#26444 * github.com:scylladb/scylladb: boost/repair_test: add repair reader integrity verification test cases test/lib: allow to disable compression in random_mutation_generator sstables: Skip checksum and digest reads for unlinked SSTables table: enable integrity checks for streaming reader table: Add integrity option to table::make_sstable_reader() sstables: Add integrity option to create_single_key_sstable_reader	2025-11-14 18:00:33 +02:00
Pavel Emelyanov	604e5b6727	sstables_manager: Move column_index_auto_scale_threshold on config Set its default value to the one from db/config.cc. Only the replica::database may want to re-configure it. This one is live-updateable, so use updateable_value<> type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 14:30:49 +03:00
Pavel Emelyanov	8f9f92728e	sstables_manager: Move column_index_size on config Set its default value to the one from db/config.cc. Only replica::database may want to re-configure it. Also not live-updateable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 14:30:28 +03:00
Pavel Emelyanov	88bb203c9c	sstables_manager: Move sstable_summary_ratio on config Set its default value to the one from db/config.cc. Only replica::database may want to re-configure it. Also not live-updateable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 14:29:34 +03:00
Pavel Emelyanov	1f6918be3f	sstables_manager: Move enable_sstable_key_validation on config Make it OFF by default and update only those callers, that may have it ON -- the replica::database, tests and scylla-sstable tool. Also not live-updateable, so plain bool. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 14:28:14 +03:00
Pavel Emelyanov	79d0f93693	sstables_manager: Move available_memory on config Currently, this parameter is passed to sstables_manager as explicit constructor argument. Also, it's not live-updateable, so a plain size_t type for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 14:27:14 +03:00
Pavel Emelyanov	218916e7c2	code: Introduce sstables_manager::config This is specific configuration for sstables_manager. All places that construct sstables manager are updated to provide config to it. For now the config is empty and exists alongside with db::config. Further patches will populate the former config with data and the latter config will be eventually removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-14 14:25:18 +03:00
Michał Chojnowski	346e0f64e2	replica/table: add a metric for hypothetical total file size without compression This patch adds a per-table metric `scylla_column_family_total_disk_space_before_compression`, which measures the hypothetical total size of sstables on disk, if Data.db was replaced with an uncompressed equivalent.	2025-11-13 11:28:19 +01:00
Michał Chojnowski	1cfce430f1	replica/table: keep track of total pre-compression file size Every table and sstable set keeps track of the total file size of contained sstables. Due to a feature request, we also want to keep track of the hypothetical file size if Data files were uncompressed, to add a metric that shows the compression ratio of sstables. We achieve this by replacing the relevant `uint_64 bytes_on_disk` counters everywhere with a struct that contains both the actual (post-compression) size and the hypothetical pre-compression size. This patch isn't supposed to change any observable behavior. In the next patch, we will use these changes to add a new metric.	2025-11-13 00:49:57 +01:00
Michael Litvak	de321218bc	storage_proxy: apply counter mutation on all write shards When applying a counter mutation, use apply_on_shards to apply the mutation on all write shards, similarly to the way other mutations are applied in the storage proxy. Previously the mutation was applied only on the current shard which is the read shard. This is needed to respect the write_both stages of intranode migration where we need to apply the mutation on both the old and the new shards.	2025-11-03 16:03:29 +01:00
Michael Litvak	c7e7a9e120	storage_proxy: move counter update coordination to storage proxy Refactor the counter update to split the functions and have them called by the storage proxy to prepare for a later change. Previously in mutate_counter the storage proxy calls the replica function apply_counter_update that does a few things: 1. checks that the operation can be done: check timeout, disk utilization 2. acquire counter locks 3. do read-modify-write and transform the counter mutation 4. apply the mutation in the replica In this commit we change it so that these functions are split and called from the storage proxy, so that we have better control from the storage proxy when we change it later to work across multiple shards. For example, we will want to acquire locks on multiple shards, transform it on one shard, and then apply the mutation on multiple shards. After the change it works as follows in storage proxy: 1. acquire counter locks 2. call replica prepare to check the operation and transform the mutation 3. call replica apply to apply the transformed mutation	2025-11-03 15:59:46 +01:00
Michael Litvak	7cc6b0d960	replica/db: add counter update guard Add a RAII guard for counter update that holds the counter locks and the table operation, and extract the creation of the guard to a separate function. This prepares it for a later change where we will want to obtain the guard externally from the storage proxy.	2025-11-03 08:43:11 +01:00
Michael Litvak	88fd9a34c4	replica/db: split counter update helper functions Split do_apply_counter_update to a few smaller and simpler functions to help prepare for a later change.	2025-11-03 08:43:11 +01:00
Lakshmi Narayanan Sreethar	3eb7193458	backlog_controller: compute backlog even when static shares are set The compaction manager backlog is exposed via metrics, but if static shares are set, the backlog is never calculated. As a result, there is no way to determine the backlog and if the static shares need adjustment. Fix that by calculating backlog even when static shares are set. Fixes #26287 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#26778	2025-10-31 18:18:36 +02:00
Tomasz Grabiec	1c0d847281	Merge 'load_balancer: load_stats reconcile after tablet migration and table resize' from Ferenc Szili This change adds the ability to move tablets sizes in load_stats after a tablet migration or table resize (split/merge). This is needed because the size based load balancer needs to have tablet size data which is as accurate as possible, in order to work on fresh tablet size distribution and issue correct tablet migrations. This is the second part of the size based load balancing changes: - First part for tablet size collection via load_stats: #26035 - Second part reconcile load_stats: #26152 - The third part for load_sketch changes: #26153 - The fourth part which performs tablet load balancing based on tablet size: #26254 This is a new feature and backport is not needed. Closes scylladb/scylladb#26152 * github.com:scylladb/scylladb: load_balancer: load_stats reconcile after tablet migration and table resize load_stats: change data structure which contains tablet sizes	2025-10-31 09:58:25 +01:00
Taras Veretilnyk	e62ebdb967	table: enable integrity checks for streaming reader Previously, streaming readers only verified the checksum of compressed SSTables. This patch extends checks to also include the digest and the uncompressed checksum (CRC). These additional checks require reading the digest and CRC components from disk, which may cause some I/O overhead. For uncompressed SSTables, this involves loading and computing checksums and digest from the data, while for compressed SSTables - where checksums are already verified inline - the only extra cost is reading and verifying the digest. If the reader range doesn't cover the full SSTable, the digest check is skipped.	2025-10-28 19:27:35 +01:00
Taras Veretilnyk	06e1b47ec6	table: Add integrity option to table::make_sstable_reader()	2025-10-28 19:27:35 +01:00
Taras Veretilnyk	deb8e32e86	sstables: Add integrity option to create_single_key_sstable_reader Added an sstables::integrity_check parameter to create_single_key_sstable_reader methods across its implementations. This allows callers to enable SSTable integrity checks during single-key reads.	2025-10-28 19:27:35 +01:00
Avi Kivity	d81796cae3	Merge 'Limit concurrent view updates from all sources' from Wojciech Mitros Before this patch, when a base table has many materialized views, each write to this table can start up to 128 view updates in parallel. With high client write concurrency, the actual concurrency of writes executed on the node may grow unexpectedly, which can lead to higher latency and higher memory usage compared to a sequential approach. In this patch we add a per-shard, per-service-level semaphore which limits the number of concurrent view updates processed on the shard in this service level to a constant value. We take one unit from the semaphore for each local view update write, and releasing it when it finishes. The remote view updates do not take units from the semaphore because they don't consume nearly as much processing power and they are limited by another semaphore based on their memory usage. Fixes https://github.com/scylladb/scylladb/issues/25341 Closes scylladb/scylladb#25456 * github.com:scylladb/scylladb: mv: limit concurrent view updates from all sources database: rename _view_update_concurrency_sem to _view_update_memory_sem	2025-10-28 11:13:24 +02:00
Wojciech Mitros	f07a86d16e	mv: limit concurrent view updates from all sources Before this patch, when a base table has many materialized views, each write to this table can start up to 128 view updates in parallel. With high client write concurrency, the actual concurrency of writes executed on the node may grow unexpectedly, which can lead to higher latency and higher memory usage compared to a sequential approach. In this patch we add a per-shard, per-service-level semaphore which limits the number of concurrent view updates processed on the shard in this service level to a constant value. We take one unit from the semaphore for each local view update write, and releasing it when it finishes. The remote view updates do not take units from the semaphore because they don't consume nearly as much processing power and they are limited by another semaphore based on their memory usage. The effect of this patch can also be observed when writing to a base table with a large number of materialized views, like in the materialized_views_test.py::TestMaterializedViews::test_many_mv_concurrent dtest. In that test, if we perform a full scan in parallel to a write workload with a concurrency of 100 to a table with 100 views, the scan would sometimes timeout because it would effectively get 1/10000 of cpu. With this patch, the cpu concurrency of view updates was limited to 128 (we ran both writes and scan in the same service level), and the scan no longer timed out. Fixes https://github.com/scylladb/scylladb/issues/25341	2025-10-27 18:55:41 +01:00
Patryk Jędrzejczak	e1c3f666c9	Merge 'vnode cleanup: add missing barriers and fix race conditions' from Petr Gusev Problems addressed by this PR * Missing barrier before cleanup: If a node was bootstrapped before cleanup, some request coordinators could still be in `write_both_read_new` and send stale requests to replicas being cleaned up. * Sessions not drained before cleanup: We lacked protection against stale streaming or repair operations. * `sstable_vnodes_cleanup_fiber()` calling `flush_all_tables()` under group0 lock: This caused SCT test failures (see [this comment](https://github.com/scylladb/scylladb/issues/25333#issuecomment-3298859046) for details). * Issues with `storage_proxy::start_write()` used by `sstable_vnodes_cleanup_fiber`: * The result of `start_write()` was not held during `abstract_write_response_handler::apply_locally`, so coordinator-local writes were not properly awaited. * Synchronization was racy — `start_write()` was not atomic with the fence check, allowing stale writes to sneak in if `fence_version` changed in between. * It waited for all writes, including local tables and tablet-based tables, which is redundant because `sstable_vnodes_cleanup_fiber` does not apply to them. * It also waited for writes with versions greater than the current `fence_version`, which is unnecessary. Fixes scylladb/scylladb#26150 backport: this PR fixes several issues with the vnodes cleanup procedure, but it doesn't seem they are critical enough to deserve backporting Closes scylladb/scylladb#26315 * https://github.com/scylladb/scylladb: test_automatic_cleanup: add test_cleanup_waits_for_stale_writes test_fencing: fix due to new version increment test_automatic_cleanup: clean it up storage_proxy: wait for closing sessions in sstable cleanup fiber storage_proxy: rename await_pending_writes -> await_stale_pending_writes storage_proxy: use run_fenceable_write storage_proxy: abstract_write_response_handler: apply_locally: extract post fence check storage_proxy: introduce run_fenceable_write storage_proxy: move update_fence_version from shared_token_metadata storage_proxy: fix start_write() operation scope in apply_locally storage_proxy: move post fence check into handle_write storage_proxy: move fencing into mutate_counter_on_leader_and_replicate storage_proxy::handle_read: add fence check before get_schema storage_service: rebrand cleanup_fiber to vnodes_cleanup_fiber sstable_cleanup_fiber: use coroutine::parallel_for_each storage_service: sstable_cleanup_fiber: move flush_all_tables out of the group0 lock topology_coordinator: barrier before cleanup topology_coordinator: small start_cleanup refactoring global_token_metadata_barrier: add fenced flag	2025-10-27 12:35:13 +01:00
Avi Kivity	b843d8bc8b	Merge 'scylla-sstable: add cql support to write operation' from Botond Dénes In theory, scylla-sstable write is an awesome and flexible tool to generate sstables with arbitrary content. This is convenient for tests and could come clutch in a disaster scenario, where certain system table's content need to be manually re-created, system tables that are not writable directly via CQL. In practice, in its current form this operation is so convoluted to use that even its own author shuns it. This is because the JSON specification of the sstable content is the same as that of the scylla-sstable dump-data: containing every single piece of information on the mutation content. Where this is an advantage for dump-data, allowing users to inspect the data in its entirety -- it is a huge disadvantage for write, because of all these details have to be filled in, down to the last timestamp, to generate an sstable. On top of that, the tool doesn't even support any of the more advanced data types, like collections, UDF and counters. This PR proposes a new way of generating sstables: based on the success of scylla-sstable query, it introduces CQL support for scylla-sstable write. The content of the sstable can now be specified via standard INSERT, UPDATE and DELETE statements, which are applied to a memtable, then flushed into the sstable. To avoid boundless memory consumption, the memtable is flushed every time it reaches 1MiB in size, consequently the command can generate multiple output sstables. The new CQL input-format is made default, this is safe as nobody is using this command anyway. Hopefully this PR will change that. Fixes: https://github.com/scylladb/scylladb/issues/26506 New feature, no backport. Closes scylladb/scylladb#26515 * github.com:scylladb/scylladb: test/cqlpy/test_tools.py: add test for scylla-sstable write --input-format=cql replica/mutation_dump: add support for virtual tables tools/scylla-sstable: print_query_results_json(): handle empty value buffer tools/scylla-sstable: add cql support to write operation tools/scylla-sstable: write_operation(): fix indentation tools/scylla-sstable: write_operation(): prepare for a new input-format tools/scylla-sstable: generalize query_operation_validate_query() tools/scylla-sstable: move query_operation_validate_query() tools/scylla-sstable: extract schema transformation from query operation replica/table: add virtual write hook to the other apply() overload too	2025-10-24 23:32:40 +03:00
Avi Kivity	997b52440e	Merge 'replica/mutation_dump: include empty/dead partitions in the scan results' from Botond Dénes `select * from mutation_fragment()` queries don't return partitions which are completely empty or only contain tombstones which are all garbage collectible. This is because the underlying `mutation_dump` mechanism has a separate query to discover partitions for scans. This query is a regular mutation scan, which is subject to query compaction and garbage collection. Disable the query compaction for mutation queries executed on behalf of mutation fragment queries, so all data is visible in the result, even that which is fully garbage collectible. Fixes scylladb/scylladb#23707. Scans for mutation-fragment are very rare, so a backport is not necessary. We can backport on-demand. Closes scylladb/scylladb#26227 * github.com:scylladb/scylladb: replica/mutation_dump: multi_range_partition_generator: disable garbage-collection replica: add tombstone_gc_enabled parameter to mutation query methods mutation/mutation_compactor: remove _can_gc member tombstone_gc: add tombstone_gc_state factory methods for gc_all and no_gc	2025-10-24 23:26:16 +03:00
Ferenc Szili	b4ca12b39a	load_stats: change data structure which contains tablet sizes This patch changes the tablet size map in load_stats. Previously, this data structure was: std::unordered_map<range_based_tablet_id, uint64_t> tablet_sizes; and is changed into: std::unordered_map<table_id, std::unordered_map<dht::token_range, uint64_t>> tablet_sizes; This allows for improved performance of tablet tablet size reconciliation.	2025-10-24 14:37:00 +02:00
Wojciech Mitros	c0d0f8f85b	database: rename _view_update_concurrency_sem to _view_update_memory_sem In the following commit, we'll introduce a new semaphore for view updates that limits their concurrency by view update count. To avoid confusion, we rename the existing semaphore that tracks the memory used by concurrent view updates and related objects accordingly.	2025-10-23 10:00:15 +02:00
Petr Gusev	22271b9fe7	test_automatic_cleanup: add test_cleanup_waits_for_stale_writes	2025-10-22 16:31:43 +02:00
Łukasz Paszkowski	7ec369b900	database: Log message after critical_disk_utilization mode is set This is a follow-up of the previous fix: https://github.com/scylladb/scylladb/pull/26030 The test test_user_writes_rejection starts a 3-node cluster and creates a large file on one of the nodes, to trigger the out-of-space prevention mechanism, which should reject writes on that node. It waits for the log message 'Setting critical disk utilization mode: true' and then executes a write expecting the node to reject it. Currently, the message is logged before the `_critical_disk_utilization` variable is actually updated. This causes the test to fail sporadically if it runs quickly enough. The fix splits the logging into two steps: 1. "Asked to set critical disk utilization mode" - logged before any action 2) "Set critical disk utilization mode" - logged after `_critical_disk_utilization` has been updated The tests are updated to wait for the second message. Fixes https://github.com/scylladb/scylladb/issues/26004 Closes scylladb/scylladb#26392	2025-10-20 13:24:10 +03:00
Pavel Emelyanov	44ed3bbb7c	Merge 'RFC: Initial GCP storage backend for scylla (sstables + backup)' from Calle Wilund Integrates GCP object storage as a working storage backend for scylla sstables as well as backup storage. Adds an abstraction layer (atm very heavily designed around the s3 client interface and usage) to allow the "storage" etc layers of sstable management to pick transparently between "s3" and "gs" providers. This modifies the scylla config such that endpoints can optionally (through a "type" param) ref a GS backend. Similarly with storage_options. Also adds some IO wrapping primitives to make it more feasible to place some logic at a mid level of the implementation stack (such as making networked storage files, ranged reading etc). Test s3 fixture is replaced (where appropriate) with an `object_storage` fixture that multiplexes the test across both backends. Unit tests are duplicated and for the GS versions use a boost test fixture for GCS, default local fake. Fixes #25359 Fixes #26453 Closes scylladb/scylladb#26186 * github.com:scylladb/scylladb: docs::dev::object_storage: Add some initial info on GS storage docs/dev: Add mention of (nested) docker usage in testing.md sstables::object_storage_client: Forward memory limit semaphore to GS instance utils::gcp::object_storage: Add optional memory limits to up/download sstables::object_storage_client: Add multi-upload support for GS utils::gcp::storage: Add merge objects operation test_backup/test_basic: Make tests multiplex both s3 and gs backends test::cluster::conftest: Add support for multiple object storage backends boost::gcs_storage_test: reindent boost::gcs_storage_test: Convert to use fixture tests::boost: Add GS object storage cases to mirror S3 ones tests::lib::gcs_fixture: Add a reusable test fixture for real/fake GS/GCS tests::lib::test_utils: Add overloads/helpers for reading and (temp) writing env sstables::object_storage_client: Add google storage implementation test_services: Allow testing with GS object storage parameters utils::gcp::gcp_credentials: Add option to create uninitialized credentials utils::gcp::object_storage: Make create_download_source return seekable_data_source utils::gcp::object_storage: Add defensive copies of string_view params utils::gcp::object_storage: Add missing retry backoff increate utils::gcp::object_storage: Add timestamp to object listing utils::gcp::object_storage: Add paging support to list_objects object_storage_client: Add object_name wrapper type utils::gcp::object_storage: Add optional abort_source utils::rest::client: Add abort_source support sstables: Use object_storage_client for remote storage sstables::object_storage_client: Add abstraction layer for OS cliens (s3 initial) s3::upload_progress: Promote to general util type storage_options: Abstract s3 to "object_storage" and add gs as option sstables::file_io_extension: Change "creator" callback to just data_source utils::io-wrappers: Add ranged data_source utils::io-wrappers: Add file wrapper type for seekable_source utils::seekable_source: Add a seekable IO source type object_storage_endpoint_param: Add gs storage as option config: break out object_storage_endpoint_param preparing for multi storage	2025-10-20 13:14:53 +03:00
Piotr Dulikowski	70b0cfb13e	Merge 'test: cluster: Replica exceptions tests' from Dario Mirovic This patch series introduces several tests that check number of exceptions that happens during various replica operations. The goal is to have a set of tests that can catch situations where number of exceptions per operation increases. It makes exception throw regressions easier to catch. The tests cover apply counter update and apply functionalities in the database layer. There are more paths that can be checked, like various semaphore wait timeouts located deeper in the code. This set of tests does not cover all code paths. Fixes #18164 This is an improvement. No backport needed. Closes scylladb/scylladb#25992 * github.com:scylladb/scylladb: test: cluster: test replica write timeout database: parameterize apply_counter_update_delay_5s injector value test: cluster: test replica exceptions - test rate limit exceptions	2025-10-20 10:03:31 +03:00
Dario Mirovic	1d93f342f9	test: cluster: test replica write timeout This patch introduces test `test_replica_database_apply_timeout`. It tests timeout on database write. The test uses error injection that returns timeout error if the injection `database_apply_force_timeout` is enabled. Refs #18164	2025-10-17 11:52:11 +02:00
Dario Mirovic	ff88fe2d76	database: parameterize apply_counter_update_delay_5s injector value Parameterize `apply_counter_update_delay_5s` injector value. Instead of sleeping 5s when the injection is active, read parameter value that specifies sleep duration. To reflect these changes, it is renamed to `apply_counter_update_delay_ms` and the sleep duration is specified in milliseconds. Refs #18164	2025-10-17 11:52:10 +02:00
Tomasz Grabiec	c4a87453a2	Merge 'Add experimental feature flag for strongly consistent tables and extend kesypace creation syntax to allow specifying consistency mode.' from Gleb Natapov The series adds an experimental flag for strongly consistent tables and extends "CREATE KEYSPACE" ddl with `consistency` option that allows specifying the consistency mode for the keyspace. Closes scylladb/scylladb#26116 * github.com:scylladb/scylladb: schema: Allow configuring consistency setting for a keyspace db: experimental consistent-tablets option	2025-10-16 21:48:06 +02:00
Tomasz Grabiec	e6c427953e	Merge 'schema_applier: unify handling of token_metadata during schema change' from Marcin Maliszkiewicz This patchset improves the atomicity and clarity of schema application in the presence of token metadata updates during schema changes. The primary focus is to ensure that changes to tablet metadata are applied atomically as part of the schema commit phase, rather than being replicated to all cores afterward, which previously violated atomicity guarantees. Key changes: - Introduced pending_token_metadata to unify handling of new and existing metadata. - Split token metadata replication into prepare and commit steps. - Abstracted schema dependencies in storage_service to support pending schema visibility. - Applied tablet metadata updates atomically within schema commit phase. Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/24414 Closes scylladb/scylladb#25302 * github.com:scylladb/scylladb: db: schema_applier: update tablet metadata atomically db: replica: move tables_metadata locking to commit storage_service: abstract schema dependecies during token metadata update storage_service: split replicate_to_all_cores to steps db: schema_applier: unify token_metadata loading replica: schema_applier: obtain copy of token_metadata at the beginning of schema merge service: fix dependencies during migration_manager startup db: schema_applier: move pending_token_metadata to locator db: always use _tablet_hint as condition for tablet metadata change db: refactor new_token_metadata into pending_token_metadata db: rename new_token_metadata to pending_token_metadata db: schema_applier: move types storage init to merge_types func db: schema_applier: make merge functions non-static members db: remove unused proxy from create_keyspace_metadata	2025-10-16 21:43:49 +02:00
Gleb Natapov	c255740989	schema: Allow configuring consistency setting for a keyspace We want to add strongly consistent tables as an option. We will have two kind of strongly consistent tables: globally consistent and locally consistent. The former means that requests from all DCs will be globally linearisable while the later - only requests to the same DCs will be linearisable. To allow configuring all the possibilities the patch adds new parameter to a keyspace definition "consistency" that can be configured to be `eventual`, `global` or `local`. Non eventual setting is supported for tablets enabled keyspaces only. Since we want to start with implementing local consistency configuring global consistency will result in an error for now.	2025-10-16 13:34:49 +03:00
Marcin Maliszkiewicz	e5fffa158f	db: replica: move tables_metadata locking to commit This keeps the locking scope minimal, and since unlocking is done in commit(), locking fits here as well.	2025-10-16 10:56:10 +02:00
Botond Dénes	5d70450917	replica/mutation_dump: multi_range_partition_generator: disable garbage-collection Make use of the freshly introduced facility to disable garbage-collection on a per-query basis for range scans. This is needed so partitions that only contain garbage-collectible data are not missing from the partition-list. When using SELECT * FROM MUTATION_FRAGMENTS(), the user is expecting to see all data, even that which is dead and garbage-collectible. Include a test which reproduces the issue.	2025-10-16 10:40:28 +03:00

1 2 3 4 5 ...

1743 Commits