scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 03:30:49 +00:00

Author	SHA1	Message	Date
copilot-swe-agent[bot]	98fafb25b2	Address code review comments: improve documentation and exception handling - Add detailed comments explaining leaf depth calculation - Document prefix encoding format (length in lower 7 bits, value in upper bits) - Replace bare except clauses with specific exception types - Catch only relevant exceptions (gdb.error, MemoryError, ValueError, AttributeError) Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>	2025-12-08 22:08:05 +00:00
copilot-swe-agent[bot]	b17de07c43	Enhance compact_radix_tree wrapper with better documentation and error handling - Add comprehensive usage examples in docstring - Improve error messages for optimized builds - Document limitations and workarounds - Show tree size and layout info when elements not accessible - Provide guidance for users encountering limitations Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>	2025-12-08 22:06:05 +00:00
copilot-swe-agent[bot]	4b7f760a38	Implement compact_radix_tree wrapper with std_map-like API Add wrapper class for compact_radix_tree that provides: - Iteration over elements (__iter__) - Indexing by column id (__getitem__) - Dictionary-like methods (keys, values, items, get) - Length support (__len__) Note: Full tree traversal is limited by compiler optimizations and GDB's inability to call C++ template methods directly. The implementation provides the API framework with best-effort element collection. Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>	2025-12-08 22:04:15 +00:00
copilot-swe-agent[bot]	c824803a24	Initial plan	2025-12-08 21:56:45 +00:00
Amnon Heiman	a213e41250	scylla-node-exporter: Add ethtool to node exporter AWS suggests following multiple network performance metrics: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html#network-performance-metrics This patch enables the ethtool collector with the specific list of metrics Ater this patch the relevant metris looks like: $ curl http://localhost:9100/metrics \|& grep ethtool node_ethtool_bw_in_allowance_exceeded{device="ens5"} 0 node_ethtool_bw_out_allowance_exceeded{device="ens5"} 0 node_ethtool_conntrack_allowance_available{device="ens5"} 51303 node_ethtool_conntrack_allowance_exceeded{device="ens5"} 0 node_ethtool_info{bus_info="0000:00:05.0",device="ens5",driver="ena",expansion_rom_version="",firmware_version="",version="6.14.0-1015-aws"} 1 node_ethtool_linklocal_allowance_exceeded{device="ens5"} 0 node_scrape_collector_duration_seconds{collector="ethtool"} 0.001091436 node_scrape_collector_success{collector="ethtool"} 1 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#27358	2025-12-08 14:27:10 +02:00
Dawid Mędrek	58dc414912	test/cluster/mv: Rewrite test_view_building_scheduling_group We rewrite the test to avoid flakiness. Instead of looking at the metrics, we make a trade-off and start depending on a less reliable mechanism -- logs. We grep all relevant messages printed by Scylla in TRACE mode and make sure that they were all printed from a context using the streaming scheduling group. Although it's a "less proper" way of testing, it should be much more dependable and avoid flakiness. Fixes scylladb/scylladb#25957 Closes scylladb/scylladb#26656	2025-12-08 14:24:25 +02:00
Ferenc Szili	d883ff2317	test: fix flakyness caused by TRUNCATE retries The test test_truncate_during_topology_change tests TRUNCATE TABLE while bootstrapping a new node. With tablets enabled TRUNCATE is a global topology operation which needs to serialize with boostrap. When TRUNCATE TABLE is issued, it first checks if there is an already queued truncate for the same table. This can happen if a previous TRUNCATE operation has timed out, and the client retried. The newly issued truncate will only join the queued one if it is waiting to be processed, and will fail immediatelly if the TRUNCATE is already being processed. In this test, TRUNCATE will be retried after a timeout (1 minute) due to the default retry policy, and will be retried up to 3 times, while the bootstrap is delayed by 2 minutes. This means that the test can validate the result of a truncate which was started after bootstrap was completed. Because of the way truncate joins existing truncate operations, we can also have the following scenario: - TRUNCATE times out after one minute because the new node is being bootstrapped - the client retries the TRUNCATE command which also times out after 1m - the third attempt is received during TRUNCATE being processed which fails the test This patch changes the retry policy of the TRUNCATE operation to FallthroughRetryPolicy which guarantees that TRUNCATE will not be retried on timeout. It also increases the timeout of the TRUNCATE from 1 to 4 minutes. This way the test will actually validate the performance of the TRUNCATE operation which was issued during bootstrap, instead of the subsequent, retried TRUNCATEs which could have been issued after the bootstrap was complete. Fixes: #26347 Closes scylladb/scylladb#27245	2025-12-08 14:13:26 +02:00
dependabot[bot]	1f777da863	build(deps): bump sphinx-scylladb-theme from 1.8.9 to 1.8.10 in /docs Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.9 to 1.8.10. - [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases) - [Commits](https://github.com/scylladb/sphinx-scylladb-theme/commits) --- updated-dependencies: - dependency-name: sphinx-scylladb-theme dependency-version: 1.8.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Closes scylladb/scylladb#27468	2025-12-08 13:40:51 +02:00
Asias He	faad0167d7	repair: Add tablet repair progress report support This patch adds tablet repair progress report support so that the user could use the /task_manager/task_status API to query the progress. In order to support this, a new system table is introduced to record the user request related info, i.e, start of the request and end of the request. The progress is accurate when tablet split or merge happens in the middle of the request, since the tokens of the tablet are recorded when the request is started and when repair of each tablet is finished. The original tablet repair is considered as finished when the finished ranges cover the original tablet token ranges. After this patch, the /task_manager/task_status API will report correct progress_total and progress_completed. Fixes #22564 Fixes #26896 Closes scylladb/scylladb#26924	2025-12-08 13:35:19 +02:00
Andrei Chekun	0115a21b9a	test.py: fail test when timeout reached for boost test There is a bug in current pytest's boost implementation. When timeout reached process will be killed, but it was not correctly propagated, that lead to a false positive result. This will fail test case when timeout for the process is reached. This is to prevent issues like this https://github.com/scylladb/scylladb/issues/27237 Closes scylladb/scylladb#27463	2025-12-08 11:49:46 +01:00
Pavel Emelyanov	8192f45e84	Merge 'Add option to use sstable identifier in snapshot' from Benny Halevy This change adds a new option to the REST api and correspondingly, to scylla nodetool: use_sstable_identifier. When set, we use the sstable identifier, if available, to name each sstable in the snapshots directory and the manifest.json file, rather than using the sstable generation. This can be used by the user (e.g. Scylla Manager) for global deduplication with tablets, where an sstable may be migrated across shards or across nodes, and in this case, its generation may change, but its sstable identifier remains sstable. Currently, Scylla manager uses the sstable generation to detect sstables that are already backed up to object storage and exist in previous backed up snapshots. Historically, the sstable generation was guaranteed to be unique only per table per node, so the dedup code currently checks for deduplication in the node scope. However, with tablet migration, sstables are renamed when migrated to a different shard, i.e. their generation changes, and they may be renamed when migrated to another node, but even if they are not, the dedup logic still assumes uniqueness only within a node. To address both cases, we keep the sstable_id stable throughout the sstable life cycle (since `3a12ad96c7`). Given the globally unique sstable identifier, scylla manager can now detect duplicate sstables in a wider scope. This can be cluster-wide, but we practically need only rack-wide deduplication or dc-wide, as tablets are migrated across racks only in rare occasions (like when converting from a numerical replication factor to a rack list containing a subset of the available racks in a datacenter). Fixes #27181 * New feature, no backport required Closes scylladb/scylladb#27184 * github.com:scylladb/scylladb: database: truncate_table_on_all_shards: set use_sstable_identifier to true nodetool: snapshot: add --use-sstable-identifier option api: storage_service: take_snapshot: add use_sstable_identifier option test: database_test: add snapshot_use_sstable_identifier_works test: database_test: snapshot_works: add validate_manifest sstable: write_scylla_metadata: add random_sstable_identifier error injection table: snapshot_on_all_shards: take snapshot_options sstable: add get_format getter sstable: snapshot: add use_sstable_identifier option db: snapshot_ctl: snapshot_options: add use_sstable_identifier options db: snapshot_ctl: move skip_flush to struct snapshot_options	2025-12-08 12:56:12 +03:00
Avi Kivity	45c16553eb	Revert "Update tools/cqlsh submodule" This reverts commit `ff1b212319`. In this commit, the python driver was updated to 3.29.6. That version has a serious flaw - it rejects compression=None settings [1] which cqlsh (legitimately) uses in copyutil.py. The reason this hasn't caused numerous continuous integration failures is that the submodule update commit did not update the frozen toolchain, so the build was effectively running with an older version of the driver. Fix by reverting the change. This allows us to regenerate the frozen toolchain when we need to. Reverted changes: * tools/cqlsh 2240122...6badc99 (2): > Update scylla-driver version to 3.29.6 > Revert "Migrate workflows to Blacksmith" [1] `78f554236f` Closes scylladb/scylladb#27473	2025-12-08 08:50:52 +02:00
Nadav Har'El	c984f557ef	Merge 'alternator: eliminate cross shard ::free for do_batch_write' from Petr Gusev This is an optimization follow-up [for this PR](https://github.com/scylladb/scylladb/pull/27396#issuecomment-3611410774): avoiding destruction of foreign objects on the wrong shard. Releasing objects allocated on a different shard causes their ::free calls to be executed remotely, which adds unnecessary load to the SMP subsystem. Before this PR, a `std::vector<put_or_delete_item>` could be moved to another shard. When the vector was eventually destroyed, its ::free had to be marshalled back to the shard where the memory had originally been allocated. This change avoids that overhead by passing the vector by const reference instead. backport: not needed, this is an optimization Closes scylladb/scylladb#27432 * github.com:scylladb/scylladb: alternator/executor.cc: avoid cross-shard free storage_proxy: cas: take cas_request by raw reference	2025-12-07 22:54:36 +02:00
Andrei Chekun	5e83311305	test.py: switch to ThreadPoolExecutor With python 3.14, the Process fails due to pickling issue with nodes objects. This will eliminate this issue, so we can bump up the python version. Closes scylladb/scylladb#27456	2025-12-07 17:37:25 +02:00
Petr Gusev	f00f7976c1	alternator/executor.cc: avoid cross-shard free This commit is an optimization: avoiding destruction of foreign objects on the wrong shard. Releasing objects allocated on a different shard causes their ::free calls to be executed remotely, which adds unnecessary load to the SMP subsystem. Before this patch, a std::vector could be moved to another shard. When the vector was eventually destroyed, its ::free had to be marshalled back to the shard where the memory had originally been allocated. This change avoids that overhead by passing the vector by const reference instead. The referenced objects lifetime correctness reasoning: * the put_or_delete_item refs usages in put_or_delete_item_cas_request are bound to its lifetime * cas_request lifetime is bound to storage_proxy::cas future * we don't release put_or_delete_item-s untill all storage_proxy::cas calls are done.	2025-12-07 16:14:56 +01:00
Petr Gusev	c428645d16	storage_proxy: cas: take cas_request by raw reference In the next commit we want to add an optimization that relies on precise control over the lifetime of cas_request. In particular, we want the implementation of this interface in Alternator to operate on raw references that are guaranteed to remain valid only until the cas() future is resolved. We already depend on the same lifetime assumptions in cas_request when used by modification_statement. However, these assumptions are not clearly expressed in the current interface: cas_request is taken by shared_ptr, and nothing prevents cas() from storing that pointer inside paxos_response_handler, which may outlive the cas() future. This commit fixes that by taking cas_request by raw reference. This makes it explicit that cas() does not assume ownership of the object. Callers must ensure that the referenced object remains valid until the returned future is resolved.	2025-12-07 16:14:56 +01:00
Tomasz Grabiec	082342ecad	Attach names to allocating sections for better debuggability Large reserves in allocating_section can cause stalls. We already log reserve increase, but we don't know which table it belongs to: lsa - LSA allocation failure, increasing reserve in section 0x600009f94590 to 128 segments; Allocating sections used for updating row cache on memtable flush are notoriously problematic. Each table has its own row_cache, so its own allocating_section(s). If we attached table name to those sections, we could identify which table is causing problems. In some issues we suspected system.raft, but we can't be sure. This patch allows naming allocating_sections for the purpose of identifying them in such log messages. I use abstract_formatter for this purpose to avoid the cost of formatting strings on the hot path (e.g. index_reader). And also to avoid duplicating strings which are already stored elsewhere. Fixes #25799 Closes scylladb/scylladb#27470	2025-12-07 14:14:25 +02:00
Avi Kivity	47efbdffbc	Merge 'cache, mvcc: Preempt cache update when applying range tombstone from memtable' from Tomasz Grabiec Range tombstones are represented as entry attributes, which applies to the interval between entries. So if a range tombstone covers many rows, to apply it we have to update all covered entries. In some workloads that could be many entries, even the whole cache. Before the patch, we did this update without preemption, which can cause reactor stalls in such workloads. This scenario is already covered by mvcc_tests, e.g. test_apply_to_incomplete_respects_continuity. And I verified that the new preemption point is hit in the test. perf-row-cache-update results show no significant stalls anymore (max 2ms scheduling delay, instead of previous 1.5 s): Generated 1124195 rows Memtable fill took 4179.457520 [ms], {count: 8295, 99%: 0.654949 [ms], max: 32.817176 [ms]} Draining... took 0.000616 [ms] cache: 2506/2948 [MB], memtable: 781/1024 [MB], alloc/comp: 1051/662 [MB] (amp: 0.630) update: 2874.157471 [ms], preemption: {count: 26650, 99%: 1.131752 [ms], max: 2.068762 [ms]}, cache: 3027/3973 [MB], alloc/comp: 3951/2424 [MB] (amp: 0.614), pr/me/dr 1124195/0/0 Fixes #23479 Fixes #2578 Closes scylladb/scylladb#27469 * github.com:scylladb/scylladb: cache, mvcc: Preempt cache update when applying range tombstone from memtable partition_snapshot_row_cursor: Clarify non-obvious semantic difference of range_tombstone() perf-row-cache-update: Add scenario with large tombstone covering many rows	2025-12-07 11:54:15 +02:00
Avi Kivity	d811eeb4ca	Merge 'Make direct failure detector verb handler more efficient' from Gleb Natapov We saw that in large clusters direct failure detector may cause large task queues to be accumulated. The series address this issue and also moves the code into the correct scheduling group. Fixes https://github.com/scylladb/scylladb/issues/27142 Backport to all version where `60f1053087` was backported to since it should improve performance in large clusters. Closes scylladb/scylladb#27387 * github.com:scylladb/scylladb: direct_failure_detector: run direct failure detector in the gossiper scheduling group raft: drop invoke_on from the pinger verb handler direct_failure_detector: pass timeout to direct_fd_ping verb	2025-12-07 11:40:26 +02:00
Marcin Maliszkiewicz	4784e39665	auth: fix ctor signature of certificate_authenticator In `b9199e8b24` we added cache argument to constructor of authenticators but certificate_authenticator was ommited. Class registrator sadly only fails in runtime for such cases. Fixes https://github.com/scylladb/scylladb/issues/27431 Closes scylladb/scylladb#27434	2025-12-07 11:18:42 +02:00
Tomasz Grabiec	d4014b7970	Drop legacy schema support We switched to using v3 schema tables (in system_schema keyspace) in 2017, in `9eb91bc30b`. So no system should have the old schema any more. No need to run legacy_schema_migrator on boot. Closes scylladb/scylladb#27420	2025-12-07 00:09:13 +02:00
Tomasz Grabiec	92b5e4d63d	cache, mvcc: Preempt cache update when applying range tombstone from memtable Range tombstones are represented as entry attributes, which applies to the interval between entries. So if a range tombstone covers many rows, to apply it we have to update all covered entries. In some workloads that could be many entries, even the whole cache. Before the patch, we did this update without preemption, which can cause reactor stalls in such workloads. This scenario is already covered by mvcc_tests, e.g. test_apply_to_incomplete_respects_continuity. And I verified that the new preemption point is hit in the test. perf-row-cache-update results show no significant stalls anymore (max 2ms scheduling delay, instead of previous 1.5 s): Generated 1124195 rows Memtable fill took 4179.457520 [ms], {count: 8295, 99%: 0.654949 [ms], max: 32.817176 [ms]} Draining... took 0.000616 [ms] cache: 2506/2948 [MB], memtable: 781/1024 [MB], alloc/comp: 1051/662 [MB] (amp: 0.630) update: 2874.157471 [ms], preemption: {count: 26650, 99%: 1.131752 [ms], max: 2.068762 [ms]}, cache: 3027/3973 [MB], alloc/comp: 3951/2424 [MB] (amp: 0.614), pr/me/dr 1124195/0/0 Fixes #23479 Fixes #2578	2025-12-06 13:45:35 +01:00
Tomasz Grabiec	e546143fd9	partition_snapshot_row_cursor: Clarify non-obvious semantic difference of range_tombstone()	2025-12-06 01:03:10 +01:00
Tomasz Grabiec	721434054b	perf-row-cache-update: Add scenario with large tombstone covering many rows Fills memtable with rows and a tombstone which deletes all rows which are already in cache. Similar to raft log workload, but more extreme. With -c1 -m4G, observed really bad performance: update: 1711.976196 [ms], preemption: {count: 22603, 99%: 0.943127 [ms], max: 1494.571776 [ms]}, cache: 2148/2906 [MB], alloc/comp: 1334/869 [MB] (amp: 0.651), pr/me/dr 1062186/0/1062187 cache: 2148/2906 [MB], memtable: 738/1024 [MB], alloc/comp: 993/0 [MB] (amp: 0.000) Which means that max reactor stall during cache update was 1.5 [s] 0.7 GB memtables. 2.1 GB in cache.	2025-12-06 01:03:09 +01:00
Nadav Har'El	350cbd1d66	alternator: fix typo of BatchWriteItem in comments The DynamoDB API's "BatchWriteItem" operation is spelled like this, in singular. Some comments incorrectly referred to as BatchWriteItems - in plural. This patch fixes those mistakes. There are no functional changes here or changes to user-facing documents - these mistakes were only in code comments. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27446	2025-12-05 15:08:58 +02:00
Botond Dénes	866c96f536	Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk This pull request adds support for calculation and storing CRC32 digests for all SSTable components. This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream. All important SSTable components (Index, Partitions, Rows, Summary, Filter, CompressionInfo, and TOC) are covered. Several test cases where introduced to verify expected behaviour. Backport is not required, it is a new feature Fixes #20100 Closes scylladb/scylladb#27287 * github.com:scylladb/scylladb: sstable_test: add verification testcases of SSTable components digests persistance sstables: store digest of all sstable components in scylla metadata sstables: Add TemporaryScylla metadata component type sstables: Extract file writer closing logic into separate methods sstables: Add components_digests to scylla metadata components sstables: Implement CRC32 digest-only writer	2025-12-05 11:36:50 +02:00
Botond Dénes	367633270a	Merge 'EAR: handle IPV6 hosts in KMIP and use shared (improved) http parser in AWS/Azure' from Calle Wilund Fixes #27367 Fixes #27362 Fixes #27366 Makes http URL parser handle IPv6. Makes KMIP host setup handle IPv6 hosts + use system trust if no truststore set Moves Azure/KMS code to use shared http URL parser to avoid same regex everywhere. Closes scylladb/scylladb#27368 * github.com:scylladb/scylladb: ear::kms/ear::azure: Use utils::http URL parsing ear::kmip_host: Handle ipv6 hosts + use system trust when not specified utils::http: Handle ipv6 numeric host part in URL:s	2025-12-05 10:43:07 +02:00
Asias He	e97a504775	repair: Allow min max range to be updated for repair history It is observed that: repair - repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: Failed to update system.repair_history table of node d27de212-6f32-4649ad76-a9ef1165fdcb: seastar::rpc::remote_verb_error (repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: range (minimum token,maximum token) is not in the format of (start, end]) This is because repair checks the end of the range to be repaired needs to be inclusive. When small_table_optimization is enabled for regular repair, a (minimum token,maximum token) will be used. To fix, we can relax the check of (start, end] for the min max range. Fixes #27220 Closes scylladb/scylladb#27357	2025-12-05 10:41:25 +02:00
Anna Stuchlik	a5c971d21c	doc: update the upgrade policy to cover non-consecutive minor upgrades Fixes https://github.com/scylladb/scylladb/issues/27308 Closes scylladb/scylladb#27319	2025-12-05 10:31:53 +02:00
Guy Shtub	a0809f0032	Update integration-jaeger.rst Fixing broken link in Jaeger Docs to ScyllaDB Closes scylladb/scylladb#26406	2025-12-05 10:23:07 +02:00
Piotr Dulikowski	bb6e41f97a	index: allow vector indexes without rf_rack_valid_keyspces The rf_rack_valid_keyspaces option needs to be turned on in order to allow creating materialized views in tablet keyspaces with numeric RF per DC. This is also necessary for secondary indexes because they use materialized views underneath. However, this option is _not_ necessary for vector store indexes because those use the external vector store service for querying the list of keys to fetch from the main table, they do not create a materialized view. The rf_rack_valid_keyspaces was, by accident, required for vector indexes, too. Remove the restriction for vector store indexes as it is completely unnecessary. Fixes: SCYLLADB-81 Closes scylladb/scylladb#27447	2025-12-05 09:26:26 +02:00
Marcin Maliszkiewicz	4df6b51ac2	auth: fix cache::prune_all roles iteration During `b9199e8b24` reivew it was suggested to use standard for loop but when erasing element it causes increment on invalid iterator, as role could have been erased before. This change brings back original code. Fixes: https://github.com/scylladb/scylladb/issues/27422 Backport: no, offending commit not released yet Closes scylladb/scylladb#27444	2025-12-04 23:35:54 +01:00
Taras Veretilnyk	0c8730ba05	sstable_test: add verification testcases of SSTable components digests persistance Adds a generic test helper that writes a random SSTable, reloads it, and verifies that the persisted CRC32 digest for each component matches the digest computed from disk. Those covers all checksummed components test cases.	2025-12-04 21:09:01 +01:00
Taras Veretilnyk	bc2e83bc1f	sstables: store digest of all sstable components in scylla metadata This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream.	2025-12-04 21:00:09 +01:00
Patryk Jędrzejczak	f4c3d5c1b7	Merge 'fix test_coordinator_queue_management flakiness' from Gleb Natapov After `39cec4ae45` node join may fail with either "request canceled" notification or (very rarely) because it was banned. Depend on timing. The series fixes the test to check for both possibilities. Fixes #27320 No need to backport since the flakiness is in the mater only. Closes scylladb/scylladb#27408 * https://github.com/scylladb/scylladb: test: fix test_coordinator_queue_management flakiness test/pylib: allow expected_error in server_start to contain regular expression	2025-12-04 16:08:02 +01:00
Tomasz Grabiec	e54abde3e8	Merge 'main: delay setup of storage_service REST API' from Andrzej Jackowski The storage_service REST API uses `group0` internally. Before this patch, it was possible to send an HTTP request before `group0` was initialized, which resulted in a segmentation fault. Therefore, this patch delays the setup of the storage_service REST API. Additionally, `test_rest_api_on_startup` is added to reproduce the problem. Fixes: https://github.com/scylladb/scylladb/issues/27130 No backport. It's a crash fix but possible only if a request is sent in a very specific phase of a node start. Closes scylladb/scylladb#27410 * github.com:scylladb/scylladb: test: add test_rest_api_on_startup main: delay setup of storage_service REST API	2025-12-04 14:56:49 +01:00
Avi Kivity	9696ee64d0	database: fix overflow when computing data distribution over shards We store the per-shard chunk count in a uint64_t vector global_offset, and then convert the counts to offsets with a prefix sum: ```c++ // [1, 2, 3, 0] --> [0, 1, 3, 6] std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), 0, std::plus()); ``` However, std::exclusive_scan takes the accumulator type from the initial value, 0, which is an int, instead of from the range being iterated, which is of uint64_t. As a result, the prefix sum is computed as a 32-bit integer value. If it exceeds 0x8000'0000, it becomes negative. It is then extended to 64 bits and stored. The result is a huge 64-bit number. Later on we try to find an sstable with this chunk and fail, crashing on an assertion. An example of the failure can be seen here: https://godbolt.org/z/6M8aEbo57 The fix is simple: the initial value is passed as uint64_t instead of int. Fixes https://github.com/scylladb/scylladb/issues/27417 Closes scylladb/scylladb#27418	2025-12-04 14:10:53 +01:00
Calle Wilund	8dd69f02a8	ear::kms/ear::azure: Use utils::http URL parsing Fixes #27367 Move to reuse shared code.	2025-12-04 11:38:41 +00:00
Calle Wilund	d000fa3335	ear::kmip_host: Handle ipv6 hosts + use system trust when not specified Fixes #27362 The KMIP host connector should handle ipv4 connections (named or numeric). It also should fall back to system trust when truststore is not specified.	2025-12-04 11:38:41 +00:00
Calle Wilund	4e289e8e6a	utils::http: Handle ipv6 numeric host part in URL:s Fixes #27366 A URL with numeric host part formats special in case of ipv6, to avoid confusion with port part. The parser should handle this. I.e. http://[2001:db8:4006:812::200e]:8080 v2: * Include scheme agnostic parse + case insensitive scheme matching	2025-12-04 11:38:41 +00:00
Benny Halevy	19b6207f17	database: truncate_table_on_all_shards: set use_sstable_identifier to true To facilitate global sstable deduplication on backup. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:57:39 +02:00
Benny Halevy	ff52550739	nodetool: snapshot: add --use-sstable-identifier option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:57:39 +02:00
Benny Halevy	e654045755	api: storage_service: take_snapshot: add use_sstable_identifier option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:57:39 +02:00
Benny Halevy	07b92a1ee8	test: database_test: add snapshot_use_sstable_identifier_works Test that taking a snapshot with the use_sstable_identifier option (and injecting `random_sstable_identifier`) produces different file names in the snapshot than the original sstable names and validate te manifest.json file respectively. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:57:38 +02:00
Benny Halevy	7504d10d9e	test: database_test: snapshot_works: add validate_manifest Validate the manifest.json format by loading it using rjson::parse and then validate its contents to ensure it lists exactly the SSTables present in the snapshot directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:55:50 +02:00
Benny Halevy	28cb300d0a	sstable: write_scylla_metadata: add random_sstable_identifier error injection To be used by a unit test in the following patch for testing the snapshot use_sstable_identifier option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:55:50 +02:00
Benny Halevy	9b3fbedc8c	table: snapshot_on_all_shards: take snapshot_options And pass the use_sstable_identifier down the stack to the sstables layer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:55:50 +02:00
Benny Halevy	420fb1fd53	sstable: add get_format getter To be used by the snapshot code in te following patch for manufacturing a basename using the sstable_id rather than its generation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:55:50 +02:00
Benny Halevy	7c62417b54	sstable: snapshot: add use_sstable_identifier option When set to true, use the sstable_identifier as the sstable name in the snapshot rather than its generation. sstable::snapshot now returns the generation it used for the sstable in the snapshot, based on the `use_sstable_identifier` option, to be used by the upper layer generating the manifest. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-12-04 11:53:32 +02:00
Botond Dénes	9d2f7c3f52	Merge 'mv: allow setting concurrency in PRUNE MATERIALIZED VIEW' from Wojciech Mitros The PRUNE MATERALIZED VIEW statement is performed as follows: 1. Perform a range scan of the view table from the view replicas based on the ranges specified in the statement. 2. While reading the paged scan above, for each view row perform a read from all base replicas at the corresponding primary key. If a discrepancy is detected, delete the row in the view table. When reading multiple rows, this is very slow because for each view row we need to performe a single row query on multiple replicas. In this patch we add an option to speed this up by performing many of the single base row reads concurrently, at the concurrency specified in the USING CONCURRENCY clause. Aside from the unit test, I checked manually on a 3-node cluster with 10M rows, using vnodes. There were actually no ghost rows in the test, but we still had to iterate over all view rows and read the corresponding base rows. And actual ghost rows, if there are any, should be a tiny fraction of all rows. I compared concurrencies 1,2,10,100 and the results were: * Pruning with concurrency 1 took total 1416 seconds * Pruning with concurrency 2 took total 731 seconds * Pruning with concurrency 10 took total 234 seconds * Pruning with concurrency 100 took total 171 seconds So after a concurrency of 10 or so we're hitting diminishing returns (at least in this setup). At that point we may be no longer bottlenecked by the reads, but by CPU on the shard that's handling the PRUNE Fixes https://github.com/scylladb/scylladb/issues/27070 Closes scylladb/scylladb#27097 * github.com:scylladb/scylladb: mv: allow setting concurrency in PRUNE MATERIALIZED VIEW cql: add CONCURRENCY to the USING clause	2025-12-04 11:47:41 +02:00

1 2 3 4 5 ...

50838 Commits