scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Botond Dénes	1c7a6ba140	replica: improve memtable overlap checks for the cache The current memtable overlap check that is used by the cache -- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only checks the active memtable, so memtables which are either being flushed or are already flushed and also have active reads against them do not participate in the overlap check. This can result in temporary data resurrection, where a cache read can garbage-collect a tombstone which still covers data in a flushing or flushed memtable, which still have active read against it. To prevent this, extend the overlap check to also consider all of the memtable list. Furthermore, memtable_list::erase() now places the removed (flushed) memtable in an intrusive list. These entries are alive only as long as there are readers still keeping an `lw_shared_ptr<memtable>` alive. This list is now also consulted on overlap checks. (cherry picked from commit `d126ea09ba`)	2025-04-10 03:17:27 -04:00
Michał Chojnowski	2a74426084	table: fix a race in table::take_storage_snapshot() `safe_foreach_sstable` doesn't do its job correctly. It iterates over an sstable set under the sstable deletion lock in an attempt to ensure that SSTables aren't deleted during the iteration. The thing is, it takes the deletion lock after the SSTable set is already obtained, so SSTables might get unlinked before we take the lock. Remove this function and fix its usages to obtain the set and iterate over it under the lock. Closes scylladb/scylladb#23397 (cherry picked from commit `e23fdc0799`) Closes scylladb/scylladb#23628	2025-04-08 19:07:22 +03:00
Dawid Mędrek	ecdefe801c	main: Refuse to start node when RF-rack-invalid keyspace exists When a node is started with the option `rf_rack_valid_keyspaces` enabled, the initialization will fail if there is an RF-rack-invalid keyspace. We want to force the user to adjust their existing keyspaces when upgrading to 2025.* so that the invariant that every keyspace is RF-rack-valid is always satisfied. Fixes scylladb/scylladb#23300 (cherry picked from commit `0e04a6f3eb`)	2025-03-21 12:27:04 +00:00
Botond Dénes	47989b1503	Merge 'tasks: add tablet resize virtual task' from Aleksandra Martyniuk In this change, tablet_virtual_task starts supporting tablet resize (i.e. split and merge). Users can see running resize tasks - finished tasks are not presented with the task manager API. A new task state "suspended" is added. If a resize was revoked, it will appear to users as suspended. We assume that the resize was revoked when the tablet number didn't change. Fixes: #21366. Fixes: #21367. No backport, new feature Closes scylladb/scylladb#21891 * github.com:scylladb/scylladb: test: boost: check resize_task_info in tablet_test.cc test: add tests to check revoked resize virtual tasks test: add tests to check the list of resize virtual tasks test: add tests to check spilt and merge virtual tasks status test: test_tablet_tasks: generalize functions replica: service: add split virtual task's children replica: service: pass parent info down to storage_group::split tasks: children of virtual tasks aren't internal by default tasks: initialize shard in task_info ctor service: extend tablet_virtual_task::abort service: retrun status_helper struct from tablet_virtual_task::get_status_helper service: extend tablet_virtual_task::wait tasks: add suspended task state service: extend tablet_virtual_task::get_status service: extend tablet_virtual_task::contains service: extend tablet_virtual_task::get_stats service: add service::task_manager_module::get_nodes tasks: add task_manager::get_nodes tasks: drop noexcept from module::get_nodes replica: service: add resize_task_info static column to system.tablets locator: extend tablet_task_info to cover resize tasks	2025-01-17 14:24:07 +02:00
Botond Dénes	55963f8f79	replica: remove noexcept from token -> tablet resolution path The methods to resolve a key/token/range to a table are all noexcept. Yet the method below all of these, `storage_group_for_id()` can throw. This means that if due to any mistake a tablet without local replica is attempted to be looked up, it will result in a crash, as the exception bubbles up into the noexcept methods. There is no value in pretending that looking up the tablet replica is noexcept, remove the noexcept specifiers so that any bad lookup only fails the operation at hand and doesn't crash the node. This is especially relevant to replace, which still has a window where writes can arrive for tablets that don't (yet) have a local replica. Currently, this results in a crash. After this patch, this will only fail the writes and the replace can move on. Fixes: #21480 Closes scylladb/scylladb#22251	2025-01-17 11:24:09 +03:00
Kefu Chai	7215d4bfe9	utils: do not include unused headers these unused includes were identifier by clang-include-cleaner. after auditing these source files, all of the reports have been confirmed. please note, because quite a few source files relied on `utils/to_string.hh` to pull in the specialization of `fmt::formatter<std::optional<T>>`, after removing `#include <fmt/std.h>` from `utils/to_string.hh`, we have to include `fmt/std.h` directly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-01-14 07:56:39 -05:00
Aleksandra Martyniuk	7ef6900837	replica: service: pass parent info down to storage_group::split Pass task_info down to storage_group::split. In the following patches, it will be used to set the parent of offstrategy_compaction_task_executor and split_compaction_task_executor running as a part of the split. The task_info param will contain task info of a split virtual task.	2025-01-10 10:03:08 +01:00
Piotr Dulikowski	7383013f43	replica/database: add reader concurrency semaphore groups Replace the reader concurrency semaphores for user reads and view updates with the newly introduced reader concurrency semaphore group, which assigns a semaphore for each service level. Each group is statically assigned to some pool of memory on startup and dynamically distribute this memory between the semaphores, relative to the number of shares of the corresponding scheduling group. The intent of having a separate reader concurrency semaphore for each scheduling group is to prevent priority inversion issues due to reads with different priorities waiting on the same semaphore, as well as make memory allocation more fair between service levels due to the adjusted number of shares.	2025-01-02 07:13:34 +01:00
Avi Kivity	4905b1bf76	Merge 'table: make update_effective_replication_map sync again' from Benny Halevy Commit `f2ff701489` introduced a yield in update_effective_replication_map that might cause the storage_group manager to be inconsistent with the new effective_replication_map (e.g. if yielding right before calling `handle_tablet_split_completion`. Also, yielding inside storage_service::replicate_to_all_cores update loop means that base tables and their views aren't updated atomically, that caused scylladb/scylladb#17786 This change essentially reverts `f2ff701489` and makes handle_tablet_split_completion synchronous too. The stopped compaction groups future is kept as a member and storage_group_manager::stop() consumes this future during table::stop(). - storage_service: replicate_to_all_cores: update base and view tables atomically Currently, the loop updating all tables (including views) with the new effective_replication_map may yield, and therefore expose a state where the base and view tables effective_replication_map and topology are out of sync (as seen in scylladb/scylladb#17786) To prevent that, loop over all base tables and for each table update the base table and all views atomically, without yielding, and so allow yielding only between base tables. * Regression was introduced in `f2ff701489`, so backport is required to 6.x, 2024.2 Closes scylladb/scylladb#21781 * github.com:scylladb/scylladb: storage_service: replicate_to_all_cores: clear_gently pending erms test_mv_topology_change: drop delay_after_erm_update injection case storage_service: replicate_to_all_cores: update base and view tables atomically table: make update_effective_replication_map sync again	2024-12-30 23:42:06 +02:00
Takuya ASADA	03461d6a54	test: compile unit tests into a single executable To reduce test executable size and speed up compilation time, compile unit tests into a single executable. Here is a file size comparison of the unit test executable: - Before applying the patch $ du -h --exclude='.o' --exclude='.o.d' build/release/test/boost/ build/debug/test/boost/ 11G build/release/test/boost/ 29G build/debug/test/boost/ - After applying the patch du -h --exclude='.o' --exclude='.o.d' build/release/test/boost/ build/debug/test/boost/ 5.5G build/release/test/boost/ 19G build/debug/test/boost/ It reduces executable sizes 5.5GB on release, and 10GB on debug. Closes #9155 Closes scylladb/scylladb#21443	2024-12-22 19:14:09 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	e65fc35b5e	replica: do not include unused headers these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21836	2024-12-18 13:52:57 +02:00
Benny Halevy	10c4cf930c	table: make update_effective_replication_map sync again Commit `f2ff701489` introduced a yield in update_effective_replication_map that might cause the storage_group manager to be inconsistent with the new effective_replication_map (e.g. if yielding right before calling `handle_tablet_split_completion`. Also, yielding inside storage_service::replicate_to_all_cores update loop means that base tables and their views aren't updated atomically, that caused scylladb/scylladb#17786 This change essentially reverts `f2ff701489` and makes handle_tablet_split_completion synchronous too. The stopped compaction groups future is kept as a memebr and storage_group_manager::stop() consumes this future during table::stop(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-12-15 11:45:08 +02:00
Avi Kivity	841481c202	Merge "move storage proxy and adjacent services to identify hosts by ids" from Gleb " This rather large patch series moves storage proxy and some adjacent services (like migration manager) to use host ids to identify nodes rather than ips. Messaging service gains a capability to address nodes by host ids (which allows dropping translations from topology coordinator code that worked on host ids already) and also makes sure that a node with incorrect host id will reject a message (can happen during address changes). The series gets rid of the raft address map completely and replaces it with the gossiper address map which is managed by the gossiper since translation is now done in the layer below raft. Fixes: scylladb/scylladb#6403 perf-simple-query -- smp 1 -m 1G output Before: enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 64336.82 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41291 insns/op, 24485 cycles/op, 0 errors) 62669.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41277 insns/op, 24695 cycles/op, 0 errors) 69172.12 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41326 insns/op, 24463 cycles/op, 0 errors) 56706.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41143 insns/op, 24513 cycles/op, 0 errors) 56416.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41186 insns/op, 24851 cycles/op, 0 errors) throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65 instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80 cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70 After: enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 65237.35 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 40733 insns/op, 23145 cycles/op, 0 errors) 59283.09 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40624 insns/op, 23948 cycles/op, 0 errors) 70851.03 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40625 insns/op, 23027 cycles/op, 0 errors) 70549.61 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40650 insns/op, 23266 cycles/op, 0 errors) 68634.96 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40622 insns/op, 22935 cycles/op, 0 errors) throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09 instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33 cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59 CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/ SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/ Tested mixed cluster manually. " * 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits) group0: drop unused field from replace_info struct test: rename raft_address_map_test to address_map_test and move if from raft tests raft_address_map: remove raft address map topology coordinator: do not modify expire state for left/new nodes any more in raft address map topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used group0: drop raft address map dependency from raft_rpc group0: move raft_ticker_type definition from raft_address_map.hh storage_service: do not update raft address map on gossiper events group0: drop raft address map dependency from raft_server_with_timeouts group0: move group0 upgrade code to host ids repair: drop raft address map dependency group0: remove unused raft address map getter from raft_group0 group0: drop raft address map from group0_state_machine dependency since it is not used there any more group0: remove dependency on raft address map from group0_state_id_handler gossiper: add get_application_state_ptr that searches by host_id gossiper: change get_live_token_owners to return host ids view: move view building to host id hints: use host id to send hints storage_proxy: remove id_vector_to_addr since it is no longer used db: consistency_level: change is_sufficient_live_nodes to work on host ids ...	2024-12-03 18:18:48 +02:00
Avi Kivity	58baeac0ad	Merge 'compaction: update maintenance sstable set on scrub compaction completion' from Lakshmi Narayanan Sreethar Scrub compaction can pick up input sstables from maintenance sstable set but on compaction completion, it doesn't update the maintenance set leaving the original sstable in set after it has been scrubbed. To fix this, on compaction completion has to update the maintenance sstable if the input originated from there. This PR solves the issue by updating the correct sstable_sets on compaction completion. Fixes #20030 This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2. Closes scylladb/scylladb#21582 * github.com:scylladb/scylladb: compaction: remove unused `update_sstable_lists_on_off_strategy_completion` compaction_group: replace `update_sstable_lists_on_off_strategy_completion` compaction_group: rename `update_main_sstable_list_on_compaction_completion` compaction_group: update maintenance sstable set on scrub compaction completion compaction_group: store table::sstable_list_builder::result in replacement_desc table::sstable_list_builder: remove old sstables only from current list table::sstable_list_builder: return removed sstables from build_new_list	2024-12-02 13:32:49 +02:00
Gleb Natapov	474b47ed22	database: move hits rates handling to host ids Hits rates map is now indexed by ip. Change it to be indexed by host id since this is what storage proxy uses now.	2024-12-02 10:31:12 +02:00
Botond Dénes	055a36ae55	main: dump diagnostics on SIGQUIT Dump a diagnostics report on each shard when receiving a SIGQUIT. The report is logged with a dedicated logger, called diagnostics. The report has multiple parts: * seastar memory diagnostics, similar to that printed by the scylla memory command (from scylla-gdb.py). * reader concurrency semaphore diagnostics for each semaphore. Example report: INFO 2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT: Dumping seastar memory diagnostics Used memory: 3988M Free memory: 58M Total memory: 4G Hard failures: 0 LSA allocated: 4M used: 16 free: 4G Cache: total: 1M used: 642K free: 398K Memtables: total: 3M Regular: real dirty: 0B virt dirty: 0B System: real dirty: 3M virt dirty: 3M Replica: Read Concurrency Semaphores: user: 0/100, 0B/81M, queued: 0 streaming: 0/10, 0B/81M, queued: 0 system: 0/10, 0B/81M, queued: 0 compaction: 0/unlimited, 0B/unlimited view update: 0/50, 0B/40M, queued: 0 Execution Stages: apply stage: Total: 0 Tables - Ongoing Operations: Pending writes (top 10): 0 Total (all) Pending reads (top 10): 0 Total (all) Pending streams (top 10): 0 Total (all) Small pools: objsz spansz usedobj memory unused wst% 8 4K 858 16K 9K 58 10 4K 5 8K 8K 99 12 4K 5 8K 8K 99 14 4K 0 0B 0B 0 16 4K 2k 44K 15K 35 32 4K 4k 136K 16K 11 32 4K 8k 280K 24K 8 32 4K 3k 92K 6K 6 32 4K 4k 140K 21K 14 48 4K 3k 180K 25K 14 48 4K 2k 120K 27K 22 64 4K 2k 156K 18K 11 64 4K 19k 1M 11K 0 80 4K 3k 236K 16K 6 96 4K 6k 572K 49K 8 112 4K 2k 276K 72K 25 128 4K 477 80K 20K 25 160 4K 194 60K 30K 49 192 4K 1k 232K 39K 16 224 4K 2k 468K 15K 3 256 4K 182 100K 55K 54 320 8K 349 152K 43K 28 384 8K 332 288K 164K 56 448 4K 243 180K 74K 40 512 4K 256 244K 116K 47 640 16K 185 192K 76K 39 768 16K 394 432K 137K 31 896 8K 54 192K 144K 75 1024 4K 288 432K 144K 33 1280 32K 92 256K 140K 54 1536 32K 11 128K 111K 86 1792 16K 10 144K 126K 87 2048 8K 487 1M 90K 8 2560 64K 113 384K 100K 26 3072 64K 9 256K 228K 89 3584 32K 3 288K 277K 96 4096 16K 129 912K 396K 43 5120 128K 21 384K 275K 71 6144 128K 4 512K 486K 94 7168 64K 3 576K 553K 96 8192 32K 373 3M 56K 1 10240 64K 6 832K 770K 92 12288 64K 17 960K 756K 78 14336 128K 2 1M 1M 97 16384 64K 14 1M 992K 81 Page spans: index size free used spans 0 4K 4K 5M 1k 1 8K 8K 2M 213 2 16K 16K 2M 106 3 32K 64K 6M 200 4 64K 64K 4M 71 5 128K 384K 3934M 31k 6 256K 1M 256K 5 7 512K 512K 512K 2 8 1M 2M 0B 2 9 2M 2M 2M 2 10 4M 4M 0B 1 11 8M 16M 0B 2 12 16M 32M 0B 2 13 32M 0B 32M 1 14 64M 0B 0B 0 15 128M 0B 0B 0 16 256M 0B 0B 0 17 512M 0B 0B 0 18 1G 0B 0B 0 19 2G 0B 0B 0 20 4G 0B 0B 0 21 8G 0B 0B 0 22 16G 0B 0B 0 23 32G 0B 0B 0 24 64G 0B 0B 0 25 128G 0B 0B 0 26 256G 0B 0B 0 27 512G 0B 0B 0 28 1T 0B 0B 0 29 2T 0B 0B 0 30 4T 0B 0B 0 31 8T 0B 0B 0 INFO 2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT: Semaphore user with 0/100 count and 0/84850769 memory resources: user request, dumping permit diagnostics: permits count memory table/operation/state 0 0 0B total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 0 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 0 reads_admitted: 0 reads_enqueued_for_admission: 0 reads_enqueued_for_memory: 0 reads_admitted_immediately: 0 reads_queued_because_ready_list: 0 reads_queued_because_need_cpu_permits: 0 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 0 reads_queued_with_eviction: 0 total_permits: 0 current_permits: 0 need_cpu_permits: 0 awaits_permits: 0 disk_reads: 0 sstables_read: 0 INFO 2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT: Semaphore streaming with 0/10 count and 0/84850769 memory resources: user request, dumping permit diagnostics: permits count memory table/operation/state 0 0 0B total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 6 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 0 reads_admitted: 6 reads_enqueued_for_admission: 0 reads_enqueued_for_memory: 0 reads_admitted_immediately: 6 reads_queued_because_ready_list: 0 reads_queued_because_need_cpu_permits: 0 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 0 reads_queued_with_eviction: 0 total_permits: 6 current_permits: 0 need_cpu_permits: 0 awaits_permits: 0 disk_reads: 0 sstables_read: 0 INFO 2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT: Semaphore compaction with 0/2147483647 count and 0/9223372036854775807 memory resources: user request, dumping permit diagnostics: permits count memory table/operation/state 0 0 0B total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 0 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 0 reads_admitted: 0 reads_enqueued_for_admission: 0 reads_enqueued_for_memory: 0 reads_admitted_immediately: 0 reads_queued_because_ready_list: 0 reads_queued_because_need_cpu_permits: 0 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 0 reads_queued_with_eviction: 0 total_permits: 27 current_permits: 0 need_cpu_permits: 0 awaits_permits: 0 disk_reads: 0 sstables_read: 0 INFO 2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT: Semaphore system with 0/10 count and 0/84850769 memory resources: user request, dumping permit diagnostics: permits count memory table/operation/state 1 0 0B ./view_builder/active 1 0 0B total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 234 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 0 reads_admitted: 234 reads_enqueued_for_admission: 154 reads_enqueued_for_memory: 0 reads_admitted_immediately: 80 reads_queued_because_ready_list: 154 reads_queued_because_need_cpu_permits: 0 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 0 reads_queued_with_eviction: 0 total_permits: 235 current_permits: 1 need_cpu_permits: 0 awaits_permits: 0 disk_reads: 0 sstables_read: 0 INFO 2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT: Semaphore view_update with 0/50 count and 0/42425384 memory resources: user request, dumping permit diagnostics: permits count memory table/operation/state 0 0 0B total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 0 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 0 reads_admitted: 0 reads_enqueued_for_admission: 0 reads_enqueued_for_memory: 0 reads_admitted_immediately: 0 reads_queued_because_ready_list: 0 reads_queued_because_need_cpu_permits: 0 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 0 reads_queued_with_eviction: 0 total_permits: 0 current_permits: 0 need_cpu_permits: 0 awaits_permits: 0 disk_reads: 0 sstables_read: 0 Fixes: scylladb/scylladb#7400 Closes scylladb/scylladb#21692	2024-11-28 18:52:29 +02:00
Lakshmi Narayanan Sreethar	0e08ccd307	table::sstable_list_builder: return removed sstables from build_new_list Updated the method table::sstable_list_builder::build_new_list() to return the list of sstables that was removed along with the newly built sstable set. This change will be used to unify the `update_sstable_lists` variants in a following patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-11-28 11:25:11 +05:30
Ernest Zaslavsky	793f2c95d1	snapshots: Stop taking snapshots of MVs Stop taking snapshots of MVs and allow taking snapshot of individual tables, now one can take a snapshot of any base table, any view or index. Also add tests to cover new cases both boost test (using cc code) and pytest (using the API) Also, update documentation to reflect the change fixes: #21339 fixes: #20760 Closes scylladb/scylladb#21433	2024-11-26 15:27:30 +02:00
Kefu Chai	a5ee0c896b	treewide: migrate from boost::adaptors::filtered to std::views::filter Modernize the codebase by replacing Boost range adaptors with C++23 standard library views, reducing external dependencies and leveraging modern C++ language features. Key Changes: - Replace `boost::adaptors::filtered` with `std::views::filter` - Remove `#include <boost/range/adaptor/filtered.hpp>` - Utilize standard library range views Motivation: - Reduce project's external dependency footprint - Leverage standard library's range and view capabilities - Improve long-term code maintainability - Align with modern C++ best practices Implementation Challenges and Considerations: 1. Range Conversion and Move Semantics - `std::ranges::to` adaptor requires rvalue references - Necessitated updates to variable and parameter constness - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const` from `common` to enable efficient range conversion 2. Range Iteration and Mutation - Range views may mutate internal state during iteration - Cannot pass ranges by const reference in some scenarios - Solution: Pass ranges by rvalue reference to explicitly indicate state invalidation Limitations: - One instance of `boost::adaptors::filtered` temporarily preserved due to lack of a C++23 alternative for `boost::join()` - A comprehensive replacement will be addressed in a follow-up change This change is part of our ongoing effort to modernize the codebase, reducing external dependencies and adopting modern C++ practices. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21648	2024-11-26 14:26:50 +02:00
Lakshmi Narayanan Sreethar	c4db4abcae	replica/table: implement `get_token_range_after_split()` wrappers Expose the functionality of `tablet_map::get_token_range_after_split()` via the replica::table class. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-11-11 12:24:00 +05:30
Botond Dénes	8938e06ebe	replica/database: make_multishard_streaming_reader(): expose the read_ahead parameter Continuing the previous patch, expose the just added read_ahead parameter of make_multishard_combining>_reader_v2(). Set to read_ahead::yes by all callers, keeping the current default.	2024-11-07 02:47:54 -05:00
Botond Dénes	e2344e28b6	replica/database: make_multishard_streaming_reader(): expose buffer_hint parameter Expose the buffer hint functionality added by the previous commits, to callers of make_multishard_streaming_reader(). All callers disable it currently, it will be used in the next patch.	2024-11-07 02:47:46 -05:00
Botond Dénes	519e167611	Merge 'replica/table: check memtable before discarding tombstone during read' from Lakshmi Narayanan Sreethar On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables	2024-10-23 10:28:00 +03:00
Nadav Har'El	5fd3177057	Merge 'mv: add a dedicated read concurrency semaphore for view update read before writes' from Wojciech Mitros When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. This patch also adds a test to confirm that the view update workload doesn't impact the read latency, as well as a test which confirms that we do not run out of memory even under heavy view udpate workload. The issue of view updates causing increased latencies most often occurs in the following scenario: * we have a medium to high write workload to a table with a materialized view which requires reading from the base table before sending the update to delete the old rows * we have any read workload * one replica is slower or is handling more writes due to an imbalance of data distribution * we write with a cl<ALL, the mentioned replica is replying to write requests slower while new ones keep being sent to it. * each write performs a read first taking resources from the user read concurrency semaphore, so when enough writes accumulate the reads using the semaphore start getting queued * the queue is shared by regular reads and view update reads. When there's enough view update reads in the queue, regular reads start getting increased latencies An sct test (perf-regression-latency-mv-read-concurrency) was prepared to somewhat resemble this scenario: * the tables were prepared satisfying the conditions above * we use a medium write workload and a very low read workload * the imbalance is achieved by writing to just a few (10) partitions - some replicas (and shards) can have twice or more used partitions than others. We also keep writing to a limited (though high) number of rows, to cause overwrites which require reading before sending the view update * to minimize the test case, we use a cluster of 3 nodes and rf=2, we write with cl=ONE to have background replica writes and read with cl=ALL to wait for the slower replica to respond. In the test above: * without the fix, the latency of reads increases over 50s * with the fix, the latency of reads stays below 20ms Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805 The patch is not that small and it isn't fixing a regression, so no backports Closes scylladb/scylladb#20887 * github.com:scylladb/scylladb: test: add test for high view update concurrency causing bad_allocs test: add test for high view update concurrency degrading read latency mv: add a dedicated read concurrency semaphore for view update read before writes	2024-10-22 22:17:23 +03:00
Pavel Emelyanov	516a5f06a8	sstables: Open-code format_table_directory_name() moved recently This helper is small enough and it's easier to understand how table directory name is formatted without it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:18:19 +03:00
Pavel Emelyanov	eeb0d637bb	replica,sstables: Move format_table_directory_name() Now this helper is not needed in replica code, as all manipulations of tables' sstables now sit in the sstables/storage.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:17:30 +03:00
Pavel Emelyanov	74728d3889	table: Remove all_datadirs It's write-only now, all the places than wanted to know where table's storage is (well -- "are", there can be several directories) already use storage_options. This finishes the work started by `9fe64b5d70`. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:15:54 +03:00
Wojciech Mitros	242079d70b	mv: add a dedicated read concurrency semaphore for view update read before writes When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805	2024-10-21 11:02:06 +02:00
Lakshmi Narayanan Sreethar	5a93277904	replica/table: check memtable before discarding tombstone during read On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-10-18 19:19:58 +05:30
Alexey Novikov	b965729f0a	replica: implement memtable_flush_period_in_ms schema option implement cassandra original schema option memtable_flush_period_in_ms: Milliseconds before memtables associated with the table are flushed. there are few things concerning this patch: * milliseconds look strange and scary for this option. Unlike Cassandra we use 60000ms (1min) minimum value for this option. * This is limitation of Cassandra but it is impossible to set this option for system tables. However sometimes it could be very useful to use automatic flushing for such a tables: some system tables have small traffic and as a result prevent tombstone garbage collection. Fixes #20270 Closes scylladb/scylladb#20999	2024-10-17 13:41:15 +03:00
Benny Halevy	d34878e96c	view: check_needs_view_update_path: get token_metadata_ptr check_needs_view_update_path is async and might yield so the token_metadata reference passed to it must be kept alive throughout the call. Fixes scylladb/scylladb#20979 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20980	2024-10-09 20:56:21 +03:00
Raphael S. Carvalho	93815e0649	replica: Fix tombstone GC during tablet split preparation During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes #20044. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-10-02 11:26:13 -03:00
Botond Dénes	9fe64b5d70	Merge 'Remove datadir string from table::config' from Pavel Emelyanov The datadir keeps path to directory where local sstables can be. The very same information is now kept in table's storage options (#20542). This set fixes the remaining places that still use table::config::datadir and table::dir() and removes the datadir field. Closes scylladb/scylladb#20675 * github.com:scylladb/scylladb: treewide: Remove table::config::datadir distributed_loader: Print storage options, not datadir data_dictionary: Add formatter for storage_options test: Construct table_for_tests with table storage options test: Generalize pair of make_table_for_tests helpers tests: Add helper to get snapshot directory from storage options table: snapshot_exists: Get directory from storage options table: snapshot_on_all_shards: Get directory from storage options	2024-09-26 15:26:45 +03:00
Marcin Maliszkiewicz	258ffbd126	replica: remove unused table_selector forward declaration	2024-09-23 12:01:36 +02:00
Benny Halevy	574a08ed96	storage_service: rebuild: warn about tablets-enabled keyspaces Until we automatically support rebuild for tablets-enabled keyspaces, warn the user about them. The reason this is not an error, is that after increasing RF in a new datacenter, the current procedure is to run `nodetool rebuild` on all nodes in that dc to rebuild the new vnode replicas. This is not required for tablets, since the additional replicas are rebuilt automatically as part of ALTER KS. However, `nodetool rebuild` is also run after local data loss (e.g. due to corruption and removal of sstables). In this case, rebuild is not supported for tablets-enabled keyspaces, as tablet replicas that had lost data may have already been migrated to other nodes, and rebuilding the requested node will not know about it. It is advised to repair all nodes in the datacenter instead. Refs scylladb/scylladb#17575 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20375	2024-09-19 14:25:46 +03:00
Pavel Emelyanov	8487f2fd93	treewide: Remove table::config::datadir It's write-only now, all the places than wanted to know where table's storage is, already use storage_options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Botond Dénes	c7c5817808	Merge 'Improve timestamp heuristics for tombstone garbage collection' from Benny Halevy When purging regular tombstone consult the min_live_timestamp, if available. This is safe since we don't need to protect dead data from resurrection, as it is already dead. For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp, if available, otherwise fallback to the min_live_timestamp. If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data. In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable. If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp. Fixes scylladb/scylladb#20423 Fixes scylladb/scylladb#20424 > [!NOTE] > no backport needed at this time > We may consider backport later on after given some soak time in master/enterprise > since we do see tombstone accumulation in the field under some materialized views workloads Closes scylladb/scylladb#20446 * github.com:scylladb/scylladb: cql-pytest: add test_compaction_tombstone_gc sstable_compaction_test: add mv_tombstone_purge_test sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection sstable_compaction_test: tombstone_purge_test: add testlog debugging sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp sstable, compaction: add debug logging for extended min timestamp stats compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats compaction: define max_purgeable_fn tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh sstables: scylla_metadata: add ext_timestamp_stats compaction_group, storage_group, table_state: add extended timestamp stats getters sstables, memtable: track live timestamps memtable_encoding_stats_collector: update row_marker: do nothing if missing	2024-09-13 08:56:51 +03:00
Piotr Dulikowski	d98708013c	Merge 'view: move view_build_status to group0' from Michael Litvak Migrate the `system_distributed.view_build_status` table to `system.view_build_status_v2`. The writes to the v2 table are done via raft group0 operations. The new parameter `view_builder_version` stored in `scylla_local` indicates whether nodes should use the old or the new table. New clusters use v2. Otherwise, the migration to v2 is initiated by the topology coordinator when the feature is enabled. It reads all the rows from the old table and writes them to the new table, and sets `view_builder_version` to v2. When the change is applied, all view_builder services are updated to write and read from the v2 table. The old table `system_distributed.view_build_status` is set to read virtually from the new table in order to maintain compatibility. When removing a node from the cluster, we remove its rows from the table atomically (fixes https://github.com/scylladb/scylladb/issues/11836). Also, during the migration, we remove all invalid rows. Fixes scylladb/scylladb#15329 dtest https://github.com/scylladb/scylla-dtest/pull/4827 Closes scylladb/scylladb#19745 * github.com:scylladb/scylladb: view: test view_build_status table with node replace test/pylib: use view_build_status_v2 table in wait_for_view view_builder: common write view_build_status function view_builder: improve migration to v2 with intermediate phase view: delete node rows from view_build_status on node removal view: sanitize view_build_status during migration view: make old view_build_status table a virtual table replica: move streaming_reader_lifecycle_policy to header file view_builder: test view_build_status_v2 storage_service: add view_build_status to raft snapshot view_builder: migration to v2 db:system_keyspace: add view_builder_version to scylla_local view_builder: read view status from v2 table view_builder: introduce writing status mutations via raft view_builder: pass group0_client and qp to view_builder view_builder: extract sys_dist status operations to functions db:system_keyspace: add view_build_status_v2 table	2024-09-11 13:02:58 +02:00
Benny Halevy	6f202cf48b	compaction_group, storage_group, table_state: add extended timestamp stats getters To return the minimum live timestamp and live row-marker timestamp across a compaction_group, storage_group, or table_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Pavel Emelyanov	b6f662417c	table: Remove unused database& argument from take_snapshot() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20496	2024-09-10 14:53:06 +03:00
Michael Litvak	09eadcff08	replica: move streaming_reader_lifecycle_policy to header file move the class streaming_reader_lifecycle_policy to a header file in order to make it reusable in other places.	2024-09-05 15:42:35 +03:00
Lakshmi Narayanan Sreethar	84d06a13c7	api: compaction: add `consider_only_existing_data` option Added a new parameter `consider_only_existing_data` to major compaction API endpoints. When enabled, major compaction will: - Force-flush all tables. - Force a new active segment in the commit log. - Compact all existing SSTables and garbage-collect tombstones by only checking the SSTables being compacted. Memtables, commit logs, and other SSTables not part of the compaction will not be checked, as they will only contain newer data that arrived after the compaction started. The `consider_only_existing_data` is passed down to the compaction descriptor's `gc_check_only_compacting_sstables` option to ensure that only the existing data is considered for garbage collection. The option is also passed to the `maybe_flush_commitlog` method to make sure all the tables are flushed and a new active segment is created in the commit log. Fixes #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	5e6bffc146	compaction: rename maybe_flush_all_tables to maybe_flush_commitlog Major compaction flushes all tables as a part of flushing the commitlog. After forcing new active segments in the commitlog, all the tables are flushed to enable reclaim of older commitlog segments. The main goal is to flush the commitlog and flushing all the table is just a dependency. Rename maybe_flush_all_tables to maybe_flush_commitlog so that it reflects the actual intent of the major compaction code. Added a new wrapper method to database::flush_all_tables(), database::flush_commitlog(), that is now called from maybe_flush_commitlog. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Avi Kivity	0acfa4a00d	Merge 'abstract_replication_strategy: make get_ranges async' from Benny Halevy To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Closes scylladb/scylladb#19758 * github.com:scylladb/scylladb: abstract_replication_strategy: make get_ranges async database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param compaction: task_manager_module: open code maybe_get_keyspace_local_ranges alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder alternator: ttl: can pass const gms::gossiper& to ranges_holder alternator: ttl: ranges_holder_primary: unconstify _token_ranges member alternator: ttl: refactor token_ranges_owned_by_this_shard	2024-08-26 16:56:18 +03:00
Benny Halevy	686a8f2939	abstract_replication_strategy: make get_ranges async To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:57:34 +03:00
Benny Halevy	2bbbe2a8bc	database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param Prepare for making the function async. Then, it will need to hold on to the erm while getting the token_ranges asynchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:55:33 +03:00
Benny Halevy	ea5a0cca10	compaction: task_manager_module: open code maybe_get_keyspace_local_ranges It is used only here and can be simplified by checking if the keyspace replication strategy is per table by the caller. Prepare for making get_keyspace_local_ranges async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Pavel Emelyanov	f7b380d53b	database: Export parse_table_directory_name() helper There's parse_table_directory_name() static helper in database.cc code that is used by methods that parse table tree layout for snapshot. Export this helper for external usage and rename to fit the format_... one introduced by previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:48 +03:00
Pavel Emelyanov	33962946fc	database: Introduce format_table_directory_name() helper The one makes table directory (not full path) out of table name and uuid. This is to be symmetrical with yet another helper that converts dirctory name back to table name and uuid (next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:48 +03:00

1 2 3 4 5 ...

464 Commits