scylladb

Author	SHA1	Message	Date
Botond Dénes	d85208a574	replica/database: revert initial boost to system semaphore with set_resources() Unlike the current method (which uses consume()), this will also adjust the initial resources, adjusting the semaphore as if it was created with the reduced amount of resources in the first place. This fixes the confusing 90/100 count resources seen in diagnostics dump outputs.	2022-10-17 07:39:20 +03:00
Pavel Emelyanov	f9b57df471	database: Plug/unplug system_keyspace There's a circular dependency between system_keyspace and database. The former needs the latter because it needs to execula local requests via query_processor. The latter needs the former via compaction manager and large data handler, database depends on both and these too need to insert their entries into system keyspace. To cut this loop the compaction manager and large data handler both get a weak reference on the system keysace. Once system keyspace starts is activcates this reference via the database call. When system keyspace is shutdown-ed on stop, it deactivates the reference. Technically the weak reference is implemented by marking the system_k.s. object as async_sharded_service, and the "reference" in question is the shared_from_this() pointer. When compaction manager or large data handler need to update a system keyspace's table, they both hold an extra reference on the system keyspace until the entry is committed, thus making sure that sys._k.s. doesn't stop from under their feet. At the same time, unplugging the reference on shutdown makes sure that no new entries update will appear and the system_k.s. will eventually be released. It's not a C++ classical reference, because system_keyspace starts after and stops before database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 16:20:59 +03:00
Avi Kivity	20bad62562	Merge 'Detect and record large collections' from Benny Halevy This series adds support for detecting collections that have too many items and recording them in `system.large_cells`. A configuration variable was added to db/config: `compaction_collection_items_count_warning_threshold` set by default to 10000. Collections that have more items than this threshold will be warned about and will be recorded as a large cell in the `system.large_cells` table. Documentation has been updated respectively. A new column was added to system.large_cells: `collection_items`. Similar to the `rows` column in system.large_partition, `collection_items` holds the number of items in a collection when the large cell is a collection, or 0 if it isn't. Note that the collection may be recorded in system.large_cells either due to its size, like any other cell, and/or due to the number of items in it, if it cross the said threshold. Note that #11449 called for a new system.large_collections table, but extending system.large_cells follows the logic of system.large_partitions is a smaller change overall, hence it was preferred. Since the system keyspace schema is hard coded, the schema version of system.large_cells was bumped, and since the change is not backward compatible, we added a cluster feature - `LARGE_COLLECTION_DETECTION` - to enable using it. The large_data_handler large cell detection record function will populate the new column only when the new cluster feature is enabled. In addition, unit tests were added in sstable_3_x_test for testing large cells detection by cell size, and large_collection detection by the number of items. Closes #11449 Closes #11674 * github.com:scylladb/scylladb: sstables: mx/writer: optimize large data stats members order sstables: mx/writer: keep large data stats entry as members db: large_data_handler: dynamically update config thresholds utils/updateable_value: add transforming_value_updater db/large_data_handler: cql_table_large_data_handler: record large_collections db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler db/large_data_handler: cql_table_large_data_handler: move ctor out of line docs: large-rows-large-cells-tables: fix typos db/system_keyspace: add collection_elements column to system.large_cells gms/feature_service: add large_collection_detection cluster feature test: sstable_3_x_test: add test_sstable_too_many_collection_elements test: lib: simple_schema: add support for optional collection column test: lib: simple_schema: build schema in ctor body test: lib: simple_schema: cql: define s1 as static only if built this way db/large_data_handler: maybe_record_large_cells: consider collection_elements db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells sstables: mx/writer: add large_data_type::elements_in_collection db/large_data_handler: get the collection_elements_count_threshold db/config: add compaction_collection_elements_count_warning_threshold test: sstable_3_x_test: add test_sstable_write_large_cell test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random	2022-10-06 18:28:21 +03:00
Benny Halevy	2c4ff71d2b	db: large_data_handler: dynamically update config thresholds make the various large data thresholds live-updateable and construct the observers and updaters in cql_table_large_data_handler to dynamically update the base large_data_handler class threshold members. Fixes #11685 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-05 10:53:40 +03:00
Avi Kivity	37c6b46d26	dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty" The "virtual dirty" term is not very informative. "Virtual" means "not real", but it doesn't say in which way it isn't real. In this case, virtual dirty refers to real dirty memory, minus the portion of memtables that has been written to disk (but not yet sealed - in that case it would not be dirty in the first place). I chose to call "the portion of memtables that has been written to disk" as "spooled memory". At least the unique term will cause people to look it up and may be easier to remember. From that we have "unspooled memory". I plan to further change the accounting to account for spooled memory rather than unspooled, as that is a more natural term, but that is left for later. The documentation, config item, and metrics are adjusted. The config item is practically unused so it isn't worth keeping compatibility here.	2022-10-04 14:03:59 +03:00
Avi Kivity	d02c407769	dirty_memory_manager: rename _virtual_region_group Since we folded _real_region_group into _virtual_region_group, the "virtual" tag makes no sense any more, so remove it.	2022-10-04 14:01:45 +03:00
Avi Kivity	bc2fcf5187	dirty_memory_manager: unscramble terminology Before `95f31f37c1` ("Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity"), we had two region_group objects, one _real_region_group and another _virtual_region_group, each with a set of "soft" and "hard" limits and related functions and members. In `95f31f37c1`, we merged _real_region_group into _virtual_region_group, but unfortunately the _real_region_group members received the "hard" prefix when they got merged. This overloads the meaning of "hard" - is it related to soft/hard limit or is it related to the real/virtual distinction? This patch applied some renaming to restore consistency. Anything that came from _virtual_region_group now has "virtual" in its name. Anything that came from _real_region_group now has "real" in its name. The terms are still pretty bad but at least they are consistent.	2022-10-04 13:56:28 +03:00
Benny Halevy	3f8bba202f	db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler For recording collection_elements of large_collections when the large_collection_detection feature is enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:10 +03:00
Benny Halevy	a107f583fd	db/large_data_handler: get the collection_elements_count_threshold Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:11 +03:00
Botond Dénes	95f31f37c1	Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity region_group evolved as a tree, each node of which contains some regions (memtables). Each node has some constraints on memory, and can start flushing and/or stop allocation into its memtables and those below it when those constraints are violated. Today, the tree has exactly two nodes, only one of which can hold memtables. However, all the complexity of the tree remains. This series applies some mechanical code transformations that remove the tree structure and all the excess functionality, leaving a much simpler structure behind. Before: - a tree of region_group objects - each with two parameters: soft limit and hard limit - but only two instances ever instantiated After: - a single region_group object - with three parameters - two from the bottom instance, one from the top instance Closes #11570 * github.com:scylladb/scylladb: dirty_memory_manager: move third memory threshold parameter of region_group constructor to reclaim_config dirty_memory_manager: simplify region_group::update() dirty_memory_manager: fold region_group::notify_hard_pressure_relieved into its callers dirty_memory_manager: clean up region_group::do_update_hard_and_check_relief() dirty_memory_manager: make do_update_hard_and_check_relief() a member of region_group dirty_memory_manager: remove accessors around region_group::_under_hard_pressure dirty_memory_manager: merge memory_hard_limit into region_group dirty_memory_manager: rename members in memory_hard_limit dirty_memory_manager: fold do_update() into region_group::update() dirty_memory_manager: simplify memory_hard_limit's do_update dirty_memory_manager: drop soft limit / soft pressure members in memory_hard_limit dirty_memory_manager: de-template do_update(region_group_or_memory_hard_limit) dirty_memory_manager: adjust soft_limit threshold check dirty_memory_manager: drop memory_hard_limit::_name dirty_memory_manager: simplify memory_hard_limit configuration dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} dirty_memory_manager: stop inheriting from region_group_reclaimer dirty_memory_manager: test: unwrap region_group_reclaimer dirty_memory_manager: change region_group_reclaimer configuration to a struct dirty_memory_manager: convert region_group_reclaimer to callbacks dirty_memory_manager: consolidate region_group_reclaimer constructors dirty_memory_manager: rename {memory_hard_limit,region_group}::notify_relief dirty_memory_manager: drop unused parameter to memory_hard_limit constructor dirty_memory_manager: drop memory_hard_limit::shutdown() dirty_memory_manager: split region_group hierarchy into separate classes dirty_memory_manager: extract code block from region_group::update dirty_memory_manager: move more allocation_queue functions out of region_group dirty_memory_manager: move some allocation queue related function definitions outside class scope dirty_memory_manager: move region_group::allocating_function and related classes to new class allocation_queue dirty_memory_manager: remove support for multiple subgroups	2022-10-03 13:22:47 +03:00
Avi Kivity	17b1cb4434	dirty_memory_manager: move third memory threshold parameter of region_group constructor to reclaim_config Place it along the other parameters.	2022-09-30 22:17:37 +03:00
Avi Kivity	6a02bb7c2b	dirty_memory_manager: merge memory_hard_limit into region_group The two classes always have a 1:1 or 0:1 relationship, and so we can just move all the members of memory_hard_limit into region_group, with the functions that track the relationship (memory_hard_limit::{add,del}()) removed. The 0:1 relationship is maintained by initializing the hard limit parameter with std::numeric_limits<size_t>::max(). The _hard_total_memory variable is always checked if it is greater than this parameter in order to do anything, and with this default it can never be.	2022-09-30 21:59:38 +03:00
Benny Halevy	d32c497cd9	database: automatically take snapshot of base table views The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 11:02:54 +03:00
TarasBor	1f4a93da78	Show warn message if `tombstone_warn_threshold` reached on querier. When querier read page with tombstones more than `tombstone_warn_threshold` limit - warning message appeared in logs. If `tombstone_warn_threshold:0` feature disabled. Refs scylladb#11410	2022-09-22 16:42:31 +03:00
Avi Kivity	b9eb26cd77	dirty_memory_manager: drop memory_hard_limit::_name It's write-only.	2022-09-22 14:01:57 +03:00
Avi Kivity	c64fb66cc3	dirty_memory_manager: simplify memory_hard_limit configuration We observe that memory_hard_limit's reclaim_config is only ever initialized as default, or with just the hard_limit parameter. Since soft_limit defaults to hard_limit, we can collapse the two into a limit. The reclaim callbacks are always left as the default no-op functions, so we can eliminate them too. This fits with memory_hard_limit only being responsible for the hard limit, and for it not having any memtables to reclaim on its own.	2022-09-22 13:56:59 +03:00
Avi Kivity	2f907dc47d	dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} region_group_reclaimer is used to initialize (by reference) instances of memory_hard_limit and region_group. Now that it is a final class, we can fold it into its users by pasting its contents into those users, and using the initializer (reclaim_config) to initialize the users. Note there is a 1:1 relationship between a region_group_reclaimer instance and a {memory_hard_limit,region_group} instance. It may seem like code duplication to paste the contents of one class into two, but the two classes use region_group_reclaimer differently, and most of the code is just used to glue different classes together, so the next patches will be able to get rid of much of it. Some notes: - no_reclaimer was replaced by a default reclaim_config, as that's how no_reclaimer was initialized - all members were added as private, except when a caller required one to be public - an under_presssure() member already existed, forwarding to the reclaimer; this was just removed.	2022-09-22 13:56:59 +03:00
Avi Kivity	d8f857e74b	dirty_memory_manager: stop inheriting from region_group_reclaimer This inheritance makes it harder to get rid of the class. Since there are no longer any virtual functions in the class (apart from the destructor), we can just convert it to a data member. In a few places, we need forwarding functions to make formerly-inherited functions visible to outside callers. The virtual destructor is removed and the class is marked final to verify it is no longer a base class anywhere.	2022-09-22 13:56:59 +03:00
Avi Kivity	1d3508e02c	dirty_memory_manager: change region_group_reclaimer configuration to a struct It's just so much nicer. The "threshold" limit was renamed to "hard_limit" to contrast it with "soft_limit" (in fact threshold is a good name for soft_limit, since it's a point where the behavior begins to change, but that's too much of a change).	2022-09-22 13:56:59 +03:00
Avi Kivity	2c54c7d51e	dirty_memory_manager: convert region_group_reclaimer to callbacks region_group_reclaimer is partially policy (deciding when to reclaim) and partially mechanism (implementing reclaim via virtual functions). Move the mechanism to callbacks. This will make it easy to fold the policy part into region_group and memory_hard_limit. This folding is expected to simplify things since most of region_group_reclaimer is cross-class communication.	2022-09-22 13:56:59 +03:00
Avi Kivity	a72ac14154	dirty_memory_manager: drop unused parameter to memory_hard_limit constructor	2022-09-22 13:56:59 +03:00
Avi Kivity	fca5689052	dirty_memory_manager: drop memory_hard_limit::shutdown() It is empty.	2022-09-22 13:56:59 +03:00
Botond Dénes	05ef13a627	Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted. The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading. The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run. The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold. What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned. Closes #11233 * github.com:scylladb/scylladb: test: Add test for large partition splitting on compaction compaction: Add support to split large partitions sstable: Extend sstable_run to allow disjointness on the clustering level sstables: simplify will_introduce_overlapping() test: move sstable_run_disjoint_invariant_test into sstable_datafile_test test: lib: Fix inefficient merging of mutations in make_sstable_containing() sstables: Keep track of first partition's first pos and last partition's last pos sstables: Rename min/max position_range to a descriptive name sstables_manager: Add sstable metadata reader concurrency semaphore sstables: Add ability to find first or last position in a partition	2022-09-15 16:08:56 +03:00
Raphael S. Carvalho	e099a9bf3b	sstables_manager: Add sstable metadata reader concurrency semaphore Let's introduce a reader_concurrency_semaphore for reading sstable metadata, to avoid an OOM due to unlimited concurrency. The concurrency on startup is not controlled, so it's important to enforce a limit on the amount of memory used by the parallel readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Michał Chojnowski	9b6fc553b4	db: commitlog: don't print INFO logs on shutdown The intention was for these logs to be printed during the database shutdown sequence, but it was overlooked that it's not the only place where commitlog::shutdown is called. Commitlogs are started and shut down periodically by hinted handoff. When that happens, these messages spam the log. Fix that by adding INFO commitlog shutdown logs to database::stop, and change the level of the commitlog::shutdown log call to DEBUG. Fixes #11508 Closes #11536	2022-09-14 11:30:53 +03:00
Benny Halevy	3b0147390b	replica: database: get_tombstone_gc_state from compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	5dd15aa3c8	tombstone_gc: introduce tombstone_gc_state and use it to access the repair history maps. At this introductory patch, we use default-constructed tombstone_gc_state to access the thread-local maps temporarily and those use sites will be replaced in following patches that will gradually pass the tombstone_gc_state down from the compaction_manager to where it's used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	3d88fe9729	database: do not drop_repair_history_map_for_table in detach_column_family drop_repair_history_map_for_table is called on each shard when database::truncate is done, and the table is stopped. dropping it before the table is stopped is too early. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Botond Dénes	7d17d675af	utils/logalloc: move global stat accessors to tracker These are pretend free functions, accessing globals in the background, make them a member of the tracker instead, which everything needed locally to compute them. Callers still have to access these stats through the global tracker instance, but this can be changed to happen through a local instance. Soon....	2022-08-23 10:38:58 +03:00
Botond Dénes	2b1eb6e284	database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() Instead of querier_cache::evict_all_for_table(). The new method cover all queriers and in addition any other inactive reads registered on the semaphore. In theory by the time we detach a table, no regular inactive reads should be in the semaphore anymore, but if there is any still, we better evict them before the table is destroyed, they might attempt to access it in when destroyed later.	2022-08-15 14:16:41 +03:00
Avi Kivity	e9cbc9ee85	Merge 'Add support for empty replica pages' from Botond Dénes Many tombstones in a partition is a problem that has been plaguing queries since the inception of Scylla (and even before that as they are a pain in Apache Cassandra too). Tombstones don't count towards the query's page limit, neither the size nor the row number one. Hence, large spans of tombstones (be that row- or range-tombstones) are problematic: the query can time out while processing this span of tombstones, as it waits for more live rows to fill the page. In the extreme case a partition becomes entirely unreadable, all read attempts timing out, until compaction manages to purge the tombstones. The solution proposed in this PR is to pass down a tombstone limit to replicas: when this limit is reached, the replica cuts the page and marks it as short one, even if the page is empty currently. To make this work, we use the last-position infrastructure added recently by `3131cbea62`, so that replicas can provide the position of the last processed item to continue the next page from. Without this no forward progress could be made in the case of an empty page: the query would continue from the same position on the next page, having to process the same span of tombstones. The limit can be configured with the newly added `query_tombstone_limit` configuration item, defaulted to 10000. The coordinator will pass this to the newly added `tombstone_limit` field of `read_command`, if the `replica_empty_pages` cluster feature is set. Upgrade sanity test was conducted as following: * Created cluster of 3 nodes with RF=3 with master version * Wrote small dataset of 1000 rows. * Deleted prefix of 980 rows. * Started read workload: `scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100` * Also did some manual queries via `cqlsh` with smaller page size and tracing on. * Stopped and upgraded each node one-by-one. New nodes were started by `--query-tombstone-page-limit=10`. * Confirmed there are no errors or read-repairs. Perf regression test: ``` build/release/test/perf/perf_simple_query_g -c1 -m2G --concurrency=1000 --task-quota-ms 10 --duration=60 ``` Before: ``` median 133665.96 tps ( 62.0 allocs/op, 12.0 tasks/op, 43007 insns/op, 0 errors) median absolute deviation: 973.40 maximum: 135511.63 minimum: 104978.74 ``` After: ``` median 129984.90 tps ( 62.0 allocs/op, 12.0 tasks/op, 43181 insns/op, 0 errors) median absolute deviation: 2979.13 maximum: 134538.13 minimum: 114688.07 ``` Diff: +~200 instruction/op. Fixes: https://github.com/scylladb/scylla/issues/7689 Fixes: https://github.com/scylladb/scylla/issues/3914 Fixes: https://github.com/scylladb/scylla/issues/7933 Refs: https://github.com/scylladb/scylla/issues/3672 Closes #11053 * github.com:scylladb/scylladb: test/cql-pytest: add test for query tombstone page limit query-result-writer: stop when tombstone-limit is reached service/pager: prepare for empty pages service/storage_proxy: set smallest continue pos as query's continue pos service/storage_proxy: propagate last position on digest reads query: result_merger::get() don't reset last-pos on short-reads and last pages query: add tombstone-limit to read-command service/storage_proxy: add get_tombstone_limit() query: add tombstone_limit type db/config: add config item for query tombstone limit gms: add cluster feature for empty replica pages tree: don't use query::read_command's IDL constructor	2022-08-10 13:38:06 +03:00
Avi Kivity	be44fd63f9	Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy This series converts the synchronous `effective_replication_map::get_range_addresses` to async by calling the replication strategy async entry point with the same name, as its callers are already async or can be made so easily. To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map, let the callers of this patch hold a effective_replication_map per keyspace and pass it down to the (now asynchronous) functions that use it (making affected storage_service methods static where possible if they no longer depend on the storage_service instance). Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate that is true for local_strategy and everywhere_replication_strategy, and is false otherwise. With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow. Refs #11005 Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping) Closes #11009 * github.com:scylladb/scylladb: effective_replication_map: make get_range_addresses asynchronous range_streamer: add_ranges and friends: get erm as param storage_service: get_new_source_ranges: get erm as param storage_service: get_changed_ranges_for_leaving: get erm as param storage_service: get_ranges_for_endpoint: get erm as param repair: use get_non_local_strategy_keyspaces_erms database: add get_non_local_strategy_keyspaces_erms database: add get_non_local_strategy_keyspaces storage_service: coroutinize update_pending_ranges effective_replication_map: add get_replication_strategy effective_replication_map: get_range_addresses: use the precalculated replication_map abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies abstract_replication_strategy: reindent utils: sequenced_set: expose set and `contains` method abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set utils: sequenced_set: templatize VectorType utils: sanitize sequenced_set utils: sequenced_set: delete mutable get_vector method	2022-08-09 13:25:53 +03:00
Botond Dénes	1b669cefed	service/storage_proxy: add get_tombstone_limit() To be used by coordinator side code to determine the correct tombstone limit to pass to read-command (tombstone limit field added in the next commit). When this limit is non-zero, the replica will start cutting pages after the tombstone limit is surpassed. This getter works similarly to `get_max_result_size()`: if the cluster feature for empty replica pages is set, it will return the value configured via db::config::query_tombstone_limit. System queries always use a limit of 0 (unlimited tombstones).	2022-08-09 10:00:40 +03:00
Benny Halevy	db5c5ca59e	database: add get_non_local_strategy_keyspaces_erms To be used for getting a coheret set of all keyspaces with non-local replication strategy and their respective effective_replication_map. As an example, use it in this patch in storage_service::update_pending_ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	7ee6048255	database: add get_non_local_strategy_keyspaces For node operations, we currently call get_non_system_keyspaces but really want to work on all keyspace that have non-local replication strategy as they are replicated on other nodes. Reflect that in the replica::database function name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	c71ef330b2	query-request, everywhere: define and use query_id as a strong type Define query_id as a tagged_uuid So it can be differentiated from other uuid-class types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:13:28 +03:00
Benny Halevy	2b017ce285	schema, everywhere: define and use table_schema_version as a strong type Define table_schema_version as a distinct tagged_uuid class, So it can be differentiated from other uuid-class types, in particular table_id. Added reversed(table_schema_version) for convenience and uniformity since the same logic is currently open coded in several places. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:45 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Benny Halevy	37b7a9cce2	utils: get rid of joinpoint Now that it is no longer used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	56f336d1aa	database: get rid of timestamp_func Pass an optional truncated_at time_point to truncate_table_on_all_shards instead of the over-complicated timestamp_func that returns the same time_point on all shards anyhow, and was only used for coordination across shards. Since now we synchronize the internal execution phase in truncate_table_on_all_shards, there is no longer need for this timestamp_func. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	b640c4fd17	database: truncate: snapshot table in all-shards layer With that the database layer does no longer need to invoke the private table::snapshot function, so it can be defriended from class table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	af0c71aa12	database: truncate: flush table and views in all-shards layer Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	6e07e6b7ac	database: truncate: stop and disable compaction in all-shards layer Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	e78dad1dfb	database: truncate: move call to set_low_replay_position_mark to all-shards layer Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	a8bd3d97b6	database: truncate: enter per-shard table async_gate in all-shards layer Start moving the per-shard state establishment logic to truncate_table_on_all_shards, so that we would evetually do only the truncate logic per-se in the per-shard truncate function. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	ff028316f2	database: truncate: move check for schema_tables keyspace to all-shards layer. Now that the per-shard truncate function is called only from truncate_table_on_all_shards, we can reject the schema_tables keyspace in the upper layer. There's no need to check that on each shard. While at it, reuse `is_system_keyspace`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	fbe1fa1370	database: snapshot_table_on_all_shards: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	4d4ca40c38	table: add snapshot_on_all_shards Called from the respective database entry points. Will be called also from the database drop / truncate path and will be used for central coordination of per-shard table::snapshot so we don't have to depend on the snapshot_manager mechanism that is fragile and currently causes abort if we fail to allocate it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	be56a73e78	database: add snapshot_table_on_all_shards We need to snapshot a single table in several paths. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	d96b56fee2	database: rename {flush,snapshot}_on_all and make static Follow the convention of drop_table_on_all_shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00

1 2 3 4

166 Commits