scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 13:45:53 +00:00

Author	SHA1	Message	Date
Avi Kivity	bc2fcf5187	dirty_memory_manager: unscramble terminology Before `95f31f37c1` ("Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity"), we had two region_group objects, one _real_region_group and another _virtual_region_group, each with a set of "soft" and "hard" limits and related functions and members. In `95f31f37c1`, we merged _real_region_group into _virtual_region_group, but unfortunately the _real_region_group members received the "hard" prefix when they got merged. This overloads the meaning of "hard" - is it related to soft/hard limit or is it related to the real/virtual distinction? This patch applied some renaming to restore consistency. Anything that came from _virtual_region_group now has "virtual" in its name. Anything that came from _real_region_group now has "real" in its name. The terms are still pretty bad but at least they are consistent.	2022-10-04 13:56:28 +03:00
Botond Dénes	95f31f37c1	Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity region_group evolved as a tree, each node of which contains some regions (memtables). Each node has some constraints on memory, and can start flushing and/or stop allocation into its memtables and those below it when those constraints are violated. Today, the tree has exactly two nodes, only one of which can hold memtables. However, all the complexity of the tree remains. This series applies some mechanical code transformations that remove the tree structure and all the excess functionality, leaving a much simpler structure behind. Before: - a tree of region_group objects - each with two parameters: soft limit and hard limit - but only two instances ever instantiated After: - a single region_group object - with three parameters - two from the bottom instance, one from the top instance Closes #11570 * github.com:scylladb/scylladb: dirty_memory_manager: move third memory threshold parameter of region_group constructor to reclaim_config dirty_memory_manager: simplify region_group::update() dirty_memory_manager: fold region_group::notify_hard_pressure_relieved into its callers dirty_memory_manager: clean up region_group::do_update_hard_and_check_relief() dirty_memory_manager: make do_update_hard_and_check_relief() a member of region_group dirty_memory_manager: remove accessors around region_group::_under_hard_pressure dirty_memory_manager: merge memory_hard_limit into region_group dirty_memory_manager: rename members in memory_hard_limit dirty_memory_manager: fold do_update() into region_group::update() dirty_memory_manager: simplify memory_hard_limit's do_update dirty_memory_manager: drop soft limit / soft pressure members in memory_hard_limit dirty_memory_manager: de-template do_update(region_group_or_memory_hard_limit) dirty_memory_manager: adjust soft_limit threshold check dirty_memory_manager: drop memory_hard_limit::_name dirty_memory_manager: simplify memory_hard_limit configuration dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} dirty_memory_manager: stop inheriting from region_group_reclaimer dirty_memory_manager: test: unwrap region_group_reclaimer dirty_memory_manager: change region_group_reclaimer configuration to a struct dirty_memory_manager: convert region_group_reclaimer to callbacks dirty_memory_manager: consolidate region_group_reclaimer constructors dirty_memory_manager: rename {memory_hard_limit,region_group}::notify_relief dirty_memory_manager: drop unused parameter to memory_hard_limit constructor dirty_memory_manager: drop memory_hard_limit::shutdown() dirty_memory_manager: split region_group hierarchy into separate classes dirty_memory_manager: extract code block from region_group::update dirty_memory_manager: move more allocation_queue functions out of region_group dirty_memory_manager: move some allocation queue related function definitions outside class scope dirty_memory_manager: move region_group::allocating_function and related classes to new class allocation_queue dirty_memory_manager: remove support for multiple subgroups	2022-10-03 13:22:47 +03:00
Avi Kivity	6a02bb7c2b	dirty_memory_manager: merge memory_hard_limit into region_group The two classes always have a 1:1 or 0:1 relationship, and so we can just move all the members of memory_hard_limit into region_group, with the functions that track the relationship (memory_hard_limit::{add,del}()) removed. The 0:1 relationship is maintained by initializing the hard limit parameter with std::numeric_limits<size_t>::max(). The _hard_total_memory variable is always checked if it is greater than this parameter in order to do anything, and with this default it can never be.	2022-09-30 21:59:38 +03:00
Avi Kivity	45ab24e43d	dirty_memory_manager: rename members in memory_hard_limit In preparation for merging memory_hard_limit into region_group, disambiguate similarly named members by adding the word "hard" in random places. memory_hard_limit and region_group are candidates for merging because they constantly reference each other, and memory_hard_limit does very little by itself.	2022-09-30 21:47:33 +03:00
Botond Dénes	060dda8e00	Merge 'Reduce dependencies on large data handler header' from Benny Halevy Reduce the false dependencies on db/large_data_handler.hh by not including it from commonly used header files, and rather including it only in the source files that actually need it. The is in preparation for https://github.com/scylladb/scylladb/issues/11449 Closes #11654 * github.com:scylladb/scylladb: test: lib: do not include db/large_data_handler.hh in test_service.hh test: lib: move sstable test_env::impl ctor out of line sstables: do not include db/large_data_handler.hh in sstables.hh api/column_family: add include db/system_keyspace.hh	2022-09-30 13:27:38 +03:00
Benny Halevy	776b009c0f	test: lib: do not include db/large_data_handler.hh in test_service.hh It was needed for defining and referencing nop_lp_handler and in sstable_3_x_test for testing the large_data_handler. Remove the include from the commonly used header file to reduce the false dependencies on large_data_handler.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 18:36:16 +03:00
Benny Halevy	d32c497cd9	database: automatically take snapshot of base table views The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 11:02:54 +03:00
Avi Kivity	2f907dc47d	dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} region_group_reclaimer is used to initialize (by reference) instances of memory_hard_limit and region_group. Now that it is a final class, we can fold it into its users by pasting its contents into those users, and using the initializer (reclaim_config) to initialize the users. Note there is a 1:1 relationship between a region_group_reclaimer instance and a {memory_hard_limit,region_group} instance. It may seem like code duplication to paste the contents of one class into two, but the two classes use region_group_reclaimer differently, and most of the code is just used to glue different classes together, so the next patches will be able to get rid of much of it. Some notes: - no_reclaimer was replaced by a default reclaim_config, as that's how no_reclaimer was initialized - all members were added as private, except when a caller required one to be public - an under_presssure() member already existed, forwarding to the reclaimer; this was just removed.	2022-09-22 13:56:59 +03:00
Avi Kivity	d8f857e74b	dirty_memory_manager: stop inheriting from region_group_reclaimer This inheritance makes it harder to get rid of the class. Since there are no longer any virtual functions in the class (apart from the destructor), we can just convert it to a data member. In a few places, we need forwarding functions to make formerly-inherited functions visible to outside callers. The virtual destructor is removed and the class is marked final to verify it is no longer a base class anywhere.	2022-09-22 13:56:59 +03:00
Avi Kivity	26f3a123a5	dirty_memory_manager: test: unwrap region_group_reclaimer In one test, region_group_reclaimer is wrapped in another class just to toggle a bool, but with the new callbacks it's easy to just use a bool instead.	2022-09-22 13:56:59 +03:00
Avi Kivity	1d3508e02c	dirty_memory_manager: change region_group_reclaimer configuration to a struct It's just so much nicer. The "threshold" limit was renamed to "hard_limit" to contrast it with "soft_limit" (in fact threshold is a good name for soft_limit, since it's a point where the behavior begins to change, but that's too much of a change).	2022-09-22 13:56:59 +03:00
Avi Kivity	2c54c7d51e	dirty_memory_manager: convert region_group_reclaimer to callbacks region_group_reclaimer is partially policy (deciding when to reclaim) and partially mechanism (implementing reclaim via virtual functions). Move the mechanism to callbacks. This will make it easy to fold the policy part into region_group and memory_hard_limit. This folding is expected to simplify things since most of region_group_reclaimer is cross-class communication.	2022-09-22 13:56:59 +03:00
Avi Kivity	152136630c	dirty_memory_manager: split region_group hierarchy into separate classes Currently, region_group forms a hierarchy. Originally it was a tree, but previous work whittled it down to a parent-child relationship (with a single, possible optional parent, and a single child). The actual behavior of the parent and child are very different, so it makes sense to split them. The main difference is that the parent does not contain any regions (memtables), but the child does. This patch mechanically splits the class. The parent is named memory_hard_limit (reflecting its role to prevent lsa allocation above the memtable configured hard limit). The child is still named region_group. Details of the transformation: - each function or data member in region_group is either moved to memory_hard_limit, duplicated in memory_hard_limit, or left in region_group. - the _regions and _blocked_requests members, which were always empty in the parent, were not duplicated. Any member that only accessed them was similarly left alone. - the "no_reclaimer" static member which was only used in the parent was moved there. Similarly the constructor which accepted it was moved. - _child was moved to the parent, and _parent was kept in the child (more or less the defining change of the split) Similarly add(region_group) and del(region_group) (which manage _child) were moved. - do_for_each_parent(), which iterated to the top of the tree, was removed and its callers manually unroll the loop. For the parent, this is just a single iteration (since we're iterating towards the root), for the child, this can be two iterations, but the second one is usually simpler since the parent has many members removed. - do_update(), introduced in the previous patch, was made a template that can act on either the parent or the child. It will be further simplified later. - some tests that check now-impossible topologies were removed. - the parent's shutdown() is trivial since it has no _blocked_requests, but it was kept to reduce churn in the callers.	2022-09-22 13:56:59 +03:00
Avi Kivity	d21d2cdb3e	dirty_memory_manager: remove support for multiple subgroups We only have one parent/child relationship in the region group hierarchy, so support for more is unneeded complexity. Replace the subgroup vector with a single pointer, and delete a test for the removed functionality.	2022-09-22 13:56:59 +03:00
Avi Kivity	2cec417426	Merge 'tools: use the standard allocator' from Botond Dénes Tools want to be as little disrupting to the environment they run in as possible, because they might be run in a production environment, next to a running scylladb production server. As such, the usual behavior of seastar applications w.r.t. memory is an anti-pattern for tools: they don't want to reserve most of the system memory, in fact they don't want to reserve any amount, instead consuming as much as needed on-demand. To achieve this, tools want to use the standard allocator. To achieve this they need a seastar option to to instruct seastar to not configure and use the seastar allocator and they need LSA to cooperate with the standard allocator. The former is provided by https://github.com/scylladb/seastar/pull/1211. The latter is solved by introducing the concept of a `segment_store_backend`, which abstracts away how the memory arena for segments is acquired and managed. We then refactor the existing segment store so that the seastar allocator specific parts are moved to an implementation of this backend concept, then we introduce another backend implementation appropriate to the standard allocator. Finally, tools configure seastar with the newly introduced option to use the standard allocator and similarly configure LSA to use the standard allocator appropriate backend. Refs: https://github.com/scylladb/scylladb/issues/9882 This is the last major code piece in scylla for making tools production ready. Closes #11510 * github.com:scylladb/scylladb: test/boost: add alternative variant of logalloc test tools: use standard allocator utils/logalloc: add use_standard_allocator_segment_pool_backend() utils/logalloc: introduce segment store backend for standard allocator utils/logalloc: rebase release segment-store on segment-store-backend utils/logalloc: introduce segment_store_backend utils/logalloc: push segment alloc/dealloc to segment_store test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe	2022-09-20 12:59:34 +03:00
Botond Dénes	22128977e4	test/boost: add alternative variant of logalloc test Which intializes LSA with use_standard_allocator_segment_pool_backend() running the logalloc_test suite on the standard allocator segment pool backend. To avoid duplicating the test code, the new test-file pulls in the test code via #include. I'm not proud of it, but it works and we test LSA with both the debug and standard memory segment stores without duplicating code.	2022-09-16 14:57:23 +03:00
Botond Dénes	e82ea2f3ad	test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe Said test creates two vectors, the vector storage being allocated with the default allocator, while its content being allocated on LSA. If an exception is thrown however, both are freed via the default allocator, triggering an assert in LSA code. Move the cleanup into a `defer()` so the correct cleanup sequence is executed even on exceptions.	2022-09-16 12:16:57 +03:00
Botond Dénes	05ef13a627	Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted. The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading. The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run. The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold. What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned. Closes #11233 * github.com:scylladb/scylladb: test: Add test for large partition splitting on compaction compaction: Add support to split large partitions sstable: Extend sstable_run to allow disjointness on the clustering level sstables: simplify will_introduce_overlapping() test: move sstable_run_disjoint_invariant_test into sstable_datafile_test test: lib: Fix inefficient merging of mutations in make_sstable_containing() sstables: Keep track of first partition's first pos and last partition's last pos sstables: Rename min/max position_range to a descriptive name sstables_manager: Add sstable metadata reader concurrency semaphore sstables: Add ability to find first or last position in a partition	2022-09-15 16:08:56 +03:00
Kamil Braun	728161003a	Merge 'raft server, abort on background errors' from Gusev Petr Halted background fibers render raft server effectively unusable, so report this explicitly to the clients. Fix: #11352 Closes #11370 * github.com:scylladb/scylladb: raft server, status metric raft server, abort group0 server on background errors raft server, provide a callback to handle background errors raft server, check aborted state on public server public api's	2022-09-15 14:12:11 +02:00
Raphael S. Carvalho	20a6483678	test: Add test for large partition splitting on compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:23:19 -03:00
Raphael S. Carvalho	4bc24acf81	sstable: Extend sstable_run to allow disjointness on the clustering level After commit `0796b8c97a`, sstable_run won't accept a fragment that introduces key overlapping. But once we split large partitions, fragments in the same run may store disjoint clustering ranges of the same partition. So we're extending sstable_run to look at clustering dimension, so fragments storing disjoint clustering ranges of the same large partition can co-exist in the same run. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	13942ec947	test: move sstable_run_disjoint_invariant_test into sstable_datafile_test That's where it belongs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	5937765009	sstables: Keep track of first partition's first pos and last partition's last pos With first partition's first position and last partition's last partition, we'll be able to determine which fragments composing a sstable run store a large partition that was split. Then sstable run will be able to detect if all fragments storing a given large partition are disjoint in the clustering level. Fixes #10637. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	e099a9bf3b	sstables_manager: Add sstable metadata reader concurrency semaphore Let's introduce a reader_concurrency_semaphore for reading sstable metadata, to avoid an OOM due to unlimited concurrency. The concurrency on startup is not controlled, so it's important to enforce a limit on the amount of memory used by the parallel readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	9bcad9ffa8	sstables: Add ability to find first or last position in a partition This new method allows sstable to load the first row of the first partition and last row of last partition. That's useful for incremental reading of sstable run which will be split at clustering boundary. To get the first row, it consumes the first row (which can be either a clustering row or range tombstone change) and returns its position_in_partition. To get the last row, it does the same as above but in reverse mode instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:48 -03:00
Petr Gusev	4ff0807cd0	raft server, status metric	2022-09-13 19:34:22 +04:00
Nadav Har'El	8ece63c433	Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails' This series introduces two configurable options when working with TWCS tables: - `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting - `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check. Refs: #6923 Fixes: #9029 Closes #11445 * github.com:scylladb/scylladb: tests: cql_query_test: add mixed tests for verifying TWCS guard rails tests: cql_query_test: add test for TWCS window size tests: cql_query_test: add test for TWCS tables with no TTL defined cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables cql: add max window restriction for TimeWindowCompactionStrategy time_window_compaction_strategy: reject invalid window_sizes cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr	2022-09-12 23:55:51 +03:00
Botond Dénes	9db940ff1b	Merge "Make network_topology_strategy_test use topology" from Pavel Emelyanov " The test in question plays with snitches to simulate the topology over which tokens are spread. This set replaces explicit snitch usage with temporary topology object. Some snitch traces are still left, but those are for token_metadata internal which still call global snitch for DC/RACK. " * 'br-tests-use-topology-not-snitch' of https://github.com/xemul/scylla: network_topology_strategy_test: Use topology instead of snitch network_topology_strategy_test: Populate explicit topology	2022-09-12 09:40:17 +03:00
Avi Kivity	6c797587c7	dirty_memory_manager: region_group: remove sorting of subgroups dirty_memory_manager tracks lsa regions (memtables) under region_group:s, in order to be able to pick up the largest memtable as a candidate for flushing. Just as region_group:s contain regions, they can also contain other region_group:s in a nested structure. It also tracks the nested region_group that contains the largest region in a binomial heap. This latter facility is no longer used. It saw use when we had the system dirty_memory_manager nested under the user dirty_memory_manager, but that proved too complicated so it was undone. We still nest a virtual region_group under the real region_group, and in fact it is the virtual region_group that holds the memtables, but it is accessed directly to find the largest memtable (region_group::get_largest_region) and so all the mechanism that sorts region_group:s is bypassed. Start to dismantle this house of cards by removing the subgroup sorting. Since the hierarchy has exactly one parent and one child, it's clearly useless. This is seen by the fact that we can just remove everything related. We still need the _subgroups member to hold the virtual region_group; it's replaced by a vector. I verified that the non-intrusive vector is exception safe since push_back() happens at the very end; in any case this is early during setup where we aren't under memory pressure. A few tests that check the removed functionality are deleted. Closes #11515	2022-09-12 09:29:08 +03:00
Petr Gusev	1b5fa4088e	raft server, abort group0 server on background errors	2022-09-12 10:16:43 +04:00
Felipe Mendes	6a3d8607b4	tests: cql_query_test: add mixed tests for verifying TWCS guard rails This patch adds set of 10 cenarios that have been unveiled during additional testing. In particular, most of the scenarios cover ALTER TABLE statements, which - if not handled - may break the guardrails safe-mode. The situations covered are: - STCS->TWCS with no TTL defined - STCS->TWCS with small TTL - STCS->TWCS with large TTL value - TWCS table with small to large TTL - No TTL TWCS to large TTL and then small TTL - twcs_max_window_count LiveUpdate - Decrease TTL - twcs_max_window_count LiveUpdate - Switch CompactionStrategy - No TTL TWCS table to STCS - Large TTL TWCS table, modify attribute other than compaction and default_time_to_live - Large TTL STCS table, fail to switch to TWCS with no TTL explicitly defined	2022-09-11 17:57:14 -03:00
Felipe Mendes	a7a91e3216	tests: cql_query_test: add test for TWCS window size This patch adds a test for checking the validity of tables using TimeWindowCompactionStrategy with an incorrect number of compaction windows. The twcs_max_window_count LiveUpdate-able parameter is also disabled during the execution of the test in order to ensure that users can effectively disable the enforcement, should they want.	2022-09-11 17:38:25 -03:00
Felipe Mendes	1c5d46877e	tests: cql_query_test: add test for TWCS tables with no TTL defined This patch adds a testcase for TimeWindowCompactionStrategy tables created with no default_time_to_live defined. It makes use of the LiveUpdate-able restrict_twcs_default_ttl parameter in order to determine whether TWCS tables without TTL should be forbidden or not. The test replays all 3 possible variations of the tri_mode_restriction and verifies tables are correctly created/altered according to the current setting on the replica which receives the request.	2022-09-11 16:55:46 -03:00
Raphael S. Carvalho	f5715d3f0b	replica: Move memtables to compaction_group Now memtables live in compaction_group. Also introduced function that selects group based on token, but today table always return the single group managed by it. Once multiple groups are supported, then the function should interpret token content to select the group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	6717d96684	replica: move maintenance SSTable set to compaction_group This commit is restricted to moving maintenance set into compaction_group. Next, we'll introduce compound set into it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	65414e6756	test: sstable_compaction_test: Don't reference main sstable set directly Preparatory change for main sstable set to be moved into compaction group. After that, tests can no longer direct access the main set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	4fa8159a13	test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable column_family_test::add_sstable will soon be changed to run in a thread, and it's not needed in this procedure, so let's remove its usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Kamil Braun	dba595d347	Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch Broadcast tables are tables for which all statements are strongly consistent (linearizable), replicated to every node in the cluster and available as long as a majority of the cluster is available. If a user wants to store a “small” volume of metadata that is not modified “too often” but provides high resiliency against failures and strong consistency of operations, they can use broadcast tables. The main goal of the broadcast tables project is to solve problems which need to be solved when we eventually implement general-purpose strongly consistent tables: designing the data structure for the Raft command, ensuring that the commands are idempotent, handling snapshots correctly, and so on. In this MVP (Minimum Viable Product), statements are limited to simple SELECT and UPDATE operations on the built-in table. In the future, other statements and data types will be available but with this PR we can already work on features like idempotent commands or snapshotting. Snapshotting is not handled yet which means that restarting a node or performing too many operations (which would cause a snapshot to be created) will give incorrect results. In a follow-up, we plan to add end-to-end Jepsen tests (https://jepsen.io/). With this PR we can already simulate operations on lists and test linearizability in linear complexity. This can also test Scylla's implementation of persistent storage, failure detector, RPC, etc. Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing Closes #11164 * github.com:scylladb/scylladb: raft: broadcast_tables: add broadcast_kv_store test raft: broadcast_tables: add returning query result raft: broadcast_tables: add execution of intermediate language raft: broadcast_tables: add compilation of cql to intermediate language raft: broadcast_tables: add definition of intermediate language db: system_keyspace: add broadcast_kv_store table db: config: add BROADCAST_TABLES feature flag	2022-09-09 18:05:37 +02:00
Benny Halevy	d86810d22c	mutation_partition: compact_for_compaction_v2: get tombstone_gc_state To be passed down to compact_mutation_state in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	0627667a06	mutation_partition: compact_for_compaction: get tombstone_gc_state And pass down to `do_compact`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	7e4612d3aa	mutation_readers: pass tombstone_gc_state to compating_reader To be passed further done to `compact_mutation_state` in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:14 +03:00
Benny Halevy	2cd3fc2f36	compaction: table_state: add virtual get_tombstone_gc_state method and override it in table::table_state to get the tombstone_gc_state from the table's compaction_manager. It is going to be used in the next patched to pass the gc state from the compaction_strategy down to sstables and compaction. table_state_for_test was modified to just keep a null tombstone_gc_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Pavel Emelyanov	398e9f8593	network_topology_strategy_test: Use topology instead of snitch Most of the test's cases use rack-inferring snitch driver and get DC/RACK from it via the test_dc_rack() helper. The helper was introduced in one of the previous sets to populate token metadata with some DC/RACK as normal tokens manipulations required respective endpoint in topology. This patch removes the usage of global snitch and replaces it with the pre-populated topology. The pre-population is done in rack-inferring snitch like manner, since token_metadata still uses global snitch and the locations from snitch and this temporary topology should match. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-06 12:26:30 +03:00
Pavel Emelyanov	d8b2940cd8	network_topology_strategy_test: Populate explicit topology There's a test case that makes its own snitch driver that generates pre-claculated DC/RACK data for test endpoints. This patch replaces this custom snitch driver with a standalone topology object. Note: to get DC/RACK info from this topo the get_location() is used since the get_rack()/get_datacenter() are still wrappers around global snitch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-06 12:24:39 +03:00
Mikołaj Grzebieluch	5b1421cc33	db: config: add BROADCAST_TABLES feature flag Add experimental flag 'broadcast-tables' for enabling BROADCAST_TABLES feature. This feature requires raft group0, thus enabling it without RAFT will cause an error.	2022-09-05 11:11:08 +02:00
Botond Dénes	be9d1c4df4	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422	2022-09-04 20:02:50 +03:00
Avi Kivity	421557b40a	Merge "Provide DC/RACK when populating topology" from Pavel E " The topology object maintains all sort of node/DC/RACK mappings on board. When new entries are added to it the DC and RACK are taken from the global snitch instance which, in turn, checks gossiper, system keyspace and its local caches. This set make topology population API require DC and RACK via the call argument. In most of the cases the populating code is the storage service that knows exactly where to get those from. After this set it will be possible to remove the dependency knot consiting of snitch, gossiper, system keyspace and messaging. " * 'br-topology-dc-rack-info' of https://github.com/xemul/scylla: toplogy: Use the provided dc/rack info test: Provide testing dc/rack infos storage_service: Provide dc/rack for snitch reconfiguration storage_service: Provide dc/rack from system ks on start storage_service: Provide dc/rack from gossiper for replacement storage_service: Provide dc/rack from gossiper for remotes storage_service,dht,repair: Provide local dc/rack from system ks system_keyspace: Cache local dc-rack on .start() topology: Some renames after previous patch topology: Require entry in the map for update_normal_tokens() topology: Make update_endpoint() accept dc-rack info replication_strategy: Accept dc-rack as get_pending_address_ranges argument dht: Carry dc-rack over boot_strapper and range_streamer storage_service: Make replacement info a real struct	2022-08-31 12:53:06 +03:00
Tomasz Grabiec	ae8d2a550d	db: schema_tables: Make table creation shadow earlier concurrent changes Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396	2022-08-29 12:06:02 +02:00
Pavel Emelyanov	10e8804417	test: Provide testing dc/rack infos There's a test that's sensitive to correct dc/rack info for testing entries. To populate them it uses global rack-inferring snitch instance or a special "testing" snitch. To make it continue working add a helper that would populate the topology properly (spoiler: next branch will replace it with explicitly populated topology object). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 10:00:04 +03:00
Pavel Emelyanov	4cbe6ee9f4	topology: Require entry in the map for update_normal_tokens() The method in question tries to be on the safest side and adds the enpoint for which it updates the tokens into the topology. From now on it's up to the caller to put the endpoint into topology in advance. So most of what this patch does is places topology.update_endpoint() into the relevant places of the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:44:08 +03:00

1 2 3 4 5 ...

1910 Commits