scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Botond Dénes	05ef13a627	Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted. The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading. The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run. The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold. What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned. Closes #11233 * github.com:scylladb/scylladb: test: Add test for large partition splitting on compaction compaction: Add support to split large partitions sstable: Extend sstable_run to allow disjointness on the clustering level sstables: simplify will_introduce_overlapping() test: move sstable_run_disjoint_invariant_test into sstable_datafile_test test: lib: Fix inefficient merging of mutations in make_sstable_containing() sstables: Keep track of first partition's first pos and last partition's last pos sstables: Rename min/max position_range to a descriptive name sstables_manager: Add sstable metadata reader concurrency semaphore sstables: Add ability to find first or last position in a partition	2022-09-15 16:08:56 +03:00
Kamil Braun	728161003a	Merge 'raft server, abort on background errors' from Gusev Petr Halted background fibers render raft server effectively unusable, so report this explicitly to the clients. Fix: #11352 Closes #11370 * github.com:scylladb/scylladb: raft server, status metric raft server, abort group0 server on background errors raft server, provide a callback to handle background errors raft server, check aborted state on public server public api's	2022-09-15 14:12:11 +02:00
Raphael S. Carvalho	20a6483678	test: Add test for large partition splitting on compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:23:19 -03:00
Raphael S. Carvalho	4bc24acf81	sstable: Extend sstable_run to allow disjointness on the clustering level After commit `0796b8c97a`, sstable_run won't accept a fragment that introduces key overlapping. But once we split large partitions, fragments in the same run may store disjoint clustering ranges of the same partition. So we're extending sstable_run to look at clustering dimension, so fragments storing disjoint clustering ranges of the same large partition can co-exist in the same run. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	13942ec947	test: move sstable_run_disjoint_invariant_test into sstable_datafile_test That's where it belongs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	5937765009	sstables: Keep track of first partition's first pos and last partition's last pos With first partition's first position and last partition's last partition, we'll be able to determine which fragments composing a sstable run store a large partition that was split. Then sstable run will be able to detect if all fragments storing a given large partition are disjoint in the clustering level. Fixes #10637. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	e099a9bf3b	sstables_manager: Add sstable metadata reader concurrency semaphore Let's introduce a reader_concurrency_semaphore for reading sstable metadata, to avoid an OOM due to unlimited concurrency. The concurrency on startup is not controlled, so it's important to enforce a limit on the amount of memory used by the parallel readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	9bcad9ffa8	sstables: Add ability to find first or last position in a partition This new method allows sstable to load the first row of the first partition and last row of last partition. That's useful for incremental reading of sstable run which will be split at clustering boundary. To get the first row, it consumes the first row (which can be either a clustering row or range tombstone change) and returns its position_in_partition. To get the last row, it does the same as above but in reverse mode instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:48 -03:00
Petr Gusev	4ff0807cd0	raft server, status metric	2022-09-13 19:34:22 +04:00
Nadav Har'El	8ece63c433	Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails' This series introduces two configurable options when working with TWCS tables: - `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting - `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check. Refs: #6923 Fixes: #9029 Closes #11445 * github.com:scylladb/scylladb: tests: cql_query_test: add mixed tests for verifying TWCS guard rails tests: cql_query_test: add test for TWCS window size tests: cql_query_test: add test for TWCS tables with no TTL defined cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables cql: add max window restriction for TimeWindowCompactionStrategy time_window_compaction_strategy: reject invalid window_sizes cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr	2022-09-12 23:55:51 +03:00
Botond Dénes	9db940ff1b	Merge "Make network_topology_strategy_test use topology" from Pavel Emelyanov " The test in question plays with snitches to simulate the topology over which tokens are spread. This set replaces explicit snitch usage with temporary topology object. Some snitch traces are still left, but those are for token_metadata internal which still call global snitch for DC/RACK. " * 'br-tests-use-topology-not-snitch' of https://github.com/xemul/scylla: network_topology_strategy_test: Use topology instead of snitch network_topology_strategy_test: Populate explicit topology	2022-09-12 09:40:17 +03:00
Avi Kivity	6c797587c7	dirty_memory_manager: region_group: remove sorting of subgroups dirty_memory_manager tracks lsa regions (memtables) under region_group:s, in order to be able to pick up the largest memtable as a candidate for flushing. Just as region_group:s contain regions, they can also contain other region_group:s in a nested structure. It also tracks the nested region_group that contains the largest region in a binomial heap. This latter facility is no longer used. It saw use when we had the system dirty_memory_manager nested under the user dirty_memory_manager, but that proved too complicated so it was undone. We still nest a virtual region_group under the real region_group, and in fact it is the virtual region_group that holds the memtables, but it is accessed directly to find the largest memtable (region_group::get_largest_region) and so all the mechanism that sorts region_group:s is bypassed. Start to dismantle this house of cards by removing the subgroup sorting. Since the hierarchy has exactly one parent and one child, it's clearly useless. This is seen by the fact that we can just remove everything related. We still need the _subgroups member to hold the virtual region_group; it's replaced by a vector. I verified that the non-intrusive vector is exception safe since push_back() happens at the very end; in any case this is early during setup where we aren't under memory pressure. A few tests that check the removed functionality are deleted. Closes #11515	2022-09-12 09:29:08 +03:00
Petr Gusev	1b5fa4088e	raft server, abort group0 server on background errors	2022-09-12 10:16:43 +04:00
Felipe Mendes	6a3d8607b4	tests: cql_query_test: add mixed tests for verifying TWCS guard rails This patch adds set of 10 cenarios that have been unveiled during additional testing. In particular, most of the scenarios cover ALTER TABLE statements, which - if not handled - may break the guardrails safe-mode. The situations covered are: - STCS->TWCS with no TTL defined - STCS->TWCS with small TTL - STCS->TWCS with large TTL value - TWCS table with small to large TTL - No TTL TWCS to large TTL and then small TTL - twcs_max_window_count LiveUpdate - Decrease TTL - twcs_max_window_count LiveUpdate - Switch CompactionStrategy - No TTL TWCS table to STCS - Large TTL TWCS table, modify attribute other than compaction and default_time_to_live - Large TTL STCS table, fail to switch to TWCS with no TTL explicitly defined	2022-09-11 17:57:14 -03:00
Felipe Mendes	a7a91e3216	tests: cql_query_test: add test for TWCS window size This patch adds a test for checking the validity of tables using TimeWindowCompactionStrategy with an incorrect number of compaction windows. The twcs_max_window_count LiveUpdate-able parameter is also disabled during the execution of the test in order to ensure that users can effectively disable the enforcement, should they want.	2022-09-11 17:38:25 -03:00
Felipe Mendes	1c5d46877e	tests: cql_query_test: add test for TWCS tables with no TTL defined This patch adds a testcase for TimeWindowCompactionStrategy tables created with no default_time_to_live defined. It makes use of the LiveUpdate-able restrict_twcs_default_ttl parameter in order to determine whether TWCS tables without TTL should be forbidden or not. The test replays all 3 possible variations of the tri_mode_restriction and verifies tables are correctly created/altered according to the current setting on the replica which receives the request.	2022-09-11 16:55:46 -03:00
Raphael S. Carvalho	f5715d3f0b	replica: Move memtables to compaction_group Now memtables live in compaction_group. Also introduced function that selects group based on token, but today table always return the single group managed by it. Once multiple groups are supported, then the function should interpret token content to select the group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	6717d96684	replica: move maintenance SSTable set to compaction_group This commit is restricted to moving maintenance set into compaction_group. Next, we'll introduce compound set into it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	65414e6756	test: sstable_compaction_test: Don't reference main sstable set directly Preparatory change for main sstable set to be moved into compaction group. After that, tests can no longer direct access the main set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	4fa8159a13	test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable column_family_test::add_sstable will soon be changed to run in a thread, and it's not needed in this procedure, so let's remove its usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Kamil Braun	dba595d347	Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch Broadcast tables are tables for which all statements are strongly consistent (linearizable), replicated to every node in the cluster and available as long as a majority of the cluster is available. If a user wants to store a “small” volume of metadata that is not modified “too often” but provides high resiliency against failures and strong consistency of operations, they can use broadcast tables. The main goal of the broadcast tables project is to solve problems which need to be solved when we eventually implement general-purpose strongly consistent tables: designing the data structure for the Raft command, ensuring that the commands are idempotent, handling snapshots correctly, and so on. In this MVP (Minimum Viable Product), statements are limited to simple SELECT and UPDATE operations on the built-in table. In the future, other statements and data types will be available but with this PR we can already work on features like idempotent commands or snapshotting. Snapshotting is not handled yet which means that restarting a node or performing too many operations (which would cause a snapshot to be created) will give incorrect results. In a follow-up, we plan to add end-to-end Jepsen tests (https://jepsen.io/). With this PR we can already simulate operations on lists and test linearizability in linear complexity. This can also test Scylla's implementation of persistent storage, failure detector, RPC, etc. Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing Closes #11164 * github.com:scylladb/scylladb: raft: broadcast_tables: add broadcast_kv_store test raft: broadcast_tables: add returning query result raft: broadcast_tables: add execution of intermediate language raft: broadcast_tables: add compilation of cql to intermediate language raft: broadcast_tables: add definition of intermediate language db: system_keyspace: add broadcast_kv_store table db: config: add BROADCAST_TABLES feature flag	2022-09-09 18:05:37 +02:00
Benny Halevy	d86810d22c	mutation_partition: compact_for_compaction_v2: get tombstone_gc_state To be passed down to compact_mutation_state in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	0627667a06	mutation_partition: compact_for_compaction: get tombstone_gc_state And pass down to `do_compact`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	7e4612d3aa	mutation_readers: pass tombstone_gc_state to compating_reader To be passed further done to `compact_mutation_state` in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:14 +03:00
Benny Halevy	2cd3fc2f36	compaction: table_state: add virtual get_tombstone_gc_state method and override it in table::table_state to get the tombstone_gc_state from the table's compaction_manager. It is going to be used in the next patched to pass the gc state from the compaction_strategy down to sstables and compaction. table_state_for_test was modified to just keep a null tombstone_gc_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Pavel Emelyanov	398e9f8593	network_topology_strategy_test: Use topology instead of snitch Most of the test's cases use rack-inferring snitch driver and get DC/RACK from it via the test_dc_rack() helper. The helper was introduced in one of the previous sets to populate token metadata with some DC/RACK as normal tokens manipulations required respective endpoint in topology. This patch removes the usage of global snitch and replaces it with the pre-populated topology. The pre-population is done in rack-inferring snitch like manner, since token_metadata still uses global snitch and the locations from snitch and this temporary topology should match. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-06 12:26:30 +03:00
Pavel Emelyanov	d8b2940cd8	network_topology_strategy_test: Populate explicit topology There's a test case that makes its own snitch driver that generates pre-claculated DC/RACK data for test endpoints. This patch replaces this custom snitch driver with a standalone topology object. Note: to get DC/RACK info from this topo the get_location() is used since the get_rack()/get_datacenter() are still wrappers around global snitch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-06 12:24:39 +03:00
Mikołaj Grzebieluch	5b1421cc33	db: config: add BROADCAST_TABLES feature flag Add experimental flag 'broadcast-tables' for enabling BROADCAST_TABLES feature. This feature requires raft group0, thus enabling it without RAFT will cause an error.	2022-09-05 11:11:08 +02:00
Botond Dénes	be9d1c4df4	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422	2022-09-04 20:02:50 +03:00
Avi Kivity	421557b40a	Merge "Provide DC/RACK when populating topology" from Pavel E " The topology object maintains all sort of node/DC/RACK mappings on board. When new entries are added to it the DC and RACK are taken from the global snitch instance which, in turn, checks gossiper, system keyspace and its local caches. This set make topology population API require DC and RACK via the call argument. In most of the cases the populating code is the storage service that knows exactly where to get those from. After this set it will be possible to remove the dependency knot consiting of snitch, gossiper, system keyspace and messaging. " * 'br-topology-dc-rack-info' of https://github.com/xemul/scylla: toplogy: Use the provided dc/rack info test: Provide testing dc/rack infos storage_service: Provide dc/rack for snitch reconfiguration storage_service: Provide dc/rack from system ks on start storage_service: Provide dc/rack from gossiper for replacement storage_service: Provide dc/rack from gossiper for remotes storage_service,dht,repair: Provide local dc/rack from system ks system_keyspace: Cache local dc-rack on .start() topology: Some renames after previous patch topology: Require entry in the map for update_normal_tokens() topology: Make update_endpoint() accept dc-rack info replication_strategy: Accept dc-rack as get_pending_address_ranges argument dht: Carry dc-rack over boot_strapper and range_streamer storage_service: Make replacement info a real struct	2022-08-31 12:53:06 +03:00
Tomasz Grabiec	ae8d2a550d	db: schema_tables: Make table creation shadow earlier concurrent changes Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396	2022-08-29 12:06:02 +02:00
Pavel Emelyanov	10e8804417	test: Provide testing dc/rack infos There's a test that's sensitive to correct dc/rack info for testing entries. To populate them it uses global rack-inferring snitch instance or a special "testing" snitch. To make it continue working add a helper that would populate the topology properly (spoiler: next branch will replace it with explicitly populated topology object). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 10:00:04 +03:00
Pavel Emelyanov	4cbe6ee9f4	topology: Require entry in the map for update_normal_tokens() The method in question tries to be on the safest side and adds the enpoint for which it updates the tokens into the topology. From now on it's up to the caller to put the endpoint into topology in advance. So most of what this patch does is places topology.update_endpoint() into the relevant places of the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:44:08 +03:00
Pavel Emelyanov	1d437302a8	tests: Use one-by-one tokens updating method Tests are the only users of batch tokens updating "sugar" which actually makes things more complicated Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-24 08:24:21 +03:00
Avi Kivity	6ce5e9079c	Merge 'utils/logalloc: consolidate lsa state in shard tracker' from Botond Dénes Currently the state of LSA is scattered across a handful of global variables. This series consolidates all these into a single one: the shard tracker. Beyond reducing the number of globals (the less globals, the better) this paves the way for a planned de-globalization of the shard tracker itself. There is one separate global left, the static migrators registry. This is left as-is for now. Closes #11284 * github.com:scylladb/scylladb: utils/logalloc: remove reclaim_timer:: globals utils/logalloc: make s_sanitizer_report_backtrace global a member of tracker utils/logalloc: tracker_reclaimer_lock: get shard tracker via constructor arg utils/logalloc: move global stat accessors to tracker utils/logalloc: allocating_section: don't use the global tracker utils/logalloc: pass down tracker::impl reference to segment_pool utils/logalloc: move segment pool into tracker utils/logalloc: add tracker member to basic_region_impl utils/logalloc: make segment independent of segment pool	2022-08-23 18:51:14 +02:00
Tomasz Grabiec	0e5b86d3da	Merge 'Optimize mutation consume of range tombstones in reverse' from Benny Halevy Reversing the whole range_tombstone_list into reversed_range_tombstones is inefficient and can lead to reactor stalls with a large number of range tombstones. Instead, iterate over the range_tombsotne_list in reverse direction and reverse each range_tombstone as we go, keeping the result in the optional cookie.reversed_rt member. While at it, this series contains some other cleanups on this path to improve the code readability and maybe make the compiler's life easier as for optimizing the cleaned-up code. Closes #11271 * github.com:scylladb/scylladb: mutation: consume_clustering_fragments: get rid of reversed_range_tombstones; mutation: consume_clustering_fragments: reindent mutation: consume_clustering_fragments: shuffle emit_rt logic around mutation: consume, consume_gently: simplify partition_start logic mutation: consume_clustering_fragments: pass iterators to mutation_consume_cookie ctor mutation: consume_clustering_fragments: keep the reversed schema in cookie mutation: clustering_iterators: get rid of current_rt mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently	2022-08-23 10:05:39 +02:00
Botond Dénes	7d17d675af	utils/logalloc: move global stat accessors to tracker These are pretend free functions, accessing globals in the background, make them a member of the tracker instead, which everything needed locally to compute them. Callers still have to access these stats through the global tracker instance, but this can be changed to happen through a local instance. Soon....	2022-08-23 10:38:58 +03:00
Botond Dénes	331033adae	Merge 'Fix frozen mutation consume ordering' from Benny Halevy Currently, frozen_mutation is not consumed in position_in_partition order as all range tombstones are consumed before all rows. This violates the range_tombstone_generator invariants as its lower_bound needs to be monotonically increasing. Fix this by adding mutation_partition_view::accept_ordered and rewriting do_accept_gently to do the same, both making sure to consume the range tombstones and clustering rows in position_in_partition order, similar to the mutation consume_clustering_fragments function. Add a unit test that verifies that. Fixes #11198 Closes #11269 * github.com:scylladb/scylladb: mutation_partition_view: make mutation_partition_view_virtual_visitor stoppable frozen_mutation: consume and consume_gently in-order frozen_mutation: frozen_mutation_consumer_adaptor: rename rt to rtc frozen_mutation: frozen_mutation_consumer_adaptor: return early when flush returns stop_iteration::yes frozen_mutation: frozen_mutation_consumer_adaptor: consume static row unconditionally frozen_mutation: frozen_mutation_consumer_adaptor: flush current_row before rt_gen	2022-08-23 06:37:04 +03:00
Mikołaj Sielużycki	b5380baf8a	frozen_mutation: consume and consume_gently in-order Currently, frozen_mutation is not consumed in position_in_partition order as all range tombstones are consumed before all rows. This violates the range_tombstone_generator invariants as its lower_bound needs to be monotonically increasing. Fix this by adding mutation_partition_view::accept_ordered and rewriting do_accept_gently to do the same, both making sure to consume the range tombstones and clustering rows in position_in_partition order, similar to the mutation consume_clustering_fragments function. Add a unit test that verifies that. Fixes #11198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 20:12:20 +03:00
Piotr Sarna	484004e766	Merge 'Fix mutation commutativity with shadowable tombstone' from Tomasz Grabiec This series fixes lack of mutation associativity which manifests as sporadic failures in row_cache_test.cc::test_concurrent_reads_and_eviction due to differences in mutations applied and read. No known production impact. Refs https://github.com/scylladb/scylladb/issues/11307 Closes #11312 * github.com:scylladb/scylladb: test: mutation_test: Add explicit test for mutation commutativity test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones db: mutation_partition: Drop unnecessary maybe_shadow() db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone mutation_partition: row: make row marker shadowing symmetric	2022-08-20 16:46:32 +02:00
Benny Halevy	7747b8fa33	sstables: define run_identifier as a strong tagged_uuid type Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11321	2022-08-18 19:03:10 +03:00
Tomasz Grabiec	5a9df433c6	test: mutation_test: Add explicit test for mutation commutativity	2022-08-17 17:39:54 +02:00
Benny Halevy	017f9b4131	mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 14:43:52 +03:00
Piotr Sarna	cf30d4cbcf	Merge 'Secondary index of collection columns' from Nadav Har'El This pull request introduces global secondary-indexing for non-frozen collections. The intent is to enable such queries: ``` CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id)); CREATE INDEX ON test(keys(somemap)); CREATE INDEX ON test(values(somemap)); CREATE INDEX ON test(entries(somemap)); CREATE INDEX ON test(values(somelist)); CREATE INDEX ON test(values(someset)); -- index on test(c) is the same as index on (values(c)) CREATE INDEX IF NOT EXISTS ON test(somelist); CREATE INDEX IF NOT EXISTS ON test(someset); CREATE INDEX IF NOT EXISTS ON test(somemap); SELECT * FROM test WHERE someset CONTAINS 7; SELECT * FROM test WHERE somelist CONTAINS 7; SELECT * FROM test WHERE somemap CONTAINS KEY 7; SELECT * FROM test WHERE somemap CONTAINS 7; SELECT * FROM test WHERE somemap[7] = 7; ``` We use here all-familiar materialized views (MVs). Scylla treats all the collections the same way - they're a list of pairs (key, value). In case of sets, the value type is dummy one. In case of lists, the key type is TIMEUUID. When describing the design, I will forget that there is more than one collection type. Suppose that the columns in the base table were as follows: ``` pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2) ``` The MV schema is as follows (the names of columns which are not the same as in base might be different). All the columns here form the primary key. ``` -- for index over entries indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int -- for index over keys indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int -- for index over values indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int ``` The reason for the last additional column is that the values from a collection might not be unique. Fixes #2962 Fixes #8745 Fixes #10707 This patch does not implement local secondary indexes for collection columns: Refs #10713. Closes #10841 * github.com:scylladb/scylladb: test/cql-pytest: un-xfail yet another passing collection-indexing test secondary index: fix paging in map value indexing test/cql-pytest: test for paging with collection values index cql, view: rename and explain bytes_with_action cql, index: make collection indexing a cluster feature test/cql-pytest: failing tests for oversized key values in MV and SI cql: fix secondary index "target" when column name has special characters cql, index: improve error messages cql, index: fix default index name for collection index test/cql-pytest: un-xfail several collecting indexing tests test/cql-pytest/test_secondary_index: verify that local index on collection fails. docs/design-notes/secondary_index: add `VALUES` to index target list test/cql-pytest/test_secondary_index: add randomized test for indexes on collections cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail. test/boost/secondary_index_test: update for non-frozen indexes on collections test/cql-pytest: Uncomment collection indexes tests that should be working now cql, index: don't use IS NOT NULL on collection column cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections cql3, index: Use entries() indexes on collections for queries cql3, index: Use keys() and values() indexes on collections for queries. types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function db/view/view.cc: compute view_updates for views over collections view info: has_computed_column_depending_on_base_non_primary_key column_computation: depends_on_non_primary_key_column schema, index/secondary_index_manager: make schema for index-induced mv index/secondary_index_manager: extract keys, values, entries types from collection cql3/statements/: validate CREATE INDEX for index over a collection cql3/statements/create_index_statement,index_target: rewrite index target for collection column_computation.hh, schema.cc: collection_column_computation column_computation.hh, schema.cc: compute_value interface refactor Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`	2022-08-16 14:18:51 +02:00
Avi Kivity	afa7960926	Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later. Fixes: https://github.com/scylladb/scylladb/issues/11264 Closes #11273 * github.com:scylladb/scylladb: querier: querier_cache: remove now unused evict_all_for_table() database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() reader_concurrency_semaphore: add evict_inactive_reads_for_table()	2022-08-15 19:05:59 +03:00
Botond Dénes	92e5f438a4	querier: querier_cache: remove now unused evict_all_for_table()	2022-08-15 14:16:41 +03:00
Botond Dénes	e55ccbde8f	reader_concurrency_semaphore: add evict_inactive_reads_for_table() Allowing for evicting all inactive reads that belong to a certain table.	2022-08-15 14:16:41 +03:00
Botond Dénes	c8ef356859	test/lib: move convenience table config factory to sstable_test_env All users of `column_family_test_config()`, get the semaphore parameter for it from `sstable_test_env`. It is clear that the latter serves as the storage space for stable objects required by the table config. This patch just enshrines this fact by moving the config factory method to `sstable_test_env`, so it can just get what it needs from members.	2022-08-15 11:23:59 +03:00
Michał Radwański	f572051ee9	test/boost/secondary_index_test: update for non-frozen indexes on collections	2022-08-14 10:29:52 +03:00
Benny Halevy	d295d8e280	everywhere: define locator::host_id as a strong tagged_uuid type So it can be distinguished from other uuid-based identifiers in the system. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11276	2022-08-12 06:01:44 +03:00

1 2 3 4 5 ...

1893 Commits