scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Nadav Har'El	e4b2dfb54d	alternator ttl: when node is down, secondary node continues to expire The current implementation of the Alternator expiration (TTL) feature has each node scan for expired partitions in its own primary ranges. This means that while a node is down, items in its primary ranges will not get expired. But we note that doesn't have to be this way: If only a single node is down, and RF=3, the items that node owns are still readable with QUORUM - so these items can still be safely read and checked for expiration - and also deleted. This patch implements a fairly simple solution: When a node completes scanning its own primary ranges, also checks whether any of its secondary ranges (ranges where it is the second replica) has its primary owner down. For such ranges, this node will scan them as well. This secondary scan stops if the remote node comes back up, but in that case it may happen that both nodes will work on the same range at the same time. The risks in that are minimal, though, and amount to wasted work and duplicate deletion records in CDC. In the future we could avoid this by using LWT to claim ownership on a range being scanned. We have a new dtest (see a separate patch), alternator_ttl_tests.py:: TestAlternatorTTL::test_expiration_with_down_node, which reproduces this and verifies this fix. The test starts a 5-node cluster, with 1000 items with random tokens which are due to be expired immediately. The test expects to see all items expiring ASAP, but when one of the five nodes is brought down, this doesn't happen: Some of the items are not expired, until this patch is used. Fixes #9787 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211222131933.406148-1-nyh@scylladb.com>	2021-12-26 14:10:52 +02:00
Benny Halevy	f7b8b809d0	sstables: parse chunked_vector<std::integral Members>: maximize chunk size Currently this parse function reads only 100KB worth of members in eac hiteration. Since the default max_chunk_capacity is 128KB, 100KB underutilize the chunk capacity, and it could be safely increased to the max to reduce the number of allocations and corresponding calls to read_exactly for large arrays. Expose utils::chunked_vector::max_chunk_capacity so that the caler wouldn't have to guess this number and use it in parse(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211222103126.1819289-2-bhalevy@scylladb.com>	2021-12-22 15:47:37 +02:00
Benny Halevy	d95f6602a7	sstables: coroutinize parse functions Simplify the implementation using coroutines. This also has the potential to coalesce multiple allocations into one. test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211222103126.1819289-1-bhalevy@scylladb.com>	2021-12-22 15:47:37 +02:00
Benny Halevy	2f2e3b2e84	test: lib: index_reader_assertions: close reader before it is destroyed Otherwise, it may trip an assertion when the nuderlying file is closed, as seen in e.g.: https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4318/artifact/testlog/x86_64_release/sstable_3_x_test.test_read_rows_only_index.4174.log ``` test/boost/sstable_3_x_test.cc(0): Entering test case "test_read_rows_only_index" sstable_3_x_test: ./seastar/src/core/fstream.cc:205: virtual seastar::file_data_source_impl::~file_data_source_impl(): Assertion `_reads_in_progress == 0' failed. Aborting on shard 0. Backtrace: 0x22557e8 0x2286842 0x7f2799e99a1f /lib64/libc.so.6+0x3d2a1 /lib64/libc.so.6+0x268a3 /lib64/libc.so.6+0x26788 /lib64/libc.so.6+0x35a15 0x222c53d 0x222c548 0xb929cc 0xc0b23b 0xa84bbf 0x24d0111 ``` Decoded: ``` __GI___assert_fail at :? ~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:205 ~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:202 std::default_delete<seastar::data_source_impl>::operator()(seastar::data_source_impl) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85 (inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361 (inlined by) ~data_source at ././seastar/include/seastar/core/iostream.hh:55 (inlined by) ~input_stream at ././seastar/include/seastar/core/iostream.hh:254 (inlined by) ~continuous_data_consumer at ././sstables/consumer.hh:484 (inlined by) ~index_consume_entry_context at ././sstables/index_reader.hh:116 (inlined by) std::default_delete<sstables::index_consume_entry_context<sstables::index_consumer> >::operator()(sstables::index_consume_entry_context<sstables::index_consumer>) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85 (inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361 (inlined by) ~index_bound at ././sstables/index_reader.hh:395 (inlined by) ~index_reader at ././sstables/index_reader.hh:435 std::default_delete<sstables::index_reader>::operator()(sstables::index_reader*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85 (inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361 (inlined by) ~index_reader_assertions at ././test/lib/index_reader_assertions.hh:31 (inlined by) operator() at ./test/boost/sstable_3_x_test.cc:4630 ``` Test: unit(dev), sstable_3_x_test.test_read_rows_only_index(release X 10000) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211222132858.2155227-1-bhalevy@scylladb.com>	2021-12-22 15:33:22 +02:00
Raphael S. Carvalho	e80cb51b6a	distributed_loader: make shutdown clean by properly handling compaction_stopped exception Today, when resharding is interrupted, shutdown will not be clean because stopped exception interrupts the shutdown process. Let's handle stopped exception properly, to allow shutdown process to run to completion. Refs #9759 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211221175717.62293-1-raphaelsc@scylladb.com>	2021-12-22 15:08:31 +02:00
Botond Dénes	def6d48307	Merge 'gdb: Introduce "scylla lsa-check"' from Tomasz Grabiec Catches inconsistencies in LSA state. Currently: - discrepancy between segment set in _closed_segments and shard's segment descriptors - cross-shard segment references in _closed_segments - discrepancy in _closed_occupancy stats and what's in segment descriptors - segments not present in _closed_segments but present in segment descriptors Refs https://github.com/scylladb/scylla/issues/9544 Closes #9834 * github.com:scylladb/scylla: gdb: Introduce "scylla lsa-check" gdb: Make get_base_class_offset() also see indirect base classes	2021-12-22 15:08:31 +02:00
Pavel Emelyanov	7286374dba	migration_manager: Remove last occurrence of get_local_storage_proxy() The migration manager got local storage proxy reference recently, but one method still uses the global call. Fix it. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20211221120034.21824-1-xemul@scylladb.com>	2021-12-22 15:08:31 +02:00
Botond Dénes	aba68c8f83	Merge "reader_concurrency_semaphore: convert to flat_mutation_reader_v2" from Michael " The second patch in this series is a mechanical conversion of reader_concurrency_semaphore to flat_mutation_reader_v2, and caller updates. The first patch is needed to pass the test suite, since without it a real reader version conversion would happen on every entry to and exit from reader_concurrency_semaphore, which is stressful (for example: mutation_reader_test.test_multishard_streaming_reader reaches 8191 conversions for a couple of readers, which somehow causes it to catch SIGSEGV in diverse and seemingly-random places). Note that in a real workload it is unreasonable to expect readers being parked in a reader_concurrency_semaphore to be pristine, so short-circuiting their version conversions will be impossible and this workaround will not really help. " * tag 'rcs-v2-v4' of https://github.com/cmm/scylla: reader_concurrency_semaphore: convert to flat_mutation_reader_v2 short-circuit flat mutation reader upgrades and downgrades	2021-12-22 15:08:31 +02:00
Tomasz Grabiec	3e81318587	gdb: Introduce "scylla lsa-check" Catches inconsistencies in LSA state. Currently: - discrepancy between segment set in _closed_segments and shard's segment descritpors - cross-shard segment references in _closed_segments - discrepancy in _closed_occupancy stats and what's in segment descriptors - segments not present in _closed_segments but present in segment descriptors	2021-12-21 21:18:52 +01:00
Tomasz Grabiec	d754504fa2	gdb: Make get_base_class_offset() also see indirect base classes I need it so that segment_descriptor is seen as inheriting from list_base_hook<>, which it does via log_heap_hook.	2021-12-21 21:18:52 +01:00
Michael Livshin	a1b8ba23d2	reader_concurrency_semaphore: convert to flat_mutation_reader_v2 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-12-21 11:26:17 +02:00
Michael Livshin	9f656b96ac	short-circuit flat mutation reader upgrades and downgrades When asked to upgrade a reader that itself is a downgrade, try to return the original v2 reader instead, and likewise when downgrading upgraded v1 readers. This is desirable because version transformations can result from, say, entering/leaving a reader concurrency semaphore, and the amount of such transformations is practically unbounded. Such short-circuiting is only done if it is safe, that is: the transforming reader's buffer is empty and its internal range tombstone tracking state is discardable. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-12-21 11:26:17 +02:00
Raphael S. Carvalho	64ec1c6ec6	table: Make sure major compaction doesn't miss data in memtable Make sure that major will compact data in all sstables and memtable, as tombstones sitting in memtable could shadow data in sstables. For example, a tombstone in memtable deleting a large partition could be missed in major, so space wouldn't be saved as expected. Additionally, write amplification is reduced as data in memtable won't have to travel through tiers once flushed. Fixes #9514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217160055.96693-2-raphaelsc@scylladb.com>	2021-12-21 07:21:34 +02:00
Raphael S. Carvalho	e1e8e020fe	tests: Allow memtable to be flushed through column_family_for_tests Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217160055.96693-1-raphaelsc@scylladb.com>	2021-12-21 07:21:26 +02:00
Raphael S. Carvalho	e05859c3f9	compaction: kill unused code for resharding_compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217162728.114936-2-raphaelsc@scylladb.com>	2021-12-20 18:21:31 +02:00
Raphael S. Carvalho	d1f2fd7f03	compaction: rename compacting_sstable_writer to compacted_fragments_writer the name compacting_sstable_writer is misleading as it doesn't perform any compaction. let's rename it to a name that reflects more what it does. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217162728.114936-1-raphaelsc@scylladb.com>	2021-12-20 18:21:31 +02:00
Avi Kivity	f190434beb	Merge "table,sstable_set: use v2 readers below the cache" from Bodtrond " Convert sstable_set and table::make_sstable_reader() to v2. With this all readers below cache use the v2 format. Tests: unit(dev) " * 'table-make-sstable-reader-v2/v1' of https://github.com/denesb/scylla: table: upgrade make_sstable_reader() to v2 sstables/sstable_set: create_single_key_sstable_reader() upgrade to v2 sstables/sstable_set: remove unused and undefined make_reader() member	2021-12-20 17:53:44 +02:00
Botond Dénes	18cddd3279	table: upgrade make_sstable_reader() to v2 With this all readers below cache use the v2 format (except kl/la readers).	2021-12-20 17:40:46 +02:00
Botond Dénes	9027c6f936	sstables/sstable_set: create_single_key_sstable_reader() upgrade to v2 With this all methods of the sstable set create v2 readers.	2021-12-20 17:17:33 +02:00
Botond Dénes	847eddf19a	sstables/sstable_set: remove unused and undefined make_reader() member	2021-12-20 17:17:31 +02:00
Botond Dénes	55bb70a878	Merge "Make sure TWCS per-window major includes all files" from Raphael " TWCS perform STCS on a window as long as it's the most recent one. From there on, TWCS will compact all files in the past window into a single file. With some moderate write load, it could happen that there's still some compaction activity in that past window, meaning that per-window major may miss some files being currently compacted. As a result, a past window may contain more than 1 file after all compaction activity is done on its behalf, which may increase read amplification. To avoid that, TWCS will now make sure that per-window major is serialized, to make sure no files are missed. Fixes #9553. tests: unit(dev). " * 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla: TWCS: Make sure major on past window is done on all its sstables TWCS: remove needless param for STCS options TWCS: kill unused param in newest_bucket() compaction: Implement strategy control and wire it compaction: Add interface to control strategy behavior.	2021-12-20 17:12:50 +02:00
Avi Kivity	e772fcbd57	Merge "Convert combined reader to v2" from Botond " Users are adjusted by sprinkling `upgrade_to_v2()` and `downgrade_to_v1()` where necessary (or removing any of these where possible). No attempt was made to optimize and reduce the amount of v1<->v2 conversions. This is left for follow-up patches to keep this set small. The combined reader is composed of 3 layers: 1. fragment producer - pop fragments from readers, return them in batches (each fragment in a batch having the same type and pos). 2. fragment merger - merge fragment batches into single fragments 3. reader implementation glue-code Converting layers (1) and (3) was mostly mechanical. The logic of merging range tombstone changes is implemented at layer (2), so the two different producer (layer 1) implementations we have share this logic. Tests: unit(dev) " * 'combined-reader-v2/v4' of https://github.com/denesb/scylla: test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging mutation_reader: convert make_clustering_combined_reader() to v2 mutation_reader: convert position_reader_queue to v2 mutation_reader: convert make_combined_reader() overloads to v2 mutation_reader: combined_reader: convert reader_selector to v2 mutation_reader: convert combined reader to v2 mutation_reader: combined_reader: attach stream_id to mutation_fragments flat_mutation_reader_v2: add v2 version of empty reader test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation	2021-12-20 14:01:03 +02:00
Botond Dénes	7f331cee01	test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging Stressing the range tombstone change merging logic.	2021-12-20 09:29:05 +02:00
Botond Dénes	e1bbc4a480	mutation_reader: convert make_clustering_combined_reader() to v2 Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to call sites, no attempts at optimization was done.	2021-12-20 09:29:05 +02:00
Botond Dénes	2364144b19	mutation_reader: convert position_reader_queue to v2 By removing the converting (v1->v2) constructor of `reader_and_upper_bound` and adjusting its users.	2021-12-20 09:29:05 +02:00
Botond Dénes	aeddcf50a1	mutation_reader: convert make_combined_reader() overloads to v2 Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to call sites, no attempts at optimization was done.	2021-12-20 09:29:05 +02:00
Botond Dénes	1554b94b78	mutation_reader: combined_reader: convert reader_selector to v2	2021-12-20 09:29:05 +02:00
Botond Dénes	71835bdee1	mutation_reader: convert combined reader to v2 The meat of the change is on the fragment merger level, which is now also responsible for merging range tombstone changes. The fragment producers are just mechanically converted to v2 by appending `_v2` to the appropriate type names. The beauty of this approach is that range tombstone merging happens in a single place, shared by all fragment producers (there is 2 of them). Selectors and factory functions are left as v1 for now, they will be converted incrementally by the next patches.	2021-12-20 09:29:05 +02:00
Asias He	eba4a4fba4	repair: Allow ignoring dead nodes for replace operation Consider 1) n1, n2, n3, n4, n5 2) n2 and n3 are both down 3) start n6 to replace n2 4) start n7 to replace n3 We want to replace the dead nodes n2 and n3 to fix the cluster to have 5 running nodes. Replace operation in step 3 will fail because n3 is down. We would see errors like below: replace[25edeec0-57d4-11ec-be6b-7085c2409b2d]: Nodes={127.0.0.3} needed for replace operation are down. It is highly recommended to fix the down nodes and try again. In the above example, currently, there is no way to replace any of the dead nodes. Users can either fix one of the dead nodes and run replace or run removenode operation to remove one of the dead nodes then run replace and run bootstrap to add another node. Fixing dead nodes is always the best solution but it might not be possible. Running removenode operation is not better than running replace operation (with best effort by ignoring the other dead node) in terms of data consistency. In addition, users have to run bootstrap operation to add back the removed node. So, allowing replacing in such case is a clear win. This patch adds the --ignore-dead-nodes-for-replace option to allow run replace operation with best effort mode. Please note, use this option only if the dead nodes are completely broken and down, and there is no way to fix the node and bring it back. This also means the user has to make sure the ignored dead nodes specified are really down to avoid any data consistency issue. Fixes #9757 Closes #9758	2021-12-20 00:49:03 +02:00
Avi Kivity	7bdc999bba	service: paxos_state: wean off get_local_storage_proxy() Instead of calling get_local_storage_proxy in paxos_state, get it from the caller (who is, in fact, storage_proxy or one of its components). Some of the callers, although they are storage_proxy components, don't have a storage_proxy reference handy and so they ignomiously call get_local_storage_proxy() themselves. This will be adjusted later. The other callers who are, in fact, storage_proxy, have to take special care not to cross a shard boundary. When they do, smp::submit_to() is converted to sharded::invoke_on() in order to get the correct local instance. Test: unit (dev) Closes #9824	2021-12-20 00:31:13 +02:00
Nadav Har'El	252ce8afd4	Merge 'Extend stop compaction api' from Benny Halevy Allow stopping compaction by type on a given keyspace and list of tables. Also add api unit test suite that tests the existing `stop_compaction` api and the new `stop_keyspace_compaction` api. Fixes #9700 Closes #9746 * github.com:scylladb/scylla: api: storage_service: validate_keyspace: improve exception error message api: compaction_manager: add stop_keyspace_compaction api: storage_service: expose validate_keyspace and parse_tables api: compaction_manager: stop_compaction: fix type description compaction_manager: stop_compaction: expose optional table* test: api: add basic compaction_manager test	2021-12-20 00:18:46 +02:00
Pavel Emelyanov	d88ae7edae	Merge 'migration_manager: retire global storage proxy refs' from Avi Kivity Replace get_local_storage_proxy() and get_local_storage_proxy() with constructor-provided references. Some unneeded cases were removed. Test: unit (dev) Closes #9816 * github.com:scylladb/scylla: migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference migration_manager: don't keep storage_proxy alive during schema_check verb mm: don't capture storage proxy shared_ptr during background schema merge mm: remove stats on schema version get	2021-12-17 17:53:08 +03:00
Raphael S. Carvalho	f508f54f3e	table: move min_compaction_threshold() and compaction_enforce_min_threshold() into table_state Compaction specific methods can be implemented in table_state only, as they aren't needed elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211214191822.164223-1-raphaelsc@scylladb.com>	2021-12-17 10:00:31 +02:00
Piotr Sarna	f49c20aa24	thrift: drop obtaining incorrect permits The thrift layer started partially having admission control after commit `ef1de114f0`, but code inspection suggests that it might cause use-after-free in a few cases, when a permit is obtained more than once per handling - due to the fact that some functions tail-called other functions, which also obtain a permit. These extraneous permits are not taken anyore. Tests: "please trust me" + cassandra-stress in thrift mode Message-Id: <ac5d711288b22c5fed566937722cceeabc234e16.1639394937.git.sarna@scylladb.com>	2021-12-17 09:35:24 +02:00
Avi Kivity	7c23ed888d	Update tools/jmx submodule (dropping unneeded dependencies) * tools/jmx 2c43d99...53f7f55 (1): > pom.xml: drop unneeded logging dependencies	2021-12-16 21:54:36 +02:00
Avi Kivity	a97731a7e5	migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference A static helper also gained a storage_proxy parameter.	2021-12-16 21:05:47 +02:00
Avi Kivity	aca9029c24	migration_manager: don't keep storage_proxy alive during schema_check verb The schema_check verb doesn't leak tasks, so when the verb is unregistered it will be drained. So protection for storage_proxy lifetime can be removed.	2021-12-16 21:04:27 +02:00
Avi Kivity	26c656f6ed	mm: don't capture storage proxy shared_ptr during background schema merge The definitions_update() verb captures a shared_ptr to storage_proxy to keep it alive while the background task executes. This was introduced in (2016!): commit `1429213b4c` Author: Pekka Enberg <penberg@scylladb.com> Date: Mon Mar 14 17:57:08 2016 +0200 main: Defer migration manager RPC verb registration after commitlog replay Defer registering migration manager RPC verbs after commitlog has has been replayed so that our own schema is fully loaded before other other nodes start querying it or sending schema updates. Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com> when moving this code from storage_proxy.cc. Later, better protection with a gate was added: commit `14de126ff8` Author: Pavel Emelyanov <xemul@scylladb.com> Date: Mon Mar 16 18:03:48 2020 +0300 migration_manager: Run background schema merge in gate The call for merge_schema_from in some cases is run in the background and thus is not aborted/waited on shutdown. This may result in use-after-free one of which is merge_schema_from -> read_schema_for_keyspace -> db::system_keyspace::query -> storage_proxy::query -> query_partition_key_range_concurrent in the latter function the proxy._token_metadata is accessed, while the respective object can be already free (unlike the storage_proxy itself that's still leaked on shutdown). Related bug: #5903, #5999 (cannot reproduce though) Tests: unit(dev), manual start-stop dtest(consistency.TestConsistency, dev) dtest(schema_management, dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reviewed-by: Pekka Enberg <penberg@scylladb.com> Message-Id: <20200316150348.31118-1-xemul@scylladb.com> Since now the task execution is protected by the gate and therefore migration_manager lifetime (which is contained within that of storage_proxy, as it is constructed afterwards), capturing the shared_ptr is not needed, and we therefore remove it, as it uses the deprecated global storage_proxy accessors.	2021-12-16 21:01:06 +02:00
Botond Dénes	7db31e1bdb	mutation_reader: combined_reader: attach stream_id to mutation_fragments The fragment producer component of the combined reader returns a batch of fragments on each call to operator()(). These fragments are merged into a single one by the fragment merger. This patch adds a stream id to each fragment in the batch which identifies the stream (reader) it originates from. This will be used in the next patches to associate range-tombstone-changes originating from the same stream with each other.	2021-12-16 14:57:49 +02:00
Botond Dénes	c193bbed82	flat_mutation_reader_v2: add v2 version of empty reader Convert the v1 implementation to v2, downgrade to v1 in the existing `make_empty_flat_reader()`.	2021-12-16 14:57:49 +02:00
Botond Dénes	f15f4952be	test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation Currently the test assumes that fragments represent weakly monotonic upper bounds and therefore unconditionally overwrites the upper-bound on receiving each fragment. Range tombstones however violate this as a range tombstone with a smaller position (lower bound) may have a higher upper bound than some or all fragments that follow it in the stream. This causes test failures after the converting the combined reader to v2, but not before, no idea why.	2021-12-16 14:57:49 +02:00
Nadav Har'El	9ae98dbe92	Merge 'Reduce boot time for dtest setup' from Asias He This patch helps to speed up node boot up for test setups like dtest. Nadav reported ``` With Asias's two patches o Scylla, and my patch to enable it in dtest: Boot time of 5 nodes is now down to 9 seconds! Remember we started this exercise with 214 seconds? :-) ``` Closes #9808 * github.com:scylladb/scylla: storage_service: Recheck tokens before throw in storage_service::bootstrap gossip: Dot not wait for gossip to settle if skip_wait_for_gossip_to_settle is zero	2021-12-16 13:44:42 +02:00
Pavel Emelyanov	b2a62d2b59	Merge 'db: range_tombstone_list: Deoverlap empty range tombstones' from Tomasz Grabiec Appending an empty range adjacent to an existing range tombstone would not deoverlap (by dropping the empty range tombstone) resulting in different (non canoncial) result depending on the order of appending. Suppose that range tombstone [a, b] covers range tombstone [x, x), and [a, x) and [x, b) are range tombstones which correspond to [a, b] split around position x. Appending [a, x) then [x, b) then [x, x) would give [a, b) Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b) The fix is to drop empty range tombstones in range_tombstone_list so that the result is canonical. Fixes #9661 Closes #9764 * github.com:scylladb/scylla: range_tombstone_list: Deoverlap adjacent empty ranges range_tombstone_list: Convert to work in terms of position_in_partition	2021-12-16 10:00:40 +03:00
Avi Kivity	c40043b142	mm: remove stats on schema version get	2021-12-15 18:56:18 +02:00
Nadav Har'El	d323b82cf6	Merge 'Introduce data_dictionary module' from Avi Kivity The full user-defined structure of the database (keyspaces, tables, user-defined types, and similar metadata, often known as the schema in other databases) is needed by much of the front-end code. But in Scylla it is deeply intertwined with the replica data management code - ::database, ::keyspace, and ::table. Not only does the front-end not need data access, it cannot get correct data via these objects since they represent just one replica out of many. This dual-role is a frequent cause of recompilations. It was solved to some degree by forward declarations, but there is still a lot of incidental dependencies. To solve this, we introduce a data_dictionary module (and namespace) to exclusively deal with greater schema metadata. It is an interface, with a backing implementation by the existing code, so it doesn't add a new source of truth. The plan is to allow mock implementations for testing as well. Test: unit (dev, release, debug). Closes #9783 * github.com:scylladb/scylla: cql3, related: switch to data_dictionary test: cql_test_env: provide access to data_dictionary storage_proxy: provide access to data_dictionary database: implement data_dictionary interface data_dictionary: add database/keyspace/table objects data_dictionary: move keyspace_metadata to data_dictionary data_dictionary: move user_types_metadata to new module data_dictionary	2021-12-15 18:29:28 +02:00
Avi Kivity	87917d2536	Merge "gms: gossiper: coroutinize a few small functions" from Pavel S " Start converting small functions in gossiper code from using `seastar::thread` context to coroutines. For now, the changes are quite trivial. Later, larger code fragments will be converted to eliminate uses of `seastar::async` function calls. Moving the code to coroutines makes the code a bit more readable and also mmediately evident that a given function is async just looking at the signature (for example, for void-returning functions, a coroutine will return `future<>` instead of `void` in case of a seastar::thread-using function). Tests: unit(dev) " * 'coro_gossip_v1' of https://github.com/ManManson/scylla: gms: gossiper: coroutinize `maybe_enable_features` gms: gossiper: coroutinize `wait_alive` gms: gossiper: coroutinize `add_saved_endpoint` gms: gossiper: coroutinize `evict_from_membership`	2021-12-15 16:02:18 +02:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00
Avi Kivity	399e2895f1	test: cql_test_env: provide access to data_dictionary Allow tests to have access to the data_dictionary.	2021-12-15 13:54:18 +02:00
Avi Kivity	c2da20484d	storage_proxy: provide access to data_dictionary Probably storage_proxy is not the correct place to supply data_dictionary, but it is available to practically all of the coordinator code, so it is convenient.	2021-12-15 13:54:08 +02:00
Avi Kivity	1de0a4b823	database: implement data_dictionary interface Implement the new data_dictionary interface using the existing ::database, ::keyspace, and ::table classes. The implementation is straightforward. This will allow the coordinator code to access the full schema without depending on the gnarly bits that compose ::database, like reader_concurrency_semaphore or the backlog controller.	2021-12-15 13:53:46 +02:00

1 2 3 4 5 ...

29508 Commits