scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Benny Halevy	5f513ed28b	view_builder: consumer: flush_fragments: close reader on error Make sure to close the reader created by flush_fragments if an exception occurs before it's moved to `populate_views`. Note that it is also ok to close the reader _after_ it has been moved, in case populate_views itself throws after closing the reader that was moved it. For conveience flat_mutation_reader::close supports close-after-move. Fixes #9479 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211024164138.1100304-1-bhalevy@scylladb.com>	2021-10-24 19:53:31 +03:00
Benny Halevy	cddd16f22d	db: view: use effective_replication_map to get_natural_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:55:50 +03:00
Pavel Emelyanov	c504361c15	view_builder: Accept view_build_statuses The code itself is already in relevant .cc file, not move it to the relevant class. The only significant change is where to get token metadata from. In its old location tokens were provided by the storage service itself, now when it's in the view builder there's no "native" place to get them from, however the rest of the view building code gets tokens from global storage proxy, so do the same here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-10-11 11:11:40 +03:00
Pavel Emelyanov	3b6e8c7d93	storage_service: Move view_build_statuses code This code belongs to view builder, so put it into its .cc. No changes, just move. This needs some ugly namespace breakage, but they will be patched away with the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-10-11 11:11:29 +03:00
Avi Kivity	369afe3124	treewide: use coroutine::maybe_yield() instead of co_await make_ready_future() The dedicated API shows the intent, and may be a tiny bit faster. Closes #9382	2021-09-23 12:28:56 +02:00
Pavel Emelyanov	598841a5dd	code: Expell gossiper.hh from other headers This needs to add forward declarations of the gossiper class and re-include some other headers here and there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	0de69136d4	view_update_generator: Register staging sstables in constructor First, it's to fix the discarded future during the register. The future is not actually such, as it's always the no-op ready one as at that stage the view_update_generator is neither aborted nor is in throttling state. Second, this change is to keep database start-up code in main shorter and cleaner. Registering staging sstables belongs to the view_update_generator start code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-15 17:49:06 +03:00
Avi Kivity	115d6d8d4c	system_keyspace: prepare forward-declared members In anticipation of making system_keyspace a class instead of a namespace, rename any member that is currently forward-declared, since one can't forward-declare a class member. Each member is taken out of the system_keyspace namespace and gains a system_keyspace prefix. Aliases are added to reduce code churn. The result isn't lovely, but can be adjusted later.	2021-09-13 15:11:26 +03:00
Avi Kivity	c1028de22a	Merge 'Introduce native reversed format' from Botond Dénes We define the native reverse format as a reversed mutation fragment stream that is identical to one that would be emitted by a table with the same schema but with reversed clustering order. The main difference to the current format is how range tombstones are handled: instead of looking at their start or end bound depending on the order, we always use them as-usual and the reversing reader swaps their bounds to facilitate this. This allows us to treat reversed streams completely transparently: just pass along them a reversed schema and all the reader, compacting and result building code is happily ignorant about the fact that it is a reversed stream. This series is the first step towards implementing efficient reverse reads. It allows us to remove all the special casing we have in various places for reverse reads and thus treating reverse streams transparently in all the middle layers. The only layers that have to know about the actual reversing are mutation sources proper. The plan is that when reading in reverse we create a reversed schema in the top layer then pass this down as the schema for the read. There are two layers that will need to act on this reversed schema: * The layer sitting on top of the first layer which still can't handle reversed streams, this layer will create a reversed reader to handle the transition. * The mutation source proper: which will obtain the underlying schema and will emit the data in reverse order. Once all the mutation sources are able to handle reverse reads, we can get rid of the reverse reader entirely. Refs: #1413 Tests: unit(dev) TODO: * v2 * more testing Also on: https://github.com/denesb/scylla.git reverse-reads/v3 Changelog v3: * Drop the entire schema transformation mechanism; * Drop reversing from `schema_builder()`; * Don't keep any information about whether the schema is reversed or not in the schema itself, instead make reversing deterministic w.r.t. schema version, such that: `s.version() == s.make_reversed().make_reversed().version()`; * Re-reverse range tombstones in `streaming_mutation_freezer`, so `reconcilable_results` sent to the coordinator during read repair still use the old reverse format; v2: * Add `data_type reversed(data_type)`; * Add `bound_kind reverse_kind(bound_kind)`; * Make new API safer to use: - `schema::underlying_type()`: return this when unengaged; - `schema::make_transformed()`: noop when applying the same transformation again; * Generalize reversed into transformation. Add support to transferring to remote nodes and shards by way of making `schema_tables` aware of the transformation; * Use reverse schema everywhere in reverse reader; Closes #9184 * github.com:scylladb/scylla: range_tombstone_accumulator: drop _reversed flag test/boost/mutation_test: add test for mutation::consume() monotonicity test/boost/flat_mutation_reader_test: more reversed reader tests flat_mutation_reader: make_reversing_reader(): implement fast_forward_to(partition_range) flat_mutation_reader: make_reversing_reader(): take ownership of the reader test/lib/mutation_source_test: add consistent log to all methods mutation: introduce reverse() mutation_rebuilder: make it standalone mutation: make copy constructor compatible with mutation_opt treewide: switch to native reversed format for reverse reads mutation: consume(): add native reverse order mutation: consume(): don't include dummy rows query: add slice reversing functions partition_slice_builder: add range mutating methods partition_slice_builder: add constructor with slice query: specific_ranges: add non-const ranges accessor range_tombstone: add reverse() clustering_bounds_comparator: add reverse_kind() schema: introduce make_reversed() schema: add a transforming copy constructor utils: UUID_gen: introduce negate() types: add reversed(data_type) docs: design-notes: add reverse-reads.md	2021-09-09 15:50:22 +03:00
Botond Dénes	f02632aeb0	range_tombstone_accumulator: drop _reversed flag	2021-09-09 15:42:15 +03:00
Piotr Sarna	5d7c765422	db,view: split stopping view builder to drain+stop In order to be able to avoid a deadlock when CQL server cannot be started, the view builder shutdown procedure is now split to two parts - - drain and stop. Drain is performed before storage proxy shutdown, but stop() will be called even before drain is scheduled. The deadlock is as follows: - view builder creates a reader permit in order to be able to read from system tables - CQL server fails to start, shutdown procedure begins - view builder stop() is not called (because it was not scheduled yet), so it holds onto its reader permit - database shutdown procedure waits for all permits to be destroyed, and it hangs indefinitely because view builder keeps holding its permit.	2021-09-08 10:52:40 +02:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Benny Halevy	e9aff2426e	everywhere: make deferred actions noexcept Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:52 +03:00
Benny Halevy	4439e5c132	everywhere: cleanup defer.hh includes Get rid of unused includes of seastar/util/{defer,closeable}.hh and add a few that are missing from source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:39 +03:00
Piotr Sarna	58196e8ea6	db,view: avoid ignoring failed future in background view updates The code for handling background view updates used to propagate exceptions unconditionally, which leads to "exceptional future ignored" warnings if the update was put to background. From now on, the exception is only propagated if its future is actually waited on. Fixes #6187 Tested manually, the warning was not observed after the patch Closes #9179	2021-08-12 17:32:35 +03:00
Pavel Emelyanov	689a4c1e54	view: Use proxy to get token metadata from The mutate_MV() call needs token metadata and it gets them from global storage service. Fixing it not to use globals is a huge refactoring, so for now just get the tokens from global storage proxy. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-29 05:12:36 +03:00
Nadav Har'El	5183e0cbe9	Merge 'Fix artificial view update size limit' from Piotr Sarna The series which split the view update process into smaller parts accidentally put an artificial 10MB limit on the generated mutation size, which is wrong - this limit is configurable for users, and, what's more important, this data was already validated when it was inserted into the base table. Thus, the limit is lifted. The series comes with a cql-pytest which failed before the fix and succeeds now. This bug is also covered by `wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view` dtest, but it needs over a minute to run, as opposed to cql-pytest's <1 second. Fixes #9047 Tests: unit(release), dtest(wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view) Closes #9048 * github.com:scylladb/scylla: cql-pytest: add a materialized views suite with first cases db,view: drop the artificial limit on view update mutation size	2021-07-15 17:03:07 +03:00
Piotr Sarna	697e2fc66d	db,view: drop the artificial limit on view update mutation size The series which split the view update process into smaller parts accidentally put an artificial 10MB limit on the generated mutation size, which is wrong - this limit is configurable for users, and, what's more important, this data was already validated when it was inserted into the base table. Thus, the limit is lifted. Tests: unit(release), dtest(wide_rows_test)	2021-07-15 14:09:37 +02:00
Botond Dénes	1b7eea0f52	reader_concurrency_semaphore: admission: flip the switch This patch flips two "switches": 1) It switches admission to be up-front. 2) It changes the admission algorithm. (1) by now all permits are obtained up-front, so this patch just yanks out the restricted reader from all reader stacks and simultaneously switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By doing this admission is now waited on when creating the permit. (2) we switch to an admission algorithm that adds a new aspect to the existing resource availability: the number of used/blocked reads. Namely it only admits new reads if in addition to the necessary amount of resources being available, all currently used readers are blocked. In other words we only admit new reads if all currently admitted reads requires something other than CPU to progress. They are either waiting on I/O, a remote shard, or attention from their consumers (not used currently). We flip these two switches at the same time because up-front admission means cache reads now need to obtain a permit too. For cache reads the optimal concurrency is 1. Anything above that just increases latency (without increasing throughput). So we want to make sure that if a cache reader hits it doesn't get any competition for CPU and it can run to completion. We admit new reads only if the read misses and has to go to disk. Another change made to accommodate this switch is the replacement of the replica side read execution stages which the reader concurrency semaphore as an execution stage. This replacement is needed because with the introduction of up-front admission, reads are not independent of each other any-more. One read executed can influence whether later reads executed will be admitted or not, and execution stages require independent operations to work well. By moving the execution stage into the semaphore, we have an execution stage which is in control of both admission and running the operations in batches, avoiding the bad interaction between the two.	2021-07-14 17:19:02 +03:00
Botond Dénes	7bfa40a2f1	treewide: use make_tracking_only_permit() For all those reads that don't (won't or can't) pass through admission currently.	2021-07-14 17:19:02 +03:00
Botond Dénes	f28b5018f2	view/view_update_generator: use obtain_reader_permit()	2021-07-14 16:48:43 +03:00
Piotr Sarna	a1813c9b34	db,view,table: drop unneeded time point parameter Now that restriction checking is translated to the partition-slice-style interface, checking the partition/clustering key restrictions for views can be performed without the time point parameter. The parameter is dropped from all relevant call sites.	2021-07-13 10:40:08 +02:00
Piotr Sarna	37fc3f4b5b	db,view: migrate key checks from the deprecated is_satisfied_by Last two users of the mutation-based is_satisfied_by function were in the partition/clustering key checks. These functions are now translated to use the original API.	2021-07-13 10:40:07 +02:00
Piotr Sarna	d6b0a8338a	db,view: migrate checking view filter to new is_satisfied_by In order to unify the interfaces, the is_satisfied_by flavor for mutations is getting deprecated. In order to be able to remove it, one of its biggest users, the matches_view_filter() function, is translated to the other variant.	2021-07-13 10:04:03 +02:00
Piotr Sarna	786db7e9a8	db,view: add a helper result builder class In order to migrate from mutation-based restriction checks, code in view.cc needs to have a way of translating results to partition-slice-based representation. A slightly simplified builder from multishard_mutation_query.cc is injected into the view code.	2021-07-13 10:04:03 +02:00
Piotr Sarna	32d87837b1	db,view: move make_partition_slice helper function up No functional changes, it will be needed for a future patch.	2021-07-13 10:04:02 +02:00
Avi Kivity	9059514335	build, treewide: enable -Wpessimizing-move warning This warning prevents using std::move() where it can hurt - on an unnamed temporary or a named automatic variable being returned from a function. In both cases the value could be constructed directly in its final destination, but std::move() prevents it. Fix the handful of cases (all trivial), and enable the warning. Closes #8992	2021-07-08 17:52:34 +03:00
Piotr Sarna	bf0777e97a	view: generate view updates in smaller parts In order to avoid large allocations and too large mutations generated from large view updates, granularity of the process is broken down from per-partition to smaller chunks. The view update builder now produces partial updates, no more than 100 view rows at a time.	2021-07-08 11:17:27 +02:00
Piotr Sarna	679dc4d824	db,view: move view_update_builder to the header The builder is going to be used directly by the callers, which requires making its definition public. No semantic changes were intended.	2021-07-08 11:17:27 +02:00
Pavel Emelyanov	64bb16af8a	view_update_generator: Remove unused struct sstable_with_table Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-18 20:19:35 +03:00
Piotr Sarna	1fb831c8c1	db,view: limit the number of simultaneous view update futures Previously the view update code generated a continuation for each view update and stored them all in a vector. In certain cases the number of updates can grow really large (to millions and beyond), so it's better to only store a limited amount of these futures at a time.	2021-06-17 10:20:52 +02:00
Piotr Sarna	a7f7716ecf	db,view: use chunked_vector for view updates The number of view updates can grow large, especially in corner cases like removing large base partitions. Chunked vector prevents large allocations.	2021-06-17 10:15:17 +02:00
Piotr Sarna	f832a30388	db,view,table: futurize calculating affected ranges In order to avoid stalls on large inputs, calculating affected ranges is now able to yield.	2021-06-16 09:51:31 +02:00
Piotr Sarna	3592d9b36e	db,view: use chunked vector for view affected ranges There were large allocation reportsa from vectors used for calculating affected ranges. In order to reduce the pressure on the allocator, chunked vector is used for storing intermediate results.	2021-06-15 10:30:27 +02:00
Piotr Sarna	8a049c9116	view: fix use-after-move when handling view update failures The code was susceptible to use-after-move if both local and remote updates were going to be sent. The whole routine for sending view updates is now rewritten to avoid use-after-move. Refs #8830 Tests: unit(release), dtest(secondary_indexes_test.py:TestSecondaryIndexes.test_remove_node_during_index_build)	2021-06-14 09:36:10 +02:00
Piotr Sarna	7cdbb7951a	db,view: explicitly move the mutation to its helper function The `apply_to_remote_endpoints` helper function used to take its `mut` parameter by reference, but then moved the value from it, which is confusing and prone to errors. Since the value is moved-from, let's pass it to the helper function as rvalue ref explicitly.	2021-06-14 09:34:40 +02:00
Piotr Sarna	88d4a66e90	db,view: pass base token by value to mutate_MV The base token is passed cross-continuations, so the current way of passing it by const reference probably only works because the token copying is cheap enough to optimize the reference out. Fix by explicitly taking the token by value.	2021-06-14 09:30:38 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Emelyanov	1ce0682821	view: Get database from stprage_proxy The db::view code already uses proxy rather actively, so instead of depending on the storage service to be at hands it's better to make db::view require the proxy. For now -- via global instance. There's one dependency on storage service left after this patch -- to get the tokens. This piece is to be fixed later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-05-28 18:09:32 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Tomasz Grabiec	abe3d7d7d3	Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity storage_proxy uses std::vector<inet_address> for small lists of nodes - for replication (often 2-3 replicas per operation) and for pending operations (usually 0-1). These vectors require an allocation, sometimes more than one if reserve() is not used correctly. This series switches storage_proxy to use utils::small_vector instead, removing the allocations in the common case. Test results (perf_simple_query --smp 1 --task-quota-ms 10): ``` before: median 184810.98 tps ( 91.1 allocs/op, 20.1 tasks/op, 54564 insns/op) after: median 192125.99 tps ( 87.1 allocs/op, 20.1 tasks/op, 53673 insns/op) ``` 4 allocations and ~900 instructions are removed (the tps figure is also improved, but it is less reliable due to cpu frequency changes). The type change is unfortunately not contained in storage_proxy - the abstraction leaks to providers of replica sets and topology change vectors. This is sad but IMO the benefits make it worthwhile. I expect more such changes can be applied in storage_proxy, specifically std::unordered_set<gms::inet_address> and vectors of response handles. Closes #8592 * github.com:scylladb/scylla: storage_proxy, treewide: use utils::small_vector inet_address_vector:s storage_proxy, treewide: introduce names for vectors of inet_address utils: small_vector: add print operator for std::ostream hints: messages.hh: add missing #include	2021-05-06 18:00:54 +02:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Nadav Har'El	64a4e5e059	cross-tree: reduce dependency on db/config.hh and database.hh Every time db/config.hh is modified (e.g., to add a new configuration option), 110 source files need to be recompiled. Many of those 110 didn't really care about configuration options, and just got the dependency accidentally by including some other header file. In this patch, I remove the include of "db/config.hh" from all header files. It is only needed in source files - and header files only need forward declarations. In some cases, source files were missing certain includes which they got incidentally from db/config.hh, so I had to add these includes explicitly. After this patch, the number of source files that get recompiled after a change to db/config.hh goes down from 110 to 45. It also means that 65 source files now compile faster because they don't include db/config.hh and whatever it included. Additionally, this patch also eliminates a few unnecessary inclusions of database.hh in other header files, which can use a forward declaration or database_fwd.hh. Some of the source files including one of those header files relied on one of the many header files brought in by database.hh, so we need to include those explicitly. In view_update_generator.hh something interesting happened - it needs database.hh because of code in the header file, but only included database_fwd.hh, and the only reason this worked was that the files including view_update_generator.hh already happened to unnecessarily include database.hh. So we fix that too. Refs #1 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210505121830.964529-1-nyh@scylladb.com>	2021-05-05 15:24:25 +03:00
Nadav Har'El	58e275e362	cross-tree: reduce dependency on db/config.hh and database.hh Every time db/config.hh is modified (e.g., to add a new configuration option), 110 source files need to be recompiled. Many of those 110 didn't really care about configuration options, and just got the dependency accidentally by including some other header file. In this patch, I remove the include of "db/config.hh" from all header files. It is only needed in source files - and header files only need forward declarations. In some cases, source files were missing certain includes which they got incidentally from db/config.hh, so I had to add these includes explicitly. After this patch, the number of source files that get recompiled after a change to db/config.hh goes down from 110 to 45. It also means that 65 source files now compile faster because they don't include db/config.hh and whatever it included. Additionally, this patch also eliminates a few unnecessary inclusions of database.hh in other header files, which can use a forward declaration or database_fwd.hh. Some of the source files including one of those header files relied on one of the many header files brought in by database.hh, so we need to include those explicitly. In view_update_generator.hh something interesting happened - it needs database.hh because of code in the header file, but only included database_fwd.hh, and the only reason this worked was that the files including view_update_generator.hh already happened to unnecessarily include database.hh. So we fix that too. Refs #1 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210505102111.955470-1-nyh@scylladb.com>	2021-05-05 13:23:00 +03:00
Benny Halevy	dad6c94476	view_builder: stop: close all build_step readers Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	02d74e1530	view_update_generator: start: close staging_sstable_reader when done The staging_sstable_reader has to be closed before it's destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	1e1c8ea824	view: build_progress_virtual_reader: implement close method Close underlying reader. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	2d8b00f2d8	view: generate_view_updates: close builder readers when done Make sure to close the builder's _updates and optional _existings readers before they are destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00

1 2 3 4 5 ...

308 Commits