scylladb

Author	SHA1	Message	Date
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Paweł Dziepak	a014367c5b	mutation_fragment_merger: use small_vector	2018-12-06 14:21:04 +00:00
Botond Dénes	ee193f1ab4	multishard_combining_reader: pause readers after reading ahead Readers created or resumed just to read ahead should be paused right after, to avoid consuming all available permits on the shards they operate on, causing a deadlock.	2018-12-06 13:20:30 +02:00
Botond Dénes	170fa382fa	multishard_combining_reader: pause all EOS'd readers Previously the last shard reader to reach EOS wasn't paused. This is a mistake and can contribute to causing deadlocks when the number of concurrently active readers on any shard is limited.	2018-12-06 10:30:43 +02:00
Paweł Dziepak	402902ac78	mutation_fragment_merger: value-initialise iterator ForwardIterators are default constructible, but they have to be value-initialised to compare equal to other value-initialised instances of that iterator.	2018-12-05 20:07:29 +00:00
Botond Dénes	22b14d593b	multishard_combining_reader: use pause-resume API Refactor the multishard combining reader to make use of the new pause-resume API to pause inactive shard readers. Make the pause-resume API mandatory to implement, as by now all existing clients have adapted it.	2018-12-04 08:51:05 +02:00
Botond Dénes	9601d23e0d	foreign_reader: add pause-resume API Allowing for pausing the reader and later resume it. Pausing the reader waits on the ongoing read ahead (if any), executes any pending `next_partition()` calls and than detaches the shard reader's buffer. The paused shard reader is returned to the client. Resuming the reader consists of getting the previously detached reader back, or one that has the same position as the old reader had. This API allows for making the inactive shard readers of the `multishard_combining_reader` evictable. The API is private, it's only accessible for classes knowing the full definition of the `foreign_reader` (which resides in a .cc file).	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	301abaca07	multishard_combining_reader: drop unnecessary `reader_promise` member The `reader_promise` member of the `shard_reader` was used to synchronize a foreground request to create the underlying reader with an ongoing background request with the same goal. This is however unnecessary. The underlying reader is created in the background only as part of a read ahead. In this case there is no need for extra synchronization point, the foreground reader create request can just wait for the read ahead to finish, for which there already exists a mean. Furthermore, foreground reader create requests are always followed by a `fill_buffer()` request, so by waiting on the read ahead we ensure that the following `fill_buffer()` call will not block.	2018-12-04 08:51:05 +02:00
Botond Dénes	a73175fdbc	multishard_combining_reader: drop tracking of pending next_partition calls Shard readers used to track pending `next_partition()` calls that they couldn't execute, because their underlying reader wasn't created yet. These pending calls were then executed after the reader was created. However the only situation where a shard reader can receive a `next_partition()` call, before its underlying reader wasn't created is when `next_partition()` is called on the multishard reader before a single fragment is read. In this case we know we are at a partition boundary and thus this call has no effect, therefore it is safe to ignore it.	2018-12-04 08:51:05 +02:00
Botond Dénes	ab3e639c3b	foreign_reader: use bool for pending_next_partition Foreign reader doesn't execute `next_partition()` calls straight away, when this would require interaction with the remote reader. Instead these calls are "remembered" and executed on the next occasion the foreign reader has to interact with the remote reader. This was implemented with a counter that counts the number of pending `next_partition()` calls. However when `next_partition()` is called multiple times, without interleaving calls to `operator()()` or `fast_forward_to()`, only the first such call has effect. Thus it doesn't make sense to count these calls, it is enough to just set a flag if there was at least one such call.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	0cb7c43fb5	reader_concurrency_semaphore: add dedicated .cc file As we are about to extend the functionality of the reader concurrency semaphore, adding more method implementations that need to go to a .cc file, it's time we create a dedicated file, instead of keep shoving them into unrelated .cc files (mutation_reader.cc).	2018-12-03 13:37:02 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
George Kollias	c2343dc841	Make restricting reader fill_buffer more efficient Currently, restricting_mutation_reader::fill_buffer justs reads lower-layer reader's fragments one by one without doing any further transformations. This change just swaps the parent-child buffers in a single step, as suggested in #3604, and, hence, removing any possible per-fragment overhead. I couldn't find any test that exercises restricting_mutation_reader as a mutation source, so I added test_restricted_reader_as_mutation_source in mutation_reader_test. Tests: unit (release), though these 4 tests are failing regardless of my changes (they fail on master for me as well): snitch_reset_test, sstable_mutation_test, sstable_test, sstable_3_x_test. Fixes: #3604 Signed-off-by: George Kollias <georgioskollias@gmail.com> Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>	2018-10-22 11:36:54 +03:00
Botond Dénes	dfad223ea2	multishard_mutation_reader: shard_reader: don't do concurrent read-aheads multishard_mutation_reader starts read-aheads on the shards-to-be-read-soon. When doing this it didn't check whether the respective shards had an ongoing read-ahead already. This lead to a single shard executing multiple concurrent read-aheads. This is damaging for multiple reasons: * Can lead to concurrent access of the remote reader's data members. * The `shard_reader` was designed around a single read-ahead and thus will synchronise foreground reads with only the last one. The practical implications of this seen so far was that queries reading a large number of rows (large enough to reliably trigger the bug) would stop the read early, due the `combined_mutation_reader`'s internal accounting being messed up by concurrent access. Also add a unit test. Instead of coming up with a very specific, and very contrived unit test, use the test-case that detected this bug in the first place: count(*) on a table with lots of rows (>1000). This unit-test should serve well for detecting any similar bugs in the future. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <ff1c49be64e2fb443f9aa8c5c8d235e682442248.1536746388.git.bdenes@scylladb.com>	2018-09-12 11:43:18 +01:00
Botond Dénes	6a07b8ae83	multishard_mutation_reader: update shard_reader's comment The `adandoned` member was renamed to `stopped`. Update the comment accordingly. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1d655785f28fe1e5fa041f2f49852f0ad88be53e.1536743950.git.bdenes@scylladb.com>	2018-09-12 11:32:08 +02:00
Botond Dénes	49704755b0	combined_mutation_reader: propagate timeout in fill_buffer() All user reads go through the combined reader. Not propagating the timeout down from there means that the storage layer's timeout functionality is effectively disabled. Spotted while reading the code. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <7fc10eca1c231dd04ac433913d9e6a51b6b17139.1536657041.git.bdenes@scylladb.com>	2018-09-11 15:44:28 +02:00
Botond Dénes	a011a9ebf2	mutation_reader: multishard_combining_reader: support custom dismantler Add a dismantler functor parameter. When the multishard reader is destroyed this functor will be called for each shard reader, passing a future to a `stopped_foreign_reader`. This future becomes available when the shard reader is stopped, that is, when it finished all in-progress read-aheads and/or pending next partition calls. The intended use case for the dismantler functor is a client that needs to be notified when readers are destroyed and/or has to have access to any unconsumed fragments from the foreign readers wrapping the shard readers.	2018-09-03 10:31:44 +03:00
Botond Dénes	f13b878a94	mutation_reader: pass all standard reader params to `remote_reader_factory` Extend `remote_reader_factory` interface so that it accepts all standard mutation reader creation parameters. This allows factory lambdas to be truly stateless, not having to capture any standard parameters that is needed for creating the reader. Standard parameters are those accepted by `mutation_source::make_reader()`.	2018-09-03 10:31:44 +03:00
Botond Dénes	8915293257	multishard_combining_reader: fix incorrect comment	2018-09-03 10:31:44 +03:00
Botond Dénes	81a03db955	mutation_reader: reader_selector: use ring_position instead of token sstable_set::incremental selector was migrated to ring position, follow suit and migrate the reader_selector to use ring_position as well. Above correctness this also improves efficiency in case of dense tables, avoiding prematurely selecting sstables that share the token but start at different keys, altough one could argue that this is a niche case.	2018-07-04 17:42:37 +03:00
Botond Dénes	a8e795a16e	sstables_set::incremental_selector: use ring_position instead of token Currently `sstable_set::incremental_selector` works in terms of tokens. Sstables can be selected with tokens and internally the token-space is partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as well. This is problematic for severeal reasons. The sub-range sstables cover from the token-space is defined in terms of decorated keys. It is even possible that multiple sstables cover multiple non-overlapping sub-ranges of a single token. The current system is unable to model this and will at best result in selecting unnecessary sstables. The usage of token for providing the next position where the intersecting sstables change [1] causes further problems. Attempting to walk over the token-space by repeatedly calling `select()` with the `next_position` returned from the previous call will quite possibly lead to an infinite loop as a token cannot express inclusiveness/exclusiveness and thus the incremental selector will not be able to make progress when the upper and lower bounds of two neighbouring intervals share the same token with different inclusiveness e.g. [t1, t2](t2, t3]. To solve these problems update incremental_selector to work in terms of ring position. This makes it possible to partition the token-space amoing sstables at decorated key granularity. It also makes it possible for select() to return a next_position that is guaranteed to make progress. partitioned_sstable_set now builds the internal interval map using the decorated key of the sstables, not just the tokens. incremental_selector::select() now uses `dht::ring_position_view` as both the selector and the next_position. ring_position_view can express positions between keys so it can also include information about inclusiveness/exclusiveness of the next interval guaranteeing forward progress. [1] `sstable_set::incremental_selector::selection::next_position`	2018-07-04 17:42:33 +03:00
Botond Dénes	78ecf2740a	mutation_reader_merger::maybe_add_readers(): remove early return It's unnecessary (doesn't prevent anything). The code without it expresses intent better (and is shorter by two lines).	2018-07-02 11:41:09 +03:00
Botond Dénes	d26b35b058	mutation_reader_merger: get rid of _key `_key` is only used in a single place and this does not warrant storing it in a member. Also get rid of current_position() which was used to query `_key`.	2018-07-02 11:40:43 +03:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Tomasz Grabiec	bb96518cc5	mutation_reader: Make empty mutation source advertize no partitions So that perf_row_cache_update will always populate cache.	2018-05-30 12:18:56 +02:00
Avi Kivity	7d29addb1f	mutation_reader: optimize make_combined_reader for the single-reader case If we're given a single reader (can be common in a low-write-rate table, where most of the data will be in a single large sstable, or in leveled tables) then we can avoid the overhead of the combining reader by returning the single input. Tests: unit (release) Message-Id: <20180513130333.15424-1-avi@scylladb.com>	2018-05-13 20:07:10 +02:00
Botond Dénes	7a3eab90c8	multishard_combining_reader: use optimized optional for the shard reader Use flat_mutation_reader_opt instead of std::optional<flat_mutation_reader>.	2018-05-10 13:06:47 +03:00
Botond Dénes	04643fb223	multishard_combining_reader: prepare for read-ahead otliving the reader When the multishard reader is destroyed there might be severeal pending read-aheads running in the background. These read-aheads need their associated reader to stay alive until after the read-ahead completes. To solve this move the flat_mutation_reader into a struct and manage this struct's lifetime through a shared pointer. Fibers associated with read-aheads that might outlive the multishard reader will hold on to a copy of the shard pointer keeping the underlying reader alive until they complete. To avoid doing any extra work a flag is added to this state which is set when the multishard reader is destroyed. When this flag is set, pending continuations will return early. All this is encapsulated in multishard_combining_reader::shard_reader the multishard reader code itself need not be changed.	2018-04-30 17:16:21 +03:00
Botond Dénes	a05d398be7	foreign_reader: prepare for read-ahead outliving the reader The foreign reader keeps track of ongoing read-aheads via a foreign_ptr to the read-ahead's future on the remote shard. This pointer is overwritten after each "remote call" to the remote reader with a pointer to the future of the new read-ahead's future. There are severeal problems with the current implementation: 1) There is a new read-ahead launched after each "remote call" unconditionally, even if the remote reader is at EOS. This will start unecessary read-ahead when the reader is already finished and may be soon destroyed (legally) by the client. 2) The pointer to the remote read-ahead future is not set to nullptr when a remote call is issued. Thus in the destructor, where we attach a continuation to the read-ahead's future to extend the reader's lifetime until after the read-ahead finishes, we migh attach a continuation to a future that already has one and run into a failed assert(). To fix this issues reset the read-ahead pointer to nullptr each time a remote call is issued and don't start a new read-ahead if the remote reader is at EOS. This way we can ensure that when the reader is destroyed we either have a valid and non-stale read-aead future or none at all and can reliably make a decision about whether we need to extend the lifetime of the remote reader or not.	2018-04-30 14:34:43 +03:00
Botond Dénes	704d3d8421	multishard_combining_reader: avoid creating the shard reader twice The multishard reader creates its shard readers on demand when they are first attempted to be used. However at this time the reader migh already be in the progress of being created, initiated by a previous read-ahead. To avoid creating the shard reader twice, before creating the reader check whether there are any read-aheads in progress. If there is, it already created (is creating or will create) the reader and hence synchronise with the read ahead. Synchronisation happens via a promise, the read ahead creates a promise which will be fulfilled when the reader is created. A concurrent create_reader() call will wait on this promise instead of attempting to create a new reader.	2018-04-30 14:34:43 +03:00
Botond Dénes	f9464cfcd7	multishard_combining_reader: read_ahead: don't assume reader is created Currently it is assumed that when read_ahead is called the reader is already created. Under most circumstances this will not be true. It was blind (bad) luck that we didn't hit this before (during testing).	2018-04-30 14:34:43 +03:00
Botond Dénes	d9fceb398a	multishard_combining_reader: move read-ahead related methods To the group of methods that do not assume the reader is already created. A patch will follow that will update read_ahead() to not assume that the reader is created.	2018-04-30 14:34:43 +03:00
Botond Dénes	5dcfaa68f6	multishard_combining_reader: avoid looking up the shard reader twice	2018-04-30 14:34:43 +03:00
Botond Dénes	79504a7d28	multishard_combining_reader: use optional for maybe created reader After a little "research" [1] it turns out my initial fears were completely without ground, std::optional::operator->() and std::optional::opterator() doesn't involve an unnecessary branch and thus there is no need to hand-roll an optional with a separate bool. [1] http://en.cppreference.com/w/cpp/utility/optional/operator	2018-04-30 14:34:37 +03:00
Botond Dénes	07fb2e9c4d	make_foreign_reader: don't wrap local readers If the to-be-wrapped reader is local (lives on the same shard where make_foreign_reader() is called) there is no need to wrap it with foreign_reader. Just return it as is. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <886ed883b707f163603a40b56b8823f2bb6c47c6.1523873224.git.bdenes@scylladb.com>	2018-04-16 15:11:20 +03:00
Botond Dénes	3a6f397fd0	Add multishard_combined_reader Takes care of reading a range from all shards that own a subrange in the range. The read happens sequentially, reading from one shard at a time. Under the scenes it uses combined_mutation_reader and foreign_reader, the former providing the merging logic and the latter taking care of transferring the output of the remote readers to the local shard. Readers are created on-demand by a reader-selector implementation that creates readers for yet unvisited shards as the read progresses. The read starts with a concurrency of one, that is the reader reads from a single shard at a time. The concurrency is exponentially increased (to a maximum of the number of shards) when a reader's buffer is empty after moving the next shard. This condition is important as we only wan't to increase concurrency for sparse tables that have little data and the reader has to move between shards often. When concurrency is > 1, the reader issues background read-aheads to the next shards so that by the time it needs to move to them they have the data ready. For dense tables (where we rarely cross shards) we rely on the foreign_reader to issue sufficient read-aheads on its own to avoid blocking.	2018-04-11 10:03:47 +03:00
Botond Dénes	2c0f8d0586	Add foreign_reader Local representant of a reader located on a remote shard. Manages the lifecycle and takes care of seamlessly transferring fragments produced by the remote reader. Fragments are copied between the shards in batches, a bufferful at a time. To maximize throughput read-ahead is used. After each fill_buffer() or fast_forward_to() a read-ahead (a fill_buffer() on the remote reader) is issued. This read-ahead runs in the background and is brough back to foreground on the next fill_buffer() or fast_forward_to() call.	2018-04-11 09:22:45 +03:00
Botond Dénes	f488ae3917	Add buffer_size() to flat_mutation_reader buffer_size() exposes the collective size of the external memory consumed by the mutattion-fragments in the flat reader's buffer. This provides a basis to build basic memory accounting on. Altought this is not the entire memory consumption of any given reader it is the most volatile component and usually by far the largest one too.	2018-03-13 10:34:34 +02:00
Botond Dénes	212b2dabc4	Resource-based cache eviction Readers serving user-reads need to obtain a permit to start reading. There exists a restriction on how much active readers can be admitted based on their count and their memory onsumption. Since the saved readers of cached queriers are techically active (they hold a permit) they can block new readers from obtaining a permit. New readers have a higher priority because a cached reader might be abandoned or used later at best so in the face of memory pressure we evict cached readers to free up permits for new readers. Cached queriers are evicted in LRU order as the oldest queriers are the most likely to be evicted based on their TTL anyway.	2018-03-13 10:34:34 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Botond Dénes	dfa04c3fea	Add reader_concurrency_semaphore This semaphore implements the new dual, count and memory based active reader limiting. As purely memory-based limiting proved to cause problems on big boxes admitting a large number of readers (more than any disk could handle) the previous count-based limit is reintroduced in addition to the existing memory-based limit. When creating new readers first the count-based limit is checked. If that clears the memory limit is checked before admitting the reader. reader_conccurency_semaphore wraps the two semaphores that implement these limits and enforces the correct order of limit checking. This class also completely replaces the restricted_reader_config struct, it encapsulates all data and related functinality of the latter, making client code simpler.	2018-03-08 14:12:12 +02:00
Botond Dénes	d5bb8a47fc	mv reader_resource_tracker.hh -> reader_concurrency_semaphore.hh In preparation to reader_concurrency_semaphore being added to the file. The reader_resource_tracker is really only a helper class for reader_concurrency_semaphore so the latter is better suited to provide the name of the file.	2018-03-08 10:29:16 +02:00
Botond Dénes	206e7d40d4	restricted_mutation_reader: switch to std::variant Tests: unit-tests(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a8930b764171db131d9d8d5fe4035014ecb452f4.1519391304.git.bdenes@scylladb.com>	2018-02-25 14:35:57 +02:00
Piotr Jastrzebski	37285ad7fa	Delete unused make_reader_returning Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	864db78fcf	Delete unused make_reader_returning_many Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	ff4ffc1c64	Delete unused make_empty_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00

1 2 3

136 Commits