scylladb

Author	SHA1	Message	Date
Benny Halevy	4e3dcfd7d6	reader_concurrency_semaphore: use permit timeout for admission Now that the timeout is stored in the reader permit use it for admission rather than a timeout parameter. Note that evictable_reader::next_partition currently passes db::no_timeout to resume_or_create_reader, which propagated to maybe_wait_readmission, but it seems to be an oversight of the f_m_r api that doesn't pass a timeout to next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	eeab5f77d9	repair: row_level: read_mutation_fragment: set reader timeout The timeout needs to be propagated to the reader's permit. Reset it to db::no_timeout in repair_reader::pause(). Warn if set_timeout asks to change the timeout too far into the past (100ms). It is possible that it will be passed a past timeout from the rcp path, where the message timeout is applied (as duration) over the local lowres_clock time and parallel read_data messages that share the query may end up having close, but different timeout values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:40 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Benny Halevy	8674746fdd	flat_mutation_reader: detach_buffer: mark as noexcept Since detach_buffer is used before closing and destroying the reader, we want to mark it as noexcept to simply the caller error handling. Currently, although it does construct a new circular_buffer, none of the constructors used may throw. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210617114240.1294501-2-bhalevy@scylladb.com>	2021-07-25 12:02:27 +03:00
Botond Dénes	8fc55fa5bf	reader_concurrency_semaphore: get rid of struct permit_list struct permit_list exists so the intrusive list declaration which needs the definition of reader_permit can be hidden in the .cc. But it turns out that if the hook type is fully spelled out, the intrusive list declaration doesn't need T to be defined. Exploit this to get rid of this extra indirection. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210720073121.63027-2-bdenes@scylladb.com>	2021-07-20 10:35:12 +03:00
Botond Dénes	11b39cbc23	reader_concurrency_semaphore: merge permit_stats into stats If there was any reason to have them separate when permit_stats was conceived, it is gone now, so merge the two. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210720073121.63027-1-bdenes@scylladb.com>	2021-07-20 10:35:12 +03:00
Botond Dénes	27fbca84f6	reader_concurrency_semaphore: remove prethrow_action The semaphore accepts a functor as in its constructor which is run just before throwing on wait queue overload. This is used exclusively to bump a counter in the database::stats, which counts queue overloads. However, there is now an identical counter in reader_concurrency_semaphore::stats, so the database can just use that directly and we can retire the now unused prethrow action. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210716111105.237492-1-bdenes@scylladb.com>	2021-07-19 15:47:37 +03:00
Botond Dénes	b81f39cec9	reader_permit: add operator<< for reader_resources And use it in tests, it results in actually useful error messages.	2021-07-14 17:19:02 +03:00
Botond Dénes	1666ad078a	reader_concurrency_semaphore: add reads_{admitted,enqueued} stats Primarily for tests, but we could also export these, should we want to.	2021-07-14 17:19:02 +03:00
Botond Dénes	5b8d6f02eb	reader_permit: remove now unused wait_admission()	2021-07-14 17:19:02 +03:00
Botond Dénes	c86573813f	reader_concurrency_semaphore: remove now unused obtain_permit_nowait()	2021-07-14 17:19:02 +03:00
Botond Dénes	1b7eea0f52	reader_concurrency_semaphore: admission: flip the switch This patch flips two "switches": 1) It switches admission to be up-front. 2) It changes the admission algorithm. (1) by now all permits are obtained up-front, so this patch just yanks out the restricted reader from all reader stacks and simultaneously switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By doing this admission is now waited on when creating the permit. (2) we switch to an admission algorithm that adds a new aspect to the existing resource availability: the number of used/blocked reads. Namely it only admits new reads if in addition to the necessary amount of resources being available, all currently used readers are blocked. In other words we only admit new reads if all currently admitted reads requires something other than CPU to progress. They are either waiting on I/O, a remote shard, or attention from their consumers (not used currently). We flip these two switches at the same time because up-front admission means cache reads now need to obtain a permit too. For cache reads the optimal concurrency is 1. Anything above that just increases latency (without increasing throughput). So we want to make sure that if a cache reader hits it doesn't get any competition for CPU and it can run to completion. We admit new reads only if the read misses and has to go to disk. Another change made to accommodate this switch is the replacement of the replica side read execution stages which the reader concurrency semaphore as an execution stage. This replacement is needed because with the introduction of up-front admission, reads are not independent of each other any-more. One read executed can influence whether later reads executed will be admitted or not, and execution stages require independent operations to work well. By moving the execution stage into the semaphore, we have an execution stage which is in control of both admission and running the operations in batches, avoiding the bad interaction between the two.	2021-07-14 17:19:02 +03:00
Botond Dénes	00511100a4	reader_concurrency_semaphore: remove now unused make_permit()	2021-07-14 17:19:02 +03:00
Botond Dénes	af8f39a775	reader_concurrency_semaphore: make it an execution stage The execution stage functionality is exposed via two new member functions, `with_permit()` and `with_ready_permit()`. Both accept a function to be run. The former obtains a permit then runs the passed in function through the execution stage. The latter allows an already obtained permit to be passed in.	2021-07-14 16:48:43 +03:00
Botond Dénes	5d3ddba2c7	reader_concurrency_semaphore: make_permit(): add up-front admission variants Three new methods are added for creating permits: 1) obtain_permit() 2) obtain_permit_nowait() 3) make_tracking_only_permit() (1) is meant to replace `make_permit()` + `wait_admission()`, by integrating the waiting for admission into the process of creating the permit. This is the method meant to be used to create permits from here on, ensuring that each read passes admission before even being started. (2) is a bridge between the old and new world. Up-front admission cannot coexist with the restricted reader in the same read, so those reads that have a restricted reader in their stack can use this method to create a non-admitted permit to be admitted by the restricted reader later. Once we have migrated all reads to (1) or (2), we can get rid of the restricted reader and just replace (1) with (2) in the codebase. (2) returns a future to make this a simple rename, the churn of dealing with a future<reader_permit> return type already having been dealt with by then. (3) is for reads that bypass admission, yet their resource usage does participate in the admission of other reads. This is the equivalent of reads that don't pass admission at all. The following patches will gradually transition the codebase away from the old permit API, and once the transition is complete, we can switch over to do the admission up-front at once.	2021-07-14 16:48:43 +03:00
Botond Dénes	844a99a91a	reader_concurrency_semaphore: prepare for up-front admission We want to make permits be admitted up-front, before even being created. As part of this change, we will get rid of the `wait_admission()` method on the permit, instead, the permit will be created as a result of waiting for admission (just like back some time ago). To allow evicted readers to wait for re-admission, a new method `maybe_wait_readmission()` is created, which waits for readmission if the permit is in evicted state. Also refactor the internals of the semaphore to support and favor up-front admission code. As up-front admission is the future we want the permit code to be organized in such a way that it is natural to use with it. This means that the "old-style" admission code might suffer but we tolerate this as it is on its way out. To this end the following changes were done: * Add a _base_resources field to reader_permit which tracks the base cost of said permit. This is passed in the constructor and is used in the first and subsequent admissions. * The base cost is now managed internally by the permit, instead of relying on an external `resource_units` instance, though the old way is still supported temporarily. * Change the admission pipeline to favor the new permit-internally managed base cost variant. * Compatibility with old-style admission: permits are created with 0 base resources, base resources are set with the compatibility method `set_base_resources()` right before admission, then externalized again after admission with `base_resource_as_resource_units()`. These methods will be gone when the old style admission is retired (together with `wait_admission()`).	2021-07-14 16:48:43 +03:00
Botond Dénes	05e6881c73	reader_permit: allow constructing reader_permit from impl& By enabling shared from this for impl and adding a reader permit constructor which takes a shared pointer to an impl. This allows impl members to invoke functions requiring a `reader_permit` instance as a parameter.	2021-07-14 16:48:43 +03:00
Botond Dénes	aa480fa3f9	reader_permit: allow marking blocked Distinguish between permits that are blocked and those that are not. Conceptually a blocked permit is one that needs to wait on either I/O or a remote shard to proceed. This information will be used by admission, which will only admit new reads when all currently used ones are blocked. More on that in the commit introducing this new admission type. This patch only adds the infrastructure, block sites are not marked yet.	2021-07-14 16:48:43 +03:00
Botond Dénes	a5dc48b4b1	reader_permit: allow marking it as used Distinguish between permits that are used and those that are not. These are two subtypes of the current 'active' state (and replace it). Conceptually a permit is used when any readers associated with it have a pending call to any of their async methods, i.e. the consumer is actively consuming from them. This information will be used for admission, together with a new blocked state introduced by a future patch. This patch only adds the infrastructure, use sites are not marked yet.	2021-07-14 16:48:43 +03:00
Botond Dénes	a251cc2368	reader_permit: introduce evicted state We want to introduce more fine-grained states for permits than what we have currently, splitting the current 'active' state into multiple sub-states. As a preparatory step, introduce an evicted state too, to keep track of permits that were evicted while being inactive. This will be important in determining what permits need to re-wait admission, once we keep permits across pages. Having an evicted state also aids validating internal state transitions.	2021-07-14 16:48:43 +03:00
Botond Dénes	5416fc6d1b	reader_concurrency_semaphore: add current_permits to permit_stats	2021-07-14 16:48:43 +03:00
Botond Dénes	c97fc16105	reader_concurrency_semaphore: extract waiter admission into separate function Because soon we will have more than one place to trigger waiter admission from.	2021-07-14 16:48:43 +03:00
Botond Dénes	f8004c652b	reader_concurrency_semaphore: relax _stopped check when destroying a used semaphore Further relax the conditions under which we abort on destroying a unstopped semaphore. We already allow destroying completely unused semaphores, this patch further relaxes this to allow destroying formerly used but presently not used semaphores without stopping. We still call `on_internal_error_noexcept()` even if destroying the semaphore is safe, because without calling `stop()`, destroying the semaphore depends on luck, which we shouldn't rely on.	2021-07-12 15:53:00 +03:00
Botond Dénes	750b20fd85	reader_concurrency_semaphore: allow destroying without stop() when not used yet To make it easier to construct objects with semaphore members. When the construction of such object fails, they can now just destroy their semaphore member as usual, without having to employ tricks to make sure its is stopped before.	2021-07-12 15:53:00 +03:00
Botond Dénes	03959a332b	reader_concurrency_semaphore: add permit-stats Which stores permit related stats. For now only total number of permits is maintained which is useful to determine whether the semaphore was used already or not.	2021-07-12 15:53:00 +03:00
Avi Kivity	9059514335	build, treewide: enable -Wpessimizing-move warning This warning prevents using std::move() where it can hurt - on an unnamed temporary or a named automatic variable being returned from a function. In both cases the value could be constructed directly in its final destination, but std::move() prevents it. Fix the handful of cases (all trivial), and enable the warning. Closes #8992	2021-07-08 17:52:34 +03:00
Botond Dénes	42bd5c980f	reader_concurrency_semaphore: assert(_stopped) in the destructor Now that there are no more global semaphore which are impossible to stop properly we can resolve the related FIXME and arm the assert in the semaphore destructor. We can also remove all the other cleanup code from the destructor as they are taken care of by stop(), which we now assert to have been run.	2021-07-08 16:53:38 +03:00
Botond Dénes	09309f5dbf	reader_concurrency_semaphore: on_permit_created(): remove noexcept The permit creation path enters the semaphore's permit gate in on_permit_created(). Entering this gate can throw so this method is not noexcept. Remove the noexcept specifier accordingly. Also enter the gate before adding the permit to the permit list, to save some work when this fails. Fixes: #8933 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210628074941.32878-1-bdenes@scylladb.com>	2021-06-28 11:04:38 +03:00
Botond Dénes	578a092e4a	reader_concurrency_semaphore: wait for all permits to be destroyed in stop() To prevent use-after-free resulting from any permit out-living the semaphore.	2021-06-16 11:29:36 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Botond Dénes	a6166671ef	reader_concurrency_semaphore: dump_reader_diagnostics(): print more information in the header Provide a quick summary in the first line of the printout, about the available/initial resources, number of queued reads and number of inactive reads.	2021-05-10 10:15:47 +03:00
Botond Dénes	0a908a47d6	reader_concurrency_semaphore: dump_reader_diagnostics(): cap number of printed lines This report is logged, so we don't want huge printouts, cap the table at 20 lines, and print only a summary for the rest. For manual dumps, allow the limit to be set to a custom value, including no limit at all.	2021-05-10 10:15:47 +03:00
Botond Dénes	f0fc3eaefc	reader_concurrency_semaphore: dump_reader_diagnostics(): sort lines in descending order So the largest memory consumer are at the top.	2021-05-10 10:15:47 +03:00
Botond Dénes	06e17c48e5	reader_concurrency_semaphore: dump_reader_diagnostics(): merge all states into a single table The goal of the printout is to allow finding the culprit for semaphore related problems and this usually involves finding the table/op/state eating the most memory. This is much easier when all the permit summaries are in a single table.	2021-05-10 10:15:47 +03:00
Botond Dénes	595a44bee2	reader_concurrency_semaphore: dump_reader_diagnostics(): separate number of permits and count resources Currently we have a single "count" column and it is not at all clear what it refers to: the number of permits or count resources used by them. Whichever it is, it only represent one of them, so in this commit we add a "permits" column, which in addition to clearing things up, supplies further information to the printout.	2021-05-10 10:15:47 +03:00
Botond Dénes	d246e2df0a	reader_concurrency_semaphore: add dump_diagnostics() Allow semaphore related tests to include a diagnostics printout in error messages to help determine why the test failed.	2021-04-26 15:56:56 +03:00
Botond Dénes	caaa8ef59a	reader_permit: always forward resources This commit conceptually reverts `4c8ab10`. Said commit was meant to prevent the scenario where memory-only permits -- those that don't pass admission but still consume memory -- completely prevent the admission of reads, possibly even causing a deadlock because a permit might even blocks its own admission. The protection introduced by said commit however proved to be very problematic. It made the status of resources on the permit very hard to reason about and created loopholes via which permits could accumulate without tracking or they could even leak resources. Instead of continuing to patch this broken system, this commit does away with this "protection" based on the observation that deadlocks are now prevented anyway by the admission criteria introduced by `0fe75571d9`, which admits a read anyway when all the initial count resources are available (meaning no admitted reader is alive), regardless of availability of memory. The benefits of this revert is that the semaphore now knows about all the resources and is able to do its job better as it is not "lied to" about resource by the permits. Furthermore the status of a permit's resources is much simpler to reason about, there are no more loopholes in unexpected state transitions to swallow/leak resources. To prove that this revert is indeed safe, in the next commit we add robust tests that stress test admission on a highly contested semaphore. This patch also does away with the registered/admitted differentiation of permits, as this doesn't make much sense anymore, instead these two are unified into a single "active" state. One can always tell whether a permit was admitted or not from whether it owns count resources anyway.	2021-04-26 15:56:56 +03:00
Botond Dénes	2b66f7222e	reader_concurrency_semaphore: inactive_read_handle: abandon(): close reader `fa43d7680` recently introduced mandatory closing of readers before they are destroyed. One reader destroy path that was left not closing the reader before destruction is `inactive_reader_handle::abandon()`. This path is executed when the handle is destroyed while still referring to a non-evicted inactive read. This patch fixes it up to close the reader and adds a small unit test which checks that this happens.	2021-04-26 15:56:54 +03:00
Benny Halevy	a144819683	reader_concurrency_semaphore: unregister_inactive_read: close reader also on internal error "forward" the unregister to the other semaphore in case on_internal_error throws rather than aborting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	c8e30db5db	reader_concurrency_semaphore: close evicted reader Close readers in the background: - evicted based on ttl, or - those that weren't admitted by register_inactive_read - those that are destoryed in clear_inactive_reads. Use a gate for waiting on these background closes in stop(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	be1cafc1a5	reader_concurrency_semaphore: do_wait_admission: close evicted readers enqueue_waiter before evicting readers and start a loop in the background to dequeue and close inactive_readers until either the _wait_list is empty or there are no more inactive_readers to evict. We admit the read synchronously only if the wait_list is empty and the semaphore has_available_units to statisfy admission. We need to enqueue the reader before starting to evict readers to make sure any evicted resources are assigned to the waiter at the head of the queue and not "stolen" in case we yield and some other caller grabs them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	43bf0f9356	reader_concurrency_semaphore: add stop method In addition to clear_inactive_reads, that's currently called when the database object is destroyed, introduce a stop() method that will: 1. wait on all background closes of inactive_reads. 2. close all present inactive_reads and waits on their close. 3. signal waiters on the wait_list via broken() with a proper exception indicating that the semaphore was closed. In addition, assert in the semaphore's destructor that it has no remaining inactive reads. Stop must be called from whoever owns the r_c_s. Mainly, from database::stop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	2f4134e1cc	reader_concurrency_semaphore: broken: make broken_semaphore the default exception Rather than explcitily generating it by all callers and then not using the argument at all. Prepare for providing a different exception_ptr from a stop() path to be introduced in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	81391b845f	reader_permit: expose description method Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Botond Dénes	4762b84b44	reader_concurrency_semaphore: remove now unused may_proceed()	2021-03-30 17:54:34 +03:00
Botond Dénes	94c7e619af	reader_concurrency_semaphore: restructure do_wait_admission() Currently the code is structured such that first the conditions required for admission are checked. The success paths have early returns and if all of them fail, we fall back to enqueueing the permit. This patch restructures the code such that the wait conditions are checked first, and if all of them fail, we fall back to admitting the permit. This structure allows for easier introduction of additional wait/admit conditions in the future.	2021-03-30 17:51:17 +03:00
Botond Dénes	d1dd55d98f	reader_concurrency_semaphore: extract enqueueing logic into enqueue_waiter() Besides making the code more readable, this also enables restructuring `do_wait_admission()`, without moving too much code around. As a bonus, queue length is now only checked when the permit actually has to be enqueued.	2021-03-30 17:49:30 +03:00
Botond Dénes	d90cd6402c	reader_concurrency_semaphore: make admission conditions consistent Currently there are two places where we check admission conditions: `do_wait_admission()` and `signal()`. Both use `has_available_units()` to check resource availability, but the former has some additional resource related conditions on top (in `may_proceed()`), which lead to the two paths working with slightly different conditions. To fix, push down all resource availability related checks to `has_available_units()` to ensure admission conditions are consistent across all paths.	2021-03-30 17:39:57 +03:00
Avi Kivity	a8463cfb37	Merge "reader_permit: signal leaked resources" from Botond " When a permit is destroyed we check if it still holds on to any resources in the destructor. Any resources the permit still holds on are leaked resources, as users should have released these. Currently we just invoke `on_internal_error_noexcept()` to handle this, which -- depending on the configuration -- will result in an error message or an assert. In the former case, the resources will be leaked for good. This mini-series fixes this, by signaling back these resources to the semaphore. This helps avoid an eventual complete dry-up of all semaphore resources and a subsequent complete shutdown of reads. Tests: unit(release, debug) " * 'reader-permit-signal-leaked-resources/v1' of https://github.com/denesb/scylla: reader_permit: signal leaked resources test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease sstables: generate_summary(): extend the lifecycle of the reader concurrency semaphore	2021-03-29 17:57:31 +03:00
Botond Dénes	d64b1fdd6a	reader_permit: signal leaked resources When destroying a permit with leaked resources we call `on_internal_error_noexcept()` in the destructor. This method logs an error or asserts depending on the configuration. When not asserting, we need to return the leaked units to the semaphore, otherwise they will be leaked for good. We can do this because we know exactly how many resources the user of the permit leaked (never signalled).	2021-03-26 14:23:32 +02:00

1 2 3

109 Commits