scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Botond Dénes	c4faa05888	reader_concurrency_semaphore: s/description/operation/ in diagnostics dumps "description" is not the respective column contains, so fix the header.	2023-06-07 14:21:48 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Avi Kivity	97694d26c4	Merge 'reader_permit: minor improvements to resource consume/release safety' from Botond Dénes This PR contains some small improvements to the safety of consuming/releasing resources to/from the semaphore: * reader_permit: make the low-level `consume()/signal()` API private, making the only user (an RAII class) friend. * reader_resources: split `reset()` into `noexcept` and potentially throwing variant. * reader_resources::reset_to(): try harder to avoid calling `consume()` (when the new resource amount is smaller then the previous one) Closes #13678 * github.com:scylladb/scylladb: reader_permit: resource_units::reset_to(): try harder to avoid calling consume() reader_permit: split resource_units::reset() reader_permit: make consume()/signal() API private	2023-05-14 14:14:23 +03:00
Botond Dénes	b790f14456	reader_concurrency_semaphore: execution_loop(): trigger admission check when _ready_list is empty The execution loop consumes permits from the _ready_list and executes them. The _ready_list usually contains a single permit. When the _ready_list is not empty, new permits are queued until it becomes empty. The execution loops relies on admission checks triggered by the read releasing resouces, to bring in any queued read into the _ready_list, while it is executing the current read. But in some cases the current read might not free any resorces and thus fail to trigger an admission check and the currently queued permits will sit in the queue until another source triggers an admission check. I don't yet know how this situation can occur, if at all, but it is reproducible with a simple unit test, so it is best to cover this corner-case in the off-chance it happens in the wild. Add an explicit admission check to the execution loop, after the _ready_list is exhausted, to make sure any waiters that can be admitted with an empty _ready_list are admitted immediately and execution continues. Fixes: #13540 Closes #13541	2023-05-08 17:11:41 +03:00
Botond Dénes	c1e8e86637	reader_concurrency_semaphore: reader_permit: clean-up after failed memory requests When requesting memory via `reader_permit::request_memory()`, the requested amount is added to `_requested_memory` member of the permit impl. This is because multiple concurrent requests may be blocked and waiting at the same time. When the requests are fulfilled, the entire amount is consumed and individual requests track their requested amount with `resource_units` to release later. There is a corner-case related to this: if a reader permit is registered as inactive while it is waiting for memory, its active requests are killed with `std::bad_alloc`, but the `_requested_memory` fields is not cleared. If the read survives because the killed requests were part of a non-vital background read-ahead, a later memory request will also include amount from the failed requests. This extra amount wil not be released and hence will cause a resource leak when the permit is destroyed. Fix by detecting this corner case and clearing the `_requested_memory` field. Modify the existing unit test for the scenario of a permit waiting on memory being registered as inactive, to also cover this corner case, reproducing the bug. Fixes: #13539 Closes #13679	2023-05-07 14:06:51 +03:00
Avi Kivity	f125a3e315	Merge 'tree: finish the reader_permit state renames' from Botond Dénes In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`. This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date. Closes #13573 * github.com:scylladb/scylladb: reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes reader_concurrency_semaphore: update API w.r.t. recent permit state name changes reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes	2023-05-04 18:29:04 +03:00
Kefu Chai	48387a5a9a	reader_concurrency_semaphore: fix signed/unsigned comparision a signed/unsigned comparsion can overflow. and GCC-13 rightly points this out. so let's use `std::cmp_greater_equal()` when comparing unsigned and signed for greater-or-equal. ``` /home/kefu/dev/scylladb/reader_concurrency_semaphore.cc:931:76: error: comparison of integer expressions of different signedness: ‘long int’ and ‘uint64_t’ {aka ‘long unsigned int’} [-Werror=sign-compare] 931 \| if (_resources.memory <= 0 && (consumed_resources().memory + r.memory) >= get_kill_limit()) [[unlikely]] { \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-29 17:02:25 +08:00
Botond Dénes	88c19b23dc	reader_permit: resource_units::reset_to(): try harder to avoid calling consume() Currently, the `reset_to()` implementation calls `consume(new_amount)` (if not zero), then calls `signal(old_amount)`. This means that even if `reset_to()` is a net reduction in the amount of resources, there is a call to `consume()` which can now potentially throw. Add a special case for when the new amount of resources is strictly smaller than the old amount. In this case, just call `signal()` with the difference. This not just avoids a potential `std::bad_alloc`, but also helps relieving memory pressure when this is most needed, by not failing calls to release memory.	2023-04-26 07:41:57 -04:00
Botond Dénes	2449b714df	reader_permit: split resource_units::reset() Into reset_to() and reset_to_zero(). The latter replaces `reset()` with the default 0 resources argument, which was often called from noexcept contexts. Splitting it out from `reset()` allows for a specialized implementation that is guaranteed to be `noexcept` indeed and thus peace of mind.	2023-04-26 07:41:57 -04:00
Botond Dénes	ecbb118d32	reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes Update comments, test names and etc. that are still using the old terminology for permit state names, bring them up to date with the recent state name changes.	2023-04-19 05:31:27 -04:00
Botond Dénes	e71d6566ab	reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes They are still using the old terminology for permit state names, bring them up to date with the recent state name changes.	2023-04-19 05:20:44 -04:00
Botond Dénes	804403f618	reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes They is still using the old terminology for permit state names, bring them up to date with the recent state name changes.	2023-04-19 05:20:42 -04:00
Botond Dénes	89328ce447	reader_concurrency_semaphore: update API w.r.t. recent permit state name changes It is still using the old terminology for permit state names, bring it up to date with the recent state name changes.	2023-04-19 05:18:13 -04:00
Botond Dénes	3919effe2d	reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes It is still using the old terminology for permit state names, bring it up to date with the recent state name changes.	2023-04-19 05:17:34 -04:00
Botond Dénes	943ae7fc69	reader_permit: give better names to active* states The names of these states have been the source of confusion ever since they were introduced. Give them names which better reflects their true meaning and gives less room for misinterpretation. The changes are: * active/unused -> active * active/used -> active/need_cpu * active/blocked -> active/await Hopefully the new names do a better job at conveying what these states really mean: * active - a regular admitted permit, which is active (as opposed to an inactive permit). * active/need_cpu - an active permit which was marked as needing CPU for the read to make progress. This permit prevents admission of new permits while it is in this state. * active/await - a former active/need_cpu permit, which has to wait on I/O or a remote shard. While in this state, it doesn't block the admission of new permits (pending other criteria such as resource availability).	2023-04-14 08:40:46 -04:00
Botond Dénes	bd57471e54	reader_concurrency_semaphore: don't evict inactive readers needlessly Inactive readers should only be evicted to free up resources for waiting readers. Evicting them when waiters are not admitted for any other reason than resources is wasteful and leads to extra load later on when these evicted readers have to be recreated end requeued. This patch changes the logic on both the registering path and the admission path to not evict inactive readers unless there are readers actually waiting on resources. A unit-test is also added, reproducing the overly-agressive eviction and checking that it doesn't happen anymore. Fixes: #11803 Closes #13286	2023-04-13 15:20:18 +03:00
Botond Dénes	d5488dba69	reader_permit: set_trace_state(): emit trace message linking to previous page This method is called on the start of each page, updating the trace state stored on the permit to that of the current page. When doing so, emit a trace message, containing the session id of the previous page, so the per-page sessions can be stiched together later. Note that this message is only emitted if the cached read survived between the pages. Example: Tracing session: dcfc1570-ca3c-11ed-88d0-24443f03a8bb activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2023-03-24 08:10:27.271000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2023-03-24 08:10:27.271864 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2023-03-24 08:10:27.271958 \| 127.0.0.1 \| 94 \| 127.0.0.1 Creating read executor for token 3274692326281147944 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2023-03-24 08:10:27.271995 \| 127.0.0.1 \| 132 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2023-03-24 08:10:27.271998 \| 127.0.0.1 \| 135 \| 127.0.0.1 Start querying singular range {{3274692326281147944, pk{00026b73}}} [shard 0] \| 2023-03-24 08:10:27.272003 \| 127.0.0.1 \| 140 \| 127.0.0.1 [reader concurrency semaphore] admitted immediately [shard 0] \| 2023-03-24 08:10:27.272006 \| 127.0.0.1 \| 143 \| 127.0.0.1 [reader concurrency semaphore] executing read [shard 0] \| 2023-03-24 08:10:27.272014 \| 127.0.0.1 \| 150 \| 127.0.0.1 Querying cache for range {{3274692326281147944, pk{00026b73}}} and slice {(-inf, +inf)} [shard 0] \| 2023-03-24 08:10:27.272022 \| 127.0.0.1 \| 159 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 3 clustering row(s) (3 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2023-03-24 08:10:27.272076 \| 127.0.0.1 \| 212 \| 127.0.0.1 Caching querier with key ab928e0d-b815-46b7-9a02-1fa2d9549477 [shard 0] \| 2023-03-24 08:10:27.272084 \| 127.0.0.1 \| 221 \| 127.0.0.1 Querying is done [shard 0] \| 2023-03-24 08:10:27.272087 \| 127.0.0.1 \| 224 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2023-03-24 08:10:27.272106 \| 127.0.0.1 \| 242 \| 127.0.0.1 Request complete \| 2023-03-24 08:10:27.271259 \| 127.0.0.1 \| 259 \| 127.0.0.1 Tracing session: dd3092f0-ca3c-11ed-88d0-24443f03a8bb activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2023-03-24 08:10:27.615000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2023-03-24 08:10:27.615223 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2023-03-24 08:10:27.615310 \| 127.0.0.1 \| 87 \| 127.0.0.1 Creating read executor for token 3274692326281147944 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2023-03-24 08:10:27.615346 \| 127.0.0.1 \| 124 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2023-03-24 08:10:27.615349 \| 127.0.0.1 \| 126 \| 127.0.0.1 Start querying singular range {{3274692326281147944, pk{00026b73}}} [shard 0] \| 2023-03-24 08:10:27.615352 \| 127.0.0.1 \| 130 \| 127.0.0.1 Found cached querier for key ab928e0d-b815-46b7-9a02-1fa2d9549477 and range(s) {{{3274692326281147944, pk{00026b73}}}} [shard 0] \| 2023-03-24 08:10:27.615358 \| 127.0.0.1 \| 135 \| 127.0.0.1 Reusing querier [shard 0] \| 2023-03-24 08:10:27.615362 \| 127.0.0.1 \| 139 \| 127.0.0.1 Continuing paged query, previous page's trace session is dcfc1570-ca3c-11ed-88d0-24443f03a8bb [shard 0] \| 2023-03-24 08:10:27.615364 \| 127.0.0.1 \| 141 \| 127.0.0.1 [reader concurrency semaphore] executing read [shard 0] \| 2023-03-24 08:10:27.615371 \| 127.0.0.1 \| 148 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2023-03-24 08:10:27.615385 \| 127.0.0.1 \| 163 \| 127.0.0.1 Querying is done [shard 0] \| 2023-03-24 08:10:27.615583 \| 127.0.0.1 \| 360 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2023-03-24 08:10:27.615730 \| 127.0.0.1 \| 507 \| 127.0.0.1 Request complete \| 2023-03-24 08:10:27.615518 \| 127.0.0.1 \| 518 \| 127.0.0.1 See the message: Continuing paged query, previous page's trace session is dcfc1570-ca3c-11ed-88d0-24443f03a8bb [shard 0] \| 2023-03-24 08:10:27.615364 \| 127.0.0.1 \| 141 \| 127.0.0.1 This is a folow-up to #13255 Refs: #12781 Closes #13318	2023-03-26 18:41:21 +03:00
Botond Dénes	ff87f95a26	reader_concurrency_semaphore: add trace points for important events Notably, to admission execution and eviction. Registering/unregistering the permit as inactive is not traced, as this happens on every buffer-fill for range scans. Semaphore trace messages have a "[reader_concurrency_semaphore]" prefix to allow them to be clearly associated with the semaphore.	2023-03-22 04:58:18 -04:00
Botond Dénes	1f51f752cc	reader_permit: refresh trace_state on new pages To make sure all tracing done on a certain page will make its way into the appropriate trace session. This is a contination of the previous patch (which added trace pointer to the permit).	2023-03-22 04:58:10 -04:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Botond Dénes	d6583cad0a	reader_concurrency_semaphore: do_dump_reader_permit_diagnostics(): print the stats Print the semaphore stats below the permit listing and remove the currently redundant "Total: " line. Some of the stats printed here are already exported as metrics, but instead of trying to cherry-pick and risk some metrics falling through the cracks, just print everything, there aren't that many anyway.	2023-03-17 03:15:41 -04:00
Botond Dénes	7b701ac52e	reader_concurrency_semaphore: add stats to record reason for queueing permits When diagnosing problems, knowing why permits were queued is very valuable. Record the reason in a new stats, one for each reason a permit can be queued.	2023-03-17 03:15:41 -04:00
Botond Dénes	bb00405818	reader_concurrency_semaphore: can_admit_read(): also return reason for rejection So caller can bump the appropriate counters or log the reason why the the request cannot be admitted.	2023-03-17 03:15:40 -04:00
Botond Dénes	3f0b3489a2	reader_concurrency_semaphore: handle reader blocked on memory becoming inactive Kill said read's memory requests with std::bad_alloc and dequeue it from the memory wait list, then evict it on the spot. Now that `_inactive_reads` just store permits, we can do this easily.	2023-03-13 08:07:53 -04:00
Botond Dénes	d1bc5f9293	reader_permit: evict inactive read on timeout If the read is inactive when the timeout clock fires, evict it. Now that `_inactive_reads` just store permits, we can do this easily.	2023-03-13 08:07:53 -04:00
Botond Dénes	6181c08191	reader_concurrency_semaphore: move inactive_read to .cc It is not used in the header anymore and moving it to the .cc allows us to remove the dependency on flat_mutation_reader_v2.hh.	2023-03-13 08:07:53 -04:00
Botond Dénes	e56ec9373d	reader_concurrency_semaphore: store permits in _inactive_reads Add an member of type `inactive_read` to reader permit, and store permit instances in `_inactive_reads`. This list is now just another intrusive list the permit can be linked into, depending on its state. Inactive read handles now just store a reader permit pointer.	2023-03-13 08:07:53 -04:00
Botond Dénes	d11f9efbfe	reader_concurrency_semaphore: inactive_read: de-inline more methods They will soon need to access reader_permit::impl internals, only available in the .cc file.	2023-03-13 08:07:53 -04:00
Botond Dénes	8e296e8e05	reader_concurrency_semaphore: make _ready_list intrusive Following the same scheme we used to make the wait lists intrusive. Permits are added to the ready list intrusive list while waiting to be executed and moved back to the _permit_list when de-queued from this list. We now use a conditional variable for signaling when there are permits ready to be executed.	2023-03-13 08:07:53 -04:00
Botond Dénes	11dde4b80b	reader_permit: add wait_for_execution state Used while the permit is in the _ready_list, waiting for the execution loop to pick it up. This just acknowledging the existence of this wait-state. This state will now show up in permit diagnostics printouts and we can now determine whether a permit is waiting for execution, without checking which queue it is in.	2023-03-09 07:11:51 -05:00
Botond Dénes	6229f8b1a6	reader_concurrency_semaphore: make wait lists intrusive Instead of using expiring_fifo to store queued permits, use the same intrusive list mechanism we use to keep track of all permits. Permits are now moved between the _permit_list and the wait queues, depending on which state they are in. This means _permit_list is now not the definitive list containing all permits, instead it is the list containing all permits that are not in a more specialized queue at the moment. Code wishing to iterate over all permits should now use foreach_permits(). For outside code, this was already the only way and internal users are already patched. Making the wait lists intrusive allows us to dequeue a permit from any position, with nothing but a permit reference at hand. It also means the wait queues don't have any additional memory requirements, other than the memory for the permit itself. Timeout while being queued is now handled by the permit's on_timeout() callback.	2023-03-09 07:11:49 -05:00
Botond Dénes	9ea9a48dbc	reader_concurrency_semaphore: move most wait_queue methods out-of-line They will soon depend on the definition of the reader_permit::impl, which is only available in the .cc file.	2023-03-09 06:53:11 -05:00
Botond Dénes	1d27dd8f0e	reader_concurrency_semaphore: store permits directly in queues Instead of the `entry` wrapper. In _wait_list and _ready_list, that is. Data stored in the `entry` wrapper is moved to a new `reader_permit::auxiliary_data` type. This makes the reader permit self-sufficient. This in turn prepares the ground for the ability to de-queue a permit from any queue, with nothing but a permit reference at hand: no need to have back pointer to wrappers and/or iterators.	2023-03-09 06:53:11 -05:00
Botond Dénes	bcfb8715f9	reader_permit: introduce (private) operator * and -> Currently the reader_permit has some private methods that only the semaphore's internal calls. But this method of communication is not consistent, other times the semaphore accesses the permit impl directly, calling methods on that. This commit introduces operator * and -> for reader_permit. With this, the semaphore internals always call the reader_permit::impl methods direcly, either via a direct reference, or via the above operators. This makes the permit internface a little narrower and reduces boilerplate code.	2023-03-09 06:53:11 -05:00
Botond Dénes	74a5981dbe	reader_concurrency_semaphore: add waiters counter Use it to keep track of all permits that are currently waiting on something: admission, memory or execution. Currently we keep track of size, by adding up the result of size() of the various queues. In future patches we are going to change the queues such that they will not have constant time size anymore, move to an explicit counter in preperation to that. Another change this commit makes is to also include ready list entries in this counter. Permits in the ready list are also waiters, they wait to be executed. Soon we will have a separate wait state for this too.	2023-03-09 06:53:11 -05:00
Botond Dénes	2694aa1078	reader_permit: use check_abort() for timeout Instead of having callers use get_timeout(), then compare it against the current time, set up a timeout timer in the permit, which assigned a new `_ex` member (a `std::exception_ptr`) to the appropriate exception type when it fires. Callers can now just poll check_abort() which will throw when `_ex` is not null. This is more natural and allows for more general reasons for aborting reads in the future. This prepares the ground for timeouts being managed inside the permit, instead of by the semaphore. Including timing out while in a wait queue.	2023-03-09 06:53:09 -05:00
Botond Dénes	23f4e250c2	reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param This param is from a time when _permit_list was not accessible from the outside, so it was passed along the semaphore instance to avoid making the diagnostics methods friends. To allow the semaphore freedom in how permits are stored, the diagnostics code is instead made to use foreach_permit(), instead of accessing the underlying list directly. As the diagnostics code wants reader_permit::impl& directly, a new variant of foreach_permit() passing impl references is introduced.	2023-03-09 05:19:59 -05:00
Botond Dénes	59dc15682b	reader_concurrency_semaphroe: make foreach_permit() const It already is conceptually, as it passes const references to the permits it iterates over. The only reason it wasn't const before is a technical issue which is solved here with a const_cast.	2023-03-09 05:19:59 -05:00
Botond Dénes	c86136c853	reader_permit: add get_schema() and get_op_name() accessors	2023-03-09 05:19:59 -05:00
Botond Dénes	9dd2cd07ef	reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept It is in fact noexcept and so it is expected to be, so document this.	2023-03-09 05:19:59 -05:00
Botond Dénes	2f4a793457	reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict() Instead of open-coding the same, in an incomplete way. clear_inactive_reads() does incomplete eviction in severeal ways: * it doesn't decrement _stats.inactive_reads * it doesn't set the permit to evicted state * it doesn't cancel the ttl timer (if any) * it doesn't call the eviction notifier on the permit (if there is one) The list goes on. We already have an evict() method that all this correctly, use that instead of the current badly open-coded alternative. This patch also enhances the existing test for clear_inactive_reads() and adds a new one specifically for `stop()` being called while having inactive reads. Fixes: #13048 Closes #13049	2023-03-07 08:45:04 +03:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Botond Dénes	34cdcaffae	reader_concurrency_semaphore: un-bless permits when they become inactive When the memory consumption of the semaphore reaches the configured serialize threshold, all but the blessed permit is blocked from consuming any more memory. This ensures that past this limit, only one permit at a time can consume memory. Such a blessed permit can be registered inactive. Before this patch, it would still retain its blessed status when doing so. This could result in this permit being re-queued for admission if it was evicted in the meanwhile, potentially resulting in a complete deadlock of the semaphore: * admission queue permits cannot be admitted because there is no memory * admitter permits are all queued on memory, as none of them are blessed This patch strips the blessed status from the permit when it is registered as inactive. It also adds a unit test to verify this happens. Fixes: #12603 Closes #12694	2023-02-01 21:02:17 +02:00
Botond Dénes	7f8469db27	reader_concurrency_semaphore: add foreach_permit() Allows iterating over all permits.	2023-01-17 05:27:04 -05:00
Botond Dénes	edb32cb171	reader_concurrency_semaphore: add OOM killer When the collective memory consumption of all readers goes above $kill_limit_multiplier * $memory_limit, consume() will throw std::bad_alloc(), instantly unwinding the read that is unlucky enough to have requested the last bytes of memory. This should help situation where there are some problematic partitions, either because of large cells or because they are scattered in too many sstables. Currently nothing prevents such reads from bringing down the entire node via OOM.	2023-01-17 05:27:04 -05:00
Botond Dénes	8f9e8aafdf	reader_concurrency_semaphore: move consume() out-of-line Its about to get a little bit more complex.	2023-01-17 05:27:04 -05:00
Botond Dénes	e4ef28284b	reader_permit: consume(): make it exception-safe reader_concurrency_semaphroe::consume() will soon throw.	2023-01-17 05:27:04 -05:00
Botond Dénes	029269af42	reader_permit: resource_units::reset(): only call consume() if needed reset() is called from the destructor, with null resources. Calling consume() can be avoided in this case and in fact it is required as consume() is soon going to throw in some cases.	2023-01-17 05:27:04 -05:00
Botond Dénes	dd9a0a16e6	reader_concurrency_semaphore: tracked_file_impl: use request_memory() Use the recently added `request_memory()` to aquire the memory units for the I/O. This allows blocking all but one readers when memory consumption grows too high.	2023-01-17 05:27:04 -05:00

1 2 3 4

184 Commits