scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 02:20:37 +00:00

Author	SHA1	Message	Date
Benny Halevy	6e92f07630	reader_concurrency_semaphore: register_inactive_read: make noexcept Catch error to allocate an inactive_read and just log them. Return an empty inactive_read_handle in this case, as if the inactive reader was evicted due to lack of resources. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 22:31:01 +02:00
Benny Halevy	46c2229b78	reader_concurrency_semaphore: separate set_notify_handler from register_inactive_reader Register the inactive reader first with no evict_notify_handler and ttl. Those can be set later, only if registration succeeded. Otherwise, as in the querier example, there is no need to to place the querier in the index and erase it on eviction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 22:31:01 +02:00
Benny Halevy	d752ea7e91	reader_concurrency_semaphore: inactive_read: make ttl_timer non-optional By default it will be unarmed and with no callback so there's no need to wrap it in a std::optional. This saves an allocation and another potential error case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 22:31:01 +02:00
Benny Halevy	a12c9638b6	reader_concurrency_semaphore: inactive_read: use intrusive list To simplify insertion and eviction into the inactive_reads container, use an intrusive list thta requires a single allocation for the inactive_read object itself. This allows passing a reference to the inactive_read to evict it. Note that the reader will be unlinked automatically from the inactive_readers list if the inactive_read_handle is destroyed. This is okay since there is no need to track the inactive_read if the caller loses the i_r_h (e.g. if an error is thrown). It is also safe to evict the inactive_reader while the i_r_h is alive. In this case the i_r will be unlinked after the flat_mutation_reader it holds is moved out of it. bi::auto_unlink will detect that it's alredy unlinked when destroyed and do nothing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 22:31:01 +02:00
Benny Halevy	f751e42bf9	reader_concurrency_semaphore: do_wait_admission: use try_evict_one_inactive_read Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:52:16 +02:00
Benny Halevy	81cd3d0c51	reader_concurrency_semaphore: try_evict_one_inactive_read: pass evict_reason So try_evict_one_inactive_read could be used also in do_wait_admission in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Benny Halevy	e072199b8d	reader_concurrency_semaphore: unregister_inactive_read: calling on wrong semaphore is an internal error Calling unregister_inactive_read on the wrong semaphore is a blatant bug so better call on_internal_error so it'd be easier to catch and fix. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Benny Halevy	9c9b4c85ae	reader_concurrency_semaphore: unregister_inactive_read: do nothing if disengaged There is no need to lookup the inactive_read if the i_r_h is disengaged, it should not be registered so just return quickly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Benny Halevy	4e8f29ef14	reader_concurrency_semaphore: inactive_read: keep a flat_mutation_reader There's no need to hold a unique_ptr<flat_mutation_reader> as flat_mutation_reader itself holds a unique_ptr<flat_mutation_reader::impl> and functions as a unique ptr via flat_mutation_reader_opt. With that, unregister_inactive_read was modified to return a flat_mutation_reader_opt rather than a std::unique_ptr<flat_mutation_reader>, keeping exactly the same semantics. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Avi Kivity	913d970c64	Merge "Unify inactive readers" from Botond " Currently inactive readers are stored in two different places: * reader concurrency semaphore * querier cache With the latter registering its inactive readers with the former. This is an unnecessarily complex (and possibly surprising) setup that we want to move away from. This series solves this by moving the responsibility if storing of inactive reads solely to the reader concurrency semaphore, including all supported eviction policies. The querier cache is now only responsible for indexing queriers and maintaining relevant stats. This makes the ownership of the inactive readers much more clear, hopefully making Benny's work on introducing close() and abort() a little bit easier. Tests: unit(release, debug:v1) " * 'unify-inactive-readers/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: store inactive readers directly querier_cache: store readers in the reader concurrency semaphore directly querier_cache: retire memory based cache eviction querier_cache: delegate expiry to the reader_concurrency_semaphore reader_concurrency_semaphore: introduce ttl for inactive reads querier_cache: use new eviction notify mechanism to maintain stats reader_concurrency_semaphore: add eviction notification facility reader_concurrency_semaphore: extract evict code into method evict()	2021-02-03 10:59:04 +02:00
Botond Dénes	318b0ef259	reader_concurrency_semaphore: rate-limit diagnostics messages And since now there is no danger of them filling the logs, the log-level is promoted to info, so users can see the diagnostics messages by default. The rate-limit chosen is 1/30s. Refs: #7398 Tests: manual Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201117091253.238739-1-bdenes@scylladb.com>	2020-11-17 11:57:51 +02:00
Piotr Sarna	3ce7848bdf	reader_concurrency_semaphore: add metrics for shed reads When the admission queue capacity reaches its limits, excessive reads are shed in order to avoid overload. Each such operation now bumps the metrics, which can help the user judge if a replica is overloaded.	2020-11-11 19:01:38 +01:00
Avi Kivity	cfada6e04d	reader_concurrency_semaphore: adjust permit_summary construction for clang Clang does not implement P0960R3, parenthesized initialization of aggregates, so we have to use brace initialization in permit_summary. As the parenthesized constructor call is done by emplace_back(), we have to do the braced call ourselves.	2020-10-19 14:57:51 +03:00
Botond Dénes	18454e4a80	reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow The reader concurrency semaphore timing out or its queue being overflown are fairly common events both in production and in testing. At the same time it is a hard to diagnose problem that often has a benign cause (especially during testing), but it is equally possible that it points to something serious. So when this error starts to appear in logs, usually we want to investigate and the investigation is lengthy... either involves looking at metrics or coredumps or both. This patch intends to jumpstart this process by dumping a diagnostics on semaphore timeout or queue overflow. The diagnostics is printed to the log with debug level to avoid excessive spamming. It contains a histogram of all the permits associated with the problematic semaphore organized by table, operation and state. Example: DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore - Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics: Permits with state admitted, sorted by memory memory count name 3499M 27 ks.test:data-query 3499M 27 total Permits with state waiting, sorted by count count memory name 1 0B ks.test:drain 7650 0B ks.test:data-query 7651 0B total Permits with state registered, sorted by count count memory name 0 0B total Total: permits: 7678, memory: 3499M This allows determining several things at glance: * What are the tables involved * What are the operations involved * Where is the memory This can speed up a follow-up investigation greatly, or it can even be enough on its own to determine that the issue is benign.	2020-10-13 12:32:14 +03:00
Botond Dénes	27bbf5566d	reader_concurrency_semaphore: link permits into an intrusive list	2020-10-13 12:32:14 +03:00
Botond Dénes	fdb93ae0fd	reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line Soon we will want to add more logic to this now simple handler, move it out-of-line in preparation.	2020-10-13 12:32:14 +03:00
Botond Dénes	85bfd28f4e	reader_concurrency_semaphore: move constructors out-of-line Soon, the semaphore will have a field that will not have a publicly available definition. Move the constructor out-of-line in preparation.	2020-10-13 12:32:13 +03:00
Botond Dénes	70fa543c31	reader_concurrency_semaphore: add state to permits Instead of a simple boolean, designating whether the permit was already admitted or not, add a proper state field with a value for all the different states the permit can be in. Currently there are three such states: * registered - the permit was created and started accounting resource consumption. * waiting - the permit was queued to wait for admission. * admitted - the permit was successfully admitted. The state will be used for debugging purposes, both during coredump debugging as well as for dumping diagnostics data about permits.	2020-10-13 12:32:13 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Botond Dénes	73a6b97c75	reader_permit: add consumed_resources() accessor That allows querying he amount of resources accounted though this permit, and by extension by this logical read.	2020-10-06 08:18:42 +03:00
Botond Dénes	4c8ab10563	reader_permit: only forward resource consumption to semaphore after admission In the next patches we plan to start tracking the memory consumption of the actual allocations made by the circular_buffer<mutation_fragment>, as well as the memory consumed by the mutation fragments. This means that readers will start consuming memory off the permit right after being constructed. Ironically this can prevent the reader from being admitted, due to its own pre-admission memory consumption. To prevent this hold on forwarding the memory consumption to the semaphore, until the permit is actually admitted.	2020-09-28 08:46:22 +03:00
Botond Dénes	e1eee0dc34	reader_permit: track resource consumed through permit Track all resources consumed through the permit inside the permit. This allows querying how much memory each read is consuming (as there should be one read per permit). Although this might be interesting, especially when debugging OOM cores, the real reason we are doing this is to be able forward resource consumption to the semaphore only post-admission. More on this in the patch introducing this. Another advantage of tracking resources consumed through the permit is that now we can detect resource leaks in the permit destructor and report them. Even if it is just a case of the holder of the resources wanting to release the resources later, with the permit destroyed it will cause use-after-free.	2020-09-28 08:46:22 +03:00
Botond Dénes	cd953a36fd	reader_permit: move internals to impl In the next patches the reader permit will gain members that are shared across all instances of the same permit. To facilitate this move all internals into an impl class, of which the permit stores a shared pointer. We use a shared_ptr to avoid defining `impl` in the header. This is how the reader permit started in the beginning. We've done a full circle. :)	2020-09-28 08:46:22 +03:00
Botond Dénes	12372731cb	reader_permit: add consume()/signal() And do all consuming and signalling through these methods. These operations will soon be more involved than the simple forwarding they do today, so we want to centralize them to a single method pair.	2020-09-28 08:46:22 +03:00
Botond Dénes	375815e650	reader_permit::resource_units: store permit instead of semaphore In the next patches we want to introduce per-permit resource tracking -- that is, have each permit track the amount of resource consumed through it. For this, we need all consumption to happen through a permit, and not directly with the semaphore.	2020-09-28 08:46:22 +03:00
Botond Dénes	0fe75571d9	reader_concurrency_semaphore: admit one read if no reader is active To ensure progress at all times. This is due to evictable readers, who still hold on to a buffer even when their underlying reader is evicted. As we are introducing buffer and mutation fragment tracking in the next patches, these readers will hold on to memory even in this state, so it may theoretically happen that even though no readers are admitted (all count resources all available) no reader can be admitted due to lack of memory. To prevent such deadlocks we now always admit one reader if all count resource are available.	2020-09-28 08:46:22 +03:00
Botond Dénes	ef0b279c80	reader_concurrency_semaphore: move may_proceed() out-of-line They are only used in the .cc anyway.	2020-09-28 08:46:22 +03:00
Botond Dénes	c18756ce9a	reader_concurrency_semaphore: s/inactive_read_stats/stats/ In preparations of non-inactive read stats being added to the semaphore, rename its existing stats struct and member to a more generic name. Fields, whose name only made sense in the context of the old name are adjusted accordingly.	2020-09-23 13:11:55 +03:00
Botond Dénes	a0107ba1c6	reader_permit: reader_resources: make true RAII class Currently in all cases we first deduct the to-be-consumed resources, then construct the `reader_resources` class to protect it (release it on destruction). This is error prone as it relies on no exception being thrown while constructing the `reader_resources`. Albeit the `reader_resources` constructor is `noexcept` right now this might change in the future and as the call sites relying on this are disconnected from the declaration, the one modifying them might not notice. To make this safe going forward, make the `reader_resources` a true RAII class, consuming the units in its constructor and releasing them in its destructor. Fixes: #7256 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200922150625.1253798-1-bdenes@scylladb.com>	2020-09-22 18:13:35 +03:00
Botond Dénes	11105cbb78	reader_concurrency_semaphore: make inactive read handles unique across semaphores Currently inactive read handles are only unique within the same semaphore, allowing for an unregister against another semaphore to potentially succeed. This can lead to disasters ranging from crashes to data corruption. While a handle should never be used with another semaphore in the first place, we have recently seen a bug (#6613) causing exactly that, so in this patch we prevent such unregister operations from ever succeeding by making handles unique across all semaphores. This is achieved by adding a pointer to the semaphore to the handle.	2020-07-23 16:43:33 +03:00
Avi Kivity	4e79296090	tracked_file_impl: inherit disk and memory alignment from underlying file tracked_file_impl is a wrapper around another file, that tracks memory allocated for buffers in order to control memory consumption. However, it neglects to inherit the disk and memory alignment settings from the wrapped file, which can cause unnecessarily-large buffers to be read from disk, reducing throughput. Fix by copying the alignment parameters. Fixes #6290.	2020-06-11 17:43:50 +03:00
Botond Dénes	3cd2598ab3	reader_permit: forbid empty permits Remove `no_reader_permit()` and all ways to create empty (invalid) permits. All permits are guaranteed to be valid now and are only obtainable from a semaphore. `reader_permit::semaphore()` now returns a reference, as it is guaranteed to always have a valid semaphore reference.	2020-05-28 11:34:35 +03:00
Botond Dénes	f417b9a3ea	reader_concurrency_semaphore: remove wait_admission and consume_resources() Permits are now created with `make_permit()` and code is using the permit to do all resource consumption tracking and admission waiting, so we can remove these from the semaphore. This allows us to remove some now unused code from the permit as well, namely the `base_cost` which was used to track the resource amount the permit was created with. Now this amount is also tracked with a `resource_units` RAII object, returned from `reader_permit::wait_admission()`, so it can be removed. Curiously, this reduces the reader permit to be glorified semaphore pointer. Still, the permit abstraction is worth keeping, because it allows us to make changes to how the resource tracking part of the semaphore works, without having to change the huge amount of code sites passing around the permit.	2020-05-28 11:34:35 +03:00
Botond Dénes	bf4ade8917	reader_permit: resource_units: introduce add() Allows merging two resource_units into one.	2020-05-28 11:34:35 +03:00
Botond Dénes	4d7250d12b	reader_permit: add wait_admission We want to make `read_permit` the single interface through which reads interact with the concurrency limiting mechanism. So far it was only usable to track memory consumption. Add the missing `wait_admission()` and `consume_resources()` to the permit API. As opposed to `reader_concurrency_semaphore::` equivalents which returned a permit, the `reader_permit::` variants jut return `reader_permit::resource_units` which is an RAII holder for the acquired units. This also allows for the permit to be created earlier, before the reader is admitted, allowing for tracking pre-admission memory usage as well. In fact this is what we are going to do in the next patches. This patch also introduces a `broken()` method on the reader concurrency semaphore which resolves waiters with an exception. This method is also called internally from the semaphore's destructor. This is needed because the semaphore can now have external waiters, who has to be resolved before the semaphore itself is destroyed.	2020-05-28 11:34:35 +03:00
Botond Dénes	bd793d6e19	reader_permit: resource_units: work in terms of reader_resources Refactor resource_units semantically as well to work in terms of reader_resources, instead of just memory.	2020-05-28 11:34:35 +03:00
Botond Dénes	0f9c24631a	reader_permit: s/memory_units/resource_units/ We want to refactor reader_permit::memory_units to work in terms of reader_resources, as we are planning to use it for guarding count resources as well. This patch makes the first step: renames it from memory_units to resources_units. Since this is a very noisy change, we do it in a separate patch, the semantic change is in the next patch.	2020-05-28 11:34:35 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Botond Dénes	05116ba963	reader_concurrency_semaphore: make signal() noexcept Currently reader_concurrency_semaphore::signal() can fail. This is dangerous in two ways: * It is called from constructors, so the exception can bring down the node. This will convert an `std::bad_alloc` to a crash. * Reads in the queue will be blocked until they either time-out, or another `signal()` succeeds. To solve this, wrap the `reader_permit` constructor, the only code that can throw, with try-catch and forward the exception to the reader admission promise. In practice this will result in the flushing of the reader queue, when we fail to admit a read. Fixes #5741 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200206154238.707031-1-bdenes@scylladb.com>	2020-02-06 17:51:03 +02:00
Botond Dénes	434d32befe	reader_permit: tidy up reader_permit::memory_units This patch is a bag of fixes/cleanups that were omitted from the reader memory tracking series due to contributor error. It contains the following changes: * Get rid of unused `increase()` and `decrease()` methods. * Make all constructors and assignment operators `noexcept`. * Make move assignment operator safe w.r.t. self assignment. * `reset()`: consume the new amount before releasing the old amount, to prevent a transient window where new readers might be admitted. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200206143007.633069-1-bdenes@scylladb.com>	2020-02-06 16:35:07 +02:00
Botond Dénes	92fffe51d5	reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively Consume the memory before even submitting the I/O to the underlying `file` object. This is in line with the underlying `file` object allocating the buffer before it forwards the I/O request to the kernel. This extends the "visibility" over the memory consumed by I/O greatly, as it turns out buffers spend most time alive waiting for the I/O to complete and are parsed shortly afterwards.	2020-01-28 08:13:16 +02:00
Botond Dénes	4bb3c7b1f0	reader_concurrency_semaphore: bye reader_resource_tracker Replaced by `reader_permit`, of which it was a mere wrapper of in the first place.	2020-01-28 08:13:16 +02:00
Botond Dénes	dea24ca859	reader_permit: expose make_tracked_temporary_buffer() Previously `tracking_file_impl::make_tracked_buf()`. In the next patches we plan on using this outside `tracking_file_impl`, so make it public and templatize on the char type.	2020-01-28 08:13:16 +02:00
Botond Dénes	16cea36a94	reader_permit: introduce make_tracked_file() Free function equivalent of `reader_resource_tracker::track_file()`, using a `reader_permit` directly.	2020-01-28 08:13:16 +02:00
Botond Dénes	1859a03629	reader_permit: introduce memory_units Similar to `seastar::semaphore_units`, this allows consuming and releasing memory via an RAII object. In addition to that, it also allows tracking changing values. This feature was designed to be used for tracking the ever changing memory consumption of the buffers of `flat_mutation_reader`:s. This is now the only supported way of consuming memory from a permit.	2020-01-28 08:13:16 +02:00
Botond Dénes	c0f96db2d9	reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh In the next patches we will replace `reader_resource_tracker` and have code use the `reader_permit` directly. In subsequent patches, the `reader_permit` will get even more usages as we attempt to make the tracking of reader resource more accurate by tracking more parts of it. So the grand plan is that the current `reader_concurrency_semaphore.hh` is split into two headers: * `reader_concurrency_semaphore.hh` - containing the semaphore proper. * `reader_permit.hh` - a very lightweight header, to be used by components which only want to track various parts of the resource consumption of reads.	2020-01-28 08:13:16 +02:00
Botond Dénes	2005495857	reader_concurrency_semaphore: reader_permit: make it a value type Currently `reader_permit` is passed around as `lw_shared_ptr<reader_permit>`, which is clunky to write and use and is also an unnecessary leak of details on how permit ownership is managed. Make `reader_permit` a simple value type, making it a little bit easier and safer to use. In the next patches we will get rid of `reader_resource_tracker` and instead have code use the permit instance directly, so this small improvement in usability will go a long way towards preventing eye sore.	2020-01-28 08:13:16 +02:00
Botond Dénes	89c5fd0c25	reader_concurrency_semaphore::reader_permit: move methods out-of-line In preparation for making the reader_permit a top-level class, and moving it to another file. It is also good practice to define non-performance critical methods out-of-line to reduce header bloat.	2020-01-28 08:13:16 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Botond Dénes	e56c26205f	reader_concurrency_semaphore: add counters for inactive reads Add counters that give insight into inactive read related events. Two counters are added: * permit_based_evictions * population	2019-01-07 16:45:49 +02:00

1 2

54 Commits