scylladb

Author	SHA1	Message	Date
Botond Dénes	a3ae0c7cee	reader_permit: mark check_abort() as const All it does is read one field, making it const makes using it easier.	2025-02-07 01:32:35 -05:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	168ade72f8	treewide: replace formatter<std::string_view> with formatter<string_view> in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>` for `std::string_view` as well as the specialization of `fmt::formatter<..>` for `fmt::string_view` which is an implementation builtin in {fmt} for compatibility of pre-C++17. and this type is used even if the code is compiled with C++ stadandard greater or equal to C++17. also, before v10, the `fmt::formatter<std::string_view>::format()` is defined so it accepts `std::string_view`. after v10, `fmt::formatter<std::string_view>` still exists, but it is now defined using `format_as()` machinery, so it's `format()` method does not actually accept `std::string_view`, it accepts `fmt::string_view`, as the former can be converted to `fmt::string_view`. this is why we can inherit from `fmt::formatter<std::string_view>` and use `formatter<std::string_view>::format(foo, ctx);` to implement the `format()` method with {fmt} v9, but we cannot do this with {fmt} v10, and we would have following compilation failure: ``` FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o /home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc /home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format' 254 \| return formatter<std::string_view>::format(it->second, ctx); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ /usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument 2759 \| FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const \| ^ ~~~~~~~~~~~~ ``` because the inherited `format()` method actually comes from `fmt::formatter<fmt::string_view>`. to reduce the confusion, in this change, we just inherit from `fmt::format<string_view>`, where `string_view` is actually `fmt::string_view`. this follows the document at https://fmt.dev/latest/api.html#formatting-user-defined-types, and since there is less indirection under the hood -- we do not use the specialization created by `FMT_FORMAT_AS` which inherit from `formatter<fmt::string_view>`, hopefully this can improve the compilation speed a little bit. also, this change addresses the build failure with {fmt} v10. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18299	2024-04-19 07:44:07 +03:00
Kefu Chai	38ae52d5cd	add fmt::formatter for reader_permit::state and reader_resources before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * reader_permit::state * reader_resources Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17707	2024-03-11 09:55:51 +02:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
Botond Dénes	8bfe3ca543	query: move max_result_size to query-request.hh It is currently located in query_class_config.hh, which is named after a now defunct struct. This arrangement is unintuitive and there is no upside to it. The main user of max_result_size is query_comand, so colocate it next to the latter. Closes #14268	2023-06-20 11:37:50 +02:00
Avi Kivity	97694d26c4	Merge 'reader_permit: minor improvements to resource consume/release safety' from Botond Dénes This PR contains some small improvements to the safety of consuming/releasing resources to/from the semaphore: * reader_permit: make the low-level `consume()/signal()` API private, making the only user (an RAII class) friend. * reader_resources: split `reset()` into `noexcept` and potentially throwing variant. * reader_resources::reset_to(): try harder to avoid calling `consume()` (when the new resource amount is smaller then the previous one) Closes #13678 * github.com:scylladb/scylladb: reader_permit: resource_units::reset_to(): try harder to avoid calling consume() reader_permit: split resource_units::reset() reader_permit: make consume()/signal() API private	2023-05-14 14:14:23 +03:00
Botond Dénes	2449b714df	reader_permit: split resource_units::reset() Into reset_to() and reset_to_zero(). The latter replaces `reset()` with the default 0 resources argument, which was often called from noexcept contexts. Splitting it out from `reset()` allows for a specialized implementation that is guaranteed to be `noexcept` indeed and thus peace of mind.	2023-04-26 07:41:57 -04:00
Botond Dénes	21988842de	reader_permit: make consume()/signal() API private This API is dangerous, all resource consumption should happen via RAII objects that guarantee that all consumed resources are appropriately released. At this poit, said API is just a low-level building block for higher-level, RAII objects. To ensure nobody thinks of using it for other purposes, make it private and make external users friends instead.	2023-04-26 07:41:53 -04:00
Botond Dénes	804403f618	reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes They is still using the old terminology for permit state names, bring them up to date with the recent state name changes.	2023-04-19 05:20:42 -04:00
Botond Dénes	89328ce447	reader_concurrency_semaphore: update API w.r.t. recent permit state name changes It is still using the old terminology for permit state names, bring it up to date with the recent state name changes.	2023-04-19 05:18:13 -04:00
Botond Dénes	943ae7fc69	reader_permit: give better names to active* states The names of these states have been the source of confusion ever since they were introduced. Give them names which better reflects their true meaning and gives less room for misinterpretation. The changes are: * active/unused -> active * active/used -> active/need_cpu * active/blocked -> active/await Hopefully the new names do a better job at conveying what these states really mean: * active - a regular admitted permit, which is active (as opposed to an inactive permit). * active/need_cpu - an active permit which was marked as needing CPU for the read to make progress. This permit prevents admission of new permits while it is in this state. * active/await - a former active/need_cpu permit, which has to wait on I/O or a remote shard. While in this state, it doesn't block the admission of new permits (pending other criteria such as resource availability).	2023-04-14 08:40:46 -04:00
Botond Dénes	1f51f752cc	reader_permit: refresh trace_state on new pages To make sure all tracing done on a certain page will make its way into the appropriate trace session. This is a contination of the previous patch (which added trace pointer to the permit).	2023-03-22 04:58:10 -04:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Botond Dénes	11dde4b80b	reader_permit: add wait_for_execution state Used while the permit is in the _ready_list, waiting for the execution loop to pick it up. This just acknowledging the existence of this wait-state. This state will now show up in permit diagnostics printouts and we can now determine whether a permit is waiting for execution, without checking which queue it is in.	2023-03-09 07:11:51 -05:00
Botond Dénes	bcfb8715f9	reader_permit: introduce (private) operator * and -> Currently the reader_permit has some private methods that only the semaphore's internal calls. But this method of communication is not consistent, other times the semaphore accesses the permit impl directly, calling methods on that. This commit introduces operator * and -> for reader_permit. With this, the semaphore internals always call the reader_permit::impl methods direcly, either via a direct reference, or via the above operators. This makes the permit internface a little narrower and reduces boilerplate code.	2023-03-09 06:53:11 -05:00
Botond Dénes	2694aa1078	reader_permit: use check_abort() for timeout Instead of having callers use get_timeout(), then compare it against the current time, set up a timeout timer in the permit, which assigned a new `_ex` member (a `std::exception_ptr`) to the appropriate exception type when it fires. Callers can now just poll check_abort() which will throw when `_ex` is not null. This is more natural and allows for more general reasons for aborting reads in the future. This prepares the ground for timeouts being managed inside the permit, instead of by the semaphore. Including timing out while in a wait queue.	2023-03-09 06:53:09 -05:00
Botond Dénes	c86136c853	reader_permit: add get_schema() and get_op_name() accessors	2023-03-09 05:19:59 -05:00
Botond Dénes	1a9fdebb49	treewide: adapt to throwing reader_concurrency_semaphore::consume() Said method can now throw `std::bad_alloc` since `aab5954`. All call-sites should have been adapted in the series introducing the throw, but some managed to slip through because the oom unit test didn't run in debug mode. In this commit the remaining unpatched call-sites are fixed.	2023-02-17 00:46:56 -05:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Botond Dénes	ec1c615029	reader_permit: expose operator<<(reader_permit::state)	2023-01-17 05:27:04 -05:00
Botond Dénes	78583b84f1	reader_permit: add id() accessor Effectively returns the address of the underlying permit impl as an `uintptr_t`. This can be used to determine the identity of the permit.	2023-01-17 05:27:04 -05:00
Botond Dénes	dd9a0a16e6	reader_concurrency_semaphore: tracked_file_impl: use request_memory() Use the recently added `request_memory()` to aquire the memory units for the I/O. This allows blocking all but one readers when memory consumption grows too high.	2023-01-17 05:27:04 -05:00
Botond Dénes	9ed5d861be	reader_concurrency_semaphore: add request_memory() A possibly blocking request for more memory. If the collective memory consumption of all reads goes above $serialize_limit_multiplier * $memory_limit this request will block for all but one reader (the first requester). Until this situation is resolved, that is until memory stays above the above explained limit, only this one reader is allowed to make progress. This should help reign in the memory consumption of reads in a situation where their memory consumption used to baloon without constraints before.	2023-01-17 05:27:04 -05:00
Botond Dénes	8b0afc28d4	reader_permit: add make_new_tracked_temporary_buffer() A separate method for callers of make_tracked_temporary_buffer() who are creating new empty tracked buffers of a certain size. make_tracked_temporary_buffer() is about to be changed to be more targeted at callers who call it with pre-consumed memory units.	2023-01-16 02:05:27 -05:00
Botond Dénes	397266f420	reader_permit: add get_state() accessor	2023-01-16 02:05:27 -05:00
Botond Dénes	87e2bf90b9	reader_permit: resource_units: add constructor for already consumed res	2023-01-16 02:05:27 -05:00
Botond Dénes	d2cfc25494	reader_permit: resource_units: remove noexcept qualifier from constructor It won't be noexcept soon. Also make it exception safe.	2023-01-16 02:05:27 -05:00
Botond Dénes	2c0de50969	reader_concurrency_semaphore: add disk_reads and sstables_read stats And the infrastructure to reader_permit to update them. The infrastructure is not wired in yet. These metrics will be used to count the number of reads gone to disk and the number of sstables read currently respectively.	2023-01-03 09:37:29 -05:00
Botond Dénes	669b225c67	reader_permit: resources: remove operator bool and >= These cannot be meaningfully define for a vector value like resources. To prevent instinctive misuse, remove them. Operator bool is replaced with `non_zero()` which hopefully better expresses what to expected. The comparison operator is just removed and inlined into its own user, which actually help said user's readability. Closes #11813	2022-10-20 15:25:11 +03:00
Botond Dénes	61028ad718	evicatble_reader: avoid preemption pitfall around waiting for readmission Permits have to wait for re-admission after having been evicted. This happens via `reader_permit::maybe_wait_readmission()`. The user of this method -- the evictable reader -- uses it to re-wait admission when the underlying reader was evicted. There is one tricky scenario however, when the underlying reader is created for the first time. When the evictable reader is part of a multishard query stack, the created reader might in fact be a resumed, saved one. These readers are kept in an inactive state until actually resumed. The evictable reader shares it permit with the to-be-resumed reader so it can check whether it has been evicted while saved and needs to wait readmission before being resumed. In this flow it is critical that there is no preemption point between this check and actually resuming the reader, because if there is, the reader might end up actually recreated, without having waited for readmission first. To help avoid this situation, the existing `maybe_wait_readmission()` is split into two methods: * `bool reader_permit::needs_readmission()` * `future<> reader_permit::wait_for_readmission()` The evictable reader can now ensure there is no preemption point between `needs_readmission()` and resuming the reader. Fixes: #10187 Tests: unit(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220315105851.170364-1-bdenes@scylladb.com>	2022-03-15 14:37:22 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	4762ddec0f	reader_permit: add release_base_resource() Signals base resources to the semaphore and zeros it. This basically undoes admission.	2022-01-07 14:06:31 +02:00
Kamil Braun	e8824986dd	reader_permit: make query max result size accessible from the permit This will make it easier, for example, to enforce memory limits in lower levels of the flat_mutation_reader stack. By default the size is unlimited. However, for specific queries it is possible to store a different value (for example, obtained from a `read_command` object) through a setter.	2021-09-14 13:27:25 +02:00
Benny Halevy	4e3dcfd7d6	reader_concurrency_semaphore: use permit timeout for admission Now that the timeout is stored in the reader permit use it for admission rather than a timeout parameter. Note that evictable_reader::next_partition currently passes db::no_timeout to resume_or_create_reader, which propagated to maybe_wait_readmission, but it seems to be an oversight of the f_m_r api that doesn't pass a timeout to next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	eeab5f77d9	repair: row_level: read_mutation_fragment: set reader timeout The timeout needs to be propagated to the reader's permit. Reset it to db::no_timeout in repair_reader::pause(). Warn if set_timeout asks to change the timeout too far into the past (100ms). It is possible that it will be passed a past timeout from the rcp path, where the message timeout is applied (as duration) over the local lowres_clock time and parallel read_data messages that share the query may end up having close, but different timeout values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:40 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Botond Dénes	b81f39cec9	reader_permit: add operator<< for reader_resources And use it in tests, it results in actually useful error messages.	2021-07-14 17:19:02 +03:00
Botond Dénes	5b8d6f02eb	reader_permit: remove now unused wait_admission()	2021-07-14 17:19:02 +03:00
Botond Dénes	1b7eea0f52	reader_concurrency_semaphore: admission: flip the switch This patch flips two "switches": 1) It switches admission to be up-front. 2) It changes the admission algorithm. (1) by now all permits are obtained up-front, so this patch just yanks out the restricted reader from all reader stacks and simultaneously switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By doing this admission is now waited on when creating the permit. (2) we switch to an admission algorithm that adds a new aspect to the existing resource availability: the number of used/blocked reads. Namely it only admits new reads if in addition to the necessary amount of resources being available, all currently used readers are blocked. In other words we only admit new reads if all currently admitted reads requires something other than CPU to progress. They are either waiting on I/O, a remote shard, or attention from their consumers (not used currently). We flip these two switches at the same time because up-front admission means cache reads now need to obtain a permit too. For cache reads the optimal concurrency is 1. Anything above that just increases latency (without increasing throughput). So we want to make sure that if a cache reader hits it doesn't get any competition for CPU and it can run to completion. We admit new reads only if the read misses and has to go to disk. Another change made to accommodate this switch is the replacement of the replica side read execution stages which the reader concurrency semaphore as an execution stage. This replacement is needed because with the introduction of up-front admission, reads are not independent of each other any-more. One read executed can influence whether later reads executed will be admitted or not, and execution stages require independent operations to work well. By moving the execution stage into the semaphore, we have an execution stage which is in control of both admission and running the operations in batches, avoiding the bad interaction between the two.	2021-07-14 17:19:02 +03:00
Botond Dénes	844a99a91a	reader_concurrency_semaphore: prepare for up-front admission We want to make permits be admitted up-front, before even being created. As part of this change, we will get rid of the `wait_admission()` method on the permit, instead, the permit will be created as a result of waiting for admission (just like back some time ago). To allow evicted readers to wait for re-admission, a new method `maybe_wait_readmission()` is created, which waits for readmission if the permit is in evicted state. Also refactor the internals of the semaphore to support and favor up-front admission code. As up-front admission is the future we want the permit code to be organized in such a way that it is natural to use with it. This means that the "old-style" admission code might suffer but we tolerate this as it is on its way out. To this end the following changes were done: * Add a _base_resources field to reader_permit which tracks the base cost of said permit. This is passed in the constructor and is used in the first and subsequent admissions. * The base cost is now managed internally by the permit, instead of relying on an external `resource_units` instance, though the old way is still supported temporarily. * Change the admission pipeline to favor the new permit-internally managed base cost variant. * Compatibility with old-style admission: permits are created with 0 base resources, base resources are set with the compatibility method `set_base_resources()` right before admission, then externalized again after admission with `base_resource_as_resource_units()`. These methods will be gone when the old style admission is retired (together with `wait_admission()`).	2021-07-14 16:48:43 +03:00
Botond Dénes	05e6881c73	reader_permit: allow constructing reader_permit from impl& By enabling shared from this for impl and adding a reader permit constructor which takes a shared pointer to an impl. This allows impl members to invoke functions requiring a `reader_permit` instance as a parameter.	2021-07-14 16:48:43 +03:00
Botond Dénes	aa480fa3f9	reader_permit: allow marking blocked Distinguish between permits that are blocked and those that are not. Conceptually a blocked permit is one that needs to wait on either I/O or a remote shard to proceed. This information will be used by admission, which will only admit new reads when all currently used ones are blocked. More on that in the commit introducing this new admission type. This patch only adds the infrastructure, block sites are not marked yet.	2021-07-14 16:48:43 +03:00
Botond Dénes	a5dc48b4b1	reader_permit: allow marking it as used Distinguish between permits that are used and those that are not. These are two subtypes of the current 'active' state (and replace it). Conceptually a permit is used when any readers associated with it have a pending call to any of their async methods, i.e. the consumer is actively consuming from them. This information will be used for admission, together with a new blocked state introduced by a future patch. This patch only adds the infrastructure, use sites are not marked yet.	2021-07-14 16:48:43 +03:00
Botond Dénes	5a20861a1d	reader_permit: add reader_permit_opt	2021-07-14 16:48:43 +03:00
Botond Dénes	a251cc2368	reader_permit: introduce evicted state We want to introduce more fine-grained states for permits than what we have currently, splitting the current 'active' state into multiple sub-states. As a preparatory step, introduce an evicted state too, to keep track of permits that were evicted while being inactive. This will be important in determining what permits need to re-wait admission, once we keep permits across pages. Having an evicted state also aids validating internal state transitions.	2021-07-14 16:48:43 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Botond Dénes	caaa8ef59a	reader_permit: always forward resources This commit conceptually reverts `4c8ab10`. Said commit was meant to prevent the scenario where memory-only permits -- those that don't pass admission but still consume memory -- completely prevent the admission of reads, possibly even causing a deadlock because a permit might even blocks its own admission. The protection introduced by said commit however proved to be very problematic. It made the status of resources on the permit very hard to reason about and created loopholes via which permits could accumulate without tracking or they could even leak resources. Instead of continuing to patch this broken system, this commit does away with this "protection" based on the observation that deadlocks are now prevented anyway by the admission criteria introduced by `0fe75571d9`, which admits a read anyway when all the initial count resources are available (meaning no admitted reader is alive), regardless of availability of memory. The benefits of this revert is that the semaphore now knows about all the resources and is able to do its job better as it is not "lied to" about resource by the permits. Furthermore the status of a permit's resources is much simpler to reason about, there are no more loopholes in unexpected state transitions to swallow/leak resources. To prove that this revert is indeed safe, in the next commit we add robust tests that stress test admission on a highly contested semaphore. This patch also does away with the registered/admitted differentiation of permits, as this doesn't make much sense anymore, instead these two are unified into a single "active" state. One can always tell whether a permit was admitted or not from whether it owns count resources anyway.	2021-04-26 15:56:56 +03:00
Benny Halevy	81391b845f	reader_permit: expose description method Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00

1 2

78 Commits