scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 20:57:00 +00:00

Author	SHA1	Message	Date
Avi Kivity	f125a3e315	Merge 'tree: finish the reader_permit state renames' from Botond Dénes In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`. This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date. Closes #13573 * github.com:scylladb/scylladb: reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes reader_concurrency_semaphore: update API w.r.t. recent permit state name changes reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes	2023-05-04 18:29:04 +03:00
Botond Dénes	72003dc35c	readers: evictable_reader: skip progress guarantee when next pos is partition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: #13491 Closes #13563	2023-05-02 16:19:32 +03:00
Botond Dénes	804403f618	reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes They is still using the old terminology for permit state names, bring them up to date with the recent state name changes.	2023-04-19 05:20:42 -04:00
Botond Dénes	14bff955e2	readers/multishard: shard_reader: fast-forward created reader to current range When creating the reader, the lifecycle policy might return one that was saved on the last page and survived in the cache. This reader might have skipped some fast-forwarding ranges while sitting in the cache. To avoid using a reader reading a stale range (from the read's POV), check its read range and fast forward it if necessary.	2023-03-24 08:43:03 -04:00
Botond Dénes	0aa03f85a3	readers/multishard: reader_lifecycle_policy: add get_read_range() Allows retrieving the current read-range for the reader on the given shard (where the method is called).	2023-03-24 08:40:11 -04:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Avi Kivity	29a2788b2e	Merge 'reader_concurrency_semaphore: handle read blocked on memory being registered as inactive' from Botond Dénes A read that requested memory and has to wait for it can be registered as inactive. This can happen for example if the memory request originated from a background I/O operation (a read-ahead maybe). Handling this case is currently very difficult. What we want to do is evict such a read on-the-spot: the fact that there is a read waiting on memory means memory is in demand and so inactive reads should be evicted. To evict this reader, we'd first have to remove it from the memory wait list, which is almost impossible currently, because `expiring_fifo<>`, the type used for the wait list, doesn't allow for that. So in this PR we set out to make this possible first, by transforming all current queues to be intrusive lists of permits. Permits are already linked into an intrusive list, to allow for enumerating all existing permits. We use these existing hooks to link the permits into the appropriate queue, and back to `_permit_list` when they are not in any special queue. To make this possible we first have to make all lists store naked permits, moving all auxiliary data fields currently stored in wrappers like `entry` into the permit itself. With this, all queues and lists in the semaphore are intrusive lists, storing permits directly, which has the following implications: * queues no longer take extra memory, as all of them are intrusive * permits are completely self-sufficient w.r.t to queuing: code can queue or dequeue permits just with a reference to a permit at hand, no other wrapper, iterator, pointer, etc. is necessary. * queues don't keep permits alive anymore; destroying a permit will automatically unlink it from the respective queue, although this might lead to use-after-free. Not a problem in practice, only one code-path (`reader_concurrenc_semaphore::with_permit()`) had to be adjusted. After all that extensive preparations, we can now handle the case of evicting a reader which is queued on memory. Fixes: #12700 Closes #12777 * github.com:scylladb/scylladb: reader_concurrency_semaphore: handle reader blocked on memory becoming inactive reader_concurrency_semaphore: move _permit_list next to the other lists reader_permit: evict inactive read on timeout reader_concurrency_semaphore: move inactive_read to .cc reader_concurrency_semaphore: store permits in _inactive_reads reader_concurrency_semaphore: inactive_read: de-inline more methods reader_concurrency_semaphore: make _ready_list intrusive reader_permit: add wait_for_execution state reader_concurrency_semaphore: make wait lists intrusive reader_concurrency_semaphore: move most wait_queue methods out-of-line reader_concurrency_semaphore: store permits directly in queues reader_permit: introduce (private) operator * and -> reader_concurrency_semaphore: remove redundant waiters() member reader_concurrency_semaphore: add waiters counter reader_permit: use check_abort() for timeout reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param reader_concurrency_semaphroe: make foreach_permit() const reader_permit: add get_schema() and get_op_name() accessors reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept	2023-03-15 20:10:19 +02:00
Kefu Chai	e21926f602	flat_mutation_reader_v2: use maybe_yield() when appropriate just came across this part of code, as `maybe_yield()` is a wrapper around "if should_yield(): yield()", so better off using it for more concise code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13107	2023-03-15 15:58:55 +02:00
Botond Dénes	6181c08191	reader_concurrency_semaphore: move inactive_read to .cc It is not used in the header anymore and moving it to the .cc allows us to remove the dependency on flat_mutation_reader_v2.hh.	2023-03-13 08:07:53 -04:00
Botond Dénes	bcfb8715f9	reader_permit: introduce (private) operator * and -> Currently the reader_permit has some private methods that only the semaphore's internal calls. But this method of communication is not consistent, other times the semaphore accesses the permit impl directly, calling methods on that. This commit introduces operator * and -> for reader_permit. With this, the semaphore internals always call the reader_permit::impl methods direcly, either via a direct reference, or via the above operators. This makes the permit internface a little narrower and reduces boilerplate code.	2023-03-09 06:53:11 -05:00
Botond Dénes	2694aa1078	reader_permit: use check_abort() for timeout Instead of having callers use get_timeout(), then compare it against the current time, set up a timeout timer in the permit, which assigned a new `_ex` member (a `std::exception_ptr`) to the appropriate exception type when it fires. Callers can now just poll check_abort() which will throw when `_ex` is not null. This is more natural and allows for more general reasons for aborting reads in the future. This prepares the ground for timeouts being managed inside the permit, instead of by the semaphore. Including timing out while in a wait queue.	2023-03-09 06:53:09 -05:00
Kefu Chai	563fbb2d11	build: cmake: extract more subsystem out into its own CMakeLists.txt namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica, service, tools, tracing and transport. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Botond Dénes	46efdfa1a1	Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()` Fixes: #12249 Closes #12978 * github.com:scylladb/scylladb: flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE make_nonforwardable: test through run_mutation_source_tests make_nonforwardable: next_partition and fast_forward_to when single_partition is true make_forwardable: fix next_partition flat_mutation_reader_v2: drop forward_buffer_to nonforwardable reader: fix indentation nonforwardable reader: refactor, extract reset_partition nonforwardable reader: add more tests nonforwardable reader: no partition_end after fast_forward_to() nonforwardable reader: no partition_end after next_partition() nonforwardable reader: no partition_end for empty reader row_cache: pass partition_start though nonforwardable reader	2023-03-01 09:58:14 +02:00
Petr Gusev	989ef9d358	make_nonforwardable: next_partition and fast_forward_to when single_partition is true This flag designates that we should consume only one partition from the underlying reader. This means that attempts to move to another partition should cause an EOS.	2023-02-28 23:42:34 +04:00
Petr Gusev	a67776b750	make_forwardable: fix next_partition When next_partition is called, the buffer could contain partition_start and possibly static_row. In this case clear_buffer_to_next_partition will not remove anything from the buffer and the reader position should not change. Before this patch, however, we used to set _end_of_stream=false, which violated the forwardable-reader contract - the data of the next partition was emitted after the data of the first partition without intermediate EOS. This bug was found when debugging test_make_nonforwardable_from_mutations_as_mutation_source flakiness. A corresponding focused test_make_forwardable_next_partition has been added to exercise this problem.	2023-02-28 23:11:45 +04:00
Petr Gusev	64427b9164	flat_mutation_reader_v2: drop forward_buffer_to This is just a strange method I came across. It effectively does nothing but clear_buffer().	2023-02-28 23:00:02 +04:00
Petr Gusev	a517e1d6ad	nonforwardable reader: fix indentation	2023-02-28 23:00:02 +04:00
Petr Gusev	beeffb899f	nonforwardable reader: refactor, extract reset_partition No observable behaviour changes, just refactor the code.	2023-02-28 23:00:02 +04:00
Petr Gusev	88cd1c3700	nonforwardable reader: no partition_end after fast_forward_to() This patch fixes the problem with method fast_forward_to which is similar to the one with next_partition, no partition_end should be injected for the partition if fast_forward_to was called inside it.	2023-02-28 23:00:02 +04:00
Petr Gusev	8ff96e1bce	nonforwardable reader: no partition_end after next_partition() Before the patch, nonforwardable reader injected partition_end unconditionally. This caused problems in case next_partition() was called, the downstream reader might have already injected its own partition_end marker, and the one from nonforwardable reader was a duplicate. Fixes: #12249	2023-02-28 23:00:02 +04:00
Petr Gusev	9c5c380b0b	nonforwardable reader: no partition_end for empty reader The patch introduces the _partition_is_open flag, inject partition_end only if there was some data in the input reader. A simple unit test has been added for the nonforwardable reader which checks this new behaviour.	2023-02-28 22:59:56 +04:00
Kefu Chai	3ae11de204	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:53 +08:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Tomasz Grabiec	ccf3a13648	range_tombstone_change_merger: Introduce peek() Returns the current tombstone without affecting state.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	42f5a7189d	readers: Extract range_tombstone_change_merger	2023-01-27 19:15:39 +01:00
Botond Dénes	1d273a98b9	readers/multishard: shard_reader::close() silence read-ahead timeouts Timouts are benign, especially on a read-ahead that turned out to be not needed at all. They just introduce noise in the logs, so silence them. Fixes: #12435 Closes #12441	2023-01-04 16:10:09 +02:00
Tomasz Grabiec	23e4c83155	position_in_partition: Make after_key() work with non-full keys This fixes a long standing bug related to handling of non-full clustering keys, issue #1446. after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys.	2022-12-14 14:47:33 +01:00
Botond Dénes	49ec7caf27	mutation_fragment_stream_validator: avoid allocation when stream is correct Currently the ctor of said class always allocates as it copies the provided name string and it creates a new name via format(). We want to avoid this, now that the validator is used on the read path. So defer creating the formatted name to when we actually want to log something, which is either when log level is debug or when an error is found. We don't care about performance in either case, but we do care about it on the happy path. Further to the above, provide a constructor for string literal names and when this is used, don't copy the name string, just save a view to it. Refs: #11174 Closes #12042	2022-11-22 19:19:18 +02:00
Botond Dénes	5c245b4a5e	mutation_fragment_stream_validator: add a 'none' validation level Which, as its name suggests, makes the validating filter not validate anything at all. This validation level can be used effectively to make it so as if the validator was not there at all.	2022-11-11 09:58:44 +02:00
Botond Dénes	f1a039fc2b	treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} We just added a convenience static factory method for partition start, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Tomasz Grabiec	687df05e28	db: make_forwardable::reader: Do not emit range_tombstone_change with position past the range Since the end bound is exclusive, the end position should be before_key(), not after_key(). Affects only tests, as far as I know, only there we can get an end bound which is a clustering row position. Would cause failures once row cache is switched to v2 representation because of violated assumptions about positions. Introduced in `76ee3f029c` Closes #11823	2022-10-24 17:06:52 +03:00
Tomasz Grabiec	9dae2b9c02	Merge 'mutation_fragment_stream_validator: various API improvements' from Botond Dénes The low-level `mutation_fragment_stream_validator` gets `reset()` methods that until now only the high-level `mutation_fragment_stream_validating_filter` had. Active tombstone validation is pushed down to the low level validator. The low level validator, which was a pain to use until now due to being very fussy on which subset of its API one used, is made much more robust, not requiring the user to stick to a subset of its API anymore. Closes #11614 * github.com:scylladb/scylladb: mutation_fragment_stream_validator: make interface more robust mutation_fragment_stream_validator: add reset() to validating filter mutation_fragment_stream_validator: move active tomsbtone validation into low level validator	2022-10-03 16:23:46 +02:00
Avi Kivity	aca96c4103	readers/multishard: restore shard_reader_v2::do_fill_buffer() indentation	2022-09-30 19:19:51 +03:00
Avi Kivity	b08196f3b3	readers/multishard: convert shard_reader_v2::do_fill_buffer() to a pure coroutine do_full_buffer() is an eclectic mix of coroutines and continuations. That makes it hard to follow what is running sequentially and concurrently. Convert it into a pure coroutine by changing internal continuations to lambda coroutines. These lambda coroutines are guarded with seastar::coroutine::lambda. Furthermore, a future that is co_awaited is converted to immediate co_await (without an intermediate future), since seastar::coroutine::lambda only works if the coroutine is awaited in the same statement it is defined on.	2022-09-30 19:19:48 +03:00
Botond Dénes	895522db23	mutation_fragment_stream_validator: make interface more robust The validator has several API families with increasing amount of detail. E.g. there is an `operator()(mutation_fragment_v2::kind)` and an overload also taking a position. These different API families currently cannot be mixed. If one uses one overload-set, one has to stick with it, not doing so will generate false-positive failures. This is hard to explain in documentation to users (provided they even read it). Instead, just make the validator robust enough such that the different API subsets can be mixed in any order. The validator will try to make most of the situation and validate as much as possible. Behind the scenes all the different validation methods are consolidated into just two: one for the partition level, the other for the intra-partition level. All the different overloads just call these methods passing as much information as they have. A test is also added to make sure this works.	2022-09-26 13:26:26 +03:00
Botond Dénes	4d017b6d7e	mutation_fragment_stream_validator: add reset() to validating filter Allow the high level filtering validator to be reset() to a certain position, so it can be used in situations where the consumption is not continuous (fast-forwarding or paging).	2022-09-26 10:17:28 +03:00
Botond Dénes	a8cbf66573	mutation_fragment_stream_validator: move active tomsbtone validation into low level validator Currently the active range tombstone change is validated in the high level `mutation_fragment_stream_validating_stream`, meaning that users of the low-level `mutation_fragment_stream_validator` don't benefit from checking that tombstones are properly closed. This patch moves the validation down to the low-level validator (which is what the high-level one uses under the hood too), and requires all users to pass information about changes to the active tombstone for each fragment.	2022-09-26 10:17:27 +03:00
Tomasz Grabiec	ccbfe2ef0d	Merge 'Fix invalid mutation fragment stream issues' from Botond Dénes Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own. The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state. The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans. Closes #11604 * github.com:scylladb/scylladb: shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied readers/mutation_readers: compacting_reader: remember injected partition-end db/view: view_builder::execute(): only inject partition-start if needed	2022-09-22 17:57:27 +02:00
Botond Dénes	0ccb23d02b	shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied Commit `8ab57aa` added a yield to the buffer-copy loop, which means that the copy can yield before done and the multishard reader might see the half-copied buffer and consider the reader done (because `_end_of_stream` is already set) resulting in the dropping the remaining part of the buffer and in an invalid stream if the last copied fragment wasn't a partition-end. Fixes: #11561	2022-09-22 13:54:36 +03:00
Botond Dénes	16a0025dc3	readers/mutation_readers: compacting_reader: remember injected partition-end Currently injecting a partition-end doesn't update `_last_uncompacted_kind`, which will allow for a subsequent `next_partition()` call to trigger injecting a partition-end, leading to an invalid mutation fragment stream (partition-end after partition-end). Fix by changing `_last_uncompacted_kind` to `partition_end` when injecting a partition-end, making subsequent injection attempts noop. Fixes: #11608	2022-09-22 13:54:36 +03:00
Botond Dénes	ef7471c460	readers/mutation_reader: stream validator: fix log level detection logic The mutation fragment stream validator filter has a detailed debug log in its constructor. To avoid putting together this message when the log level is above debug, it is enclosed in an if, activated when log level is debug or trace... at least that was intended. Actually the if is activated when the log level is debug or above (info, warn or error) but is only actually logged if the log level is exactly debug. Fix the logic to work as intended. Closes #11603	2022-09-22 09:41:45 +03:00
Benny Halevy	8b38893895	mutation_compactor: pass tombstone_gc_state to compact_mutation_state Used in get_gc_before. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	7e4612d3aa	mutation_readers: pass tombstone_gc_state to compating_reader To be passed further done to `compact_mutation_state` in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:14 +03:00
Raphael S. Carvalho	5757cc5160	mutation_reader_merger: fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220803003010.11551-1-raphaelsc@scylladb.com>	2022-08-03 14:33:07 +03:00
Raphael S. Carvalho	934af9be52	mutation_reader_merger: Drop unneeded readers as soon as possible Today, mutation_reader_merger drops unneeded readers in batches of 4, meaning that the merger is having to keep the memory used by 3 unneeded readers in addition to the ones being currently read from. As each may own a lot of memory, the combined effect of this waste, coming from parallel reads, can potentially cause memory pressure. This batching behavior was introduced in `b524f96a74`, when readers had to be destroyed synchronously, as flat_mutation_reader lacked an async close interface. But we have gone a long way since then. Readers can be closed asynchronously and outstanding I/O requests will be cancelled on close. Now, we'll close readers as soon they're uneeded, one at a time, using a continuation chain. If we're submitting close calls faster than we can retire them, then we wait for their completion, preventing memory usage from growing unbounded. The benefit of this new approach will be very good when combining disjoint readers, where only one is active at a time for producing fragments. As soon as we're done with the current one, then it will be closed allowing its memory to be released, before we move on to the next reader that follows. Refs #11040. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11167	2022-08-01 20:06:29 +03:00
Botond Dénes	4d2ce5c304	mutation_compactor: remove emit_only_live_rows template parameter Now that we use emit_only_live_rows::no everywhere we can remove this template parameters. Only the template parameter is removed, the internal logic around it is left in place (will be removed in a next patch), by hard-wiring `only_live()`.	2022-07-12 08:43:49 +03:00
David Garcia	bb21c3c869	Move dev docs to docs/dev	2022-06-24 18:07:08 +01:00
Michael Livshin	029508b77c	flat_mutation_reader ist tot Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00

1 2 3

105 Commits