scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 02:20:37 +00:00

Author	SHA1	Message	Date
Botond Dénes	49ec7caf27	mutation_fragment_stream_validator: avoid allocation when stream is correct Currently the ctor of said class always allocates as it copies the provided name string and it creates a new name via format(). We want to avoid this, now that the validator is used on the read path. So defer creating the formatted name to when we actually want to log something, which is either when log level is debug or when an error is found. We don't care about performance in either case, but we do care about it on the happy path. Further to the above, provide a constructor for string literal names and when this is used, don't copy the name string, just save a view to it. Refs: #11174 Closes #12042	2022-11-22 19:19:18 +02:00
Botond Dénes	5c245b4a5e	mutation_fragment_stream_validator: add a 'none' validation level Which, as its name suggests, makes the validating filter not validate anything at all. This validation level can be used effectively to make it so as if the validator was not there at all.	2022-11-11 09:58:44 +02:00
Botond Dénes	f1a039fc2b	treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} We just added a convenience static factory method for partition start, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Tomasz Grabiec	687df05e28	db: make_forwardable::reader: Do not emit range_tombstone_change with position past the range Since the end bound is exclusive, the end position should be before_key(), not after_key(). Affects only tests, as far as I know, only there we can get an end bound which is a clustering row position. Would cause failures once row cache is switched to v2 representation because of violated assumptions about positions. Introduced in `76ee3f029c` Closes #11823	2022-10-24 17:06:52 +03:00
Tomasz Grabiec	9dae2b9c02	Merge 'mutation_fragment_stream_validator: various API improvements' from Botond Dénes The low-level `mutation_fragment_stream_validator` gets `reset()` methods that until now only the high-level `mutation_fragment_stream_validating_filter` had. Active tombstone validation is pushed down to the low level validator. The low level validator, which was a pain to use until now due to being very fussy on which subset of its API one used, is made much more robust, not requiring the user to stick to a subset of its API anymore. Closes #11614 * github.com:scylladb/scylladb: mutation_fragment_stream_validator: make interface more robust mutation_fragment_stream_validator: add reset() to validating filter mutation_fragment_stream_validator: move active tomsbtone validation into low level validator	2022-10-03 16:23:46 +02:00
Avi Kivity	aca96c4103	readers/multishard: restore shard_reader_v2::do_fill_buffer() indentation	2022-09-30 19:19:51 +03:00
Avi Kivity	b08196f3b3	readers/multishard: convert shard_reader_v2::do_fill_buffer() to a pure coroutine do_full_buffer() is an eclectic mix of coroutines and continuations. That makes it hard to follow what is running sequentially and concurrently. Convert it into a pure coroutine by changing internal continuations to lambda coroutines. These lambda coroutines are guarded with seastar::coroutine::lambda. Furthermore, a future that is co_awaited is converted to immediate co_await (without an intermediate future), since seastar::coroutine::lambda only works if the coroutine is awaited in the same statement it is defined on.	2022-09-30 19:19:48 +03:00
Botond Dénes	895522db23	mutation_fragment_stream_validator: make interface more robust The validator has several API families with increasing amount of detail. E.g. there is an `operator()(mutation_fragment_v2::kind)` and an overload also taking a position. These different API families currently cannot be mixed. If one uses one overload-set, one has to stick with it, not doing so will generate false-positive failures. This is hard to explain in documentation to users (provided they even read it). Instead, just make the validator robust enough such that the different API subsets can be mixed in any order. The validator will try to make most of the situation and validate as much as possible. Behind the scenes all the different validation methods are consolidated into just two: one for the partition level, the other for the intra-partition level. All the different overloads just call these methods passing as much information as they have. A test is also added to make sure this works.	2022-09-26 13:26:26 +03:00
Botond Dénes	4d017b6d7e	mutation_fragment_stream_validator: add reset() to validating filter Allow the high level filtering validator to be reset() to a certain position, so it can be used in situations where the consumption is not continuous (fast-forwarding or paging).	2022-09-26 10:17:28 +03:00
Botond Dénes	a8cbf66573	mutation_fragment_stream_validator: move active tomsbtone validation into low level validator Currently the active range tombstone change is validated in the high level `mutation_fragment_stream_validating_stream`, meaning that users of the low-level `mutation_fragment_stream_validator` don't benefit from checking that tombstones are properly closed. This patch moves the validation down to the low-level validator (which is what the high-level one uses under the hood too), and requires all users to pass information about changes to the active tombstone for each fragment.	2022-09-26 10:17:27 +03:00
Tomasz Grabiec	ccbfe2ef0d	Merge 'Fix invalid mutation fragment stream issues' from Botond Dénes Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own. The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state. The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans. Closes #11604 * github.com:scylladb/scylladb: shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied readers/mutation_readers: compacting_reader: remember injected partition-end db/view: view_builder::execute(): only inject partition-start if needed	2022-09-22 17:57:27 +02:00
Botond Dénes	0ccb23d02b	shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied Commit `8ab57aa` added a yield to the buffer-copy loop, which means that the copy can yield before done and the multishard reader might see the half-copied buffer and consider the reader done (because `_end_of_stream` is already set) resulting in the dropping the remaining part of the buffer and in an invalid stream if the last copied fragment wasn't a partition-end. Fixes: #11561	2022-09-22 13:54:36 +03:00
Botond Dénes	16a0025dc3	readers/mutation_readers: compacting_reader: remember injected partition-end Currently injecting a partition-end doesn't update `_last_uncompacted_kind`, which will allow for a subsequent `next_partition()` call to trigger injecting a partition-end, leading to an invalid mutation fragment stream (partition-end after partition-end). Fix by changing `_last_uncompacted_kind` to `partition_end` when injecting a partition-end, making subsequent injection attempts noop. Fixes: #11608	2022-09-22 13:54:36 +03:00
Botond Dénes	ef7471c460	readers/mutation_reader: stream validator: fix log level detection logic The mutation fragment stream validator filter has a detailed debug log in its constructor. To avoid putting together this message when the log level is above debug, it is enclosed in an if, activated when log level is debug or trace... at least that was intended. Actually the if is activated when the log level is debug or above (info, warn or error) but is only actually logged if the log level is exactly debug. Fix the logic to work as intended. Closes #11603	2022-09-22 09:41:45 +03:00
Benny Halevy	8b38893895	mutation_compactor: pass tombstone_gc_state to compact_mutation_state Used in get_gc_before. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	7e4612d3aa	mutation_readers: pass tombstone_gc_state to compating_reader To be passed further done to `compact_mutation_state` in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:14 +03:00
Raphael S. Carvalho	5757cc5160	mutation_reader_merger: fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220803003010.11551-1-raphaelsc@scylladb.com>	2022-08-03 14:33:07 +03:00
Raphael S. Carvalho	934af9be52	mutation_reader_merger: Drop unneeded readers as soon as possible Today, mutation_reader_merger drops unneeded readers in batches of 4, meaning that the merger is having to keep the memory used by 3 unneeded readers in addition to the ones being currently read from. As each may own a lot of memory, the combined effect of this waste, coming from parallel reads, can potentially cause memory pressure. This batching behavior was introduced in `b524f96a74`, when readers had to be destroyed synchronously, as flat_mutation_reader lacked an async close interface. But we have gone a long way since then. Readers can be closed asynchronously and outstanding I/O requests will be cancelled on close. Now, we'll close readers as soon they're uneeded, one at a time, using a continuation chain. If we're submitting close calls faster than we can retire them, then we wait for their completion, preventing memory usage from growing unbounded. The benefit of this new approach will be very good when combining disjoint readers, where only one is active at a time for producing fragments. As soon as we're done with the current one, then it will be closed allowing its memory to be released, before we move on to the next reader that follows. Refs #11040. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11167	2022-08-01 20:06:29 +03:00
Botond Dénes	4d2ce5c304	mutation_compactor: remove emit_only_live_rows template parameter Now that we use emit_only_live_rows::no everywhere we can remove this template parameters. Only the template parameter is removed, the internal logic around it is left in place (will be removed in a next patch), by hard-wiring `only_live()`.	2022-07-12 08:43:49 +03:00
David Garcia	bb21c3c869	Move dev docs to docs/dev	2022-06-24 18:07:08 +01:00
Michael Livshin	029508b77c	flat_mutation_reader ist tot Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	00b2e7b2c5	mutation_source: add ::make_fragment_v1_stream() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	1b98692c8c	introduce mutation_fragment_v1_stream At this point, none of the remaining uses of `flat_mutation_reader` (all of which are results of calling `downgrade_to_v1()` anyway) actually need a full-featured flat mutation reader with its own separate buffer etc. `mutation_fragment_v1_stream` can only be constructed by wrapping a `flat_mutation_reader_v2`, contains enough functionality for the remaining consumers of `mutation_fragment_v1` sources and unit tests and no more, and does not buffer. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Benny Halevy	9c5feb2781	mutation_readers: queue_reader_v2: detach from handle when destroyed The handle must not point at this reader implementation after it's destroyed. This fixes use-after-free when the queue_reader_v2 is destroyed first as repair_writer_impl::_queue_reader, before repair_writer_impl::_mq is destroyed. The issue was introduced in `39205917a8` in the definition of `repair_writer_impl`. Fixes #10528 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-19 11:48:03 +03:00
Avi Kivity	325eb9b4d2	Merge 'compaction: get rid of reader v1' from Benny Halevy This series gets rid of the remaining usage of flat_mutation_reader v1 in compaction Test: sstable_compaction_test Closes #10454 * github.com:scylladb/scylla: compaction: sanitize headers from flat_mutation_reader v1 flat_mutation_reader: get rid of class filter compaction: cleanup_compaction: make_partition_filter: return flat_mutation_reader_v2::filter	2022-05-01 19:29:10 +03:00
Benny Halevy	4c35de962f	flat_mutation_reader: get rid of class filter It is no longer in use. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-28 17:19:20 +03:00
Botond Dénes	178c271bf4	readers: make upgrade_to_v2() private The only user is the tests of downgrade_to_v1(), which uses it through mutation source. To avoid any new users popping up, we make it a private method of the latter. In the process the pass-through optimization is dropped, it is not needed for tests anyway.	2022-04-28 14:12:24 +03:00
Botond Dénes	7420fb9411	readers: remove v1 forwardable reader No users.	2022-04-28 14:12:24 +03:00
Botond Dénes	f527956cdb	readers: remove v1 empty_reader The only user is row level repair: it is replaced with downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has lots of downgrade_to_v1() calls, we will deal with these later all at once. Another use is the empty mutation source, this is trivially converted to use the v2 variant.	2022-04-28 14:12:24 +03:00
Botond Dénes	ea37e9c04e	readers: remove v1 delegating_reader The only user is a test, which is hereby converted to use the v2 delegating reader.	2022-04-28 14:12:24 +03:00
Avi Kivity	3a81cb7cc3	readers/from_mutations_v2.hh: make self-contained Due to an inline function, we need the definition of flat_mutation_reader_v2.hh, so include it.	2022-04-27 15:55:16 +03:00
Avi Kivity	8624718983	Merge "row_cache: update reader implementations to v2" from Botond " cache_flat_mutation_reader gets a native v2 implementation. The underlying mutation representation is not changed: range deletions are still stored as v1 range_tombstones in mutation_partition. These are converted to range tombstone changes during reading. This allows for separating the change of a native v2 reader implementation and a native v2 in-memory storage format, enabling the two to be done at separate times and incrementally. This means there is still conversion ingoing when reading from cache and when populating, but when reading from underlying, the stream can now be passed through as-is without conversions. Also, any future v2 related changes to the in-memory storage will now be limited to the cache reader implementation itself. In the process, the non-forwarding reader, whose only user is the cache, is also converted to v2. " Performance results reported by Botond: " build/release/test/perf/perf_simple_query -c1 -m2G --flush -- duration=20 BEFORE median 130421.76 tps ( 71.1 allocs/op, 12.1 tasks/op, 47462 insns/op) median absolute deviation: 319.64 maximum: 131028.33 minimum: 127502.55 AFTER median 133297.41 tps ( 64.1 allocs/op, 12.2 tasks/op, 45406 insns/op) median absolute deviation: 2964.24 maximum: 137581.56 minimum: 123739.4 Getting rid of those upgrade/downgrade was good for allocs and ops. Curiously there is a 0.1 rise in number of tasks though. " * 'row-cache-readers-v2/v1' of https://github.com/denesb/scylla: row_cache: update reader implementations to v2 range_tombstone_change_generator: flush(): add end_of_range readers/nonforwardable: convert to v2 read_context: fix indentation read_context: coroutinize move_to_next_partition() row_cache: cache_entry::read(): return v2 reader row_cache: return v2 readers from make_reader*() readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/	2022-04-23 19:10:43 +03:00
Botond Dénes	7626beb729	readers/nonforwardable: convert to v2 It has a single user, the row cache, which for now has to upgrade/downgrade around the nonforwardable reader, but this will go away in the next patches when the row cache readers are converted to v2 proper.	2022-04-21 14:34:00 +03:00
Botond Dénes	b061acb668	Merge 'Remove queue reader v1' from Mikołaj Sielużycki The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions. tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test) Closes #10371 * github.com:scylladb/scylla: readers: Remove queue_reader v1 and associated code. repair: Make mutation_fragment_queue internally upgrade fragments to v2 repair: Make mutation_fragment_queue::impl a seastar::shared_ptr	2022-04-21 12:34:48 +03:00
Mikołaj Sielużycki	f74fd0dd80	readers: Remove queue_reader v1 and associated code.	2022-04-20 17:56:34 +02:00
Botond Dénes	c3c71b3aa5	readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/ The argument type (v1 or v2 reader) is enough to disambiguate and overloading the v1 method makes a transition to v2 more seamless.	2022-04-20 10:59:09 +03:00
Avi Kivity	1e1c0226a6	treewide: abort() after switch in formatters It is typical in switch statements to select on an enum type and rely on the compliler to complain if an enum value was missed. But gcc isn't satisified since the enum could have a value outside the declared list. Call abort() in this impossible situation to pacify it.	2022-04-18 12:27:18 +03:00
Benny Halevy	ddb5166b82	mutation readers: refactor make_flat_mutation_reader_from_mutation*_v2 Extract the common parts of the single mutation reader and the vector-based variant into mutation_reader_base and reuse from both readers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-14 11:15:17 +03:00
Benny Halevy	e85241d5b6	mutation readers: add make_flat_mutation_reader_from_mutation_v2 Optimize reading from a single partition. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-14 11:14:43 +03:00
Benny Halevy	394eb1271d	readers: delete slice_mutation.hh slice_mutations() is currently used only by readers/mutation_readers.cc so there's no need to expose it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-14 08:41:31 +03:00
Benny Halevy	002be743f6	reader: upgrading_consumer: let range_tombstone_change_generator emit last closing change When flushing range tombstones up to position_in_partition::after_all_clustered_rows(), the range_tombstone_change_generator now emits the closing range_tombstone_change, so there's no need for the upgrading_consumer to do so too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-04 17:00:53 +03:00
Benny Halevy	5ca73019dd	shard_reader_v2: do_fill_buffer: reserve buffer space ahead To prevent unneeded reallocations, just reserve the pre-known number of entries before pushing them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220402130847.625085-2-bhalevy@scylladb.com>	2022-04-03 11:28:32 +03:00
Benny Halevy	8ab57aa4ab	shard_reader_v2: do_fill_buffer: maybe yield when copying result Prevent a reactor stall with e.g. large number of range tombstones. Fixes #10314 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220402130847.625085-1-bhalevy@scylladb.com>	2022-04-03 11:05:14 +03:00
Botond Dénes	a325d3434a	Merge "make_slicing_filtering_reader(): return flat mutation reader v2" from Michael Livshin " Tests: unit(dev) " * 'slicing-filtering-v2' of https://github.com/cmm/scylla: make_slicing_filtering_reader(): return flat mutation reader v2 mutation_readers: refactor generic partition slicing logic	2022-04-01 11:08:25 +03:00
Michael Livshin	830aa041a8	make_slicing_filtering_reader(): return flat mutation reader v2 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-03-31 19:59:53 +03:00
Michael Livshin	aac51be0cc	mutation_readers: refactor generic partition slicing logic There are at least 1 actual and 1 potential users for it; this change converts the existing one. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-03-31 19:59:53 +03:00
Avi Kivity	af07519928	Merge "Remove reader from mutations v1" from Botond " First migrate all users to the v2 variant, all of which are tests. However, to be able to properly migrate all tests off it, a v2 variant of the restricted reader is also needed. All restricted reader users are then migrated to the freshly introduced v2 variant and the v1 variant is removed. Users include: * replica::table::make_reader_v2() * streaming_virtual_table::as_mutation_source() * sstables::make_reader() * tests This allows us to get rid of a bunch of conversions on the query path, which was mostly v2 already. With a few tests we did kick the can down the road by wrapping the v2 reader in `downgrade_to_v1()`, but this series is long enough already. Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug) " * 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla: readers: remove now unused v1 reader from mutations test: move away from v1 reader from mutations test/boost/mutation_reader_test: use fragment_scatterer test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh test/boost: mutation_fragment_test: refactor fragment_scatterer readers: remove now unused v1 reversing reader test/boost/flat_mutation_reader_test: convert to v2 frozen_mutation: fragment_and_freeze(): convert to v2 frozen_mutation: coroutinize fragment_and_freeze() readers: migrate away from v1 reversing reader db/virtual_table: use v2 variant of reversing and forwardable readers replica/table: use v2 variant of reversing reader sstables/sstable: remove unused make_crawling_reader_v1() sstables/sstable: remove make_reader_v1() readers: add v2 variant of reversing reader readers/reversing: remove FIXME readers: reader from mutations: use mutation's own schema when slicing	2022-03-31 13:29:11 +03:00
Botond Dénes	7d49afe78b	readers: remove now unused v1 reader from mutations	2022-03-31 10:36:26 +03:00
Botond Dénes	b8f0ab3b98	readers: remove now unused v1 reversing reader	2022-03-31 10:04:45 +03:00
Botond Dénes	fc27b6b7ed	readers: migrate away from v1 reversing reader The only internal user is the v1 make reader from mutations, we use a downgrade/upgrade to be able to use the v2 reversing reader there. This is ugly but the v1 reader from mutations is going away soon too, so not a real problem.	2022-03-31 09:57:48 +03:00

1 2

76 Commits