scylladb

Author	SHA1	Message	Date
Botond Dénes	0aa03f85a3	readers/multishard: reader_lifecycle_policy: add get_read_range() Allows retrieving the current read-range for the reader on the given shard (where the method is called).	2023-03-24 08:40:11 -04:00
Botond Dénes	1f51f752cc	reader_permit: refresh trace_state on new pages To make sure all tracing done on a certain page will make its way into the appropriate trace session. This is a contination of the previous patch (which added trace pointer to the permit).	2023-03-22 04:58:10 -04:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Pavel Emelyanov	9cd1f777a5	database.hh: Remove unused headers Use forward declarations when needed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11667	2022-10-04 09:01:38 +03:00
Botond Dénes	7730419f5c	query-result-writer: stop when tombstone-limit is reached The query result writer now counts tombstones and cuts the page (marking it as a short one) when the tombstone limit is reached. This is to avoid timing out on large span of tombstones, especially prefixes. In the case of unpaged queries, we fail the read instead, similarly to how we do with max result size. If the limit is 0, the previous behaviour is used: tombstones are not taken into consideration at all.	2022-08-10 06:03:38 +03:00
Benny Halevy	c71ef330b2	query-request, everywhere: define and use query_id as a strong type Define query_id as a tagged_uuid So it can be differentiated from other uuid-class types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:13:28 +03:00
Botond Dénes	70b4158ce0	mutation_compactor: detach_state(): make it no-op if partition was exhausted detach_state() allows the user to resume a compaction process later, without having to keep the compactor object alive. This happens by generating and returning the mutation fragments the user has to re-feed to a newly constructed compactor to bring it into the exact same state the current compactor was at the point of stopping the compaction. This state includes the partition-header (partition-start and static-row if any) and the currently active range tombstone. Detaching the state is pointless however when the compaction was stopped such that the currently compacted partition was completely exhausted. Allowing the state to be detached in this case seems benign but it caused a subtle bug in the main user of this feature: the partition range scan algorithm, where the fragments included in the detached state were pushed back into the reader which produced them. If the partition happened to be exhausted -- meaning the next fragment in the reader was a partition-start or EOS -- this resulted in the partition being re-emitted later without a partition-end, resulting in corrupt query-result being generated, in turn resulting in an obscure "IDL frame truncated" error. This patch solves this seemingly benign but sinister bug by making the return value of `detach_state()` an std::optional and returning a disengaged optional when the partition was exhausted.	2022-08-02 06:43:24 +03:00
Botond Dénes	cdd3a364cb	querier: use full_position in shard_mutation_querier Instead of a separate partition key and position-in-partition. This continues the recently started effort to standardize storing of full positions on `full_position`. This patch is also a hidden preparation for read_context::save_readers() multishard_mutation_query.cc) no longer being able to get partition key from compaction state in the future.	2022-08-02 06:43:24 +03:00
Avi Kivity	00cec159d6	Revert "Merge 'multishard_mutation_query: don't unpop partition header of spent partition' from Botond Dénes" This reverts commit `c3bad157e5`, reversing changes made to `e66809d051`. The checks it adds are triggered by some dtests. While it's possible the check is triggered due to an existing problem, better to investigate it out-of-tree. Fixes #11169.	2022-07-31 15:24:33 +03:00
Botond Dénes	f119554106	mutation_compactor: detach_state(): make it no-op if partition was exhausted detach_state() allows the user to resume a compaction process later, without having to keep the compactor object alive. This happens by generating and returning the mutation fragments the user has to re-feed to a newly constructed compactor to bring it into the exact same state the current compactor was at the point of stopping the compaction. This state includes the partition-header (partition-start and static-row if any) and the currently active range tombstone. Detaching the state is pointless however when the compaction was stopped such that the currently compacted partition was completely exhausted. Allowing the state to be detached in this case seems benign but it caused a subtle bug in the main user of this feature: the partition range scan algorithm, where the fragments included in the detached state were pushed back into the reader which produced them. If the partition happened to be exhausted -- meaning the next fragment in the reader was a partition-start or EOS -- this resulted in the partition being re-emitted later without a partition-end, resulting in corrupt query-result being generated, in turn resulting in an obscure "IDL frame truncated" error. This patch solves this seemingly benign but sinister bug by making the return value of `detach_state()` an std::optional and returning a disengaged optional when the partition was exhausted.	2022-07-28 09:02:26 +03:00
Botond Dénes	afa694a20c	querier: use full_position in shard_mutation_querier Instead of a separate partition key and position-in-partition. This continues the recently started effort to standardize storing of full positions on `full_position`. This patch is also a hidden preparation for read_context::save_readers() multishard_mutation_query.cc) no longer being able to get partition key from compaction state in the future.	2022-07-28 08:19:23 +03:00
Botond Dénes	ac9935b645	multishard_mutation_query: remove now pointless compact_for_result_state typedef No need to switch on the now defunct emit_only_live_rows.	2022-07-12 08:44:33 +03:00
Botond Dénes	4d2ce5c304	mutation_compactor: remove emit_only_live_rows template parameter Now that we use emit_only_live_rows::no everywhere we can remove this template parameters. Only the template parameter is removed, the internal logic around it is left in place (will be removed in a next patch), by hard-wiring `only_live()`.	2022-07-12 08:43:49 +03:00
Botond Dénes	bedc82e52c	tree: use emit_only_live_rows::no emit_only_live_rows is a convenience so downstream consumers of the mutation compactors don't have to check the `bool is_live` already passed to them. This convenience however causes a template parameter and additional logic for the compactor. As the most prominent of these consumers (the query result builder) will soon have to switch to emit_only_live_rows::no for other reasons anyway (it will want to count tombstones), we take the opportunity to switch everybody to ::no. This can be done with very little additional complexity to these consumer -- basically an additional if or two. This prepares the ground for removing this template parameter and the associate logic from the compactor.	2022-07-12 08:41:51 +03:00
Botond Dénes	742dc10185	querier: querier_cache: de-override insert() methods Soon, the currently two distinct types of queriers will be merged, as the template parameter differentiating them will be gone. This will make using type based overload for insert() impossible, as 2 out of the 3 types will be the same. Use different names instead.	2022-07-12 08:41:48 +03:00
Botond Dénes	fd5f8f2275	query: have replica provide the last position Use the recently introduced query-result facility to have the replica set the position where the query should continue from. For now this is the same as what the implicit position would have been previously (last row in result), but it opens up the possibility to stop the query at a dead row.	2022-06-23 13:36:24 +03:00
Botond Dénes	7b6b7a49cd	mutlishard_mutation_query: propagate compaction state to result builder Not used in this patch, facilitates further patching.	2022-06-23 13:36:24 +03:00
Botond Dénes	738cb99c53	multishard_mutation_query: defer creating result builder until needed Currently the result builder is created two frames above the method in which actually needed. Push down a factory method instead and create it where actually used. This allows us to pass it arguments that are present only in the method which uses it.	2022-06-23 13:36:24 +03:00
Botond Dénes	58d53b66c1	querier: rely on compactor for position tracking For some time now the compactor track its own position. The querier can make use of this instead of duplicating this effort.	2022-06-23 13:36:24 +03:00
Benny Halevy	5babc609c6	multishard_mutation_query: do_query: couroutinize save_readers lambda To keep it simple. It is unlikely to throw. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:31:17 +03:00
Benny Halevy	921092955b	multishard_mutation_query: do_query: prevent exceptions using coroutine::as_future Optimize error handling by preventing exception try/catch using coroutine::as_future. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:31:17 +03:00
Benny Halevy	7a76ba4038	multishard_mutation_query: read_page: prevent exceptions using coroutine::as_future Optimize error handling by preventing exception try/catch using coroutine::as_future to get query::consume_page's result. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:31:15 +03:00
Benny Halevy	817a0f316a	multishard_mutation_query: save_readers: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:23:14 +03:00
Benny Halevy	804d727b8b	multishard_mutation_query: coroutinize save_readers And use smp::invoke_on_all rather than a home-brewed version of parallel_for_each over all shard ids. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:23:14 +03:00
Benny Halevy	22e5352cc2	multishard_mutation_query: lookup_readers: make noexcept Sot it can be co_awaited efficiently using coroutine::as_future, othwise, any exceptions will escape `as_future`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:23:14 +03:00
Benny Halevy	ea3935507e	multishard_mutation_query: optimize lookup_readers No need to call _db.invoke_on inside a parallel_for_each loop over all shards. Just use _db.invoke_on_all instead. Besides that, there's no need for a .then continuation for assigning the per-shard reader in _readers[shard]. It can be done by the functor running on each db shard. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 09:23:14 +03:00
Benny Halevy	055141fc2e	multishard_mutation_query: do_query: stop ctx if lookup_readers fails lookup_readers might fail after populating some readers and those better be closed before returning the exception. Fixes #10351 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10425	2022-04-26 11:11:52 +03:00
Botond Dénes	d0ea895671	readers: move multishard reader & friends to reader/multishard.cc Since the multishard reader family weighs more than 1K SLOC, it gets its own .cc file.	2022-03-30 15:42:51 +03:00
Botond Dénes	0b5217052d	querier: switch to v2 compactor output The change is mostly mechanical: update all compactor instances to the _v2 variant and update all call-sites, of which there is not that many. As a consequence of this patch, queries -- both single-partition and range-scans -- now do the v2->v1 conversion in the consumers, instead of in the compactor.	2022-03-11 09:24:05 +02:00
Pavel Emelyanov	063da81ab7	code: Convert nothrow construction assertions into concepts The small_vector also has N>0 constraint that's also converted Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-24 19:44:50 +03:00
Botond Dénes	f1e9e3b3b7	compact_mutation: drop support for v1 input	2022-02-21 12:29:24 +02:00
Botond Dénes	f2e2b84038	multishard_mutation_query: migrate to v2 Mostly mechanical transformation. The main difference is in the detached compaction state, from which we now get the range tombstone change, instead of the range tombstone list. The code around this is a bit awkward, will become simpler when compactor drops v1 support.	2022-02-21 12:29:24 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	134601a15e	Merge "Convert input side of mutation compactor to v2" from Botond " With this series the mutation compactor can now consume a v2 stream. On the output side it still uses v1, so it can now act as an online v2->v1 converter. This allows us to push out v2->v1 conversion to as far as the compactor, usually the next to last component in a read pipeline, just before the final consumer. For reads this is as far as we can go, as the intra-node ABI and hence the result-sets built are v1. For compaction we could go further and eliminate conversion altogether, but this requires some further work on both the compactor and the sstable writer and so it is left to be done later. To summarize, this patchset enables a v2 input for the compactor and it updates compaction and single partition reads to use it. " * 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla: table: add make_reader_v2() querier: convert querier_cache and {data,mutation}_querier to v2 compaction: upgrade compaction::make_interposer_consumer() to v2 mutation_reader: remove unecessary stable_flattened_mutations_consumer compaction/compaction_strategy: convert make_interposer_consumer() to v2 mutation_writer: migrate timestamp_based_splitting_writer to v2 mutation_writer: migrate shard_based_splitting_writer to v2 mutation_writer: add v2 clone of feed_writer and bucket_writer flat_mutation_reader_v2: add reader_consumer_v2 typedef mutation_reader: add v2 clone of queue_reader compact_mutation: make start_new_page() independent of mutation_fragment version compact_mutation: add support for consuming a v2 stream compact_mutation: extract range tombstone consumption into own method range_tombstone_assembler: add get_range_tombstone_change() range_tombstone_assembler: add get_current_tombstone()	2022-01-12 14:37:19 +02:00
Botond Dénes	790e73141f	compact_mutation: add support for consuming a v2 stream Consuming either a v1 or v2 stream is supported now, but compacted fragments are still emitted in the v1 format, thus the compactor acts an online downgrader when consuming a v2 stream. This allows pushing out downgrade to v1 on the input side all the way into the compactor. This means that reads for example can now use an all v2 reader pipeline, the still mandatory downgrade to v1 happening at the last possible place: just before creating the result-set. Mandatory because our intra-node ABI is still v1. There are consumers who are ready for v2 in principle (e.g. compaction), they have to wait a little bit more.	2022-01-07 13:42:31 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Michael Livshin	a1b8ba23d2	reader_concurrency_semaphore: convert to flat_mutation_reader_v2 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2021-12-21 11:26:17 +02:00
Raphael S. Carvalho	c3c23dd1e5	multishard_mutation_query: make multi_range_reader::fill_buffer() work even after EOS if fill_buffer() is called after EOS, underlying reader will be fast forwarded to a range pointed to by an invalid iterator, so producing incorrect results. fill_buffer() is changed to return early if EOS was found, meaning that underlying reader already fast forwarded to all ranges managed by multi_range_reader. Usually, consume facilities check for EOS, before calling fill_buffer() but most reader impl check for EOS to avoid correctness issues. Let's do the same here. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211208131423.31612-1-raphaelsc@scylladb.com>	2021-12-08 15:39:11 +02:00
Botond Dénes	5380cb0102	multishard_mutation_query: don't drop data during stateful multi-range reads When multiple ranges are passed to `multishard_{mutation,data}_query()`, it wraps the multishard reader with a multi-range one. This interferes with the disassembly of the multishard reader's buffer at the end of the page, because the multi-range reader becomes the top-level reader, denying direct access to the multishard reader itself, whose buffer is then dropped. This confuses the reading logic, causing data corruption on the next page(s). A further complication is that the multi-range reader can include data from more then one range in its buffer when filling it. To solve this, a special-purpose multi-range is introduced and used instead of the generic one, which solves both these problems by guaranteeing that: * Upon calling fill_buffer(), the entire content of the underlying multishard reader is moved to that of the top-level multi-range reader. So calling `detach_buffer()` guarantees to remove all unconsumed fragments from the top-level readers. * fill_buffer() will never mix data from more than one ranges. It will always stop on range boundaries and will only cross if the last range was consumed entirely. With this, multi-range reads finally work with reader-saving.	2021-12-03 10:45:06 +02:00
Botond Dénes	953603199e	multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward The reader_lifecycle_policy API was created around the idea of shard readers (optionally) being saved and reused on the next page. To do this, the lifecycle policy has to also be able to control the lifecycle of by-reference parameters of readers: the slice and the range. This was possible from day 1, as the readers are created through the lifecycle policy, which can intercept and replace the said parameters with copies that are created in stable storage. There was one whole in the design though: fast-forwarding, which can change the range of the read, without the lifecycle policy knowing about this. In practice this results in fast-forwarded readers being saved together with the wrong range, their range reference becoming stale. The only lifecycle implementation prone to this is the one in `multishard_mutation_query.cc`, as it is the only one actually saving readers. It will fast-forward its reader when the query happens over multiple ranges. There were no problems related to this so far because no one passes more than one range to said functions, but this is incidental. This patch solves this by adding an `update_read_range()` method to the lifecycle policy, allowing the shard reader to update the read range when being fast forwarded. To allow the shard reader to also have control over the lifecycle of this range, a shared pointer is used. This control is required because when an `evictable_reader` is the top-level reader on the shard, it can invoke `create_reader()` with an edited range after `update_read_range()`, replacing the fast-forwarded-to range with a new one, yanking it out from under the feet of the evictable reader itself. By using a shared pointer here, we can ensure the range stays alive while it is the current one.	2021-12-03 10:27:44 +02:00
Botond Dénes	3210dee4a6	multishard_mutation_query: fix reverse scans The read itself has to be done with the reversed schema (query schema) but the result building has to be done with the table schema. For data queries this doesn't matter, but replicate the distinction for consistency (and because this might change).	2021-11-23 14:22:01 +02:00
Tomasz Grabiec	cc56a971e8	database, treewide: Introduce partition_slice::is_reversed() Cleanup, reduces noise. Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>	2021-10-14 12:39:16 +03:00
Botond Dénes	42b677ef6f	querier: consume_page(): remove now unused max_size parameter	2021-09-29 12:15:48 +03:00
Botond Dénes	41facb3270	treewide: move reversing to the mutation sources Push down reversing to the mutation-sources proper, instead of doing it on the querier level. This will allow us to test reverse reads on the mutation source level. The `max_size` parameter of `consume_page()` is now unused but is not removed in this patch, it will be removed in a follow-up to reduce churn.	2021-09-29 12:15:45 +03:00
Botond Dénes	22e216563a	mutlishard_mutation_query: set max result size on used permits `08042c1688` added the query max result size to the permit but only set it for single partition queries. This patch does the same for range-scans in preparation of `query::consume_page()` not propagating max size soon.	2021-09-28 17:03:57 +03:00
Botond Dénes	922295dd8e	multishard_mutation_query: add tracepoint with compaction stats Add the content of the compaction stats introduced in the previous patch to the tracing data. This will help diagnose query performance related problems caused by tombstones.	2021-09-22 14:00:24 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00

1 2 3

145 Commits