scylladb

Author	SHA1	Message	Date
Ernest Zaslavsky	d624413ddd	treewide: Move query related files to a new `query` directory As requested in #22120, moved the files and fixed other includes and build system. Moved files: - query.cc - query-request.hh - query-result.hh - query-result-reader.hh - query-result-set.cc - query-result-set.hh - query-result-writer.hh - query_id.hh - query_result_merger.hh Fixes: #22120 This is a cleanup, no need to backport Closes scylladb/scylladb#25105	2025-09-16 23:40:47 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Łukasz Paszkowski	b3bf555036	Fix comments refering to half-reversed (legacy) slices	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	da95f44adc	readers: Use reversed schema and native reversed slices The reconcilable_result is built as it would be constructed for forward read queries for tables with reversed order. Mutations constructed for reversed queries are consumed forward. Drop overloaded reversed functions that reverse read_command and reconcilable_result directly and keep only those requiring smart pointers. They are not used any more.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	fbd324b5cd	mutation_query: Add reversed function to reverse reconcilable_result The reconcilable_result is reversed by reversing mutations for all paritions it holds. Reversing is asynchronous to avoid potential stall. Use for transitions between legacy and native formats and in order to support mixed-nodes clusters.	2024-08-13 10:03:46 +02:00
Pavel Emelyanov	d90db016bf	treewide: Use partition_slice::is_reversed() Continuation of `cc56a971e8`, more noisy places detected Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17763	2024-03-13 08:52:46 +02:00
Botond Dénes	35e6cbf42e	mutation_query: reconcilable_result: add merge_disjoint() Merging two disjoint reconcilable_result instances.	2024-02-21 02:08:48 -05:00
Kefu Chai	c937827308	mutation_query: add formatter for reconcilable_result::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for reconcilable_result::printer, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16186	2023-11-26 20:20:50 +02:00
Michał Chojnowski	002357e238	mutation_query: properly send range tombstones in reverse queries reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, are not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down, will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired.	2023-11-08 14:54:48 +01:00
Tomasz Grabiec	8b7623f49e	database: Fix accounting of small partitions in mutation query The partition key size was ignored by the accounter, as well as the partition tombstone. As a result, a sequence of partitions with just tombstones would be accounted as taking no memory and page size limitter to not kick in. Fix by accounting the real size of accumulated frozen_mutation. Also, break pages across partitions even if there are no live rows. The coordinator can handle it now. Refs #7933	2023-09-22 02:53:14 -04:00
Botond Dénes	7e7101c180	Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes" This reverts commit `628e6ffd33`, reversing changes made to `45ec76cfbf`. The test included with this PR is flaky and often breaks CI. Revert while a fix is found. Fixes: #15371	2023-09-13 10:45:37 +03:00
Tomasz Grabiec	0d773c9f9f	database: Fix accounting of small partitions in mutation query The partition key size was ignored by the accounter, as well as the partition tombstone. As a result, a sequence of partitions with just tombstones would be accounted as taking no memory and page size limitter to not kick in. Fix by accounting the real size of accumulated frozen_mutation. Also, break pages across partitions even if there are no live rows. The coordinator can handle it now. Refs #7933	2023-09-11 06:56:13 -04:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Benny Halevy	c9612855c7	query: coroutinize to_data_query_result Reduce stalls by maybe yielding in-between partitions, and by awaiting unfreeze_gently where possible. Refs #10038 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-05 13:32:25 +03:00
Botond Dénes	0632114a9b	reconcilable_result_builder: remove v1 support Amounts to making the range tombstone consume() overload private. It is still used internally to consume the downgraded (from v2) range tombstones.	2022-03-11 09:24:46 +02:00
Botond Dénes	4629f7d7b5	reconcilable_result_builder: add v2 support Add a `consume()` overload for range tombstone changes and convert them internally to range tombstones, as the underlying reconcilable result is still v1.	2022-03-11 09:24:05 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	c7619de929	mutation_query: reconcilable_result_builder: document reverse query preconditions	2021-09-28 17:03:57 +03:00
Botond Dénes	502a45ad58	treewide: switch to native reversed format for reverse reads We define the native reverse format as a reversed mutation fragment stream that is identical to one that would be emitted by a table with the same schema but with reversed clustering order. The main difference to the current format is how range tombstones are handled: instead of looking at their start or end bound depending on the order, we always use them as-usual and the reversing reader swaps their bounds to facilitate this. This allows us to treat reversed streams completely transparently: just pass along them a reversed schema and all the reader, compacting and result building code is happily ignorant about the fact that it is a reversed stream.	2021-09-09 15:42:15 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Botond Dénes	54edd613c8	mutation_query: remove now unused mutation_query() If somebody wants to query a generic mutation source in the future, they can still do it via `mutation_querier::consume_page()` and the right result builder.	2021-04-09 13:40:27 +03:00
Botond Dénes	a4facf316d	query: remove the now unused data_query() If somebody wants to query a generic mutation source in the future, they can still do it via `data_querier::consume_page()` and the right result builder.	2021-04-09 13:40:27 +03:00
Benny Halevy	57540dae42	mutation_query: mark reconcilable_result_builder constructor noexcept With result_memory_accounter begin nothrow move constructible reconcilable_result_builder does not throw. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210215101254.480228-67-bhalevy@scylladb.com>	2021-02-17 18:56:12 +02:00
Botond Dénes	821ed96e0e	mutation_query: introduce query_mutation() This is a replacement of `mutation::query()`, but with an implementation based on the standard query result building code. This will allow us to migrate the remaining `mutation::query()` users off of said method, which in turn will allow us to retire it finally.	2021-01-22 15:27:48 +02:00
Botond Dénes	c4f12221b8	mutation_query: to_data_query_result(): migrate to standard query code Reimplement in terms of the standard query result building code. We want to retire the alternative query result code in `mutation::query()` and `to_data_query_result()` is one of the main users.	2021-01-22 15:27:48 +02:00
Botond Dénes	f097bf3005	mutation_query: mutation_query_stage: add get_stats()	2020-11-17 15:13:21 +02:00
Avi Kivity	9421cfded4	reconcilable_result_builder: don't aggrevate out-of-memory condition during recovery Consider an unpaged query that consumes all of available memory, despite `fea5067dfa` which limits them (perhaps the user raised the limit, or this is a system query). Eventually we will see a bad_alloc which will abort the query and destroy this reconcilable_result_builder. During destruction, we first destroy _memory_accounter, and then _result. Destroying _memory_accounter resumes some continuations which can then allocate memory synchronously when increasing the task queue to accomodate them. We will then crash. Had we not crashed, we would immediately afterwards release _result, freeing all the memory that we would ever need. Fix by making _result the last member, so it is freed first. Fixes #7240.	2020-09-15 19:53:05 +02:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Botond Dénes	f7a4d19fb1	mutation_partition: abort read when hard limit is exceeded for non-paged reads If the read is not paged (short read is not allowed) abort the query if the hard memory limit is reached. On reaching the soft memory limit a warning is logged. This should allow users to adjust their application code while at the same time protecting the database from the really bad queries. The enforcement happens inside the memory accounter and doesn't require cooperation from the result builders. This ensures memory limit set for the query is respected for all kind of reads. Previously non-paged reads simply ignored the memory accounter requesting the read to stop and consumed all the memory they wanted.	2020-07-29 08:32:31 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Botond Dénes	517a941feb	query_class_config: move into the query namespace It belongs there, its name even starts with "query".	2020-07-28 18:00:29 +03:00
Juliusz Stasiewicz	e04fd9f774	counters: Read the state under timeout Counter update is a RMW operation. Until now the "Read" part was not guarded by a timeout, which is changed in this patch. Fixes #5069	2020-06-02 15:10:43 +03:00
Botond Dénes	639bbefcd3	database: use valid permit for counter read-before-write Counter writes involve a read-before-write, which will soon require a valid permit to be passed to it, so make sure we create and pass a valid permit to this read. We use `database::make_query_class_config()` to obtain the semaphore for the read which selects the appropriate user/system semaphore based on the scheduling group the counter write is running in.	2020-05-28 11:34:35 +03:00
Botond Dénes	14743c4412	data_query, mutation_query: use query_class_config We want to move away from the current practice of selecting the relevant read concurrency semaphore inside `table` and instead want to pass it down from `database` so that we can pass down a semaphore that is appropriate for the class of the query. Use the recently created `query_class_config` struct for this. This is added as a parameter to `data_query`, `mutation_query` and propagated down to the point where we create the `querier` to execute the read. We are already propagating down a parameter down the same route -- max_memory_reverse_query -- which also happens to be part of `query_class_config`, so simply replace this parameter with a `query_class_config` one. As the lower layers are not prepared for a semaphore passed from above, make sure this semaphore is the same that is selected inside `table`. After the lower layers are prepared for a semaphore arriving from above, we will switch it to be the appropriate one for the class of the query.	2020-05-28 11:34:35 +03:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Botond Dénes	8da88e6cb9	mutation_query: data_query(): remove default timeout	2020-02-27 19:02:40 +02:00
Botond Dénes	fdb45d16de	mutation_query: mutation_query(): remove default timeout	2020-02-27 18:56:30 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Vladimir Davydov	e0b31dd273	query: add flag to return static row on partition with no rows A SELECT statement that has clustering key restrictions isn't supposed to return static content if no regular rows matches the restrictions, see #589. However, for the CAS statement we do need to return static content on failure so this patch adds a flag that allows the caller to override this behavior.	2019-10-28 21:50:44 +03:00
Avi Kivity	093d2cd7e5	reconcilable_result: use chunked_vector to hold partitions Usually, a reconcilable_result holds very few partitions (1 is common), since the page size is limited by 1MB. But if we have paging disabled or if we are reconciling a range full of tombstones, we may see many more. This can cause large allocations. Change to chunked_vector to prevent those large allocations, as they can be quite expensive. Fixes #4780.	2019-08-01 18:49:13 +03:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Botond Dénes	3bcd577907	Move reconcilable_result_builder declaration to mutation_query.hh It will be used by code outside of mutation_partition.cc so it needs to be public. The definition remains in mutation_partition.cc.	2018-09-03 10:31:44 +03:00
Botond Dénes	5f726e9a89	querier: move all to query namespace To avoid name clashes.	2018-09-03 10:31:44 +03:00
Avi Kivity	ebff1cfc37	database: make database::_mutation_query_stage inherit the scheduling group Like the preceeding patch and for the same reasons, adjust database::_mutation_query_stage to inherit the scheduling group from its caller.	2018-08-24 19:04:49 +03:00
Glauber Costa	9188059427	database: group statements in their own scheduling group When we introduced the CPU scheduler, we have also introduced a group for commitlog - but never used it. There is also doubtful value in separating reads from writes, since they are often part of the same workload. To accomodate for that, let's rename the query group to "statement" (query is not incorrect, just confusing), and move the write path, currently ungrouped, inside it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-20 16:58:36 -04:00
Botond Dénes	ff808d9ce6	Save and restore queriers in mutation_query() and data_query() Use the querier_cache (represented by the passed-in querier_cache_context) object to lookup saved queriers at the start of the page and save them at the end of it if it is likely that there will be more page requests.	2018-03-13 10:34:34 +02:00
Duarte Nunes	6b4b429883	query-result: Introduce class result_options Introduce class result_options to carry result options through the request pipeline, which at this point mean the result type and the digest algorithm. This class allows us to encapsulate the concrete digest algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00

1 2

74 Commits