scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 17:40:34 +00:00

Author	SHA1	Message	Date
Nadav Har'El	1aea2136c8	cql: fix regression in SELECT * GROUP BY Recently, the expression-rewrite effort changed the way that GROUP BY is implemented. Usually GROUP BY involves an aggregation function (e.g., if you want a separate SUM per partition). But there's also a query like SELECT p, c1, c2, v FROM tbl GROUP BY p This query is supposed to return one row - the first row in clustering order - per group (in this case, partition). The expression rewrite re-implemented this feature by introducing a new internal aggregator, first(), which returns the first aggregated value. The above query is rewritten into: SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p This case works correctly, and we even have a regression test for it. But unfortunately the rewrite broke the following query: SELECT * FROM tbl GROUP BY p Note the "" instead of the explicit list of columns. In our implementation, a selection of "" is looks like an empty selection, and it didn't get the "first()" treatment and it remained a "SELECT " - and wrongly returned all rows instead of just the first one in each partition. This was a regression - it worked correctly in Scylla 5.2 (and also in Cassandra) - see the next patch for a regression test. In this patch we fix this regression. When there is a GROUP BY, the "" is rewritten to the appropriate list of all visible columns and then gets the first() treatment, so it will return only the first row as expected. The next patch will be a test that confirms the bug and its fix. Fixes #16531 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:52:57 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Avi Kivity	66c47d40e6	cql3: selection: drop selector_factories, selectables, and selectors The whole class hierarchy is no longer used by anything and we can just delete it.	2023-07-03 19:45:17 +03:00
Avi Kivity	039472ffb9	cql3: selection: don't create selector_factories any more We no longer use selector_factories for anything, so we can drop them.	2023-07-03 19:45:17 +03:00
Avi Kivity	27254c4f50	cql3: selection, select_statement: fine tune add_column_for_post_processing() usage In three cases we need to consult a column that's possibly not explicitly selected: - for the WHERE clause - for GROUP BY - for ORDER BY The return value of the function is the index where the newly-added column can be found. Currently, the index is correct for both the internal column vector and the result set, but soon in won't be. In the first two cases (WHERE clause and ORDER BY), we're interested in the column before grouping, in the last case (ORDER BY) we're interested in the column after grouping, so we need to distinguish between the two. Since we already have selection::index_of() that returns the pre-grouping index, choose the post-grouping index for the return value of selection::add_column_for_post_processing(), and change the GROUP BY code to use index_of(). Comments are added.	2023-07-03 19:45:17 +03:00
Avi Kivity	6bf1bd7130	cql3: selection: evaluate non-aggregating complex selections using expr::evaluate() Now that everything is in place, implement the fast-path transform_input_row() for selection_with_processing. It's a straightforward call to evaluate() in a loop. We adjust add_column_for_post_processing() to also update _selectors, otherwise ORDER BY clauses that require an additional column will not see that column. Since every sub-class implements transform_input_row(), mark the base class declaration as pure virtual.	2023-07-03 19:45:17 +03:00
Avi Kivity	f5eb7fd6dc	cql3: selection: store primary key in result_set_builder expr::evaluate() expects an exploded primary key in its evaluation_inputs structure (this dates back from the conversion of filtering to expressions). But right now, the exploded primary key is only available in the filter. That's easy to fix however: move the primary key containers to result_set_builder and just keep references in the filter. After this, we can evaluate column_value expressions that reference the primary key.	2023-07-03 19:45:17 +03:00
Avi Kivity	aed01018a3	cql3: selection: make result_set_builder::current non-optional<> Previously, we used the engagedness of result_set_builder::optional as a flag, but the previous patch eliminated that and it's always engaged. Remove the optional wrapper to reduce noise.	2023-07-03 19:45:17 +03:00
Avi Kivity	44c8507075	cql3: selection: simplify row/group processing Processing a result set relies on calling result_set_builder::new_row(). This function is quite complex as it has several roles: - complete processing of the previously computed row, if any - determine if GROUP BY grouping has changed, and flush the previous group if so - flush the last group if that's the case This works now, but won't work with expr::evaluate. The reason is that new_row() is called after the partition key and clustering key of the new row have been evaluated, so processing of the previous row will see incorrect data. It works today because we copy the partition key and clustering key into result_set_builder::current, but expr::evaluate uses the exploded partition key and clustering key, which have been clobbered. The solution is to separate the roles. Instead of new_row() that's responsible for completing the previous row and starting a new one, we have start_new_row() that's responsible for what its name says, and complete_row() that's responsible for completing the row and checking for group change. The responsibity for flushing the final group is moved to result_set_builder::build(). This removes the awkward "more_rows_coming" parameter that makes everything more complicated. result_set_builder::current is still optional, but it's always engaged. The next patch will clean that up.	2023-07-03 19:45:17 +03:00
Avi Kivity	f48ecb5049	cql3: selection: short-circuit non-aggregations Currently, selector evaluation assumes the most complex case where we aggregate, so multiple input rows combine into one output row. In effect the query either specifies an outer loop (for the group) and an inner loop (for input rows), or it only specifies the inner loop; but we always perform the outer and inner loop. Prepare to have a separate path for the non-aggregation case by introducing transform_input_row().	2023-07-03 19:45:17 +03:00
Avi Kivity	4a2428e4ec	cql3: selection: drop validate_selectors It's unused. It dates from the (perhaps better) time when regularity of aggregation across selectors was enforced.	2023-07-03 19:45:17 +03:00
Avi Kivity	7c3ceb6473	cql3: select_statement: use prepared selectors Change one more layer of processing to work on prepared rather than raw selectors. This moves the call to prepare the selectors early in select_statement processing. In turn this changes maybe_jsonize_select_clause() and forward_service's mock_selection() to work in the prepared realm as well. This moves us one step closer to using evaluate() to process the select clause, as the prepared selectors are now available in select_statement. We can't use them yet since we can't evaluate aggregations.	2023-07-03 19:45:17 +03:00
Avi Kivity	a338d0455d	cql3: selection: avoid selector_factories in collect_metadata() Generate the column headings in the result set metadata using the newly introduced result_set_metadata mode of the expression printer.	2023-07-03 19:45:17 +03:00
Avi Kivity	a1f4abb753	cql3: selection: convert collect_metadata() to the prepared expression domain Simplifies refactoring later on.	2023-07-03 19:45:17 +03:00
Avi Kivity	91b251f6b4	cql3: selection: convert processes_selection to work on prepared expressions processes_selection() checks whether a selector passes-through a column or applies some form of processing (like a case or function application). It's more sensible to do this in the prepared domain as we have more information about the expression. It doesn't really help here, but it does help the refactoring later in the series.	2023-07-03 19:45:17 +03:00
Avi Kivity	1040589828	cql3: selection: prepare selector expressions Call prepare_expression() on selector expressions to resolve types. This leaves us with just one way to move from the unprepared domain to the prepared domain. The change is somewhat awkward since do_prepare_selectable() is re-doing work that is done by prepare_expression(), but somehow it all works. The next patch will tear down the unnecessary double-preparation.	2023-06-13 21:04:49 +03:00
Avi Kivity	42a1ced73b	cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt The expression system uses managed_bytes_opt for values, but result_set uses bytes_opt. This means that processing values from the result set in expressions requires a copy. Out of the two, managed_bytes_opt is the better choice, since it prevents large contiguous allocations for large blobs. So we switch result_set to use managed_bytes_opt. Users of the result_set API are adjusted. The db::function interface is not modified to limit churn; instead we convert the types on entry and exit. This will be adjusted in a following patch.	2023-05-07 17:17:36 +03:00
Nadav Har'El	843a5dfc15	Merge 'Allow setting permissions for user-defined functions' from Wojciech Mitros This series aims to allow users to set permissions on user-defined functions. The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions Fixes: #5572 Fixes: #10633 Closes #12869 * github.com:scylladb/scylladb: cql3: allow UDTs in permissions on UDFs cql3: add type_parser::parse() method taking user_types_metadata schema_change_test: stop using non-existent keyspace cql3: fix parameter names in function resource constructors cql3: handle complex types as when decoding function permissions cql3: enforce permissions for ALTER FUNCTION cql-pytest: add a (failing) test case for UDT in UDF cql-pytest: add a test case for user-defined aggregate permissions cql-pytest: add tests for function permissions cql3: enforce permissions on function calls selection: add a getter for used functions abstract_function_selector: expose underlying function cql3: enforce permissions on DROP FUNCTION cql3: enforce permissions for CREATE FUNCTION client_state: add functions for checking function permissions cql-pytest: add a case for serializing function permissions cql3: allow specifying function permissions in CQL auth: add functions_resource to resources	2023-03-12 14:04:34 +02:00
Piotr Sarna	4624934032	selection: add a getter for used functions The function allows extracting used function definitions from given selection. Thanks to that, it will be possible to verify if the callee has proper permissions to execute given functions.	2023-03-09 17:51:17 +01:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Avi Kivity	721c05b7ec	cql3: selection: introduce selection_from_partition_slice Since expressions were introduced for SELECT statements, they work with `selection` object to represent which table columns they can work with. Probably a neutral representation would have been better, but that's what we have now. LWT works with partition_slice, so introduce a selection_from_partition_slice() helper to bridge the two worlds.	2023-02-12 17:17:01 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Jadw1	6d977fcf88	cql3: selection: detect parallelize reduction type Detects type of reduction if it is possible. Separate case for `COUNT(*)` is left for compatibility reason. By now only single selection is supported.	2022-07-18 15:25:41 +02:00
Pavel Emelyanov' via ScyllaDB development	a78af050fd	cql: Constify select_statement restrictions It is in fact immutable (both the pointer and the object it points to), so is the pointer copy returned by get_restrictions() method, so are those propagated to filtering stuff. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1028 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220624083351.24970-1-xemul@scylladb.com>	2022-06-24 12:27:36 +03:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Michał Sala	bb7edf3785	cql3: selection: detect if a selection represents count() The way that this detection works is a bit clunky, but it does its job given the simplest cases e.g. "SELECT COUNT() FROM ks.t". It fails when there are multiple selectors, or when there is a column name specified ("SELECT COUNT(column_name) FROM ks.t").	2022-02-01 21:14:41 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	d768e9fac5	cql3, related: switch to data_dictionary Stop using database (and including database.hh) for schema related purposes and use data_dictionary instead. data_dictionary::database::real_database() is called from several places, for these reasons: - calling yet-to-be-converted code - callers with a legitimate need to access data (e.g. system_keyspace) but with the ::database accessor removed from query_processor. We'll need to find another way to supply system_keyspace with data access. - to gain access to the wasm engine for testing whether used defined functions compile. We'll have to find another way to do this as well. The change is a straightforward replacement. One case in modification_statement had to change a capture, but everything else was just a search-and-replace. Some files that lost "database.hh" gained "mutation.hh", which they previously had access to through "database.hh".	2021-12-15 13:54:23 +02:00
Avi Kivity	9424f6e12f	cql3: replace seastar::sprint() with fmt::format() sprint() is obsolete. Note some calls where to helper functions that use sprint(), not to sprint() directly, so both the helpers and the callers were modified.	2021-10-27 17:02:00 +03:00
Avi Kivity	2d25705db0	cql3: deinline non-trivial methods in selection.hh This allows us to forward-declare raw_selector, which in turn reduces indirect inclusions of expression.hh from 147 to 58, reducing rebuilds when anything in that area changes. Includes that were lost due to the change are restored in individual translation units. Closes #9434	2021-10-05 12:58:55 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Vojtech Havel	d858c57357	cql3: allow SELECTs restricted by "IN" to retrieve collections This patch enables select cql statements where collection columns are selected columns in queries where clustering column is restricted by "IN" cql operator. Such queries are accepted by cassandra since v4.0. The internals actually provide correct support for this feature already, this patch simply removes relevant cql query check. Tests: cql-pytest (testInRestrictionWithCollection) Fixes #7743 Fixes #4251 Signed-off-by: Vojtech Havel <vojtahavel@gmail.com> Message-Id: <20210104223422.81519-1-vojtahavel@gmail.com>	2021-01-05 14:39:18 +02:00
Dejan Mircevski	df3ea2443b	cql3: Drop all uses_function methods No one seems to call them except for other uses_function methods. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-09-04 17:27:30 +02:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Pavel Solodovnikov	f6e765b70f	cql3: pass `column_specification` via lw_shared_ptr `column_specification` class is marked as "final": it's safe to use non-polymorphic pointer "lw_shared_ptr" instead of a more generic "shared_ptr". tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200427084016.26068-1-pa.solodovnikov@scylladb.com>	2020-04-27 12:47:42 +03:00
Rafael Ávila de Espíndola	eca0ac5772	everywhere: Update for deprecated apply functions Now apply is only for tuples, for varargs use invoke. This depends on the seastar changes adding invoke. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200324163809.93648-1-espindola@scylladb.com>	2020-03-25 08:49:53 +02:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Pavel Solodovnikov	8efb02146f	cql3: const cleanups and API de-pointerization * Pass raw::select_statement::parameters as lw_shared_ptr * Some more const cleanups here and there * lists,maps,sets::equals now accept const-ref to _type_impl instead of shared_ptr Remove unused `get_column_for_condition` from modification_statement.hh * More methods now accept const-refs instead of shared_ptr Every call site where a shared_ptr was required as an argument has been inspected to be sure that no dangling references are possible. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>	2020-02-20 18:14:49 +02:00
Avi Kivity	f8e85e5c2a	cql3: selection: remove now-unneeded include of statement_restrictions.hh Actual users gain #includes of statement_restrictions and query_options that they previously got through selection.hh.	2020-02-09 13:01:32 +02:00
Avi Kivity	710e4ec99d	cql3: deinline result_set_builder::restrictions_filter constructor It stands in the way of #include removal, so it must go. It should have no performance impact as it is too large to be inlined.	2020-02-09 13:00:17 +02:00
Avi Kivity	7474db4075	cql3: selection: remove unnecessary include of selector_factories It is only mentioned in the header file, so the forward declaration can be used and the include moved to the real users.	2020-02-09 12:37:36 +02:00
Pavel Solodovnikov	55a1d46133	cql: some more missing const qualifiers There are several virtual functions in public interfaces named "is_*" that clearly should be marked as "const", so fix that.	2019-11-26 17:57:51 +03:00
Rafael Ávila de Espíndola	d9337152f3	Use threads when executing user functions This adds a requires_thread predicate to functions and propagates that up until we get to code that already returns futures. We can then use the predicate to decide if we need to use seastar::async. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Piotr Sarna	ca6fe598ec	cql3: fix filtering on a static column for empty partitions An otherwise empty partition can still have a valid static column. Filtering didn't take that fact into account and only filtered full-fledged rows, which may result in non-matching rows being returned to the client. Fixes #5248	2019-10-30 15:31:54 +01:00
Dejan Mircevski	d51e4a589d	Implement grouping in selection processing Make result_set_builder obey its _group_by_cell_indices by recognizing group boundaries and resetting the selectors. Also make simple_selectors work correctly when grouping. Fixes #2206. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 11:05:36 -04:00
Dejan Mircevski	c3929aee3a	Propagate GROUP BY indices to result_set_builder Ensure that the indices recorded in select_statement are passed to result_set_builder when one is created for processing the cell values. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-05-08 10:10:10 -04:00
Piotr Sarna	1dadae212a	cql3: add checking for previous partition count to filtering Filtering now needs to take into account per partition limits as well, and for that it's essential to be able to compare partition keys and decide which rows should be dropped - if previous page(s) contained rows with the same partition key, these need to be taken into consideration too.	2019-02-18 11:06:43 +01:00
Piotr Sarna	b965c3778f	cql3: obey per partition limit for filtering Filtering queries now take into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	87c23372fb	cql3: fix filtering with LIMIT with regard to paging Previously the limit was erroneously applied per page instead of being accumulated, which might have caused returning too many rows. As of now, LIMIT is handled properly inside restrictions filter. Fixes #4100	2019-01-17 13:25:09 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00

1 2

84 Commits