Recently, the expression-rewrite effort changed the way that GROUP BY is
implemented. Usually GROUP BY involves an aggregation function (e.g., if
you want a separate SUM per partition). But there's also a query like
SELECT p, c1, c2, v FROM tbl GROUP BY p
This query is supposed to return one row - the *first* row in clustering
order - per group (in this case, partition). The expression rewrite
re-implemented this feature by introducing a new internal aggregator,
first(), which returns the first aggregated value. The above query is
rewritten into:
SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p
This case works correctly, and we even have a regression test for it.
But unfortunately the rewrite broke the following query:
SELECT * FROM tbl GROUP BY p
Note the "*" instead of the explicit list of columns.
In our implementation, a selection of "*" is looks like an empty
selection, and it didn't get the "first()" treatment and it remained
a "SELECT *" - and wrongly returned all rows instead of just the first
one in each partition. This was a regression - it worked correctly in
Scylla 5.2 (and also in Cassandra) - see the next patch for a
regression test.
In this patch we fix this regression. When there is a GROUP BY, the "*"
is rewritten to the appropriate list of all visible columns and then
gets the first() treatment, so it will return only the first row as
expected. The next patch will be a test that confirms the bug and its
fix.
Fixes#16531
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
In three cases we need to consult a column that's possibly not explicitly
selected:
- for the WHERE clause
- for GROUP BY
- for ORDER BY
The return value of the function is the index where the newly-added
column can be found. Currently, the index is correct for both
the internal column vector and the result set, but soon in won't
be.
In the first two cases (WHERE clause and ORDER BY), we're interested
in the column before grouping, in the last case (ORDER BY) we're interested
in the column after grouping, so we need to distinguish between the two.
Since we already have selection::index_of() that returns the pre-grouping
index, choose the post-grouping index for the return value of
selection::add_column_for_post_processing(), and change the GROUP BY
code to use index_of(). Comments are added.
Now that everything is in place, implement the fast-path
transform_input_row() for selection_with_processing. It's a
straightforward call to evaluate() in a loop.
We adjust add_column_for_post_processing() to also update _selectors,
otherwise ORDER BY clauses that require an additional column will not
see that column.
Since every sub-class implements transform_input_row(), mark
the base class declaration as pure virtual.
expr::evaluate() expects an exploded primary key in its
evaluation_inputs structure (this dates back from the conversion
of filtering to expressions). But right now, the exploded primary
key is only available in the filter.
That's easy to fix however: move the primary key containers
to result_set_builder and just keep references in the filter.
After this, we can evaluate column_value expressions that
reference the primary key.
Previously, we used the engagedness of result_set_builder::optional
as a flag, but the previous patch eliminated that and it's always
engaged. Remove the optional wrapper to reduce noise.
Processing a result set relies on calling result_set_builder::new_row().
This function is quite complex as it has several roles:
- complete processing of the previously computed row, if any
- determine if GROUP BY grouping has changed, and flush the previous group
if so
- flush the last group if that's the case
This works now, but won't work with expr::evaluate. The reason is that
new_row() is called after the partition key and clustering key of the
new row have been evaluated, so processing of the previous row will see
incorrect data. It works today because we copy the partition key and
clustering key into result_set_builder::current, but expr::evaluate
uses the exploded partition key and clustering key, which have been
clobbered.
The solution is to separate the roles. Instead of new_row() that's
responsible for completing the previous row and starting a new one,
we have start_new_row() that's responsible for what its name says,
and complete_row() that's responsible for completing the row and
checking for group change. The responsibity for flushing the final
group is moved to result_set_builder::build(). This removes the
awkward "more_rows_coming" parameter that makes everything more
complicated.
result_set_builder::current is still optional, but it's always
engaged. The next patch will clean that up.
Currently, selector evaluation assumes the most complex case
where we aggregate, so multiple input rows combine into one output row.
In effect the query either specifies an outer loop (for the group)
and an inner loop (for input rows), or it only specifies the inner loop;
but we always perform the outer and inner loop.
Prepare to have a separate path for the non-aggregation case by
introducing transform_input_row().
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.
This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
processes_selection() checks whether a selector passes-through a column
or applies some form of processing (like a case or function application).
It's more sensible to do this in the prepared domain as we have more
information about the expression. It doesn't really help here, but
it does help the refactoring later in the series.
Call prepare_expression() on selector expressions to resolve types. This
leaves us with just one way to move from the unprepared domain to the
prepared domain.
The change is somewhat awkward since do_prepare_selectable() is re-doing
work that is done by prepare_expression(), but somehow it all works. The
next patch will tear down the unnecessary double-preparation.
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.
Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.
The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
This series aims to allow users to set permissions on user-defined functions.
The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissionsFixes: #5572Fixes: #10633Closes#12869
* github.com:scylladb/scylladb:
cql3: allow UDTs in permissions on UDFs
cql3: add type_parser::parse() method taking user_types_metadata
schema_change_test: stop using non-existent keyspace
cql3: fix parameter names in function resource constructors
cql3: handle complex types as when decoding function permissions
cql3: enforce permissions for ALTER FUNCTION
cql-pytest: add a (failing) test case for UDT in UDF
cql-pytest: add a test case for user-defined aggregate permissions
cql-pytest: add tests for function permissions
cql3: enforce permissions on function calls
selection: add a getter for used functions
abstract_function_selector: expose underlying function
cql3: enforce permissions on DROP FUNCTION
cql3: enforce permissions for CREATE FUNCTION
client_state: add functions for checking function permissions
cql-pytest: add a case for serializing function permissions
cql3: allow specifying function permissions in CQL
auth: add functions_resource to resources
The function allows extracting used function definitions
from given selection. Thanks to that, it will be possible
to verify if the callee has proper permissions to execute
given functions.
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
Since expressions were introduced for SELECT statements, they
work with `selection` object to represent which table columns
they can work with. Probably a neutral representation would have
been better, but that's what we have now.
LWT works with partition_slice, so introduce a
selection_from_partition_slice() helper to bridge the two worlds.
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
The way that this detection works is a bit clunky, but it does its job
given the simplest cases e.g. "SELECT COUNT(*) FROM ks.t". It fails when
there are multiple selectors, or when there is a column name specified
("SELECT COUNT(column_name) FROM ks.t").
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.
data_dictionary::database::real_database() is called from several
places, for these reasons:
- calling yet-to-be-converted code
- callers with a legitimate need to access data (e.g. system_keyspace)
but with the ::database accessor removed from query_processor.
We'll need to find another way to supply system_keyspace with
data access.
- to gain access to the wasm engine for testing whether used
defined functions compile. We'll have to find another way to
do this as well.
The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.
Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
sprint() is obsolete. Note some calls where to helper functions that
use sprint(), not to sprint() directly, so both the helpers and
the callers were modified.
This allows us to forward-declare raw_selector, which in turn reduces
indirect inclusions of expression.hh from 147 to 58, reducing rebuilds
when anything in that area changes.
Includes that were lost due to the change are restored in individual
translation units.
Closes#9434
This patch enables select cql statements where collection columns are
selected columns in queries where clustering column is restricted by
"IN" cql operator. Such queries are accepted by cassandra since v4.0.
The internals actually provide correct support for this feature already,
this patch simply removes relevant cql query check.
Tests: cql-pytest (testInRestrictionWithCollection)
Fixes#7743Fixes#4251
Signed-off-by: Vojtech Havel <vojtahavel@gmail.com>
Message-Id: <20210104223422.81519-1-vojtahavel@gmail.com>
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.
The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).
In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.
The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).
Fixes#5101.
The header sits in many other headers, but there's a handy
schema_fwd.hh that's tiny and contains needed declarations
for other headers. So replace shema.hh with schema_fwd.hh
in most of the headers (and remove completely from some).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200303102050.18462-1-xemul@scylladb.com>
* Pass raw::select_statement::parameters as lw_shared_ptr
* Some more const cleanups here and there
* lists,maps,sets::equals now accept const-ref to *_type_impl
instead of shared_ptr
* Remove unused `get_column_for_condition` from modification_statement.hh
* More methods now accept const-refs instead of shared_ptr
Every call site where a shared_ptr was required as an argument
has been inspected to be sure that no dangling references are
possible.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>
This adds a requires_thread predicate to functions and propagates that
up until we get to code that already returns futures.
We can then use the predicate to decide if we need to use
seastar::async.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.
Fixes#5248
Make result_set_builder obey its _group_by_cell_indices by recognizing
group boundaries and resetting the selectors.
Also make simple_selectors work correctly when grouping.
Fixes#2206.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Ensure that the indices recorded in select_statement are passed to
result_set_builder when one is created for processing the cell values.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Filtering now needs to take into account per partition limits as well,
and for that it's essential to be able to compare partition keys
and decide which rows should be dropped - if previous page(s) contained
rows with the same partition key, these need to be taken into
consideration too.
Previously the limit was erroneously applied per page
instead of being accumulated, which might have caused returning
too many rows. As of now, LIMIT is handled properly inside
restrictions filter.
Fixes#4100
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>