We have plenty of code marked with #if 0. Once it was an indication
of missing functionality, but the code has evolved so much it's
useless as an indication and only a distraction.
Delete it.
Closes#14511
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.
Fixes: #13942
Message-Id: <ZJ2aeNIBQCtnTaE2@scylladb.com>
SELECT JSON uses selector_factories to obtain the names of the
fields to insert into the json object, and we want to drop
selector_factories entirely. Switch instead to the ":metadata" mode
of printing expressions, which does what we want.
Unfortunately, the switch changes how system functions are converted
into field names. A function such as unixtimestampof() is now rendered
as "system.unixtimestampof()"; before it did not have the keyspace
prefix.
This is a compatiblity problem, albeit an obscure one. Since the new
behavior matches Cassandra, and the odds of hitting this are very low,
I think we can allow the change.
The replica needs to know which columns we're interested in. Iterate
and recurse into all selector expressions to collect all mentioned columns.
We use the same algorithm that create_factories_and_collect_column_definitions()
uses, even though it is quadratic, to avoid causing surprises.
When constructing a selection_with_processing, split the
selectors into an inner loop and an outer loop with split_aggregation().
We can then reimplement add_input_row() and get_output_row() as follows:
- add_input_row(): evaluate the inner loop expressions and store
the results in temporaries
- get_output_row(): evaluate the outer loop expressions, pulling in
values from those temporaries.
reset(), which is called between groups, simply copies the initial
values rathered by split_aggregation() into the temporaries.
The only complexity comes from add_column_for_post_query_processing(),
which essentially re-does the work of split_aggregation(). It would
be much better if we added the column before split_aggregation() was
called, but some refactoring has to take place before that happens.
In three cases we need to consult a column that's possibly not explicitly
selected:
- for the WHERE clause
- for GROUP BY
- for ORDER BY
The return value of the function is the index where the newly-added
column can be found. Currently, the index is correct for both
the internal column vector and the result set, but soon in won't
be.
In the first two cases (WHERE clause and ORDER BY), we're interested
in the column before grouping, in the last case (ORDER BY) we're interested
in the column after grouping, so we need to distinguish between the two.
Since we already have selection::index_of() that returns the pre-grouping
index, choose the post-grouping index for the return value of
selection::add_column_for_post_processing(), and change the GROUP BY
code to use index_of(). Comments are added.
Now that everything is in place, implement the fast-path
transform_input_row() for selection_with_processing. It's a
straightforward call to evaluate() in a loop.
We adjust add_column_for_post_processing() to also update _selectors,
otherwise ORDER BY clauses that require an additional column will not
see that column.
Since every sub-class implements transform_input_row(), mark
the base class declaration as pure virtual.
expr::evaluate() expects an exploded primary key in its
evaluation_inputs structure (this dates back from the conversion
of filtering to expressions). But right now, the exploded primary
key is only available in the filter.
That's easy to fix however: move the primary key containers
to result_set_builder and just keep references in the filter.
After this, we can evaluate column_value expressions that
reference the primary key.
field_selection::type refers to the type of the selection operation,
not the type of the structure being selected. This is what
prepare_expression() generates and how all other expression elements
work, but evaluate() for field_selection thinks it's the type
of the structure, and so fails when it gets an expression
from prepare_expression().
Fix that, and adjust the tests.
Previously, we used the engagedness of result_set_builder::optional
as a flag, but the previous patch eliminated that and it's always
engaged. Remove the optional wrapper to reduce noise.
Processing a result set relies on calling result_set_builder::new_row().
This function is quite complex as it has several roles:
- complete processing of the previously computed row, if any
- determine if GROUP BY grouping has changed, and flush the previous group
if so
- flush the last group if that's the case
This works now, but won't work with expr::evaluate. The reason is that
new_row() is called after the partition key and clustering key of the
new row have been evaluated, so processing of the previous row will see
incorrect data. It works today because we copy the partition key and
clustering key into result_set_builder::current, but expr::evaluate
uses the exploded partition key and clustering key, which have been
clobbered.
The solution is to separate the roles. Instead of new_row() that's
responsible for completing the previous row and starting a new one,
we have start_new_row() that's responsible for what its name says,
and complete_row() that's responsible for completing the row and
checking for group change. The responsibity for flushing the final
group is moved to result_set_builder::build(). This removes the
awkward "more_rows_coming" parameter that makes everything more
complicated.
result_set_builder::current is still optional, but it's always
engaged. The next patch will clean that up.
used_functions() is used to check whether prepared statements need
to be invalidated when user-defined functions change.
We need to skip over empty scalar components of aggregates, since
these can be defined by users (with the same meaning as if the
identity function was used).
The current version of automatic query parallelization works when all
selectors are reducible (e.g. have a state_reduction_function member),
and all the inputs to the aggregates are direct column selectors without
further transformation. The actual column names and reductions need to
be packed up for forward_service to be used.
Convert is_reducible()/get_reductions() to the expression world. The
conversion is fairly straightforward.
contains_ttl/contains_writetime are two attributes of a selection. If a selection
contains them, we must ask the replica to send them over; otherwise we don't
have data to process. Not sending ttl/writetime saves some effort.
The implementation is a straightforward recursive descent using expr::find_in_expression.
Now that we push all GROUP BY queries to selection_with_processing,
we always process rows via transform_input_row() and there's no
reason to keep any state in simple_selectors.
Drop the state and raise an internal error if we're ever
called for aggregation.
Aggregate functions cannot be evaluated directly, since they implicitly
refer to state (the accumulator). To allow for evaluation, we
split the expression into two: an inner expression that is evaluated
over the input vector (once per element). The inner expression calls
the aggregation function, with an extra input parameter (the accumulator).
The outer expression is evaluated once per input vector; it calls
the final function, and its input is just the accumulator. The outer
expression also contains any expressions that operate on the result
of the aggregate function.
The acculator is stored in a temporary.
Simple example:
sum(x)
is transformed into an inner expression:
t1 = (t1 + x) // really sum.aggregation_function
and an outer expression:
result = t1 // really sum.state_to_result_function
Complicated example:
scalar_func(agg1(x, f1(y)), agg2(x, f2(y)))
is transformed into two inner expressions:
t1 = agg1.aggregation_function(t1, x, f1(y))
t2 = agg2.aggregation_function(t2, x, f2(y))
and an outer expression
output = scalar_func(agg1.state_to_result_function(t1),
agg2.state_to_result_function(t2))
There's a small wart: automatically parallelized queries can generate
"reducible" aggregates that have no state_to_result function, since we
want to pass the state back to the coordinator. Detect that and short
circuit evaluation to pass the accumulator directly.
Currently, selector evaluation assumes the most complex case
where we aggregate, so multiple input rows combine into one output row.
In effect the query either specifies an outer loop (for the group)
and an inner loop (for input rows), or it only specifies the inner loop;
but we always perform the outer and inner loop.
Prepare to have a separate path for the non-aggregation case by
introducing transform_input_row().
GROUP BY is typically used with aggregation. In one case the aggregation
is implicit:
SELECT a, b, c
FROM tab
GROUP BY x, y, z
One row will appear from each group, even though no aggregation
was specified. To avoid this irregularity, rewrite this query as
SELECT first(a), first(b), first(c)
FROM tab
GROUP BY x, y, z
This allows us to have different paths for aggregations and
non-aggregations, without worrying about this special case.
Avoid mixed aggregate/non-aggregate queries by inserting
calls to the first() function. This allows us to avoid internal
state (simple_selector::_current) and make selector evaluation
stateless apart from explicit temporaries.
We plan to rewrite aggregation queries that have a non-aggregating
selector using the first function, so that all selectors are
aggregates (or none are). Prevent the first function from affecting
metadata (the auto-generated column names), by skipping over the
first function if detected. They input and output types are unchanged
so this only affects the name.
A query of the form `SELECT foo, count(foo) FROM tab` returns the first
value of the foo column along with the count. This can't be parallized
today since the first selector isn't an aggregate.
We plan to rewrite the query internally as `SELECT first(foo), count(foo)
FROM tab`, in order to make the query more regular (no mixing of aggregates
and non-aggregates). However, this will defeat the current check since
after the rewrite, all selectors are aggregates.
Prepare for this by performing the check on a pre-rewrite variable, so
it won't be affected by the query rewrite in the next patch.
Note that although even though we could add support for running
first() in parallel, it's not possible to get the correct results,
since first() is not commutative and we don't reduce in order. It's
also not a particularly interesting query.
Temporaries are similar to bind variables - they are values provided from
outside the expression. While bind variables are provided by the user, temporaries
are generated internally.
The intended use is for aggregate accumulator storage. Currently aggregates
store the accumulator in aggregate_function_selector::_accumulator, which
means the entire selector hierarchy must be cloned for every query. With
expressions, we can have a single expression object reused for many computations,
but we need a way to inject the accumulator into an aggregation, which this
new expression element provides.
Change one more layer of processing to work on prepared
rather than raw selectors. This moves the call to prepare
the selectors early in select_statement processing. In turn
this changes maybe_jsonize_select_clause() and forward_service's
mock_selection() to work in the prepared realm as well.
This moves us one step closer to using evaluate() to process
the select clause, as the prepared selectors are now available
in select_statement. We can't use them yet since we can't evaluate
aggregations.
When returning a result set (and when preparing a statement), we
return metadata about the result set columns. Part of that is the
column names, which are derived from the expressions used as selectors.
Currently, they are computed via selector::column_name(), but as
we're dismantling that hierarchy we need a different way to obtain
those names.
It turns out that the expression formatter is close enough to what
we need. To avoid disturbing the current :user mode, add a new
:metadata mode and apply the adjustments needed to bring it in line
with what column metadata looks like today.
Note that column metadata is visible to applications and they can
depend on it; e.g. the Python driver allows choosing columns based on
their names rather than ordinal position.
processes_selection() checks whether a selector passes-through a column
or applies some form of processing (like a case or function application).
It's more sensible to do this in the prepared domain as we have more
information about the expression. It doesn't really help here, but
it does help the refactoring later in the series.
Currently, each selector expression is individually prepared, then converted
into a selector object that is later executed. This is done (on a vector
of raw selectors) by cql3::selection::raw_selector::to_selectables().
Split that into two phases. The first phase converts raw_selector into
a new struct prepared_selector (a better name would be plain 'selector',
but it's taken for now). The second phase continues the process and
converts prepared_selector into selectables.
This gives us a full view of the prepared expressions while we're
preparing the select clause of the select statement.
Most clauses in a CQL statement don't tolerate aggregate functions,
and so they call verify_no_aggregate_functions(). It can now be
reimplemented in terms of aggregation_depth(), removing some code.
We define the "aggregation depth" of an expression by how many
nested aggregation functions are applied. In CQL/SQL, legal
values are 0 and 1, but for generality we deal with any aggregation depth.
The first helper measures the maximum aggregation depth along any path
in the expression graph. If it's 2 or greater, we have something like
max(max(x)) and we should reject it (though these helpers don't). If
we get 1 it's a simple aggregation. If it's zero then we're not aggregating
(though CQL may decide to aggregate anyway if GROUP BY is used).
The second helper edits an expression to make sure the aggregation depth
along any path that reaches a column is the same. Logically,
`SELECT x, max(y)` does not make sense, as one is a vector of values
and the other is a scalar. CQL resolves the problem by defining x as
"the first value seen". We apply this resolution by converting the
query to `SELECT first(x), max(y)` (where `first()` is an internal
aggregate function), so both selectors refer to scalars that consume
vectors.
When a scalar is consumed by an aggregate function (for example,
`SELECT max(x), min(17)` we don't have to bother, since a scalar
is implicity promoted to a vector by evaluating it every row. There
is some ambiguity if the scalar is a non-pure function (e.g.
`SELECT max(x), min(random())`, but it's not worth following.
A small unit test is added.
Currently, a prepared function_call expression is printed as an
"anonymous function", but it's not really anonymous - the name is
available. Print it out.
This helps in a unit test later on (and is worthwhile by itself).
first(x) returns the first x it sees in the group. This is useful
for SELECT clauses that return a mix of aggregates and non-aggregates,
for example
SELECT max(x), x
with inputs of x = { 1, 2, 3 } is expected to return (3, 1).
Currently, this behavior is handled by individual selectors,
which means they need to contain extra state for this, which
cannot be easily translated to expressions. The new first function
allows translating the SELECT clause above to
SELECT max(x), first(x)
so all selectors are aggregations and can be handled in the same
way.
The first() function is not exposed to users.
A GROUP BY combined with aggregation should produce a single
row per group, except for empty groups. This is in contrast
to an aggregation without GROUP BY, which produces a single
row no matter what.
The existing code only considered the case of no grouping
and forced a row into the result, but this caused an unwanted
row if grouping was used.
Fix by refining the check to also consider GROUP BY.
XFAIL tests are relaxed.
Fixes#12477.
Note, forward_service requires that aggregation produce
exactly one row, but since it can't work with grouping,
it isn't affected.
Closes#14399
`modification_statement::execute_with_condition` validates that a query with
an IF condition can be executed correctly.
There's already a check for empty partition key ranges, but there was no check
for empty clustering ranges.
Let's add a check for the clustering ranges as well, they're not allowed to be empty.
After this change Scylla outputs the same type of message for empty partition and
clustering ranges, which improves UX.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
LWT queries with empty clustering range used to cause a crash.
For example in:
```cql
UPDATE tab SET r = 9000 WHERE p = 1 AND c = 2 AND c = 2000 IF r = 3
```
The range of `c` is empty - there are no valid values.
This caused a segfault when accessing the `first` range:
```c++
op.ranges.front()
```
To fix it let's throw en exception when the clustering range
is empty. Cassandra also rejects queries with `c = 1 AND c = 2`.
There's also a check for empty partition range, as it used
to crash in the past, can't really hurt to add it.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
The series contains mostly cleanups for query processor and no functional
change. The last patch is a small cleanup for the storage_proxy.
* 'qp-cleanup' of https://github.com/gleb-cloudius/scylla:
storage_proxy: remove unused variable
client_state: co-routinise has_column_family_access function
query_processor: get rid of internal_state and create individual query_satate for each request
cql3: move validation::validate_column_family from client_state::has_column_family_access
client_state: drop unneeded argument from has.*access functions
cql3: move check for dropping cdc tables from auth to the drop statement code itself
query_processor: co-routinise execute_prepared_without_checking_exception_message function
query_processor: co-routinize execute_direct_without_checking_exception_message function
cql3: remove empty statement::validate functions
cql3: remove empty function validate_cluster_support
cql3/statements: fix indentation and spurious white spaces
query_processor: move statement::validate call into execute_with_params function
query_processor: co-routinise execute_with_params function
query_processor: execute statement::validate before each execution of internal query instead of only during prepare
query_processor: get rid of shared internal_query_state
query_processor: co-routinize execute_paged_internal function
query_processor: co_routinize execute_batch_without_checking_exception_message function
query_processor: co-routinize process_authorized_statement function
It's very annoying to add a declaration to expression.hh and watch
the whole world get recompiled. Improve that by moving less-common
functions to a new header expr-utils.hh. Move the evaluation machinery
to a new header evaluate.hh. The remaining definitions in expression.hh
should not change as often, and thus cause less frequent recompiles.
Closes#14346
* github.com:scylladb/scylladb:
cql3: expr: break up expression.hh header
cql3: expr: restrictions.hh: protect against double inclusions
cql3: constants: deinline
cql3: statement_restrictions: deinline
cql3: deinline operation::fill_prepare_context()
There was a bug in describe_statement. If executing `DESC FUNCTION <uda name>` or ` DESC AGGREGATE <udf name>`, Scylla was crashing because the function was found (`functions::find()` searches both UDFs and UDAs) but the function was bad and the pointer wasn't checked after cast.
Added a test for this.
Fixes: #14360Closes#14332
* github.com:scylladb/scylladb:
cql-pytest:test_describe: add test for filtering UDF and UDA
cql3:statements:describe_statement: check pointer to UDF/UDA
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:
- moving some restrictions-related definitions to
the existing expr/restrictions.hh
- moving evaluation related names to a new header
expr/evaluate.hh
- move utilities to a new header
expr/expr-utilities.hh
expression.hh contains only expression definitions and the most
basic and common helpers, like printing.