Commit Graph

3 Commits

Author SHA1 Message Date
Avi Kivity
a26516ef65 cql3: expression: add helper to split expressions with aggregate functions
Aggregate functions cannot be evaluated directly, since they implicitly
refer to state (the accumulator). To allow for evaluation, we
split the expression into two: an inner expression that is evaluated
over the input vector (once per element). The inner expression calls
the aggregation function, with an extra input parameter (the accumulator).

The outer expression is evaluated once per input vector; it calls
the final function, and its input is just the accumulator. The outer
expression also contains any expressions that operate on the result
of the aggregate function.

The acculator is stored in a temporary.

Simple example:

   sum(x)

is transformed into an inner expression:

   t1 = (t1 + x)   // really sum.aggregation_function

and an outer expression:

   result = t1     // really sum.state_to_result_function

Complicated example:

    scalar_func(agg1(x, f1(y)), agg2(x, f2(y)))

is transformed into two inner expressions:

    t1 = agg1.aggregation_function(t1, x, f1(y))
    t2 = agg2.aggregation_function(t2, x, f2(y))

and an outer expression

    output = scalar_func(agg1.state_to_result_function(t1),
                         agg2.state_to_result_function(t2))

There's a small wart: automatically parallelized queries can generate
"reducible" aggregates that have no state_to_result function, since we
want to pass the state back to the coordinator. Detect that and short
circuit evaluation to pass the accumulator directly.
2023-07-03 19:45:17 +03:00
Avi Kivity
b1b4a18ad8 cql3: expression: add helpers to manage an expression's aggregation depth
We define the "aggregation depth" of an expression by how many
nested aggregation functions are applied. In CQL/SQL, legal
values are 0 and 1, but for generality we deal with any aggregation depth.

The first helper measures the maximum aggregation depth along any path
in the expression graph. If it's 2 or greater, we have something like
max(max(x)) and we should reject it (though these helpers don't). If
we get 1 it's a simple aggregation. If it's zero then we're not aggregating
(though CQL may decide to aggregate anyway if GROUP BY is used).

The second helper edits an expression to make sure the aggregation depth
along any path that reaches a column is the same. Logically,
`SELECT x, max(y)` does not make sense, as one is a vector of values
and the other is a scalar. CQL resolves the problem by defining x as
"the first value seen". We apply this resolution by converting the
query to `SELECT first(x), max(y)` (where `first()` is an internal
aggregate function), so both selectors refer to scalars that consume
vectors.

When a scalar is consumed by an aggregate function (for example,
`SELECT max(x), min(17)` we don't have to bother, since a scalar
is implicity promoted to a vector by evaluating it every row. There
is some ambiguity if the scalar is a non-pure function (e.g.
`SELECT max(x), min(random())`, but it's not worth following.

A small unit test is added.
2023-07-03 19:45:16 +03:00
Avi Kivity
b858a4669d cql3: expr: break up expression.hh header
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:

 - moving some restrictions-related definitions to
   the existing expr/restrictions.hh
 - moving evaluation related names to a new header
   expr/evaluate.hh
 - move utilities to a new header
   expr/expr-utilities.hh

expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
2023-06-22 14:21:03 +03:00