Commit Graph

244 Commits

Author SHA1 Message Date
Avi Kivity
e739f2b779 cql3: expr: make evaluate() return a cql3::raw_value rather than an expr::constant
An expr::constant is an expression that happens to represent a constant,
so it's too heavyweight to be used for evaluation. Right now the extra
weight is just a type (which causes extra work by having to maintain
the shared_ptr reference count), but it will grow in the future to include
source location (for error reporting) and maybe other things.

Prior to e9b6171b5 ("Merge 'cql3: expr: unify left-hand-side and
right-hand-side of binary_operator prepares' from Avi Kivity"), we had
to use expr::constant since there was not enough type infomation in
expressions. But now every expression carries its type (in programming
language terms, expressions are now statically typed), so carrying types
in values is not needed.

So change evaluate() to return cql3::raw_value. The majority of the
patch just changes that. The rest deals with some fallout:

 - cql3::raw_value gains a view() helper to convert to a raw_value_view,
   and is_null_or_unset() to match with expr::constant and reduce further
   churn.
 - some helpers that worked on expr::constant and now receive a
   raw_value now need the type passed via an additional argument. The
   type is computed from the expression by the caller.
 - many type checks during expression evaluation were dropped. This is
   a consequence of static typing - we must trust the expression prepare
   phase to perform full type checking since values no longer carry type
   information.

Closes #10797
2022-06-15 08:47:24 +02:00
Avi Kivity
6d943e6cd0 cql3: expr: drop column_maybe_subscripted
column_maybe_subscripted is a variant<column_value*, subscript*> that
existed for two reasons:
  1. evaluation of subscripts and of columns took different paths.
  2. calculation of the type of column or column[sub] took different paths.

Now that all evaluations go through evaluate(), and the types are
present in the expression itself, there is no need for column_maybe_subscripted
and it is replaced with plain expressions.
2022-06-12 19:21:28 +03:00
Avi Kivity
2aa9199e9a cql3: expr: possible_lhs_values(): open-code get_value_comparator()
get_value_comparator() is going away soon, so open-code it here. It's
not doing much anyway.
2022-06-12 19:14:50 +03:00
Avi Kivity
b1c12073b1 cql3: expr: rationalize lhs/rhs argument order
Some functions accept the right-hand-side as the first argument
and the left-hand-side as the second argument. This is now confusing,
but at least safe-ish, as the arguments have different types. It's
going to become dangerous when we switch to expressions for both sides,
so let's rationalize it by always starting with lhs.

Some parameters were annotated with _lhs/_rhs when it was not clear.
2022-06-12 18:55:24 +03:00
Avi Kivity
9beac1df53 cql3: expr: don't rely on grammar when comparing tuples
The grammar only allows comparing tuples of clustering columns, which
are non-null, but let's not rely on that deep in expression evaluation
as it can be relaxed.
2022-06-12 18:41:03 +03:00
Avi Kivity
9a4f2a8cc3 cql3: expr: wire column_value and subscript to evaluate()
With everything standardized on evaluation_inputs(), it's a matter of calling
get_value().
2022-06-12 18:21:04 +03:00
Avi Kivity
30721fdc4a cql3: get_value(subscript): remove gratuitous pointer
While extracting get_value(subscript) we inherited a pointer due
to the calling convention, we can now remove it.
2022-06-12 18:18:59 +03:00
Avi Kivity
dd2fec9cb1 cql3: expr: reindent get_value(subscript)
Whitespace only change.
2022-06-12 18:04:12 +03:00
Avi Kivity
31b9e2a565 cql3: expr: extract get_value(subscript) from get_value(column_maybe_subscripted)
We wish to wire get_value(subscript) into evaluate (and get rid of
column_maybe_subscripted).
2022-06-12 18:03:03 +03:00
Avi Kivity
248433d7e0 cql3: prepare_expr: prepare subscript type
The type of a subscript expression is the value comparator
of the expression (column) being subscripted, according to out
wierd naming.
2022-06-12 17:39:08 +03:00
Avi Kivity
b5287db8ea cql3: expr: drop internal 'column_value_eval_bag'
is_satisfied_by() used an internal column_value_eval_bag type that
was more awkwardly named (and more awkward to use due to more nesting)
than evaluation_inputs. Drop it and use evaluation_inputs throughout.

The thunk is_satisified_by(evaluation_inputs) that just called
is_satisified_by(column_value_eval_bag) is dropped.
2022-06-12 17:12:41 +03:00
Avi Kivity
55085906ca cql3: expr: change evalute() to accept evaluation_inputs
Currently, evaluate() accepts only query_options, which makes
it not useful to evaluate columns. As a result some callers
(column_condition) have to call it directly on the right-hand-side
of binary expressions instead of evaluating the binary expression
itself.

Change it to accept evaluation_input as a parameter, but keep
the old signature too, since it is called from many places that
don't have rows.
2022-06-12 16:51:42 +03:00
Avi Kivity
2ecdb219fb cql3: expr: make evaluate(<expression subtype>) static
They aren't called from anywhere outside expression.cc, and
we're playing with the signatures, so hide them to avoid
rebuilds.
2022-06-12 16:13:20 +03:00
Avi Kivity
c80999fab4 cql3: expr: push is_satisfied_by regular and static column extraction to callers
is_satisfied_by() rearranges the static and regular columns from
query::result_row_view form (which is a use-once iterator) to
std::vector<managed_bytes_opt> (which uses the standard value
representation, and allows random access which expression
evaluation needs). Doing it in is_saitisfied_by() means that it is
done every time an expression is evaluated, which is wasteful. It's
also done even if the expression doesn't need it at all.

Push it out to callers, which already eliminates some calls.

We still pass cql3::expr::selection, which is a layering violation,
but that is left to another time.

Note that in view.cc's check_if_matches(), we should have been
able to move static_and_regular_columns calculation outside the
loop. However, we get crashes if we do. This is likely due to a
preexisting bug (which the zero iterations loop avoids). However,
in selection.cc, we are able to avoid the computation when the code
claims it is only handling partition keys or clustering keys.
2022-06-12 16:12:41 +03:00
Avi Kivity
4b715226fe cql3: expr: convert is_satisfied_by() signature to evaluation_inputs
Callers are converted, but the internals are kept using the old
conventions until more APIs are converted.

Although the new API allows passing no query_options, the view code
keeps passing dummy query_options and improvement is left as a FIXME.
2022-06-12 12:53:44 +03:00
Avi Kivity
7a9b645d64 cql3: expr: introduce evaluation_inputs
An expression may refer to values provided externally: the partition
and clusterinng keys, the static and regular row (all providing
column values), and the query options (providing values for bind
variables). Currently, different evaluation functions
(evaluate(), get_value(), and is_satisfied_by()) receive different
subsets of these values.

As a first step towards unifying the various ways to evaluate an
expression, collect the parameters in a single structure. Since
different evaluation contexts have different subsets, make everything
optional (via a pointer). Note that callers are expected to verify
using the grammar or prepare phase that they don't refer to values
that are not provided.

The cql3::selection::selection parameter is provided to translate
from query::result_row_view to schema column indexes. This is pretty
bad since it means the translation needs to be done for every
evaluation and is therefore a candidate for removal, but is kept here
since that's how it's currently done.
2022-06-12 12:47:23 +03:00
Avi Kivity
7debf6780c cql3: expr: drop prepare_binop_lhs()
It is now just a thin wrapper around try_prepare_expression(), so
replace it with that.
2022-06-01 18:58:14 +03:00
Avi Kivity
76e0dc66e5 cql3: expr: move implementation of prepare_binop_lhs() to try_prepare_expression()
This unifies the left-hand-side and right-hand-side of expression preparation.
The contents of the visitor in prepare_binop_lhs() is moved to the
visitor in try_prepare_expression(). This usually replaces an
on_internal_error() branch.

An exception is tuple_constructor, which is valid in both the left-hand-side
and right-hand-side (e.g. WHERE (x, y) IN (?, ?, ?)). We previously
enhanced this case to support not having a a column_specification, so
we just delete the branch from prepare_binop_lhs.
2022-06-01 18:58:14 +03:00
Avi Kivity
046abc4323 cql3: expr: use recursive descent when preparing subscripts
When encountering a subscript as the left-hand-side of a binary operator,
we assume the subscripted value is a column and process it directly.

As a step towards de-specializing the left-hand-side of binary operators,
use recursive descent into prepare_binop_lhs() instead. This requires
generating a column_specification for arbitrary expressions, so we
add a column_specification_of() function for that. Currently it will
return a good representation for columns (the only input allowed by
the grammar) and a bad representation (the text representation of the
expression) for other expressions. We'll have to improve that when we
relax the grammar.
2022-06-01 18:58:12 +03:00
Avi Kivity
747a1dd244 cql3: expr: allow prepare of tuple_constructor with no receiver
Currently the only expression form that can appear on both the left
hand side of an expression and the right hand side is a tuple constructor,
so consequently it must support both modes of type processing - either
deriving the type from the expression, or imposing a type on the expression.
As an example, in

    WHERE (A, B) = (:a, :b)

the first tuple derives its type from the column types, while the
second tuple has the type of the first tuple imposed on it.

So, we adjust tuple_constructor_prepare_nontuple to support both forms.
This means allowing the receiver not to be present, and calculating the
tuple type if that is the case.
2022-06-01 18:48:55 +03:00
Avi Kivity
b1c8fd8fa5 cql3: expr: drop no longer used printable_relation parameter from prepare_binop_lhs()
Inching ever closer to unifying the two expression preparation variants.
2022-06-01 18:48:03 +03:00
Avi Kivity
4e0a089f3e cql3: expr: print only column name when failing to resolve column
resolve_column() is part of the prepare stage, and tries to
resolve a column name in a query against the table's columns.

If it fails, it prints the containing binary_expression as
context. However, that's unnecessary - the unresolved
column name is sufficient context. So print that.

The motivation is to unify preparation of binary_operator
left-hand-side and right-hand-side - prepare_expression()
doesn't have the extra parameter and it wouldn't make sense
to add it, as expressions might not be children of binary_operators.
2022-06-01 18:48:03 +03:00
Avi Kivity
9e213d979f cql3: expr: pass schema to prepare_expression
Currently prepare_expression is never used where a schema is needed -
it is called for the right-hand-side of binary operators (where we
don't accept columns) or for attributes like WRITETIME or TTL. But
when we unify expression preparation it will need to handle columns
too, and these need the schema to look up the column.

So pass the schema as a parameter. It is optional (a pointer) since
not all contexts will have a schema (for example CREATE AGGREGATE).
2022-06-01 18:48:03 +03:00
Avi Kivity
9a81285206 cql3: expr: prepare_binary_operator: drop unused argument ctx
This brings the calling convention closer to prepare_expression
so we can unify them.
2022-06-01 18:48:03 +03:00
Avi Kivity
9deabdfbf4 cql3: expr: stub type inference for prepare_expression
In CQL (and SQL) types flow in different directions in expression
components. In an expression

  A[:x] = :y

The type of A is known, the type of :x is derived from the type of A,
and the type of :y is derived from the type of A[:x].

Currently prepare_expression() only supports the second mode - an
expression's type is dictated by its caller via the column_specification
parameter. But this means it can only be used to evaluate the
right-hand-side of binary expressions, since the left-hand-side uses
the first mode, where the type is derived from the column, not
imposed by the caller.

To support both modes, make the column_specification parameter optional
(it is already a pointer so just accept null) and also make the returned
expression optional, to indicate failure to infer the type if the
column_specification was not given.

This patch only arranges for the new calling convention (as a new
try_prepare_expression call), it does not actually implement anything.
2022-06-01 18:48:03 +03:00
Avi Kivity
10aa6ddca3 cql3: expr: introduce type_of() to fetch the type of an expression
For most types, we just return the type field. A few expressions have
other methods to access the type, and some expressions cannot survive
prepare and so calling type_of() on them is illegal.
2022-06-01 18:47:58 +03:00
Avi Kivity
43a3c94532 cql3: expr: keep type information in casts
Currently, preparing a cast drops the cast completely (as the
types are verified to be binary compatibile). This means we lose
the casted-to type. Since we wish to keep type infomation, keep the
cast in the prepared expression tree (and therefore the casted-to
type).

Once we do that, we must extend evaluate() to support cast
expressions.
2022-06-01 18:46:55 +03:00
Avi Kivity
0a4a8c6b92 cql3: expr: add type field to subscript, field_selection, and null expressions
Almost all expressions either already have a type field or
have an O(1) way of reaching the type (for example, column_value
can access the type via its column_definition).

Add a type field to the few expression types that don't already
have it. Since prepare_expr() doesn't yet generate these expressions,
we don't have any place to populate it, so it remains null.
2022-06-01 18:45:56 +03:00
Avi Kivity
d984ea1b7a cql3: expr: cast: use data_type instead of cql3_type for the prepared form
A cast expression naturally includes a data type indicating what type
we are casting into. Right now the prepared form uses cql3_type.
Change it to data_type which is what other expressions use to reduce
friction. Since cql3_type is a thin wrapper around data_type, the
change is minimal.

The change propagates to selectable::with_cast, but again it is
minimal.
2022-06-01 12:19:53 +03:00
Avi Kivity
f9b3c6ddbd cql3: expr: drop restrictions on list subscripts
Restriction validation forbids lists (somewhat oddly, it talks about
indexes; validation should make a soft check about indexes (since it
can fall back to filtering) and a hard check about supported filtering
expressions), and enforces a map in another place. Remove the first
restriction and relax the second to allow lists as well as maps as
subscript operands.

Some validation messages are adjusted to reflect that lists are supported.
2022-05-30 13:29:49 +03:00
Avi Kivity
35e0474410 cql3: expr: prepare_expr: support subscripted lists
Infer the type of a list index as int32_type.

The error message when a non-subscriptable type is provided is
changed, so the corresponding test is changed too.
2022-05-30 13:29:49 +03:00
Avi Kivity
8d667e374b cql3: expressions: reindent get_value()
Whitespace-only change.
2022-05-30 13:29:49 +03:00
Avi Kivity
05388f7a2a cql3: expression: evaluate() support subscripting lists
We already support subscripting maps (for filtering WHERE m[3] = 6),
so adding list subscript support is easy. Most of the code is shared.
Differences are:
 - internal list representation is a vector of values, not of key/values
 - key type is int32_type, not defined by map
 - need to check index bounds
2022-05-30 13:29:49 +03:00
Jan Ciolek
f9b1fc0b69 cql: Forbid null in lists of IN values
We used to allow nulls in lists of IN values,
i.e. a query like this would be valid:
SELECT * FROM tab WHERE pk IN (1, null, 2);

This is an old feature that isn't really used
and is already forbidden in Cassandra.

Additionally the current implementation
doesn't allow for nulls inside the list
if it's sent as a bound value.
So something like:
SELECT * FROM tab WHERE pk IN ?;
would throw an error if ? was (1, null, 2).
This is inconsistent.

Allowing it made writing code cumbersome because
this was the only case where having a null
inside of a collection was allowed.
Because of it there needed to be
separate code paths to handle regular lists
and lists of NULL values.

Forbidding it makes the code nicer and consistent
at the cost of a feature that isn't really
important.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-05-24 00:17:41 +02:00
cvybhu
21453ac9a4 cql3: Remove scalar from bind_variable_scalar_prepare_expression
There is now only one function to prepare bind_variable,
so we can remove 'scalar' from its name.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
9b49d27a8d cql3: expr: Remove shape_type from bind_variable
shape_type was used in prepare_expression to differentiate
between a few cases and create the correct receivers.
This was used by the relation class.

Now creating the correct receiver has been delegated to the caller
of prepare_expression and all bind_variables can be handled
in the same simple way.

shape_type is not needed anymore.

Not having it is better because it simplifies things.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
c0fc82d4be cql3: Remove prepare_expression_multi_column
This function was used by multi_column_relation.hh,
but now it isn't needed anymore.

The only way to prepare a bind_variable is now the standard prepare_expression.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
Michał Sala
f6bdc4d694 cql3: expr: add printer for expression
expression::printer is used to print CQL expressions
in a pretty way that allows them to be parsed back
to the same representation.

There is a bunch of things that need to be changed when
compared to the current implementation of opreatorr<<(expression)
to output something parsable.

column names should be printed without 'unresolved_identifier()'
and sometimes they need to be quoted to perserve case sensitivity.

I needed to write new code for printing constant values
because the current one did debug printing
(e.g. a set was printed as '1; 2; 3').

A list of IN values should be printed inside () intead of [],
but because it is internally represented as a list it is
by default printed with [].
To fix this a temporary tuple_constructor is created and printed.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
c4f846dbc8 cql3: expr: expr::to_restriction: Handle token relations
Implement converting token relations to expressions.

The code is mostly tekken from functions in token_relation.hh,
because we are replicating functionliaty of the functions called
token_relation::new_XX_restrictions.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
5fc5012f9b cql3: expr: expr::to_restriction: Handle multi column relations
Implement converting multi column relations to expressions.

The code is mostly taken from functions in multi_column_relation.hh,
because we are replicating functionality of the functions called
multi_column_relation::new_XX_restriction.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
89950e02b5 cql3: expr: Add expr::to_restriction for single column relations
Add a function that will be used to convert expressions
received from the parser to restrictions.

Currently parser creates relations with expressions inside
and then those relations are converted to restrictions.

Once this function is implemented we will be able to skip
creating relations altogether and convert straight from
expression to restriction. This will allow us to remove
the relation class.

Further functionality will be implemented in the following commits.
This commit implements converting single column relations to expressions.

The code is mostly taken from functions in single_column_relation.hh,
because we are replicating functionality of the functions called
single_column_relation::new_XX_restriction.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:57 +02:00
cvybhu
3e5e5c4a17 cql3: expr: Add prepare_binary_operator
Add a function that allows to prepare
a binary_operator received from the parser.

It resolves columns on the LHS, calculates type of LHS,
and prepares RHS with the correct type.

It will be used by expr::to_restriction.

Some basic type checks are performed, but more throughout
checks will be required in expr::to_restriction to fully
validate a relation.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:15:37 +02:00
cvybhu
5dee55d433 cql3: expr: Change how prepare_expression handles bind_variable
The situation with preparing bind_variable is a bit strange,
there are four shapes of bind variables and receiver behaviour
is not in line with other types.

To prepare a bind_variable for a list of IN values for an int column
the current code requires us to pass a receiver of type int.
This is counterintuitive, to prepare a string we pass
a receiver with string type, so to prepare list<int> we should
pass a receiver of type list<int>, not just int.

This commit changes the behaviour in two ways:
- Shape of bind_variable doesn't matter anymore
- The bind_variable gets the receiver passed to prepare_expression,
  no more list<receiver> magic.

Other variants of bind_variable_x_prepare_expression are not removed yet
because they are needed by prepare_expression_mutlti_column.
They will be removed later, along with bind_variable::shape_type.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:15:36 +02:00
cvybhu
be6e741b6c clq3: expr: Add columns to expr::token struct
The expr::token struct is created when something
like token(p1, p2) occurs in the WHERE clause.

Currently expr::token doesn't keep columns passed
as arguemnts to the token function.

They weren't needed because token() validation
was done inside token_relation.

Now that we want to use only expressions
we need to have columns inside the token struct
and validate that those are the correct columns.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:03:11 +02:00
cvybhu
b99aae7d41 cql3: expr: Modify list_prepare_expression to handle lists of IN values
The standard CQL list type doesn't allow for nulls inside the collection.

However lists of IN values are the exception where bind nullsare allowed,
for example in restrictions like: p IN (1, 2, null)

To be able to use list_prepare_expression with lists of IN values
a flag is added to specify whether nulls should be allowed.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:03:11 +02:00
cvybhu
2b5818697a cql3: expr: Add expr::as_if for non-const expressions
expr::as_if is our wrapper for std::get_if.

There was a version for const expression*,
but there weren't one for mutable expression*.

Add the mutable version,
it will be needed in the following commits.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:03:11 +02:00
Michał Sala
69bc60ef24 cql3: expr: do not allow unset values inside collections
Semantic of unset values inside collections is undefined.
Previous behavior of transforming list with unset value into unset value
was removed, because I couldn't find a reason for its existence.
2022-04-28 19:33:09 +02:00
Michał Sala
4766e25d6e cql3: expr: prepare_expr: allow bind markers in collection literals
It's easier to allow them then not to do so.
2022-04-28 19:31:09 +02:00
Nadav Har'El
fbb2a41246 expressions: don't dereference invalid map subscript in filter
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).

Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:

 * For NULL, we consider m[NULL] to result in NULL - instead of an error.
   This behavior is more consistent with other expressions that contain
   null - for example NULL[2] and NULL<2 both result in NULL as well.
   Moreover, if in the future we allow more complex expressions, such
   as m[a] (where a is a column), we can find the subscript to be null
   for some rows and non-null for other rows - and throwing an "invalid
   query" in the middle of the filtering doesn't make sense.

 * For UNSET_VALUE, we do consider this an error like Cassandra, and use
   the same error message as Cassandra. However, the current implementation
   checks for this error only when the expression is evaluated - not
   before. It means that if the scan is empty before the filtering, the
   error will not be reported and we'll silently return an empty result
   set. We currently consider this ok, but we can also change this in the
   future by binding the expression only once (today we do it on every
   evaluation) and validating it once after this binding.

Fixes #10361
Fixes #10399

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 16:05:34 +03:00
Nadav Har'El
808a93d29b expressions: fix invalid dereference in map subscript evaluation
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.

The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).

The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.

Fixes #10417

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 14:58:56 +03:00