Commit Graph

162 Commits

Author SHA1 Message Date
Avi Kivity
e739f2b779 cql3: expr: make evaluate() return a cql3::raw_value rather than an expr::constant
An expr::constant is an expression that happens to represent a constant,
so it's too heavyweight to be used for evaluation. Right now the extra
weight is just a type (which causes extra work by having to maintain
the shared_ptr reference count), but it will grow in the future to include
source location (for error reporting) and maybe other things.

Prior to e9b6171b5 ("Merge 'cql3: expr: unify left-hand-side and
right-hand-side of binary_operator prepares' from Avi Kivity"), we had
to use expr::constant since there was not enough type infomation in
expressions. But now every expression carries its type (in programming
language terms, expressions are now statically typed), so carrying types
in values is not needed.

So change evaluate() to return cql3::raw_value. The majority of the
patch just changes that. The rest deals with some fallout:

 - cql3::raw_value gains a view() helper to convert to a raw_value_view,
   and is_null_or_unset() to match with expr::constant and reduce further
   churn.
 - some helpers that worked on expr::constant and now receive a
   raw_value now need the type passed via an additional argument. The
   type is computed from the expression by the caller.
 - many type checks during expression evaluation were dropped. This is
   a consequence of static typing - we must trust the expression prepare
   phase to perform full type checking since values no longer carry type
   information.

Closes #10797
2022-06-15 08:47:24 +02:00
Avi Kivity
6d943e6cd0 cql3: expr: drop column_maybe_subscripted
column_maybe_subscripted is a variant<column_value*, subscript*> that
existed for two reasons:
  1. evaluation of subscripts and of columns took different paths.
  2. calculation of the type of column or column[sub] took different paths.

Now that all evaluations go through evaluate(), and the types are
present in the expression itself, there is no need for column_maybe_subscripted
and it is replaced with plain expressions.
2022-06-12 19:21:28 +03:00
Avi Kivity
2aa9199e9a cql3: expr: possible_lhs_values(): open-code get_value_comparator()
get_value_comparator() is going away soon, so open-code it here. It's
not doing much anyway.
2022-06-12 19:14:50 +03:00
Avi Kivity
b1c12073b1 cql3: expr: rationalize lhs/rhs argument order
Some functions accept the right-hand-side as the first argument
and the left-hand-side as the second argument. This is now confusing,
but at least safe-ish, as the arguments have different types. It's
going to become dangerous when we switch to expressions for both sides,
so let's rationalize it by always starting with lhs.

Some parameters were annotated with _lhs/_rhs when it was not clear.
2022-06-12 18:55:24 +03:00
Avi Kivity
9beac1df53 cql3: expr: don't rely on grammar when comparing tuples
The grammar only allows comparing tuples of clustering columns, which
are non-null, but let's not rely on that deep in expression evaluation
as it can be relaxed.
2022-06-12 18:41:03 +03:00
Avi Kivity
9a4f2a8cc3 cql3: expr: wire column_value and subscript to evaluate()
With everything standardized on evaluation_inputs(), it's a matter of calling
get_value().
2022-06-12 18:21:04 +03:00
Avi Kivity
30721fdc4a cql3: get_value(subscript): remove gratuitous pointer
While extracting get_value(subscript) we inherited a pointer due
to the calling convention, we can now remove it.
2022-06-12 18:18:59 +03:00
Avi Kivity
dd2fec9cb1 cql3: expr: reindent get_value(subscript)
Whitespace only change.
2022-06-12 18:04:12 +03:00
Avi Kivity
31b9e2a565 cql3: expr: extract get_value(subscript) from get_value(column_maybe_subscripted)
We wish to wire get_value(subscript) into evaluate (and get rid of
column_maybe_subscripted).
2022-06-12 18:03:03 +03:00
Avi Kivity
b5287db8ea cql3: expr: drop internal 'column_value_eval_bag'
is_satisfied_by() used an internal column_value_eval_bag type that
was more awkwardly named (and more awkward to use due to more nesting)
than evaluation_inputs. Drop it and use evaluation_inputs throughout.

The thunk is_satisified_by(evaluation_inputs) that just called
is_satisified_by(column_value_eval_bag) is dropped.
2022-06-12 17:12:41 +03:00
Avi Kivity
55085906ca cql3: expr: change evalute() to accept evaluation_inputs
Currently, evaluate() accepts only query_options, which makes
it not useful to evaluate columns. As a result some callers
(column_condition) have to call it directly on the right-hand-side
of binary expressions instead of evaluating the binary expression
itself.

Change it to accept evaluation_input as a parameter, but keep
the old signature too, since it is called from many places that
don't have rows.
2022-06-12 16:51:42 +03:00
Avi Kivity
2ecdb219fb cql3: expr: make evaluate(<expression subtype>) static
They aren't called from anywhere outside expression.cc, and
we're playing with the signatures, so hide them to avoid
rebuilds.
2022-06-12 16:13:20 +03:00
Avi Kivity
c80999fab4 cql3: expr: push is_satisfied_by regular and static column extraction to callers
is_satisfied_by() rearranges the static and regular columns from
query::result_row_view form (which is a use-once iterator) to
std::vector<managed_bytes_opt> (which uses the standard value
representation, and allows random access which expression
evaluation needs). Doing it in is_saitisfied_by() means that it is
done every time an expression is evaluated, which is wasteful. It's
also done even if the expression doesn't need it at all.

Push it out to callers, which already eliminates some calls.

We still pass cql3::expr::selection, which is a layering violation,
but that is left to another time.

Note that in view.cc's check_if_matches(), we should have been
able to move static_and_regular_columns calculation outside the
loop. However, we get crashes if we do. This is likely due to a
preexisting bug (which the zero iterations loop avoids). However,
in selection.cc, we are able to avoid the computation when the code
claims it is only handling partition keys or clustering keys.
2022-06-12 16:12:41 +03:00
Avi Kivity
4b715226fe cql3: expr: convert is_satisfied_by() signature to evaluation_inputs
Callers are converted, but the internals are kept using the old
conventions until more APIs are converted.

Although the new API allows passing no query_options, the view code
keeps passing dummy query_options and improvement is left as a FIXME.
2022-06-12 12:53:44 +03:00
Avi Kivity
10aa6ddca3 cql3: expr: introduce type_of() to fetch the type of an expression
For most types, we just return the type field. A few expressions have
other methods to access the type, and some expressions cannot survive
prepare and so calling type_of() on them is illegal.
2022-06-01 18:47:58 +03:00
Avi Kivity
43a3c94532 cql3: expr: keep type information in casts
Currently, preparing a cast drops the cast completely (as the
types are verified to be binary compatibile). This means we lose
the casted-to type. Since we wish to keep type infomation, keep the
cast in the prepared expression tree (and therefore the casted-to
type).

Once we do that, we must extend evaluate() to support cast
expressions.
2022-06-01 18:46:55 +03:00
Avi Kivity
8d667e374b cql3: expressions: reindent get_value()
Whitespace-only change.
2022-05-30 13:29:49 +03:00
Avi Kivity
05388f7a2a cql3: expression: evaluate() support subscripting lists
We already support subscripting maps (for filtering WHERE m[3] = 6),
so adding list subscript support is easy. Most of the code is shared.
Differences are:
 - internal list representation is a vector of values, not of key/values
 - key type is int32_type, not defined by map
 - need to check index bounds
2022-05-30 13:29:49 +03:00
Jan Ciolek
f9b1fc0b69 cql: Forbid null in lists of IN values
We used to allow nulls in lists of IN values,
i.e. a query like this would be valid:
SELECT * FROM tab WHERE pk IN (1, null, 2);

This is an old feature that isn't really used
and is already forbidden in Cassandra.

Additionally the current implementation
doesn't allow for nulls inside the list
if it's sent as a bound value.
So something like:
SELECT * FROM tab WHERE pk IN ?;
would throw an error if ? was (1, null, 2).
This is inconsistent.

Allowing it made writing code cumbersome because
this was the only case where having a null
inside of a collection was allowed.
Because of it there needed to be
separate code paths to handle regular lists
and lists of NULL values.

Forbidding it makes the code nicer and consistent
at the cost of a feature that isn't really
important.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-05-24 00:17:41 +02:00
Michał Sala
f6bdc4d694 cql3: expr: add printer for expression
expression::printer is used to print CQL expressions
in a pretty way that allows them to be parsed back
to the same representation.

There is a bunch of things that need to be changed when
compared to the current implementation of opreatorr<<(expression)
to output something parsable.

column names should be printed without 'unresolved_identifier()'
and sometimes they need to be quoted to perserve case sensitivity.

I needed to write new code for printing constant values
because the current one did debug printing
(e.g. a set was printed as '1; 2; 3').

A list of IN values should be printed inside () intead of [],
but because it is internally represented as a list it is
by default printed with [].
To fix this a temporary tuple_constructor is created and printed.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:17:58 +02:00
cvybhu
be6e741b6c clq3: expr: Add columns to expr::token struct
The expr::token struct is created when something
like token(p1, p2) occurs in the WHERE clause.

Currently expr::token doesn't keep columns passed
as arguemnts to the token function.

They weren't needed because token() validation
was done inside token_relation.

Now that we want to use only expressions
we need to have columns inside the token struct
and validate that those are the correct columns.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
2022-05-16 18:03:11 +02:00
Michał Sala
69bc60ef24 cql3: expr: do not allow unset values inside collections
Semantic of unset values inside collections is undefined.
Previous behavior of transforming list with unset value into unset value
was removed, because I couldn't find a reason for its existence.
2022-04-28 19:33:09 +02:00
Nadav Har'El
fbb2a41246 expressions: don't dereference invalid map subscript in filter
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).

Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:

 * For NULL, we consider m[NULL] to result in NULL - instead of an error.
   This behavior is more consistent with other expressions that contain
   null - for example NULL[2] and NULL<2 both result in NULL as well.
   Moreover, if in the future we allow more complex expressions, such
   as m[a] (where a is a column), we can find the subscript to be null
   for some rows and non-null for other rows - and throwing an "invalid
   query" in the middle of the filtering doesn't make sense.

 * For UNSET_VALUE, we do consider this an error like Cassandra, and use
   the same error message as Cassandra. However, the current implementation
   checks for this error only when the expression is evaluated - not
   before. It means that if the scan is empty before the filtering, the
   error will not be reported and we'll silently return an empty result
   set. We currently consider this ok, but we can also change this in the
   future by binding the expression only once (today we do it on every
   evaluation) and validating it once after this binding.

Fixes #10361
Fixes #10399

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 16:05:34 +03:00
Nadav Har'El
808a93d29b expressions: fix invalid dereference in map subscript evaluation
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.

The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).

The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.

Fixes #10417

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-04-24 14:58:56 +03:00
cvybhu
5c199cad45 cql3: expr: possible_lhs_values: Handle subscript
This commit makes subscript an invalid argument to possible_lhs_values.
Previously this function simply ignored subscripts
and behaved as if it was called on the subscripted column
without a subscript.

This behaviour is unexpected and potentially
dangerous so it would be better to forbid
passing subscript to possible_lhs_values entirely.

Trying to handle subscript correctly is impossible
without refactoring the whole function.
The first argument is a column for which we would
like to know the possible values.
What are possible values of a subscripted column c where c[0] = 1?
All lists that have 1 on 0th position?

If we wanted to handle this nicely we would have to
change the arguments.
Such refectoring is best left until the time
when this functionality is actually needed,
right now it's hard to predict what interface
will be needed then.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>

Closes #10228
2022-04-11 19:05:09 +03:00
Jan Ciolek
2e7009f427 cql3: expr: is_supported_by: Return false for subscripted values
is_supported_by checks whether a given restriction
can be supported by some index.

Currently when a subscripted value, e.g `m[1]` is encountered,
we ignore the fact that there is a subscript and ask
whether an index can support the `m` itself.

This looks like unintentional behaviour leftover
from the times when column_value had a sub field,
which could be easily forgotten about.

Scylla doesn't support indexes on collection elements at all,
so simply returning false there seems like a good idea.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #10227
2022-03-15 20:19:33 +02:00
Jan Ciolek
e086201420 cql3: expr: Remove sub from column_value
column_value::sub has been replaced by the subscript struct
everywhere, so we can finally remove it.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 22:02:39 +01:00
Jan Ciolek
b80f9e6cf8 cql3: Create a subscript in single_column_relation
When `val[sub]` is parsed, it used to be the case
that column_value with a sub field was created.

Now this has been changed to creating a subscript struct.

This is the only place where a subscripted value can be created.

All the code regarding subscripts now operates using only the
subscript struct, so we will be able to remove column_value::sub soon.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 22:02:39 +01:00
Jan Ciolek
cf6e81e731 cql3: expr: Add subscript to expression
All handlers for subscript have finally been implemented
and subscript can now be added to expression without
any trouble.

All the commented out code that waited for this moment
can now be uncommented.
Every such piece of code had a `TODO(subscript)` note
and by grepping this phrase we can make sure that
we didn't forget any of them.

Right now there is two ways to express a subscripted
column - either by a column_value with a sub field
or by using a subscript struct.

The grammar still uses the old column_value way,
but column_value.sub will be removed soon
and everything will move to the subscript struct.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 22:02:29 +01:00
Jan Ciolek
c39498537c cql3: expr: Handle subscript in fill_prepare_context
fill_prepare_context collects useful information about
the expression involved in query restrictions.

We should collect this information from subscript as well,
just like we do from column_value and its sub.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
811685ad6a cql3: expr: Handle subscript in evaluate
A column_value can't be evaluated,
so a subscripted column can't evaluated be as well.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
db8990436a cql3: expr: Handle subscript in extract_single_column_restrictions_for_column
extract_single_column_restrictions_for_column finds all restrictions
for a column and puts them in a vector.

In case we encounter col[sub] we treat it as a restriction on col
and add it to the result.

This seems to make some sense and is in line with the current behaviour
which doesn't check whether a column is subscripted at all.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
2eaa39e1c8 cql3: expr: Handle subscript in search_and_replace
Prepare a handler for subscript in search_and_replace.
Some of the code must be commented out for now
because subscript hasn't been added to expression yet.

It will uncommented later.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
e2d983f659 cql3: expr: Handle subscript in recurse_until
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
2d4174dc46 cql3: expr: Implement operator<< for subscript
expression can be printed using operator<<.
We need to handle subscript there.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
02c3b78e25 cql3: expr: Handle subscript in possible_lhs_values
possible_lhs_values returns set of possible values
for a column given some restrictions.

Current behaviour in case of a subscripted column
is to just ignore the subscript and treat
the restriction as if it were on just the column.

This seems wrong, or at least confusing,
but I won't change it in this patch to preserve the existing behaviour.

Trying to change this to something more reasonable
breaks other code which assumes that possible_lhs_values
returns a list of values.
(See partition_ranges_from_EQs() in cql3/restrictions/statement_restrictions.cc)

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
07fbf74a97 cql3: expr: Handle subscript in is_supported_by
is_supported_by checks whether the given expression
is supported by some index.

The current behaviour seems wrong, but I kept
it to avoid making changes in a refactor PR.

Scylla doesn't have indexes on map entries yet,
so for a subscript the answer is always no.
I think we should just return false there.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
fb59f488df cql3: expr: Handle subscript in is_satisifed_by
For the most part subscript can be handled
in the same way as column_value.

column_value has a sub argument and all
called functions evaluate lhs value using
get_value() which is prepared to handle
subscripted columns.

These functions now take column_maybe_subscripted
so we can pass &subscript to them without a problem.

The difference is in CONTAINS, CONTAINS_KEY and LIKE.

contains() and contains_key() throw an exception
when the passed column has a subscript, so now
we just throw an exception immediately.

like() doesn't have a check for subscripted value,
but from reading its code it's clear that
it's not ready to handle such values,
so an exception is now thrown as well.
It shouldn't break any tests because when one tries
to perform a query like:

`select * from t where m[0] like '%' allow filtering;`

an exception is throw somewhere earlier in the code.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
1edaa3ef0d cql3: expr: Remove unused attribute
Functions that were previously marked as unused to make the code
compile are now used and we can remove the markings.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
cf839807ac cql3: expr: Use column_maybe_subscripted in is_one_of()
is_one_of() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
75c8b2ec6c cql3: expr: Use column_maybe_subscripted in limits()
limits() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
bc8c298be3 cql3: expr: Use column_maybe_subscripted in equal()
equal() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.

To get lhs value the code uses get_value() which is ready to handle subscripted columns.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
ca423a455e cql3: expr: add get_subscripted_column(column_maybe_subscripted)
Add a function that extracts the column_value
from column_maybe_subscripted.

There were already overloads for expression and subscript,
but this one will be needed as well.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
6d42ff580d cql3: expr: Add as_column_maybe_subscripted
Add a convenience function that allows to convert
a reference to expression to column_maybe_subscripted.

It will be useful in a moment.

For now part of it must be commented out
because subscript is not in the expression variant yet.

It will be uncommented once subscript is finally added
to expression.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
d577af0f0c cql3: expr: Make get_value_comparator work with column_maybe_subscripted
There is get_value_comparator(column_value) but soon
we will also need get_value_comparator(column_maybe_subscripted).

Implement it by copying code from get_value_comparator(column_value).

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
a8287a158a cql3: expr: Make get_value work with column_maybe_subscripted
There is a get_value(column_value), but soon we will also
need get_value(column_maybe_subscripted).

Implement get_value(column_maybe_subscripted) by checking
whether the argument is a column_value or subscript
and calling the right code.

Code for handling the subscript case is copied from
get_value(column_value) where sub has value.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
feee6e4ffb cql3: expr: Add column_maybe_subscripted
column_maybe_subscripted is a variant that
can be either a column_value or a subscript.

It will be used as an argument to functions
which used to take column_value.

Right now column_value has a sub field,
but this will be removed soon once
the subscript struct takes over.

Changing the argument type is a smaller change
than rewriting all these functions, although
if they were rewritten the resulting code
would probably be nicer.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:41 +01:00
Jan Ciolek
4d7438d30a cql3: expr: Add get_subscripted_column
Even though the new subscript allows for subscripting anything,
the only thing that is really allowed to be subscripted is a column.

Add a utility function that extracts the column_value
from an expression with is a column_value or subscript.

It will came in handy in the following commits.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-02-27 21:56:30 +01:00
Piotr Sarna
5a13ff09e9 expression: fix get_value for mismatched column definitions
As observed in #10026, after schema changes it somehow happened
that a column defition that does not match any of the base table
columns was passed to expression verification code.
The function that looks up the index of a column happens to return
-1 when it doesn't find anything, so using this returned index
without checking if it's nonnegative results in accessing invalid
vector data, and a segfault or silent memory corruption.
Therefore, an explicit check is added to see if the column was actually
found. This serves two purposes:
 - avoiding segfaults/memory corruption
 - making it easier to investigate the root cause of #10026

Closes #10039
2022-02-07 18:40:48 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00