An expr::constant is an expression that happens to represent a constant,
so it's too heavyweight to be used for evaluation. Right now the extra
weight is just a type (which causes extra work by having to maintain
the shared_ptr reference count), but it will grow in the future to include
source location (for error reporting) and maybe other things.
Prior to e9b6171b5 ("Merge 'cql3: expr: unify left-hand-side and
right-hand-side of binary_operator prepares' from Avi Kivity"), we had
to use expr::constant since there was not enough type infomation in
expressions. But now every expression carries its type (in programming
language terms, expressions are now statically typed), so carrying types
in values is not needed.
So change evaluate() to return cql3::raw_value. The majority of the
patch just changes that. The rest deals with some fallout:
- cql3::raw_value gains a view() helper to convert to a raw_value_view,
and is_null_or_unset() to match with expr::constant and reduce further
churn.
- some helpers that worked on expr::constant and now receive a
raw_value now need the type passed via an additional argument. The
type is computed from the expression by the caller.
- many type checks during expression evaluation were dropped. This is
a consequence of static typing - we must trust the expression prepare
phase to perform full type checking since values no longer carry type
information.
Closes#10797
column_maybe_subscripted is a variant<column_value*, subscript*> that
existed for two reasons:
1. evaluation of subscripts and of columns took different paths.
2. calculation of the type of column or column[sub] took different paths.
Now that all evaluations go through evaluate(), and the types are
present in the expression itself, there is no need for column_maybe_subscripted
and it is replaced with plain expressions.
Some functions accept the right-hand-side as the first argument
and the left-hand-side as the second argument. This is now confusing,
but at least safe-ish, as the arguments have different types. It's
going to become dangerous when we switch to expressions for both sides,
so let's rationalize it by always starting with lhs.
Some parameters were annotated with _lhs/_rhs when it was not clear.
The grammar only allows comparing tuples of clustering columns, which
are non-null, but let's not rely on that deep in expression evaluation
as it can be relaxed.
is_satisfied_by() used an internal column_value_eval_bag type that
was more awkwardly named (and more awkward to use due to more nesting)
than evaluation_inputs. Drop it and use evaluation_inputs throughout.
The thunk is_satisified_by(evaluation_inputs) that just called
is_satisified_by(column_value_eval_bag) is dropped.
Currently, evaluate() accepts only query_options, which makes
it not useful to evaluate columns. As a result some callers
(column_condition) have to call it directly on the right-hand-side
of binary expressions instead of evaluating the binary expression
itself.
Change it to accept evaluation_input as a parameter, but keep
the old signature too, since it is called from many places that
don't have rows.
is_satisfied_by() rearranges the static and regular columns from
query::result_row_view form (which is a use-once iterator) to
std::vector<managed_bytes_opt> (which uses the standard value
representation, and allows random access which expression
evaluation needs). Doing it in is_saitisfied_by() means that it is
done every time an expression is evaluated, which is wasteful. It's
also done even if the expression doesn't need it at all.
Push it out to callers, which already eliminates some calls.
We still pass cql3::expr::selection, which is a layering violation,
but that is left to another time.
Note that in view.cc's check_if_matches(), we should have been
able to move static_and_regular_columns calculation outside the
loop. However, we get crashes if we do. This is likely due to a
preexisting bug (which the zero iterations loop avoids). However,
in selection.cc, we are able to avoid the computation when the code
claims it is only handling partition keys or clustering keys.
Callers are converted, but the internals are kept using the old
conventions until more APIs are converted.
Although the new API allows passing no query_options, the view code
keeps passing dummy query_options and improvement is left as a FIXME.
For most types, we just return the type field. A few expressions have
other methods to access the type, and some expressions cannot survive
prepare and so calling type_of() on them is illegal.
Currently, preparing a cast drops the cast completely (as the
types are verified to be binary compatibile). This means we lose
the casted-to type. Since we wish to keep type infomation, keep the
cast in the prepared expression tree (and therefore the casted-to
type).
Once we do that, we must extend evaluate() to support cast
expressions.
We already support subscripting maps (for filtering WHERE m[3] = 6),
so adding list subscript support is easy. Most of the code is shared.
Differences are:
- internal list representation is a vector of values, not of key/values
- key type is int32_type, not defined by map
- need to check index bounds
We used to allow nulls in lists of IN values,
i.e. a query like this would be valid:
SELECT * FROM tab WHERE pk IN (1, null, 2);
This is an old feature that isn't really used
and is already forbidden in Cassandra.
Additionally the current implementation
doesn't allow for nulls inside the list
if it's sent as a bound value.
So something like:
SELECT * FROM tab WHERE pk IN ?;
would throw an error if ? was (1, null, 2).
This is inconsistent.
Allowing it made writing code cumbersome because
this was the only case where having a null
inside of a collection was allowed.
Because of it there needed to be
separate code paths to handle regular lists
and lists of NULL values.
Forbidding it makes the code nicer and consistent
at the cost of a feature that isn't really
important.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
expression::printer is used to print CQL expressions
in a pretty way that allows them to be parsed back
to the same representation.
There is a bunch of things that need to be changed when
compared to the current implementation of opreatorr<<(expression)
to output something parsable.
column names should be printed without 'unresolved_identifier()'
and sometimes they need to be quoted to perserve case sensitivity.
I needed to write new code for printing constant values
because the current one did debug printing
(e.g. a set was printed as '1; 2; 3').
A list of IN values should be printed inside () intead of [],
but because it is internally represented as a list it is
by default printed with [].
To fix this a temporary tuple_constructor is created and printed.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
The expr::token struct is created when something
like token(p1, p2) occurs in the WHERE clause.
Currently expr::token doesn't keep columns passed
as arguemnts to the token function.
They weren't needed because token() validation
was done inside token_relation.
Now that we want to use only expressions
we need to have columns inside the token struct
and validate that those are the correct columns.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Semantic of unset values inside collections is undefined.
Previous behavior of transforming list with unset value into unset value
was removed, because I couldn't find a reason for its existence.
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).
Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:
* For NULL, we consider m[NULL] to result in NULL - instead of an error.
This behavior is more consistent with other expressions that contain
null - for example NULL[2] and NULL<2 both result in NULL as well.
Moreover, if in the future we allow more complex expressions, such
as m[a] (where a is a column), we can find the subscript to be null
for some rows and non-null for other rows - and throwing an "invalid
query" in the middle of the filtering doesn't make sense.
* For UNSET_VALUE, we do consider this an error like Cassandra, and use
the same error message as Cassandra. However, the current implementation
checks for this error only when the expression is evaluated - not
before. It means that if the scan is empty before the filtering, the
error will not be reported and we'll silently return an empty result
set. We currently consider this ok, but we can also change this in the
future by binding the expression only once (today we do it on every
evaluation) and validating it once after this binding.
Fixes#10361Fixes#10399
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.
The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).
The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.
Fixes#10417
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This commit makes subscript an invalid argument to possible_lhs_values.
Previously this function simply ignored subscripts
and behaved as if it was called on the subscripted column
without a subscript.
This behaviour is unexpected and potentially
dangerous so it would be better to forbid
passing subscript to possible_lhs_values entirely.
Trying to handle subscript correctly is impossible
without refactoring the whole function.
The first argument is a column for which we would
like to know the possible values.
What are possible values of a subscripted column c where c[0] = 1?
All lists that have 1 on 0th position?
If we wanted to handle this nicely we would have to
change the arguments.
Such refectoring is best left until the time
when this functionality is actually needed,
right now it's hard to predict what interface
will be needed then.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Closes#10228
is_supported_by checks whether a given restriction
can be supported by some index.
Currently when a subscripted value, e.g `m[1]` is encountered,
we ignore the fact that there is a subscript and ask
whether an index can support the `m` itself.
This looks like unintentional behaviour leftover
from the times when column_value had a sub field,
which could be easily forgotten about.
Scylla doesn't support indexes on collection elements at all,
so simply returning false there seems like a good idea.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#10227
column_value::sub has been replaced by the subscript struct
everywhere, so we can finally remove it.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When `val[sub]` is parsed, it used to be the case
that column_value with a sub field was created.
Now this has been changed to creating a subscript struct.
This is the only place where a subscripted value can be created.
All the code regarding subscripts now operates using only the
subscript struct, so we will be able to remove column_value::sub soon.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
All handlers for subscript have finally been implemented
and subscript can now be added to expression without
any trouble.
All the commented out code that waited for this moment
can now be uncommented.
Every such piece of code had a `TODO(subscript)` note
and by grepping this phrase we can make sure that
we didn't forget any of them.
Right now there is two ways to express a subscripted
column - either by a column_value with a sub field
or by using a subscript struct.
The grammar still uses the old column_value way,
but column_value.sub will be removed soon
and everything will move to the subscript struct.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
fill_prepare_context collects useful information about
the expression involved in query restrictions.
We should collect this information from subscript as well,
just like we do from column_value and its sub.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
extract_single_column_restrictions_for_column finds all restrictions
for a column and puts them in a vector.
In case we encounter col[sub] we treat it as a restriction on col
and add it to the result.
This seems to make some sense and is in line with the current behaviour
which doesn't check whether a column is subscripted at all.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Prepare a handler for subscript in search_and_replace.
Some of the code must be commented out for now
because subscript hasn't been added to expression yet.
It will uncommented later.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
possible_lhs_values returns set of possible values
for a column given some restrictions.
Current behaviour in case of a subscripted column
is to just ignore the subscript and treat
the restriction as if it were on just the column.
This seems wrong, or at least confusing,
but I won't change it in this patch to preserve the existing behaviour.
Trying to change this to something more reasonable
breaks other code which assumes that possible_lhs_values
returns a list of values.
(See partition_ranges_from_EQs() in cql3/restrictions/statement_restrictions.cc)
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
is_supported_by checks whether the given expression
is supported by some index.
The current behaviour seems wrong, but I kept
it to avoid making changes in a refactor PR.
Scylla doesn't have indexes on map entries yet,
so for a subscript the answer is always no.
I think we should just return false there.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
For the most part subscript can be handled
in the same way as column_value.
column_value has a sub argument and all
called functions evaluate lhs value using
get_value() which is prepared to handle
subscripted columns.
These functions now take column_maybe_subscripted
so we can pass &subscript to them without a problem.
The difference is in CONTAINS, CONTAINS_KEY and LIKE.
contains() and contains_key() throw an exception
when the passed column has a subscript, so now
we just throw an exception immediately.
like() doesn't have a check for subscripted value,
but from reading its code it's clear that
it's not ready to handle such values,
so an exception is now thrown as well.
It shouldn't break any tests because when one tries
to perform a query like:
`select * from t where m[0] like '%' allow filtering;`
an exception is throw somewhere earlier in the code.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Functions that were previously marked as unused to make the code
compile are now used and we can remove the markings.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
is_one_of() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
limits() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
equal() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.
To get lhs value the code uses get_value() which is ready to handle subscripted columns.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that extracts the column_value
from column_maybe_subscripted.
There were already overloads for expression and subscript,
but this one will be needed as well.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a convenience function that allows to convert
a reference to expression to column_maybe_subscripted.
It will be useful in a moment.
For now part of it must be commented out
because subscript is not in the expression variant yet.
It will be uncommented once subscript is finally added
to expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There is get_value_comparator(column_value) but soon
we will also need get_value_comparator(column_maybe_subscripted).
Implement it by copying code from get_value_comparator(column_value).
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There is a get_value(column_value), but soon we will also
need get_value(column_maybe_subscripted).
Implement get_value(column_maybe_subscripted) by checking
whether the argument is a column_value or subscript
and calling the right code.
Code for handling the subscript case is copied from
get_value(column_value) where sub has value.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
column_maybe_subscripted is a variant that
can be either a column_value or a subscript.
It will be used as an argument to functions
which used to take column_value.
Right now column_value has a sub field,
but this will be removed soon once
the subscript struct takes over.
Changing the argument type is a smaller change
than rewriting all these functions, although
if they were rewritten the resulting code
would probably be nicer.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Even though the new subscript allows for subscripting anything,
the only thing that is really allowed to be subscripted is a column.
Add a utility function that extracts the column_value
from an expression with is a column_value or subscript.
It will came in handy in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
As observed in #10026, after schema changes it somehow happened
that a column defition that does not match any of the base table
columns was passed to expression verification code.
The function that looks up the index of a column happens to return
-1 when it doesn't find anything, so using this returned index
without checking if it's nonnegative results in accessing invalid
vector data, and a segfault or silent memory corruption.
Therefore, an explicit check is added to see if the column was actually
found. This serves two purposes:
- avoiding segfaults/memory corruption
- making it easier to investigate the root cause of #10026Closes#10039
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937