When analyzing a WHERE clause, we want to separate individual
factors (usually relations), and later partition them into
partition key, clustering key, and regular column relations. The
first step is separation, for which this helper is added.
Currently, it is not required since the grammar supplies the
expression in separated form, but this will not work once it is
relaxed to allow any expression in the WHERE clause.
A unit test is added.
Add a function that checks if there is an index
which supports one of the columns present in
the given expression.
This functionality will soon be needed for
clustering and nonprimary columns so it's
good to separate into a reusable function.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function which checks that an expression
contains only binary operators with '='.
Right now this check is done only in a single place,
but soon the same check will have to be done
for clustering columns as well, so the code
is moved to a separate function to prevent duplication.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There was a bug which caused incorrect results of limits()
for columns with reversed clustering order.
Such columns have reversed_type as their type and this
needs to be taken into account when comparing them.
It was introduced in 6d943e6cd0.
This commit replaced uses of get_value_comparator
with type_of. The difference between them is that
get_value_comparator applied ->without_reversed()
on the result type.
Because the type was reversed, comparisons like
1 < 2 evaluated to false.
This caused the test testIndexOnKeyWithReverseClustering
to fail, but sadly it wasn't caught by CI because
the CI itself has a bug that makes it skip some tests.
The test passes now, although it has to be run manually
to check that.
Fixes: #10918
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Closes#10994
value_for is a method from the restriction class
which finds the value for a given column.
Under the hood it makes use of possible_lhs_values.
It will be needed to implement some functionality
that was implemented using restrictions before.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that finds common columns
between two expressions.
It's used in error messages in the original
restrictions code so it must be included
in the new code as well for compatibility.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Restrictions code keeps restrictions for each column
in a map sorted by their position in the schema.
Then there are methods that allow to access
the restricted column in the correct order.
To replicate this in upcoming code
we need functions that implement this functionality.
The original comparator can be found in:
cql3/restrictions/single_column_restrictions.hh
For primary key columns this comparator compares their
positions in the schema.
For non-primary columns the position is assumed to
be clustering_key_size(), which seems pretty random.
To avoid passing the schema to the comparator
for nonprimary columns I just assume the
position is u32::max(). This seems to be
as good of a choice as clustering_key_size().
Orignally Cassandra used -1:
bc8a260471/src/java/org/apache/cassandra/config/ColumnDefinition.java (L79-L86)
We never end up comparing columns of different kind using this comparator anyway.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that gets the only column
from a single column restriction expression.
The code would be very similiar to
is_single_column_restriction, so a new
function is introducted to reduce duplication.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that checks whether an expression
contains restrictions on exactly one column.
This a "single_column_restriction"
in the same way that instances of
"class single_column_restriction" are.
It will be used later to distinguish cases
later once this class is removed
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Commit e739f2b779 ("cql3: expr: make evaluate() return a
cql3::raw_value rather than an expr::constant") introduced
raw_value::view() as a synonym to raw_value::to_view() to reduce
churn. To fix this duplication, we now remove raw_value::to_view().
raw_value::to_view() was picked for removal because is has fewer
call sites, reducing churn again.
Closes#10819
An expr::constant is an expression that happens to represent a constant,
so it's too heavyweight to be used for evaluation. Right now the extra
weight is just a type (which causes extra work by having to maintain
the shared_ptr reference count), but it will grow in the future to include
source location (for error reporting) and maybe other things.
Prior to e9b6171b5 ("Merge 'cql3: expr: unify left-hand-side and
right-hand-side of binary_operator prepares' from Avi Kivity"), we had
to use expr::constant since there was not enough type infomation in
expressions. But now every expression carries its type (in programming
language terms, expressions are now statically typed), so carrying types
in values is not needed.
So change evaluate() to return cql3::raw_value. The majority of the
patch just changes that. The rest deals with some fallout:
- cql3::raw_value gains a view() helper to convert to a raw_value_view,
and is_null_or_unset() to match with expr::constant and reduce further
churn.
- some helpers that worked on expr::constant and now receive a
raw_value now need the type passed via an additional argument. The
type is computed from the expression by the caller.
- many type checks during expression evaluation were dropped. This is
a consequence of static typing - we must trust the expression prepare
phase to perform full type checking since values no longer carry type
information.
Closes#10797
column_maybe_subscripted is a variant<column_value*, subscript*> that
existed for two reasons:
1. evaluation of subscripts and of columns took different paths.
2. calculation of the type of column or column[sub] took different paths.
Now that all evaluations go through evaluate(), and the types are
present in the expression itself, there is no need for column_maybe_subscripted
and it is replaced with plain expressions.
Some functions accept the right-hand-side as the first argument
and the left-hand-side as the second argument. This is now confusing,
but at least safe-ish, as the arguments have different types. It's
going to become dangerous when we switch to expressions for both sides,
so let's rationalize it by always starting with lhs.
Some parameters were annotated with _lhs/_rhs when it was not clear.
The grammar only allows comparing tuples of clustering columns, which
are non-null, but let's not rely on that deep in expression evaluation
as it can be relaxed.
is_satisfied_by() used an internal column_value_eval_bag type that
was more awkwardly named (and more awkward to use due to more nesting)
than evaluation_inputs. Drop it and use evaluation_inputs throughout.
The thunk is_satisified_by(evaluation_inputs) that just called
is_satisified_by(column_value_eval_bag) is dropped.
Currently, evaluate() accepts only query_options, which makes
it not useful to evaluate columns. As a result some callers
(column_condition) have to call it directly on the right-hand-side
of binary expressions instead of evaluating the binary expression
itself.
Change it to accept evaluation_input as a parameter, but keep
the old signature too, since it is called from many places that
don't have rows.
is_satisfied_by() rearranges the static and regular columns from
query::result_row_view form (which is a use-once iterator) to
std::vector<managed_bytes_opt> (which uses the standard value
representation, and allows random access which expression
evaluation needs). Doing it in is_saitisfied_by() means that it is
done every time an expression is evaluated, which is wasteful. It's
also done even if the expression doesn't need it at all.
Push it out to callers, which already eliminates some calls.
We still pass cql3::expr::selection, which is a layering violation,
but that is left to another time.
Note that in view.cc's check_if_matches(), we should have been
able to move static_and_regular_columns calculation outside the
loop. However, we get crashes if we do. This is likely due to a
preexisting bug (which the zero iterations loop avoids). However,
in selection.cc, we are able to avoid the computation when the code
claims it is only handling partition keys or clustering keys.
Callers are converted, but the internals are kept using the old
conventions until more APIs are converted.
Although the new API allows passing no query_options, the view code
keeps passing dummy query_options and improvement is left as a FIXME.
For most types, we just return the type field. A few expressions have
other methods to access the type, and some expressions cannot survive
prepare and so calling type_of() on them is illegal.
Currently, preparing a cast drops the cast completely (as the
types are verified to be binary compatibile). This means we lose
the casted-to type. Since we wish to keep type infomation, keep the
cast in the prepared expression tree (and therefore the casted-to
type).
Once we do that, we must extend evaluate() to support cast
expressions.
We already support subscripting maps (for filtering WHERE m[3] = 6),
so adding list subscript support is easy. Most of the code is shared.
Differences are:
- internal list representation is a vector of values, not of key/values
- key type is int32_type, not defined by map
- need to check index bounds
We used to allow nulls in lists of IN values,
i.e. a query like this would be valid:
SELECT * FROM tab WHERE pk IN (1, null, 2);
This is an old feature that isn't really used
and is already forbidden in Cassandra.
Additionally the current implementation
doesn't allow for nulls inside the list
if it's sent as a bound value.
So something like:
SELECT * FROM tab WHERE pk IN ?;
would throw an error if ? was (1, null, 2).
This is inconsistent.
Allowing it made writing code cumbersome because
this was the only case where having a null
inside of a collection was allowed.
Because of it there needed to be
separate code paths to handle regular lists
and lists of NULL values.
Forbidding it makes the code nicer and consistent
at the cost of a feature that isn't really
important.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
expression::printer is used to print CQL expressions
in a pretty way that allows them to be parsed back
to the same representation.
There is a bunch of things that need to be changed when
compared to the current implementation of opreatorr<<(expression)
to output something parsable.
column names should be printed without 'unresolved_identifier()'
and sometimes they need to be quoted to perserve case sensitivity.
I needed to write new code for printing constant values
because the current one did debug printing
(e.g. a set was printed as '1; 2; 3').
A list of IN values should be printed inside () intead of [],
but because it is internally represented as a list it is
by default printed with [].
To fix this a temporary tuple_constructor is created and printed.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
The expr::token struct is created when something
like token(p1, p2) occurs in the WHERE clause.
Currently expr::token doesn't keep columns passed
as arguemnts to the token function.
They weren't needed because token() validation
was done inside token_relation.
Now that we want to use only expressions
we need to have columns inside the token struct
and validate that those are the correct columns.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Semantic of unset values inside collections is undefined.
Previous behavior of transforming list with unset value into unset value
was removed, because I couldn't find a reason for its existence.
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).
Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:
* For NULL, we consider m[NULL] to result in NULL - instead of an error.
This behavior is more consistent with other expressions that contain
null - for example NULL[2] and NULL<2 both result in NULL as well.
Moreover, if in the future we allow more complex expressions, such
as m[a] (where a is a column), we can find the subscript to be null
for some rows and non-null for other rows - and throwing an "invalid
query" in the middle of the filtering doesn't make sense.
* For UNSET_VALUE, we do consider this an error like Cassandra, and use
the same error message as Cassandra. However, the current implementation
checks for this error only when the expression is evaluated - not
before. It means that if the scan is empty before the filtering, the
error will not be reported and we'll silently return an empty result
set. We currently consider this ok, but we can also change this in the
future by binding the expression only once (today we do it on every
evaluation) and validating it once after this binding.
Fixes#10361Fixes#10399
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.
The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).
The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.
Fixes#10417
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This commit makes subscript an invalid argument to possible_lhs_values.
Previously this function simply ignored subscripts
and behaved as if it was called on the subscripted column
without a subscript.
This behaviour is unexpected and potentially
dangerous so it would be better to forbid
passing subscript to possible_lhs_values entirely.
Trying to handle subscript correctly is impossible
without refactoring the whole function.
The first argument is a column for which we would
like to know the possible values.
What are possible values of a subscripted column c where c[0] = 1?
All lists that have 1 on 0th position?
If we wanted to handle this nicely we would have to
change the arguments.
Such refectoring is best left until the time
when this functionality is actually needed,
right now it's hard to predict what interface
will be needed then.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Closes#10228
is_supported_by checks whether a given restriction
can be supported by some index.
Currently when a subscripted value, e.g `m[1]` is encountered,
we ignore the fact that there is a subscript and ask
whether an index can support the `m` itself.
This looks like unintentional behaviour leftover
from the times when column_value had a sub field,
which could be easily forgotten about.
Scylla doesn't support indexes on collection elements at all,
so simply returning false there seems like a good idea.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#10227
column_value::sub has been replaced by the subscript struct
everywhere, so we can finally remove it.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When `val[sub]` is parsed, it used to be the case
that column_value with a sub field was created.
Now this has been changed to creating a subscript struct.
This is the only place where a subscripted value can be created.
All the code regarding subscripts now operates using only the
subscript struct, so we will be able to remove column_value::sub soon.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
All handlers for subscript have finally been implemented
and subscript can now be added to expression without
any trouble.
All the commented out code that waited for this moment
can now be uncommented.
Every such piece of code had a `TODO(subscript)` note
and by grepping this phrase we can make sure that
we didn't forget any of them.
Right now there is two ways to express a subscripted
column - either by a column_value with a sub field
or by using a subscript struct.
The grammar still uses the old column_value way,
but column_value.sub will be removed soon
and everything will move to the subscript struct.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
fill_prepare_context collects useful information about
the expression involved in query restrictions.
We should collect this information from subscript as well,
just like we do from column_value and its sub.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
extract_single_column_restrictions_for_column finds all restrictions
for a column and puts them in a vector.
In case we encounter col[sub] we treat it as a restriction on col
and add it to the result.
This seems to make some sense and is in line with the current behaviour
which doesn't check whether a column is subscripted at all.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Prepare a handler for subscript in search_and_replace.
Some of the code must be commented out for now
because subscript hasn't been added to expression yet.
It will uncommented later.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
possible_lhs_values returns set of possible values
for a column given some restrictions.
Current behaviour in case of a subscripted column
is to just ignore the subscript and treat
the restriction as if it were on just the column.
This seems wrong, or at least confusing,
but I won't change it in this patch to preserve the existing behaviour.
Trying to change this to something more reasonable
breaks other code which assumes that possible_lhs_values
returns a list of values.
(See partition_ranges_from_EQs() in cql3/restrictions/statement_restrictions.cc)
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
is_supported_by checks whether the given expression
is supported by some index.
The current behaviour seems wrong, but I kept
it to avoid making changes in a refactor PR.
Scylla doesn't have indexes on map entries yet,
so for a subscript the answer is always no.
I think we should just return false there.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>