Currently, evaluation of a subscript expression x[y] requires that
x be a column_value, but that's completely artificial. Generalize
it to allow any expression.
This is needed after we transform a LWT IF condition from
"a[x] = y" to "func(a)[x] = y", where func casts a from a
map represention of a list back to a list; but it's also generally
useful.
LWT and some list operations represent lists using a form like
their mutations, so that the mutation list keys can be recovered
and used to update the list. But the evaluation machinery knows
nothing about that, and will return the map-form even though the type
system thinks it is a list.
To handle that, add a utility to rewrite the expression so
that the value is re-serialized into the expected list form. The
rewrite is implemented as a scalar function taking the map form and
returning the list form.
Partial clustering keys can exist in COMPACT STORAGE tables (though they
are exceedingly rare), and when LWT materializes a static row. Harden
extract_column_value() so it is ready for them.
Expression evaluation works with the evaluation_input structure to
compute values. As we move LWT column_condition towards expressions,
we'll start using evaluation_input, so provide this helper to ease
the transition.
Function call evaluation rejects NULL inputs, unnecssarily. Functions
work well with NULL inputs. Fix by relaxing the check.
This currently has no impact because functions are not evaluated via
expressions, but via selectors.
LWT IF clause interprets equality differently from SQL (and the
rest of CQL): it thinks NULL equals NULL. Currently, it implements
binary operators all by itself so the fact that oper_t::EQ (and
friends) means something else in the rest of the code doesn't
bother it. However, we can't unify the code (in
column_condition.cc) with the rest of expression evaluation if
the meaning changes in different places.
To prepare for this, introduce a null_handling_style field to
binary_operator that defaults to `sql` but can be changed to
`lwt_nulls` to indicate this special semantic.
A few unit tests are added. LWT itself still isn't modified.
Currently, evaluate_binop_sides() returns std::nullopt if either
side is NULL.
Since we wish to to add binary operators that do consider NULL on
each side, make evaluate_binop_sides return the original NULLs
instead (as managed_bytes_opt).
Utimately I think evaluate_binop_sides() should disappear, but before
that we have to improve unset value checking.
We have a cql3::expr::expression::printer wrapper that annotates
an expression with a debug_mode boolean prior to formatting. The
fmt library, however, provides a much simpler alterantive: a custom
format specifier. With this, we can write format("{:user}", expr) for
user-oriented prints, or format("{:debug}", expr) for debug-oriented
prints (if nothing is specified, the default remains debug).
This is done by implementing fmt::formatter::parse() for the
expression type, can using expression::printer internally.
Since sometimes we pass expression element types rather than
the expression variant, we also provide a custom formatter for all
ExpressionElement Types.
Uses for expression::printer are updated to use the nicer syntax. In
one place we eliminate a temporary that is no longer needed since
ExpressionElement:s can be formatted directly.
Closes#12702
evaluation_inputs is a struct which contains
data needed to evaluate expressions - values
of columns, bind variables and other data.
is_on_of() is a function used to to evaluate
IN restrictions. It checks whether the LHS
is one of elements on the RHS list.
Generally when evaluating expressions we get
the evaluation_inputs{} as an argument and
we should pass them along to any functions
that evaluate subexpressions.
is_one_of() got the inputs as an argument,
but didn't pass them along to equal(),
instead it creates new empty evaluation_inputs{}
and gives that to equal().
At first I thought this was a bug - with missing
information there could be a crash if equal()
tried to evaluate an expression with a bind_variable.
It turns out that in this particular case equal()
won't use the evaluation_inputs{} at all.
The LHS and RHS passed to it are just constant values,
which were already evaluated to serialized bytes
before calling evaluate().
It's still better to pass the inputs argument along
if possible. If in the future equal() required
these inputs for some reason, missing inputs
could lead to an unexpected crash.
I couldn't find any tests that would detect this case,
so such a bug could stay undetected until an unhappy user
finds it because their cluster crashed.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Tests are similarly relaxed. A test is added in lwt_test to show
that insertion of a list with NULL is still rejected, though we
allow NULLs in IF conditions.
One test is changed from a list of longs to a list of ints, to
prevent churn in the test helper library.
When we start allowing NULL in lists in some contexts, the exact
location where an error is raised (when it's disallowed) will
change. To prepare for that, relax the exception check to just
ensure the word NULL is there, without caring about the exact
wording.
The CQL binary protocol introduced "unset" values in version 4
of the protocol. Unset values can be bound to variables, which
cause certain CQL fragments to be skipped. For example, the
fragment `SET a = :var` will not change the value of `a` if `:var`
is bound to an unset value.
Unsets, however, are very limited in where they can appear. They
can only appear at the top-level of an expression, and any computation
done with them is invalid. For example, `SET list_column = [3, :var]`
is invalid if `:var` is bound to unset.
This causes the code to be littered with checks for unset, and there
are plenty of tests dedicated to catching unsets. However, a simpler
way is possible - prevent the infiltration of unsets at the point of
entry (when evaluating a bind variable expression), and introduce
guards to check for the few cases where unsets are allowed.
This is what this long patch does. It performs the following:
(general)
1. unset is removed from the possible values of cql3::raw_value and
cql3::raw_value_view.
(external->cql3)
2. query_options is fortified with a vector of booleans,
unset_bind_variable_vector, where each boolean corresponds to a bind
variable index and is true when it is unset.
3. To avoid churn, two compatiblity structs are introduced:
cql3::raw_value{,_view}_vector_with_unset, which can be constructed
from a std::vector<raw_value{,_view/}>, which is what most callers
have. They can also be constructed with explicit unset vectors, for
the few cases they are needed.
(cql3->variables)
4. query_options::get_value_at() now throws if the requested bind variable
is unset. This replaces all the throwing checks in expression evaluation
and statement execution, which are removed.
5. A new query_options::is_unset() is added for the users that can tolerate
unset; though it is not used directly.
6. A new cql3::unset_operation_guard class guards against unsets. It accepts
an expression, and can be queried whether an unset is present. Two
conditions are checked: the expression must be a singleton bind
variable, and at runtime it must be bound to an unset value.
7. The modification_statement operations are split into two, via two
new subclasses of cql3::operation. cql3::operation_no_unset_support
ignores unsets completely. cql3::operation_skip_if_unset checks if
an operand is unset (luckily all operations have at most one operand that
tolerates unset) and applies unset_operation_guard to it.
8. The various sites that accept expressions or operations are modified
to check for should_skip_operation(). This are the loops around
operations in update_statement and delete_statement, and the checks
for unset in attributes (LIMIT and PER PARTITION LIMIT)
(tests)
9. Many unset tests are removed. It's now impossible to enter an
unset value into the expression evaluation machinery (there's
just no unset value), so it's impossible to test for it.
10. Other unset tests now have to be invoked via bind variables,
since there's no way to create an unset cql3::expr::constant.
11. Many tests have their exception message match strings relaxed.
Since unsets are now checked very early, we don't know the context
where they happen. It would be possible to reintroduce it (by adding
a format string parameter to cql3::unset_operation_guard), but it
seems not to be worth the effort. Usage of unsets is rare, and it is
explicit (at least with the Python driver, an unset cannot be
introduced by ommission).
I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't
recognize unsets) with cql3::maybe_unset_value (that does), but that
caused huge amounts of churn, so I abandoned that in favor of the
current approach.
Closes#12517
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
Previously it was impossible to use expr::evaluate()
to get the value of a conjunction of elements
separated by ANDs.
Now it has been implemented.
NULL is treated as an "unkown value" - maybe true maybe false.
`TRUE AND NULL` evaluates to NULL because it might be true but also might be false.
`FALSE AND NULL` evaluates to FALSE because no matter what value NULL acts as, the result will still be FALSE.
Unset and empty values are not allowed.
Usually in CQL the rule is that when NULL occurs in an operation the whole expression
becomes NULL, but here we decided to deviate from this behavior.
Treating NULL as an "unkown value" is the standard SQL way of handing NULLs in conjunctions.
It works this way in MySQL and Postgres so we do it this way as well.
The evaluation short-circuits. Once FALSE is encountered the function returns FALSE
immediately without evaluating any further elements.
It works this way in Postgres as well, for example:
`SELECT true AND NULL AND 1/0 = 0` will throw a division by zero error
but `SELECT false AND 1/0 = 0` will successfully evaluate to FALSE.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Our `null` expression, after the prepare stage, is redundant with a
`constant` expression containing the value NULL.
Remove it. Its role in the unprepared stage is taken over by
untyped_constant, which gains a new type_class enumeration to
represent it.
Some subtleties:
- Usually, handling of null and untyped_constant, or null and constant
was the same, so they are just folded into each other
- LWT "like" operator now has to discriminate between a literal
string and a literal NULL
- prepare and test_assignment were folded into the corresponing
untyped_constant functions. Some care had to be taken to preserve
error messages.
Closes#12118
The messages used to contain UNSET_VALUE
in capital letters, but the tests
expect messages with 'unset value'.
Change the message so that it can
match the expected error text in tests.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Originally put braces around the cases because
there were local variables that I didn't want
to be shadowed.
Now there are no variables so the braces
can be removed without any problems.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When evaluating a binary operation with
operations like EQUAL, LESS_THAN, IN
the logic of the operation is put
in a separate function to keep things clean.
IS_NOT NULL is the only exception,
it has its evaluate implementation
right in the evaluate(binary_operator)
function.
It would be cleaner to have it in
a separate dedicated function,
so it's moved to one.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There is a more general version of limits()
which takes expressions as both the lhs and rhs
arguments.
There is no need for a specialized overload.
This specialized overload takes a tuple_constructor
as lhs, but we call evaluate() on both sides
of a binary operator before checking equality,
so this won't be useful at all.
Having multiple functions increases the risk
that one of them has a bug, while giving
dubious benfit.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Expressions like:
123 = NULL
NULL = 123
NULL = NULL
NULL != 123
should be tolerated, but evaluate to NULL.
The current code assumes that a binary operator
can only evaluate to a boolean - true or false.
Now a binary operator can also evaluate to NULL.
This should happen in cases when one of the
operator's sides is NULL.
A special class is introduced to represent a value
that can be one of three things: true, false or null.
It's better than using std::optional<bool>,
because optional has implicit conversions to bool
that could cause confusion and bugs.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There is a more general version of equal()
which takes expressions as both the lhs and rhs
arguments.
There is no need for a specialized overload.
This specialized overload takes a tuple_constructor
as lhs, but we call evaluate() on both sides
of a binary operator before checking equality,
so this won't be useful at all.
Having multiple functions increases the risk
that one of them has a bug, while giving
dubious benfit.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
is_satisfied_by has to check if a binary_operator is satisfied
by some values. It used to be impossible to evaluate
a binary_operator, so is_satisfied had code to check
if its satisfied for a limited number of cases
occuring when filtering queries.
Now evaluate(binary_operator) has been implemented
and is_satisfied_by can use it to check if a binary_operator
evaluates to true.
This is cleaner and reduces code duplication.
Additionally cql tests will test the new evalute() implementation.
There is one special case with token().
When is_satisfied_by sees a restriction on token
it assumes that it's satisfied because it's
sure that these token restrictions were used
to generate partition ranges.
I had to leave this special case in because it's impossible
to evaluate(token). Once this is implemented I will remove
the special case because it's risky and prone to cause
bugs.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
The code to evaluate binary operators
was copied from is_satisfied_by.
is_satisfied_by wasn't able to evaluate
IS NOT NULL restrictions, so when such restriction
is encountered it throws an exception.
Implement proper handling for IS NOT NULL binary operators.
The switch ensures that all variants of oper_t are handled,
otherwise there would be a compilation error.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
evaluate() takes an expression and evaluates it
to a constant value. It wasn't possible to evalute
binary operators before, so it's added.
The code is based on is_satisfied_by,
which is currently used to check
whether a binary operator evaluates
to true or false.
It looks like is_satisfied_by and evalate()
do pretty much the same thing, one could be
implemented using the other.
In the future they might get merged
into a single function.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
like() used to only accept column_value as the lhs
to evaluate. Changed it to accept any generic expression.
This will allow to evaluate a more diverse set of
binary operators.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
contains_key() used to only accept column_value as the lhs
to evaluate. Changed it to accept any generic expression.
This will allow to evaluate a more diverse set of
binary operators.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
contains() used to only accept column_value as the lhs
to evaluate. Changed it to accept any generic expression.
This will allow to evaluate a more diverse set of
binary operators.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Requests like `col IN NULL` used to cause
an error - Invalid null value for colum col.
We would like to allow NULLs everywhere.
When a NULL occurs on either side
of a binary operator, the whole operation
should just evaluate to NULL.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#11775
A binary operator like this:
{1: 2, 3: 4} CONTAINS KEY NULL
used to evaluate to `true`.
This is wrong, any operation involving null
on either side of the operator should evaluate
to NULL, which is interpreted as false.
This change is not backwards compatible.
Some existing code might break.
partially fixes: #10359
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
A binary operator like this:
[1, 2, 3] CONTAINS NULL
used to evaluate to `true`.
This is wrong, any operation involving null
on either side of the operator should evaluate
to NULL, which is interpreted as false.
This change is not backwards compatible.
Some existing code might break.
partially fixes: #10359
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
boolean_factors is a function that takes an expression
and extracts all children of the top level conjunction.
The problem is that it returns a vector<expression>,
which is inefficent.
Sometimes we would like to iterate over all boolean
factors without allocations. for_each_boolean_factor
is implemented for this purpose.
boolean_factors() can be implemented using
for_each_boolean_factor, so it's done to
reduce code duplication.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Previous commit added the ability to use GSI over non-frozen collections in queries,
but only the keys() and values() indexes. This commit adds support for the missing
index type - entries() index.
Signed-off-by: Karol Baryła <karol.baryla@scylladb.com>
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Previous commits added the possibility of creating GSI on non-frozen collections.
This (and next) commit allow those indexes to actually be used by queries.
This commit enables both keys() and values() indexes, as they are pretty similar.
When analyzing a WHERE clause, we want to separate individual
factors (usually relations), and later partition them into
partition key, clustering key, and regular column relations. The
first step is separation, for which this helper is added.
Currently, it is not required since the grammar supplies the
expression in separated form, but this will not work once it is
relaxed to allow any expression in the WHERE clause.
A unit test is added.
Add a function that checks if there is an index
which supports one of the columns present in
the given expression.
This functionality will soon be needed for
clustering and nonprimary columns so it's
good to separate into a reusable function.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function which checks that an expression
contains only binary operators with '='.
Right now this check is done only in a single place,
but soon the same check will have to be done
for clustering columns as well, so the code
is moved to a separate function to prevent duplication.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There was a bug which caused incorrect results of limits()
for columns with reversed clustering order.
Such columns have reversed_type as their type and this
needs to be taken into account when comparing them.
It was introduced in 6d943e6cd0.
This commit replaced uses of get_value_comparator
with type_of. The difference between them is that
get_value_comparator applied ->without_reversed()
on the result type.
Because the type was reversed, comparisons like
1 < 2 evaluated to false.
This caused the test testIndexOnKeyWithReverseClustering
to fail, but sadly it wasn't caught by CI because
the CI itself has a bug that makes it skip some tests.
The test passes now, although it has to be run manually
to check that.
Fixes: #10918
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Closes#10994
value_for is a method from the restriction class
which finds the value for a given column.
Under the hood it makes use of possible_lhs_values.
It will be needed to implement some functionality
that was implemented using restrictions before.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that finds common columns
between two expressions.
It's used in error messages in the original
restrictions code so it must be included
in the new code as well for compatibility.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Restrictions code keeps restrictions for each column
in a map sorted by their position in the schema.
Then there are methods that allow to access
the restricted column in the correct order.
To replicate this in upcoming code
we need functions that implement this functionality.
The original comparator can be found in:
cql3/restrictions/single_column_restrictions.hh
For primary key columns this comparator compares their
positions in the schema.
For non-primary columns the position is assumed to
be clustering_key_size(), which seems pretty random.
To avoid passing the schema to the comparator
for nonprimary columns I just assume the
position is u32::max(). This seems to be
as good of a choice as clustering_key_size().
Orignally Cassandra used -1:
bc8a260471/src/java/org/apache/cassandra/config/ColumnDefinition.java (L79-L86)
We never end up comparing columns of different kind using this comparator anyway.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>