Like mentioned in the previous commit, this changes introduce usage
of vector style type and adjusts the functions using list style type
to distinguish vectors from lists.
Rename collection constructor style list to list_or_vector.
Where the grammar supports IN, we add NOT IN. This includes the WHERE
clause and LWT IF clause.
Evaluation of NOT IN follows from IN.
In statement_restrictions analysis, they are different, as NOT IN
doesn't enable any clever query plan and must filter.
Some tests are added. An error message was changed ('in' changed to 'IN'),
so some tests are adjusted.
Closesscylladb/scylladb#21992
A dialect is a different way to interpret the same CQL statement.
Examples:
- how duplicate bind variable names are handled (later in this series)
- whether `column = NULL` in LWT can return true (as is now) or
whether it always returns NULL (as in SQL)
Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.
The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
this change was created in the same spirit of 505900f18f. because
we are deprecating the operator<< for vector and unorderd_map in
Seastar, some tests do not compile anymore if we disable these
operators. so to be prepared for the change disabling them, let's
include test/lib/test_utils.hh for accessing the printer dedicated
for Boost.test. and also '#include <fmt/ranges.h>' when necessary,
because, in order to format the ranges using {fmt}, we need to
use fmt/ranges.h.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in this change, we trade the `boost_test_print_type()` overloads
for the generic template of `boost_test_print_type()`, except for
those in the very small tests, which presumably want to keep
themselves relative self-contained.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18727
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.
in this change,
* utils: drop the range formatters in to_string.hh and to_string.c, as
we don't use them anymore. and the tests for them in
test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
we are not able to print the elements using operator<< anymore
after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
due to a bug in {fmt} v9, {fmt} fails to format a range whose
element type is `basic_sstring<uint8_t>`, as it considers it
as a string-like type, but `basic_sstring<uint8_t>`'s char type
is signed char, not char. this issue does not exist in {fmt} v10,
so, in this change, we add a workaround to explicitly specialize
the type trait to assure that {fmt} format this type using its
`fmt::formatter` specialization instead of trying to format it
as a string. also, {fmt}'s generic ranges formatter calls the
pair formatter's `set_brackets()` and `set_separator()` methods
when printing the range, but operator<< based formatter does not
provide these method, we have to include this change in the change
switching to {fmt}, otherwise the change specializing
`fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
for comparing values. but without the operator<< based formatters,
Boost.Test would not be able to print them. after removing
the homebrew formatters, we need to use the generic
`boost_test_print_type()` helper to do this job. so we are
including `test_utils.hh` in tests so that we can print
the formattable types.
* treewide: add "#include "utils/to_string.hh" where
`fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* raw_value
* raw_value_view
`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.
`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
after dropping the operator<< for vector, we would not able to
use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this,
less defined the printer for Boost.test
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
when we convert timestamp into string it must look like: '2017-12-27T11:57:42.500Z'
it concerns any conversion except JSON timestamp format
JSON string has space as time separator and must look like: '2017-12-27 11:57:42.500Z'
both formats always contain milliseconds and timezone specification
Fixes#14518Fixes#7997Closes#14726
`expression`'s default constructor is dangerous as an it can leak
into computations and generate surprising results. Fix that by
removing the default constructor.
This is made somewhat difficult by the parser generator's reliance
on default construction, and we need to expand our workaround
(`uninitialized<>`) capabilities to do so.
We also remove some incidental uses of default-constructed expressions.
Closes#14706
* github.com:scylladb/scylladb:
cql3: expr: make expression non-default-constructible
cql3: grammar: don't default-construct expressions
cql3: grammar: improve uninitialized<> flexibility
cql3: grammar: adjust uninitialized<> wrapper
test: expr_test: don't invoke expression's default constructor
cql3: statement_restrictions: explicitly initialize expressions in index match code
cql3: statement_restrictions: explicitly intitialize some expression fields
cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions
cql3: expr: prepare_expression: avoid default-constructed expression
cql3: broadcast_tables: prepare new_value without relying on expression default constructor
Since ec77172b4b (" Merge 'cql3: convert
the SELECT clause evaluation phase to expressions' from Avi Kivity"),
we rewrite non-aggregating selectors to include an aggregation, in order
to have the rest of the code either deal with no aggregation, or
all selectors aggregating, with nothing in between. This is done
by wrapping column selectors with "first" function calls: col ->
first(col).
This broke non-aggregating selectors that included the ttl() or
writetime() pseudo functions. This is because we rewrote them as
writetime(first(col)), and writetime() isn't a function that operates
on any values; it operates on mutations and so must have access to
a column, not an expression.
Fix by detecting this scenario and rewriting the expression as
first(writetime(col)).
Unit and integration tests are added.
Fixes#14715.
Closes#14716
prepare_expression() already validates the types and computes
the index of the field; no need to redo that work when
evaluating the expression.
The tests are adjusted to also prepare the expression.
Closes#14562
field_selection::type refers to the type of the selection operation,
not the type of the structure being selected. This is what
prepare_expression() generates and how all other expression elements
work, but evaluate() for field_selection thinks it's the type
of the structure, and so fails when it gets an expression
from prepare_expression().
Fix that, and adjust the tests.
We define the "aggregation depth" of an expression by how many
nested aggregation functions are applied. In CQL/SQL, legal
values are 0 and 1, but for generality we deal with any aggregation depth.
The first helper measures the maximum aggregation depth along any path
in the expression graph. If it's 2 or greater, we have something like
max(max(x)) and we should reject it (though these helpers don't). If
we get 1 it's a simple aggregation. If it's zero then we're not aggregating
(though CQL may decide to aggregate anyway if GROUP BY is used).
The second helper edits an expression to make sure the aggregation depth
along any path that reaches a column is the same. Logically,
`SELECT x, max(y)` does not make sense, as one is a vector of values
and the other is a scalar. CQL resolves the problem by defining x as
"the first value seen". We apply this resolution by converting the
query to `SELECT first(x), max(y)` (where `first()` is an internal
aggregate function), so both selectors refer to scalars that consume
vectors.
When a scalar is consumed by an aggregate function (for example,
`SELECT max(x), min(17)` we don't have to bother, since a scalar
is implicity promoted to a vector by evaluating it every row. There
is some ambiguity if the scalar is a non-pure function (e.g.
`SELECT max(x), min(random())`, but it's not worth following.
A small unit test is added.
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:
- moving some restrictions-related definitions to
the existing expr/restrictions.hh
- moving evaluation related names to a new header
expr/evaluate.hh
- move utilities to a new header
expr/expr-utilities.hh
expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
There's no way to evaluate a column_mutation_attribute via CQL
yet (the only user uses old-style cql3::selection::selector), so
we only supply a unit test.
Implement `expr:valuate()` for `expr::field_selection`.
`field_selection` is used to represent access to a struct field.
For example, with a UDT value:
```
CREATE TYPE my_type (a int, b int);
```
The expression `my_type_value.a` would be represented as a `field_selection`, which selects the field `a`.
Evaluating such an expression consists of finding the right element's value in a serialized UDT value and returning it.
Note that it's still not possible to use `field_selection` inside the `WHERE` clause. Enabling it would require changes to the grammar, as well as query planning, Current `statement_restrictions` just reacts with `on_internal_error` when it encounters a `field_selection`.
Nonetheless it's a step towards relaxing the grammar, and now it's finally possible to evaluate all kinds of prepared expressions (#12906)
Fixes: https://github.com/scylladb/scylladb/issues/12906Closes#14235
* github.com:scylladb/scylladb:
boost/expr_test: test evaluate(field_selection)
cql3/expr: fix printing of field_selection
cql3/expression: implement evaluate(field_selection)
types/user: modify idx_of_field to use bytes_view
column_identifer: add column_identifier_raw::text()
types: add read_nth_user_type_field()
types: add read_nth_tuple_element()
Add a unit test which tests evaluating field selections.
Alas at the moment it's impossible to add a cql-pytest,
as the grammar and query planning doesn't handle field
selections inside the WHERE clause.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Let's remove expr::token and replace all of its functionality with expr::function_call.
expr::token is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`,
this will be internally represented as an expression which uses
expr::token to represent the `token(p1, p2)` part.
The situation with expr::token is a bit complicated.
On one hand side it's supposed to represent the partition token,
but sometimes it's also assumed that it can represent a generic
call to the token() function, for example `token(1, 2, 3)` could
be a function_call, but it could also be expr::token.
The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented
as expr::token is dangerous - the query planning
might think that it is `token(p1, p2, p3)` and plan the query
based on this, which would be wrong.
Currently expr::token is created only in one specific case.
When the parser detects that the user typed in a restriction
which has a call to `token` on the LHS it generates expr::token.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token,
it stays a `function_call`. During preparation there is no check
to see if a `function_call` to `token` could be turned into `expr::token`.
This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented
as `expr::token` and the query planner handles that, but sometimes it might
be represented as `function_call`, which the query planner doesn't handle.
There is also a problem because there's a lot of duplication
between a `function_call` and `expr::token`. All of the evaluation
and preparation is the same for `expr::token` as it's for a `function_call`
to the token function. Currently it's impossible to evaluate `expr::token`
and preparation has some flaws, but implementing it would basically
consist of copy-pasting the corresponding code from token `function_call`.
One more aspect is multi-table queries. With `expr::token` we turn
a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple
tables? The schema is different, so something that is representad
as `expr::token` for one schema would be represented as `function_call`
in the context of a different schema.
Translating expressions to different tables would require careful
manipulation to convert `expr::token` to `function_call` and vice versa.
This could cause trouble for index queries.
Overall I think it would be best to remove expr::token.
Although having a clear marker for the partition token
is sometimes nice for query planning, in my opinion
the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things,
having two separate representations of the same thing
without clear boundaries between them causes trouble.
Instead of having expr::token and function_call we can
just have the function_call and check if it represents
a partition token when needed.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
One test for expr::token uses raw column identifier
in the test.
Let's change it to unresloved_identifier, which is
a standard representation of unresolved column
names in expressions.
Once expr::token is removed it will be possible
to create a function_call with unresolved_identifiers
as arguments.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function to check whether the expression
represents a partition token - that is a call
to the token function with consecutive partition
key columns as the arguments.
For example for `token(p1, p2, p3)` this function
would return `true`, but for `token(1, 2, 3)` or `token(p3, p2, p1)`
the result would be `false`.
The function has a schema argument because a schema is required
to get the list of partition columns that should be passed as
arguments to token().
Maybe it would be possible to infer the schema from the information
given earlier during prepare_expression, but it would be complicated
and a bit dangerous to do this. Sometimes we operate on multiple tables
and the schema is needed to differentiate between them - a token() call
can represent the base table's partition token, but for an index table
this is just a normal function call, not the partition token.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that can be used to check
whether a given expression represents a call
to the token() function.
Note that a call to token() doesn't mean
that the expression represents a partition
token - it could be something like token(1, 2, 3),
just a normal function_call.
The code for checking has been taken from functions::get.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Currently trying to do prepare_expression(function_call)
with a nullptr receiver fails.
It should be possible to prepare function calls without
a known receiver.
When the user types in: `token(1, 2, 3)`
the code should be able to figure out that
they are looking for a function with name `token`,
which takes 3 integers as arguments.
In order to support that we need to prepare
all arguments that can be prepared before
attempting to find a function.
Prepared expressions have a known type,
which helps to find the right function
for the given arguments.
Additionally the current code for finding
a function requires all arguments to be
assignment_testable, which requires to prepare
some expression types, e.g column_values.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
try_prepare_expression(constant) used to throw an error
when trying to prepeare expr::constant.
It would be useful to be able to do this
and it's not hard to implement.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add tests for preparing expr::cast which contains a bind variable,
with a known receiver.
expr::cast serves as a type hint for the bind variable.
It specifies what should be the type of the bind variable,
we must check that this type is compatible with the receiver
and fail in case it isn't
The following cases are tested:
Valid:
`text_col = (text)?`
`int_col = (int)?`
Invalid:
`text_col = (int)?`
`int_col = (text)?`
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Type inference in cast_prepare_expression was very limited.
Without a receiver it just gave up and said that it can't
infer the type.
It's possible to infer the type - an expression that
casts something to type bigint also has type bigint.
This can be implemented by creating a fake receiver
when the caller didn't specify one.
Type of this fake receiver will be c.type
and c.arg will be prepared using this receiver.
Note that the previous change (changing receiver
to cast_type_receiver in prepare_expression) is required
to keep the behaviour consistent.
Without it we would sometimes prepare c.arg using the
original receiver, and sometimes using a receiver
with type c.type.
Currently it's impossible to test this change
on live code. Every place that uses expr::cast
specifies a receiver.
A unit test is all that can be done at the moment
to ensure correctness.
In the future this functionality will be used in UDFs.
In https://github.com/scylladb/scylladb/pull/12900
it was requested to be able to use a type hint
to specify whether WASM code of the function
will be sent in binary or text form.
The user can convey this by typing
either `(blob)?` or `(text)?`.
In this case there will be no receiver
and type inference would fail.
After this change it will work - it's now possible
to prepare either of those and get an expression
with a known type.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.
the building systems are updated accordingly.
the source files referencing `types.hh` were updated using following
command:
```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```
the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12926
Compiling a pattern is expensive and so we should try to do it
at prepare time, if the pattern is a constant. Add an optimizer
that looks for such cases and replaces them with a unary function
that embeds the compiled pattern.
This isn't integrated yet with prepare_expr(), since the filtering
code isn't ready for generic expressions. Its first user will be LWT,
which contains the optimization already (filtering had it as well,
but lost it sometime during the expression rewrite).
A unit test is added.
LWT IF clause interprets equality differently from SQL (and the
rest of CQL): it thinks NULL equals NULL. Currently, it implements
binary operators all by itself so the fact that oper_t::EQ (and
friends) means something else in the rest of the code doesn't
bother it. However, we can't unify the code (in
column_condition.cc) with the rest of expression evaluation if
the meaning changes in different places.
To prepare for this, introduce a null_handling_style field to
binary_operator that defaults to `sql` but can be changed to
`lwt_nulls` to indicate this special semantic.
A few unit tests are added. LWT itself still isn't modified.
We have a cql3::expr::expression::printer wrapper that annotates
an expression with a debug_mode boolean prior to formatting. The
fmt library, however, provides a much simpler alterantive: a custom
format specifier. With this, we can write format("{:user}", expr) for
user-oriented prints, or format("{:debug}", expr) for debug-oriented
prints (if nothing is specified, the default remains debug).
This is done by implementing fmt::formatter::parse() for the
expression type, can using expression::printer internally.
Since sometimes we pass expression element types rather than
the expression variant, we also provide a custom formatter for all
ExpressionElement Types.
Uses for expression::printer are updated to use the nicer syntax. In
one place we eliminate a temporary that is no longer needed since
ExpressionElement:s can be formatted directly.
Closes#12702
The CQL protocol and specification call for lists with NULLs in
some places. For example, the statement:
```cql
UPDATE tab
SET x = 3
IF y IN (1, 2, NULL)
WHERE pk = 4
```
has a list `(1, 2, NULL)` that contains NULL. Although the syntax is tuple-like, the value is a list;
consider the same statement as a prepared statement:
```cql
UPDATE tab
SET x = :x
IF y IN :y_values
WHERE pk = :pk
```
`:y_values` must have a list type, since the number of elements is unknown.
Currently, this is done with special paths inside LWT that bypass normal
evaluation, but if we want to unify those paths, we must allow NULLs in
lists (except in storage). This series does that.
Closes#12411
* github.com:scylladb/scylladb:
test: materialized view: add test exercising synthetic empty-type columns
cql3: expr: relax evaluate_list() to allow allow NULL elements
types: allow lists with NULL
test: relax NULL check test predicate
cql3, types: validate listlike collections (sets, lists) for storage
types: make empty type deserialize to non-null value
Add unit test which check that preparing binary_operators
which represent IS NOT NULL works as expected
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add unit test which check that preparing binary_operators
with the LIKE operation works as expected.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com
Add unit test which check that preparing binary_operators
with the CONTAINS KEY operation works as expected.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add unit test which check that preparing binary_operators
with the CONTAINS operation works as expected.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add unit test which check that preparing binary_operators
with basic comparison operations works as expected.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Use the newly introduced convenience methods that create
untyped_constant in existing tests.
This will make the code more readable by removing
visual clutter that came with the previous overly
verbose code.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Tests are similarly relaxed. A test is added in lwt_test to show
that insertion of a list with NULL is still rejected, though we
allow NULLs in IF conditions.
One test is changed from a list of longs to a list of ints, to
prevent churn in the test helper library.
Allow transient lists that contain NULL throughout the
evaluation machinery. This makes is possible to evalute things
like `IF col IN (1, 2, NULL)` without hacks, once LWT conditions
are converted to expressions.
A few tests are relaxed to accommodate the new behavior:
- cql_query_test's test_null_and_unset_in_collections is relaxed
to allow `WHERE col IN ?`, with the variable bound to a list
containing NULL; now it's explicitly allowed
- expr_test's evaluate_bind_variable_validates_no_null_in_list was
checking generic lists for NULLs, and was similary relaxed (and
renamed)
- expr_Test's evaluate_bind_variable_validates_null_in_lists_recursively
was similarly relaxed to allow NULLs.