This reverts commit e9343fd382, reversing
changes made to 27138b215b. It causes a
regression in v2 serialization_format support:
collection_serialization_with_protocol_v2_test fails with: marshaling error: read_simple_bytes - not enough bytes (requested 1627390306, got 3)
Fixes#9360
constant is now ready to replace terminal as a final value representation.
Replace bind() with evaluate and shared_ptr<terminal> with constant.
We can't get rid of terminal yet. Sometimes terminal is converted back
to term, which constant can't do. This won't be a problem once we
replace term with expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There was some repeating code in expr::get_elements family
of functions. It has been reduced into one function.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
We need to be able to access elements of a constant.
Adds functions to easily do it.
Those functions check all preconditions required to access elements
and then use partially_deserialize_* or similar.
It's much more convenient than using partially_deserialize directly.
get_list_of_tuples_elements is useful with IN restrictions like
(a, b) IN [(1, 2), (3, 4)].
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
A list representing IN values might contain NULLs before evaluation.
We can remove them during evaluation, because nothing equals NULL.
If we don't remove them, there are gonna be errors, because a list can't contain NULLs.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Adds the functions:
constant evaluate(term*, const query_options&);
raw_value_view evaluate(term*, const query_options&);
These functions take a term, bind it and convert the terminal
to constant or raw_value_view.
In the future these functions will take expression instead of term.
For that to happen bind() has to be implemented on expression,
this will be done later.
Also introduces terminal::get_value_type().
In order to construct a constant from terminal we need to know the type.
It will be implemented in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Adds constant to the expression variant:
struct constant {
raw_value value;
data_type type;
};
This struct will be used to represent constant values with known bytes and type.
This corresponds to the terminal from current design.
bool is removed from expression, now constant is used instead.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
column_value_tuple overlaps both column_value and tuple_constructor
(in different respects) and can be replaced by a combination: a
tuple_constructor of column_value. The replacement is more expressive
(we can have a tuple of column_value and other expression types), though
the code (especially grammar) do not allow it yet.
So remove column_value_tuple and replace it everywhere with
tuple_constructor. Visitors get the merged behavior of the existing
tuple_constructor and column_value_tuple, which is usually trivial
since tuple_constructor and column_value_tuple came from different
hierarchies (term::raw and relation), so usually one of the types
just calls on_internal_error().
The change results in awkwards casts in two areas: WHERE clause
filtering (equal() and related), and clustering key range evaluations
(limits() and related). When equal() is replaced by recursive
evaluate(), the casts will go way (to be replaced by the evaluate())
visitor. Clustering key range extraction will remain limited
to tuples of column_value, so the prepare phase will have to vet
the expressions to ensure the casts don't fail (and use the
filtering path if they will).
Tests: unit (dev)
Closes#9274
Introduce a general-purpose search and replace function to manipulate
expressions, and use it to simplify replace_column_def() and
replace_token().
Closes#9259
* github.com:scylladb/scylla:
cql3: expr: rewrite replace_token in terms of search_and_replace()
cql3: expr: rewrite replace_column_def in terms of search_and_replace()
cql3: expr: add general-purpose search-and-replace
Use search_and_replace() to simplify replace_token(). Note the conversion
does not have 100% fidelity - the previous implementation throws on
some impossible subexpression types, and the new one passes them
through. It should be the caller's responsibility anyway, not a
side effect of replacing tokens, and since these subexpressions are
impossible there is no real effect on execution.
Note that this affects only TOKEN() calls on the partition key
columns in the right order. Other uses of the token function
(say with constants) won't be translated to the token subexpression
type. So something like
WHERE token(pk) = token(?)
would only see the left-hand side replaced, not the right-hand
side, even if it were an expression rather than a term.
We're won't introduce new expression types that are equivalent
to column_value, and search_and_replace() takes care of all
expressions that need to recurse, so we don't need std::visit()
for the search/replace lambda.
Add a recursive search-and-replace function on expressions. The
caller provides a search/replace function to operate on subexpressions,
returning nullopt if they want the default behavior of recursively
copying, or a new expression to terminate the search (in the current
subtree) and replace the current node with the returned expression.
To avoid endlessly specifying the subexpression types that get the
the common behavior (copying) since they don't contain any subexpressions,
we add a new concept LeafExpression to signify them.
Existing functions such as replace_token() can be reimplemented in
terms of search_and_replace, but that is left for later.
Convert the user_types::literal raw to a new expression type
usertype_constructor. I used "usertype" to convey that is is a
((user type) constructor), not a (user (type constructor)).
Add set and map styles to collection_constructor. Maps are implemented as
collection_constructor{tuple_constructor{key, value}...}. This saves
having a new expression type, and reduces the effort to implement
recursive descent evaluation for this omitted expression type.
Introduce a collection_constructor (similar to C++'s std::initializer_list)
to hold subexpressions being gathered into a list. Since sets, maps, and
lists construction share some attributes (all elements must be of the
same type) collection_constructor will be used for all of them, so it
also holds an enum. I used "style" for the enum since it's a weak
attribute - an empty set is also an empty map. I chose collection_constructor
rather than plain 'collection' to highlight that it's not the only way
to get a collection (selecting a collection column is another, as an
example) and to hint at what it does - construct a collection from
more primitive elements.
Introduce tuple_constructor (not a literal, since (?, ?) and (column_value,
column_value) are not literals) to represent a tuple constructed from
subexpressions. In the future we can replace column_value_tuple
with tuple_constructor(column_value, column_value, ...), but this is
not done now.
I chose the name 'tuple_constructor' since other expressions can represent
tuples (e.g. my_tuple_column, :bind_variable_of_tuple_type,
func_returning_tuple()). It also explains what the expression does.
Introduce a new expression untyped_constant that corresponds to
constants::literal, which is removed. untyped_constant is rather
ugly in that it won't exist post-prepare. We should probably instead
replace it with typed constants that use the widest possible type
(decimal and varint), and select a narrower type during the prepare
phase when we perform type inference. The conversion itseld is
straightforward.
Convert the four forms of abstract_marker to expr::bind_variable (the
name was chosen since variable is the role of the thing, while "marker"
refers more to the grammar). Having four variants is unnecessary, but
this patch doesn't do anything about that.
The cast expression has two operands: the subexpression to cast and the
type to cast to. Since prepared and unprepared expressions are the
same type, we don't have to do anything, but prepared and unprepared
types are different. So add a variant to be able to support both.
The reason the selectable->expression transformation did not need to
do this is that casts in a selector cannot accept a user defined type.
Note those casts also have different syntax and different execution,
so we'll have to choose whether to unify the two semantics, or whether
to keep them separate. This patch does not force anything (but does hint
at unification by not including any discriminant beyond the type's
rawness). The string representation matches the part of the grammar
it was derived from (or conversion back to CQL will yield wrong
results).
Since expressions can nest, and since we won't covert everything at once,
add a way to store a term::raw as an expression. We can now have a
term::raw that is internally an expression, and an expression that is
implemented as term::raw.
A term_raw_expression is a term::raw that holds an expression. It will
be used to incrementally convert the source base to expressions, while
still exposing the result to the common interface of shared_ptr<term::raw>.
Add a function, which given an expression and a column,
extracts all restrictions involving this column.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a field_selection variant element to expression. Like function_call
and cast, the structure from which a field is selectewd cannot yet be
an expression, since not all seletable::raw:s are converted. This will
be done in a later pass. This is also why printing a field selection now
does not print the selected expression; this will also be corrected later.
Add a cast variant element to expression. Like function_call, the
argument being converted cannot yet be an expression, since not
all seletable::raw:s are converted. This will be done in a later
pass. This is also why printing a cast now does not print the
casted expression; this will also be corrected later.
Rather than creating a new variant element in expression, we extend
function_call to handle both named and anonymous functions, since
most of the processing is the same.
Add a function_call variant element to hold function calls. Note
that because not all selectables are yet converted, function call
arguments are still of type selectable::raw. They will be converted
to expressions later. This is also why printing a function now
does not print its arguments; this will also be corrected later.
Create a new element in the expression variant, column_mutation_attribute,
signifying we're picking up an attribute of a column mutation (not a
column value!). We use an enum rather than a bool to choose between
writetime and ttl (the two mutation attributes) for increased
explicitness.
Although there can only be one type for the column we're operating
on (it must be an unresolved_identifer), we use a nested_expression.
This is because we'll later need to also support a column_value
as the column type after we prepare it. This is somewhat similar
to the address of operator in C, which syntactically takes any
expression but semantically operates only on lvalues.
The exression type cannot be a member of a struct that is an
element of the expression variant. This is because it would then
be required to contain itself. So introduce a nested_expression
type to indirectly hold an expression, but keep the value semantics
we expect from expressions: it is copyable and a copy has separate
identity and storage.
In fact binary_operator had to resort to this trick, so it's converted
to nested_expression in the next patch.
Introduce unresolved_identifer as an unprepared counterpart to column_value.
column_identifier_raw no longer inherits from selectable::raw, but
methods for now to reduce churn.
Adds replace_token function which takes an expression
and replaces all left hand side occurences of token()
with the given column definition.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Currently, column names can only appear in a boolean binary expression,
but not on their own. This means that in the statement
SELECT a FROM tab WHERE a > 3;
We can represent the WHERE clause as an expression, but not the selector.
To pave the way for using expressions in selector contexts, we promote
the elements of binary_operator::lhs (column_value, column_value_tuple,
token) to be expressions in their own right. binary_operator::lhs
becomes an expression (wrapped in unique_ptr, because variants can't
contain themselves).
Note that all three new possibilities make sense in a selector:
SELECT column FROM tab
SELECT token(pk) FROM tab
SELECT function_that_accepts_a_tuple((col1, col2)) FROM tab
There is some fallout from this:
- because binary_operator contains a unique_ptr, it is no longer
copyable. We add a copy constructor and assignment operator to
compensate.
- often, the new elements don't make sense when evaluating a boolean
expression, which is the only context we had before. We call
on_internal_error in these cases. The parser right now prevents such
cases from being constructed in the first place (this is equivalent to
if (some_struct_value) in C).
- in statement_restrictions.cc, we need to evalute the lhs in the context
of the full binary operator. I introduced with_current_binary_operator()
for this; an alternative approach is to create a new sub-visitor.
Closes#8797
std::bind() copies the bound parameters for safekeeping. Here this
includes expr, which can be quite heavyweight. Use std::ref() to
prevent copying. This is safe since the bound expression is executed
and discarded before has_supporting_index() returns.
Closes#8791
Right now, binary_operator::lhs is a variant<column_value,
std::vector<column_value>, token>. The role of the second branch
(a vector of column values) is to represent a tuple of columns
e.g. "WHERE (a, b, c) = ?"), but this is not clear from the type
name.
Inroduce a wrapper type around the vector, column_value_tuple, to
make it clear we're dealing with tuples of CQL references (a
column_value is really a column_ref, since it doesn't actually
contain any value).
Closes#8208
Make to_range able to handle rvalues. We will pass managed_bytes&& to it
in the next patch to avoid pointless copying.
The public declaration of to_range is changed to a concrete function to avoid
having to explicitly instantiate to_range for all possible reference types of
clustering_key_prefix.
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.
This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.
After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.