scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 04:26:48 +00:00

Author	SHA1	Message	Date
Jan Ciolek	be8ef63bf5	cql3: remove expr::token Let's remove expr::token and replace all of its functionality with expr::function_call. expr::token is a struct whose job is to represent a partition key token. The idea is that when the user types in `token(p1, p2) < 1234`, this will be internally represented as an expression which uses expr::token to represent the `token(p1, p2)` part. The situation with expr::token is a bit complicated. On one hand side it's supposed to represent the partition token, but sometimes it's also assumed that it can represent a generic call to the token() function, for example `token(1, 2, 3)` could be a function_call, but it could also be expr::token. The query planning code assumes that each occurence of expr::token represents the partition token without checking the arguments. Because of this allowing `token(1, 2, 3)` to be represented as expr::token is dangerous - the query planning might think that it is `token(p1, p2, p3)` and plan the query based on this, which would be wrong. Currently expr::token is created only in one specific case. When the parser detects that the user typed in a restriction which has a call to `token` on the LHS it generates expr::token. In all other cases it generates an `expr::function_call`. Even when the `function_call` represents a valid partition token, it stays a `function_call`. During preparation there is no check to see if a `function_call` to `token` could be turned into `expr::token`. This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented as `expr::token` and the query planner handles that, but sometimes it might be represented as `function_call`, which the query planner doesn't handle. There is also a problem because there's a lot of duplication between a `function_call` and `expr::token`. All of the evaluation and preparation is the same for `expr::token` as it's for a `function_call` to the token function. Currently it's impossible to evaluate `expr::token` and preparation has some flaws, but implementing it would basically consist of copy-pasting the corresponding code from token `function_call`. One more aspect is multi-table queries. With `expr::token` we turn a call to the `token()` function into a struct that is schema-specific. What happens when a single expression is used to make queries to multiple tables? The schema is different, so something that is representad as `expr::token` for one schema would be represented as `function_call` in the context of a different schema. Translating expressions to different tables would require careful manipulation to convert `expr::token` to `function_call` and vice versa. This could cause trouble for index queries. Overall I think it would be best to remove expr::token. Although having a clear marker for the partition token is sometimes nice for query planning, in my opinion the pros are outweighted by the cons. I'm a big fan of having a single way to represent things, having two separate representations of the same thing without clear boundaries between them causes trouble. Instead of having expr::token and function_call we can just have the function_call and check if it represents a partition token when needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:11:31 +02:00
Jan Ciolek	16bc1c930f	cql3/prepare_expr: make get_lhs_receiver handle any function_call get_lhs_receiver looks at the prepared LHS of a binary operator and creates a receiver corresponding to this LHS expression. This receiver is later used to prepare the RHS of the binary operator. It's able to handle a few expression types - the ones that are currently allowed to be on the LHS. One of those types is `expr::token`, to handle restrictions like `token(p1, p2) = 3`. Soon token will be replaced by `expr::function_call`, so the function will need to handle `function_calls` to the token function. Although we expect there to be only calls to the `token()` function, as other functions are not allowed on the LHS, it can be made generic over all function calls, which will help in future grammar extensions. The functions call that it can currently get are calls to the token function, but they're not validated yet, so it could also be something like `token(pk, pk, ck)`. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:53 +02:00
Jan Ciolek	d3a958490e	cql3/expr: properly print token function_call Printing for function_call is a bit strange. When printing an unprepared function it prints the name and then the arguments. For prepared function it prints <anonymous function> as the name and then the arguments. Prepared functions have a name() method, but printing doesn't use it, maybe not all functions have a valid name(?). The token() function will soon be represent as a function_call and it should be printable in a user-readable way. Let's add an if which prints `token(arg1, arg2)` instead of `<anonymous function>(arg1, arg2)` when printing a call to the token function. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:53 +02:00
Jan Ciolek	096efc2f38	cql3/expr: split possible_lhs_values into column and token variants The possible_lhs_values takes an expression and a column and finds all possible values for the column that make the expression true. Apart from finding column values it's also capable of finding all matching values for the partition key token. When a nullptr column is passed, possible_lhs_values switches into token values mode and finds all values for the token. This interface isn't ideal. It's confusing to pass a nullptr column when one wants to find values for the token. It would be better to have a flag, or just have a separate function. Additionally in the future expr::token will be removed and we will use expr::is_partition_token_for_schema to find all occurences of the partition token. expr::is_partition_token_for_schema takes a schema as an argument, which possible_lhs_values doesn't have, so it would have to be extended to get the schema from somewhere. To fix these two problems let's split possible_lhs_values into two functions - one that finds possible values for a column, which doesn't require a schema, and one that finds possible values for the partition token and requires a schema: value_set possible_column_values(const column_definition* col, const expression& e, const query_options& options); value_set possible_partition_token_values(const expression& e, const query_options& options, const schema& table_schema); This will make the interface cleaner and enable smooth transition once expr::token is removed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:53 +02:00
Jan Ciolek	f2e5f654f2	cql3/expr: fix error message in possible_lhs_values In possible_lhs_values there was a message talking about is_satisifed_by. It looks like a badly copy-pasted message. Change it to possibel_lhs_values as it should be. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Avi Kivity	dc3c28516d	cql3: expr: reimplement is_satisfied_by() in terms of evaluate() It calls evaluate() internally anyway. There's a scary if () in there talking about tokens, but everything appears to work.	2023-04-29 13:04:52 +02:00
Jan Ciolek	ad5c931102	cql3/expr: add a schema argument to expr::replace_token Just like has_token, replace_token will use expr::is_partition_token_for_schema to find all instance of the partition token to replace. Let's prepare for this change by adding a schema argument to the function before making the big change. It's unsued at the moment, but having a separate commit should make it easier to review. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	d50db32d14	cql3/expr: add a comment for expr::has_partition_token Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	18879aad6f	cql3/expr: add a schema argument to expr::has_token In the future expr::token will be removed and checking whether there is a partition token inside an expression will be done using expr::is_partition_token_for_schema. This function takes a schema as an argument, so all functions that will call it also need to get the schema from somewhere. Right now it's an unused argument, but in the future it will be used. Adding it in a separate commit makes it easier to review. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	7af010095e	cql3/expr: add expr::is_partition_token_for_schema Add a function to check whether the expression represents a partition token - that is a call to the token function with consecutive partition key columns as the arguments. For example for `token(p1, p2, p3)` this function would return `true`, but for `token(1, 2, 3)` or `token(p3, p2, p1)` the result would be `false`. The function has a schema argument because a schema is required to get the list of partition columns that should be passed as arguments to token(). Maybe it would be possible to infer the schema from the information given earlier during prepare_expression, but it would be complicated and a bit dangerous to do this. Sometimes we operate on multiple tables and the schema is needed to differentiate between them - a token() call can represent the base table's partition token, but for an index table this is just a normal function call, not the partition token. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:51 +02:00
Jan Ciolek	694d9298aa	cql3/expr: add expr::is_token_function Add a function that can be used to check whether a given expression represents a call to the token() function. Note that a call to token() doesn't mean that the expression represents a partition token - it could be something like token(1, 2, 3), just a normal function_call. The code for checking has been taken from functions::get. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:51 +02:00
Jan Ciolek	f7cac10fe0	cql3/expr: implement preparing function_call without a receiver Currently trying to do prepare_expression(function_call) with a nullptr receiver fails. It should be possible to prepare function calls without a known receiver. When the user types in: `token(1, 2, 3)` the code should be able to figure out that they are looking for a function with name `token`, which takes 3 integers as arguments. In order to support that we need to prepare all arguments that can be prepared before attempting to find a function. Prepared expressions have a known type, which helps to find the right function for the given arguments. Additionally the current code for finding a function requires all arguments to be assignment_testable, which requires to prepare some expression types, e.g column_values. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:51 +02:00
Jan Ciolek	b3d05f3525	cql3/expr: make it possible to prepare expr::constant try_prepare_expression(constant) used to throw an error when trying to prepeare expr::constant. It would be useful to be able to do this and it's not hard to implement. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-28 14:34:59 +02:00
Jan Ciolek	bf36cde29a	cql3/expr: implement test_assignment for column_value Make it possible to do test_assignment for column_values. It's implemented using the generic expression assignment testing function. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-28 14:34:59 +02:00
Jan Ciolek	fd174bda60	cql3/expr: implement test_assignment for expr::constant test_assignment checks whether a value of some type can be assigned to a value of different type. There is no implementation of test_assignment for expr::constant, but I would like to have one. Currently there is a custom implementation of test_assignment for each type of expression, but generally each of them boils down to checking: ``` type1->is_value_compatible_with(type2) ``` Instead of implementing another type-specific funtion I added expresion_test_assignment and used it to implement test_assignment for constant. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-28 14:34:56 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Nadav Har'El	bd09dc308c	cql3: fix printing of column_specification::name in some error messages column_specification::name is a shared pointer, so it should be dereferenced before printing - because we want to print the name, not the pointer. Fix a few instances of this mistake in prepare_expr.cc. Other instances were already correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-25 10:46:56 +03:00
Avi Kivity	3e0aacc8b5	db, cql3: functions: pass function parameters as a span instead of a vector Spans are more flexible and can be constructed from any contiguous container (such as small_vector), or a subrange of such a container. This can save allocations, so change the signature to accept a span. Spans cannot be constructed from std::initializer_list, so one such call site is changed to use construct a span directly from the single argument.	2023-04-19 20:38:55 +03:00
Kefu Chai	c580e30ec7	cql3: expr: return more accurate error message for invalidated token() args before this change, we just print out the addresses of the elements in `column_defs`, if the arguments passed to `token()` function are not valid. this is not quite helpful from the user's perspective. as user would be more interested in the values. also, we could print more accurate error message for different error. in this change, following Cassandra 4.1's behavior, three cases are identified, and corresponding errors are returned respectively: * duplicated partition keys * wrong order of partition key * missing keys where, if the partition key order is wrong, instead of printing the keys specified by user, the correct order is printed in the error message for helping user to correct the `token()` function. for better performance, the checks are performed only if the keys do not match, based on the assumption that the error handling path is not likely to be executed. tests are added accordingly. they tested with Canssandra 4.1.1 also. Fixes #13468 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13470	2023-04-14 11:46:18 +03:00
Avi Kivity	41a2856f78	cql3: expr: fix serialize_listlike() reference-to-temporary with gcc serialize_listlike() is called with a range of either managed_bytes or managed_bytes_opt. If the former, then iterating and assigning to a loop induction variable of type managed_byted_opt& will bind the reference to a temporary managed_bytes_opt, which gcc dislikes. Fix by performing the binding in a separate statement, which allows for lifetime extension.	2023-03-21 13:42:49 +02:00
Nadav Har'El	53c8c43d8a	Merge 'cql3: improve support for C-style parenthesis casts' from Jan Ciołek CQL supports type casting using C-style casts. For example it's possible to do: `blob_column = (blob)funcReturningInt()` This functionality is pretty limited, we only allow such casts between types that have a compatible binary representation. Compatible means that the bytes will stay unchanged after the conversion. This means that it's legal to cast an int to blob (int is just a 4 byte blob), but it's illegal to cast a bigint to int (change 4 bytes -> 8 bytes). This simplifies things, to cast we can just reinterpret the value as the other type. Another use of C-style casts are type hints. Sometimes it's impossible to infer the exact type of an expression from the context. In such cases the type can be specified by casting the expression to this type. For example: `overloadedFunction((int)?)` Without the cast it would be impossible to guess what should be the bind marker's type. The function is overloaded, so there are many possible argument types. The type hint specifies that the bind marker has type int. An interesting thing is that such casts don't have to be explicit. CQL allows to put an int value in a place where a blob value is expected and it will be automatically converted without any explicit casting. --- I started looking at our implementation of casts because of #12900. In there the author expressed the need to specify a type hint for bind marker used to pass the WASM code. It could be either `(text)?` for text WASM, or `(blob)?` for binary WASM. This specific use of type hints wasn't supported because there was no `receiver` and the implementation of `prepare_expression` didn't handle that. Preparing casts without a receiver should be easy to implement - we can infer the type of the expression by looking at the type to which the expression is cast. But while reading `prepare_expression` for `expr::cast` I noticed that the code there is a bit strange. The implementation prepared the expression to cast using the original `receiver` instead of a receiver with the cast type. This caused some issues because of which casting didn't work as expected. For example it was possible to do: ```cql blob_column = (blob)funcReturningInt() ``` But this didn't work at all: ```cql blob_column = (blob)(int)12323 ``` It tried to prepare `untyped_contant(12323)` with a `blob` receiver, which fails. This makes `expr::cast` useless for casting. Casting when the representation is compatible is already implicit. I couldn't find a single case where adding a cast would change the behavior in any way. There was some use for it as a type hint to choose a specific overload of a function, but it was worthless for casting. Cassandra has the same issue, I created a `cql-pytest` test and it showed that we behave in the same way as Cassandra does. I decided to improve this. By preparing the expression using a receiver with the cast type, `expr::cast` becomes actually useful for casting values. Things like `(blob)(int)12323` now work without any issues. This diverges from the behavior in Cassandra, but it's an extension, not a breaking incompatibility. --- This PR improves `prepare_expression` for `expr::cast` in the following ways: 1) Support for more complex casts by preparing the expression using a different receiver. This makes casts like `(blob)(int)123` possible 2) Support preparing `expr::cast` without a receiver. Type inference chooses the cast type as the type of the expression. 3) Add pytest tests for C-style casts `2)` Is needed for #12900, the other changes is just something I decided to do since I was already working on this piece of code. Closes #13053 * github.com:scylladb/scylladb: expr_test: more tests for preparing bind variables with type hints prepare_expr: implement preparing expr::cast with no receiver prepare_expr: use :user formatting in cast_prepare_expression prepare_expr: remove std::get<> in cast_prepare_expression prepare_expr: improve cast_prepare_expression prepare_expr: improve readability in cast_prepare_expression cql-pytest: test expr::cast in test_cast.py	2023-03-12 15:07:54 +02:00
Jan Ciolek	a08eb5cb76	prepare_expr: implement preparing expr::cast with no receiver Type inference in cast_prepare_expression was very limited. Without a receiver it just gave up and said that it can't infer the type. It's possible to infer the type - an expression that casts something to type bigint also has type bigint. This can be implemented by creating a fake receiver when the caller didn't specify one. Type of this fake receiver will be c.type and c.arg will be prepared using this receiver. Note that the previous change (changing receiver to cast_type_receiver in prepare_expression) is required to keep the behaviour consistent. Without it we would sometimes prepare c.arg using the original receiver, and sometimes using a receiver with type c.type. Currently it's impossible to test this change on live code. Every place that uses expr::cast specifies a receiver. A unit test is all that can be done at the moment to ensure correctness. In the future this functionality will be used in UDFs. In https://github.com/scylladb/scylladb/pull/12900 it was requested to be able to use a type hint to specify whether WASM code of the function will be sent in binary or text form. The user can convey this by typing either `(blob)?` or `(text)?`. In this case there will be no receiver and type inference would fail. After this change it will work - it's now possible to prepare either of those and get an expression with a known type. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	9f8340d211	prepare_expr: use :user formatting in cast_prepare_expression By default expressions are printed using the {:debug} formatting, wich is intended for internal use. Error messages should use the {:user} formatting instead. cast_prepare_expression uses the default formatting in a few places that are user facing, so let's change it to use {:user} formatting. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	12560b5745	prepare_expr: remove std::get<> in cast_prepare_expression A few times throughout cast_prepare_expression there's a line which uses std::get<> to get the raw type of the cast. `std::get<shared_ptr<cql3_type::raw>>(c.type)` This is a dangerous thing to do. It might turn out that the variant holds a different alternative and then it'll start throwing bad_variant_access. In this case this would happen if someone called cast_prepare_expression on an expression that is already prepared. It's possible to modify the code in a way that avoids doing the std::get altogether. It makes the code more resilient and gives me a piece of mind. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	7c384de476	prepare_expr: improve cast_prepare_expression Preparing expr::cast had some artificial limitations. Things like this worked: `blob_col = (blob)funcReturnsInt()` But this didn't: `blob_col = (blob)(int)1234` This is caused by the line: `prepare_expression(c.arg, db, keyspace, schema_opt, receiver)` Here the code prepares the expression to be cast using the original receiver which was passed to cast_prepare_expression. In the example above this meant that it tried to prepare untyped_constant(1234) using a receiver with type blob. This failed because an integer literal is invalid for a blob column. To me it looks like a mistake. What it should do instead is prepare the int literal using the type (int) and then see if int can be cast to blob, by checking if these types have compatible binary representation. This can be achieved by using `cast_type_receiver` instead of `receiver`. Making this small change makes it possible to use the cast in many situations where it was previously impossible. The tests have to be updated to reflect the change, some of them ow deviate from Cassandra, so they have to be marked scylla_only. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:41 +01:00
Jan Ciolek	63a7235017	prepare_expr: improve readability in cast_prepare_expression cast_prepare_expression takes care of preparing expr::cast, which is responsible for CQL C-style casts. At the first glance it can be hard to figure out what exactly does it do, so I added some comments to make things clearer. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-08 03:24:17 +01:00
Jan Ciolek	aa604bd935	cql3: preserve binary_operator.order in search_and_replace There was a bug in `expr::search_and_replace`. It doesn't preserve the `order` field of binary_operator. `order` field is used to mark relations created using the SCYLLA_CLUSTERING_BOUND. It is a CQL feature used for internal queries inside Scylla. It means that we should handle the restriction as a raw clustering bound, not as an expression in the CQL language. Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues, the database could end up selecting the wrong clustering ranges. Fixes: #13055 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #13056	2023-03-06 16:28:06 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	568c1a5a36	cql3: expr: generalize evaluation of subscript expressions Currently, evaluation of a subscript expression x[y] requires that x be a column_value, but that's completely artificial. Generalize it to allow any expression. This is needed after we transform a LWT IF condition from "a[x] = y" to "func(a)[x] = y", where func casts a from a map represention of a list back to a list; but it's also generally useful.	2023-02-12 17:25:46 +02:00
Avi Kivity	6de4032baf	cql3: expr: introduce adjust_for_collection_as_maps() LWT and some list operations represent lists using a form like their mutations, so that the mutation list keys can be recovered and used to update the list. But the evaluation machinery knows nothing about that, and will return the map-form even though the type system thinks it is a list. To handle that, add a utility to rewrite the expression so that the value is re-serialized into the expected list form. The rewrite is implemented as a scalar function taking the map form and returning the list form.	2023-02-12 17:25:46 +02:00
Avi Kivity	47026b7ee0	cql3: expr: protect extract_column_value() from partial clustering keys Partial clustering keys can exist in COMPACT STORAGE tables (though they are exceedingly rare), and when LWT materializes a static row. Harden extract_column_value() so it is ready for them.	2023-02-12 17:17:01 +02:00
Avi Kivity	c8d77c204f	cql3: expr: extract extract_column_value() from evaluation machinery Expression evaluation works with the evaluation_input structure to compute values. As we move LWT column_condition towards expressions, we'll start using evaluation_input, so provide this helper to ease the transition.	2023-02-12 17:17:01 +02:00
Avi Kivity	31ee13c0c9	cql3: expr: move check for ordering on duration types from restrictions to prepare Both LWT IF clause and SELECT WHERE clause check that a duration type isn't used in an ordered comparison, since duration types are unordered (is 1mo more or less than 30d?). As a first step towards centralizing this check, move the check from restrictions into prepare. When LWT starts using prepare, the duplication will be removed. The error message was changed: the word "slice" is an internal term, and a comparison does not necessarily have to be in a restriction (which is also an internal term). Tests were adjusted.	2023-02-12 17:17:01 +02:00
Avi Kivity	c0b1992fc4	cql3: expr: remove restrictions oper_is_slice() in favor of expr::is_slice() The two are functionally identical, so eliminate duplicate code.	2023-02-12 17:17:01 +02:00
Avi Kivity	db2fa44a9a	cql3: expr: add optimizer for LIKE with constant pattern Compiling a pattern is expensive and so we should try to do it at prepare time, if the pattern is a constant. Add an optimizer that looks for such cases and replaces them with a unary function that embeds the compiled pattern. This isn't integrated yet with prepare_expr(), since the filtering code isn't ready for generic expressions. Its first user will be LWT, which contains the optimization already (filtering had it as well, but lost it sometime during the expression rewrite). A unit test is added.	2023-02-12 17:16:58 +02:00
Avi Kivity	b40dc49e05	cql3: expr: fix search_and_replace() for subscripts We forgot to preserve the subscript's type, so fix that. Also drop a leftover throw. It's dead code, immediately after a return.	2023-02-12 17:05:22 +02:00
Avi Kivity	8dda84bb0c	cql3: expr: fix function evaluation with NULL inputs Function call evaluation rejects NULL inputs, unnecssarily. Functions work well with NULL inputs. Fix by relaxing the check. This currently has no impact because functions are not evaluated via expressions, but via selectors.	2023-02-12 17:05:22 +02:00
Avi Kivity	ecdd49317a	cql3: expr: add LWT IF clause variants of binary operators LWT IF clause interprets equality differently from SQL (and the rest of CQL): it thinks NULL equals NULL. Currently, it implements binary operators all by itself so the fact that oper_t::EQ (and friends) means something else in the rest of the code doesn't bother it. However, we can't unify the code (in column_condition.cc) with the rest of expression evaluation if the meaning changes in different places. To prepare for this, introduce a null_handling_style field to binary_operator that defaults to `sql` but can be changed to `lwt_nulls` to indicate this special semantic. A few unit tests are added. LWT itself still isn't modified.	2023-02-12 17:03:03 +02:00
Avi Kivity	9696ab7fae	cql3: expr: change evaluate_binop_sides to return more NULL information Currently, evaluate_binop_sides() returns std::nullopt if either side is NULL. Since we wish to to add binary operators that do consider NULL on each side, make evaluate_binop_sides return the original NULLs instead (as managed_bytes_opt). Utimately I think evaluate_binop_sides() should disappear, but before that we have to improve unset value checking.	2023-02-10 09:45:35 +02:00
Avi Kivity	0f15ff740d	cql3: expr: simplify user/debug formatting We have a cql3::expr::expression::printer wrapper that annotates an expression with a debug_mode boolean prior to formatting. The fmt library, however, provides a much simpler alterantive: a custom format specifier. With this, we can write format("{:user}", expr) for user-oriented prints, or format("{:debug}", expr) for debug-oriented prints (if nothing is specified, the default remains debug). This is done by implementing fmt::formatter::parse() for the expression type, can using expression::printer internally. Since sometimes we pass expression element types rather than the expression variant, we also provide a custom formatter for all ExpressionElement Types. Uses for expression::printer are updated to use the nicer syntax. In one place we eliminate a temporary that is no longer needed since ExpressionElement:s can be formatted directly. Closes #12702	2023-02-08 12:24:58 +02:00
Avi Kivity	f5fd0769b2	Merge 'cql3: expr: don't pass empty evaluation_inputs in is_one_of' from Jan Ciołek `evaluation_inputs` is a struct which contains data needed to evaluate expressions - values of columns, bind variables and other data. `is_on_of()` is a function used to to evaluate `IN` restrictions. It checks whether the LHS is one of elements on the RHS list. Generally when evaluating expressions we get the `evaluation_inputs` as an argument and we should pass them along to any functions that evaluate subexpressions. `is_one_of()` got the inputs as an argument, but didn't pass them along to `equal()`, instead it creates new empty `evaluation_inputs{}` and gives that to `equal()`. At first [I thought this was a bug](https://github.com/scylladb/scylladb/pull/12356#discussion_r1084300969) - with missing information there could be a crash if `equal()` tried to evaluate an expression with a `bind_variable`. It turns out that in this particular case `equal()` won't use the `evaluation_inputs` at all. The LHS and RHS passed to it are just constant values, which were already evaluated to serialized bytes before calling `evaluate()`, so there is no bug. It's still better to pass the inputs argument along if possible. If in the future `equal()` required these inputs for some reason, missing inputs could lead to an unexpected crash. I couldn't find any tests that would detect this case, so such a bug could stay undetected until an unhappy user finds it because their cluster crashed. I added some tests to make sure that it's covered from now on. Closes #12701 * github.com:scylladb/scylladb: cql-pytest: test filtering using list with bind variable test/expr_test: test <int_value> IN (123, ?, 456) cql3: expr: don't pass empty evaluation_inputs in is_one_of	2023-02-02 11:40:20 +02:00
Jan Ciolek	286599fe8b	cql3: expr: don't pass empty evaluation_inputs in is_one_of evaluation_inputs is a struct which contains data needed to evaluate expressions - values of columns, bind variables and other data. is_on_of() is a function used to to evaluate IN restrictions. It checks whether the LHS is one of elements on the RHS list. Generally when evaluating expressions we get the evaluation_inputs{} as an argument and we should pass them along to any functions that evaluate subexpressions. is_one_of() got the inputs as an argument, but didn't pass them along to equal(), instead it creates new empty evaluation_inputs{} and gives that to equal(). At first I thought this was a bug - with missing information there could be a crash if equal() tried to evaluate an expression with a bind_variable. It turns out that in this particular case equal() won't use the evaluation_inputs{} at all. The LHS and RHS passed to it are just constant values, which were already evaluated to serialized bytes before calling evaluate(). It's still better to pass the inputs argument along if possible. If in the future equal() required these inputs for some reason, missing inputs could lead to an unexpected crash. I couldn't find any tests that would detect this case, so such a bug could stay undetected until an unhappy user finds it because their cluster crashed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-02-01 16:20:24 +01:00
Kefu Chai	ccc03dd1ec	cql3, locator: call fmt::format_to() explicitly since format_to() is defined included by both fmt and std namepaces, without specifying which one to use, we'd fail to build with the standard library which implements std::format_to(). yes, we are `using namespace std` somewhere. this change should address the FTBFS with GCC-13. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-30 21:50:11 +08:00
Nadav Har'El	9433108158	Merge 'Allow transient list values to contain NULLs' from Avi Kivity The CQL protocol and specification call for lists with NULLs in some places. For example, the statement: ```cql UPDATE tab SET x = 3 IF y IN (1, 2, NULL) WHERE pk = 4 ``` has a list `(1, 2, NULL)` that contains NULL. Although the syntax is tuple-like, the value is a list; consider the same statement as a prepared statement: ```cql UPDATE tab SET x = :x IF y IN :y_values WHERE pk = :pk ``` `:y_values` must have a list type, since the number of elements is unknown. Currently, this is done with special paths inside LWT that bypass normal evaluation, but if we want to unify those paths, we must allow NULLs in lists (except in storage). This series does that. Closes #12411 * github.com:scylladb/scylladb: test: materialized view: add test exercising synthetic empty-type columns cql3: expr: relax evaluate_list() to allow allow NULL elements types: allow lists with NULL test: relax NULL check test predicate cql3, types: validate listlike collections (sets, lists) for storage types: make empty type deserialize to non-null value	2023-01-19 15:15:16 +02:00
Jan Ciolek	da3c07955a	cql3: expr: make it possible to prepare binary_operator using prepare_expression prepare_expression didn't allow to prepare binary_operators. so it's now implemented. If prepare_binary_operator is unable to infer the types it will fail with an exception instead of returning std::nullopt, but we can live with that for now. Preparing binary_operators inside the WHERE clause is currently more complicated than just calling prepare_binary_operator. Preparation of the WHERE clause is done inside statement_restrictions constructor. It's done by iterating over all binary_operators, validating them and then preparing. The validation contains additional checks with custom error messages. Preparation has to be done after validation, because otherwise the error messages will change and some tests will start failing. Because of that we can't just call prepare_expression on the WHERE clause yet. It's still useful to have the ability to prepare binary_operators using prepare_expression. In cases where we know that the WHERE clause is valid, we can just call prepare_expression and be done with it. Once grammar is fully relaxed the artificial constraints checked by the validation code will be removed and it will be possible to prepare the whole WHERE clause using just prepare_expression. prepare_expression does a bit more than prepare_binary_operator. In case where both sides of the binary_operator are known it will evaluate the whole binary_operator to a constant value. Query analysis code is NOT ready to encounter constant boolean values inside the WHERE clause, so for the WHERE we still use prepare_binary_operator which doesn't evaluate the binary_operator to a constant value. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	5f8b1a1a60	cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators When preparing a binary operator we first prepare the LHS, which gives us information about its type and allows to infer the desired type of RHS. Then the RHS is prepared with the expectation that it is compatible with the inferred type. This is enough for all types of operations apart from IS NOT NULL. For IS NOT we should also check that the RHS value is actually null. It's not enough to check that RHS is of right type. Before this change preparing `int_col IS NOT 123` would end in success, which is wrong. The missing check doesn't cause any real problems, it's impossible for the user to produce such input because the parser will reject it. Still it's better to have the check because in the future the grammar might get more relaxed and the parser could become more generic, making it possible to write such things. It would be better to introduce unary_operators, but that's a bigger change. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	703e9f21ff	cql3: expr: pass non-empty keyspace name in prepare_binary_operator For some reason we passed an empty keyspace name to prepare_expression when preparing the LHS of a binary operator. This doesn't look correct. We have keyspace name available from the schema_ptr so let's use that. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	9a0c5789a2	cql3: expr: take reference to schema in prepare_binary_operator prepare_binary_operator takes a schema_ptr, but it would be useful to take a reference to schema instead. Every schema_ptr can be easily converted to a reference so there is no loss of functionality. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:40 +01:00
Avi Kivity	04925a7b29	cql3: expr: relax evaluate_list() to allow allow NULL elements Tests are similarly relaxed. A test is added in lwt_test to show that insertion of a list with NULL is still rejected, though we allow NULLs in IF conditions. One test is changed from a list of longs to a list of ints, to prevent churn in the test helper library.	2023-01-18 10:38:24 +02:00
Avi Kivity	00145f9ada	test: relax NULL check test predicate When we start allowing NULL in lists in some contexts, the exact location where an error is raised (when it's disallowed) will change. To prepare for that, relax the exception check to just ensure the word NULL is there, without caring about the exact wording.	2023-01-18 10:38:24 +02:00

1 2 3 4 5 ...

353 Commits