scylladb

Author	SHA1	Message	Date
Avi Kivity	8085b9f57a	cql3: expr: add boolean_factors() function to factorize an expression When analyzing a WHERE clause, we want to separate individual factors (usually relations), and later partition them into partition key, clustering key, and regular column relations. The first step is separation, for which this helper is added. Currently, it is not required since the grammar supplies the expression in separated form, but this will not work once it is relaxed to allow any expression in the WHERE clause. A unit test is added.	2022-07-22 20:14:48 +03:00
Avi Kivity	1efb2fecbe	cql3: expression: define operator==() for expressions This is useful for tests, to check that expression manipulations yield the expected results.	2022-07-22 20:14:48 +03:00
Jan Ciolek	c7495fa59e	cql3: Add expr::index_supports_some_column Add a function that checks if there is an index which supports one of the columns present in the given expression. This functionality will soon be needed for clustering and nonprimary columns so it's good to separate into a reusable function. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-19 15:38:20 +02:00
Jan Ciolek	6cf0981aa6	cql3: expr: Add has_only_eq_binops function Add a function which checks that an expression contains only binary operators with '='. Right now this check is done only in a single place, but soon the same check will have to be done for clustering columns as well, so the code is moved to a separate function to prevent duplication. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-18 17:45:06 +02:00
cvybhu	80dda2bb97	cql3: expr: Fix handling reversed types in limits() There was a bug which caused incorrect results of limits() for columns with reversed clustering order. Such columns have reversed_type as their type and this needs to be taken into account when comparing them. It was introduced in `6d943e6cd0`. This commit replaced uses of get_value_comparator with type_of. The difference between them is that get_value_comparator applied ->without_reversed() on the result type. Because the type was reversed, comparisons like 1 < 2 evaluated to false. This caused the test testIndexOnKeyWithReverseClustering to fail, but sadly it wasn't caught by CI because the CI itself has a bug that makes it skip some tests. The test passes now, although it has to be run manually to check that. Fixes: #10918 Signed-off-by: cvybhu <jan.ciolek@scylladb.com> Closes #10994	2022-07-10 09:24:06 +03:00
Jan Ciolek	83f27fc8c1	cql3: expr: Add contains_multi_column_restriction Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:29:11 +02:00
Jan Ciolek	fd0798c8a2	cql3: Add expr::value_for value_for is a method from the restriction class which finds the value for a given column. Under the hood it makes use of possible_lhs_values. It will be needed to implement some functionality that was implemented using restrictions before. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:29:11 +02:00
Jan Ciolek	24b0a61d51	cql3: Add get_columns_in_commons Add a function that finds common columns between two expressions. It's used in error messages in the original restrictions code so it must be included in the new code as well for compatibility. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:29:09 +02:00
Jan Ciolek	177ba9b9db	cql3: expr: Add is_empty_restriction Add a function to check whether an expression restricts anything at all. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:29:09 +02:00
Jan Ciolek	228b344d9c	cql3: Replicate column sorting functionality using expressions Restrictions code keeps restrictions for each column in a map sorted by their position in the schema. Then there are methods that allow to access the restricted column in the correct order. To replicate this in upcoming code we need functions that implement this functionality. The original comparator can be found in: cql3/restrictions/single_column_restrictions.hh For primary key columns this comparator compares their positions in the schema. For non-primary columns the position is assumed to be clustering_key_size(), which seems pretty random. To avoid passing the schema to the comparator for nonprimary columns I just assume the position is u32::max(). This seems to be as good of a choice as clustering_key_size(). Orignally Cassandra used -1: `bc8a260471/src/java/org/apache/cassandra/config/ColumnDefinition.java (L79-L86)` We never end up comparing columns of different kind using this comparator anyway. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 16:28:41 +02:00
Jan Ciolek	afc482f0a5	cql3: expr: Add get_the_only_column Add a function that gets the only column from a single column restriction expression. The code would be very similiar to is_single_column_restriction, so a new function is introducted to reduce duplication. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 15:50:48 +02:00
Jan Ciolek	9c3b0299a1	cql3: expr: Add is_single_column_restriction Add a function that checks whether an expression contains restrictions on exactly one column. This a "single_column_restriction" in the same way that instances of "class single_column_restriction" are. It will be used later to distinguish cases later once this class is removed Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-01 15:49:37 +02:00
Avi Kivity	4d587e0c3d	cql3: raw_value: deduplicate view() and to_view() Commit `e739f2b779` ("cql3: expr: make evaluate() return a cql3::raw_value rather than an expr::constant") introduced raw_value::view() as a synonym to raw_value::to_view() to reduce churn. To fix this duplication, we now remove raw_value::to_view(). raw_value::to_view() was picked for removal because is has fewer call sites, reducing churn again. Closes #10819	2022-06-17 09:32:58 +02:00
Avi Kivity	e739f2b779	cql3: expr: make evaluate() return a cql3::raw_value rather than an expr::constant An expr::constant is an expression that happens to represent a constant, so it's too heavyweight to be used for evaluation. Right now the extra weight is just a type (which causes extra work by having to maintain the shared_ptr reference count), but it will grow in the future to include source location (for error reporting) and maybe other things. Prior to `e9b6171b5` ("Merge 'cql3: expr: unify left-hand-side and right-hand-side of binary_operator prepares' from Avi Kivity"), we had to use expr::constant since there was not enough type infomation in expressions. But now every expression carries its type (in programming language terms, expressions are now statically typed), so carrying types in values is not needed. So change evaluate() to return cql3::raw_value. The majority of the patch just changes that. The rest deals with some fallout: - cql3::raw_value gains a view() helper to convert to a raw_value_view, and is_null_or_unset() to match with expr::constant and reduce further churn. - some helpers that worked on expr::constant and now receive a raw_value now need the type passed via an additional argument. The type is computed from the expression by the caller. - many type checks during expression evaluation were dropped. This is a consequence of static typing - we must trust the expression prepare phase to perform full type checking since values no longer carry type information. Closes #10797	2022-06-15 08:47:24 +02:00
Avi Kivity	6d943e6cd0	cql3: expr: drop column_maybe_subscripted column_maybe_subscripted is a variant<column_value, subscript> that existed for two reasons: 1. evaluation of subscripts and of columns took different paths. 2. calculation of the type of column or column[sub] took different paths. Now that all evaluations go through evaluate(), and the types are present in the expression itself, there is no need for column_maybe_subscripted and it is replaced with plain expressions.	2022-06-12 19:21:28 +03:00
Avi Kivity	2aa9199e9a	cql3: expr: possible_lhs_values(): open-code get_value_comparator() get_value_comparator() is going away soon, so open-code it here. It's not doing much anyway.	2022-06-12 19:14:50 +03:00
Avi Kivity	b1c12073b1	cql3: expr: rationalize lhs/rhs argument order Some functions accept the right-hand-side as the first argument and the left-hand-side as the second argument. This is now confusing, but at least safe-ish, as the arguments have different types. It's going to become dangerous when we switch to expressions for both sides, so let's rationalize it by always starting with lhs. Some parameters were annotated with _lhs/_rhs when it was not clear.	2022-06-12 18:55:24 +03:00
Avi Kivity	9beac1df53	cql3: expr: don't rely on grammar when comparing tuples The grammar only allows comparing tuples of clustering columns, which are non-null, but let's not rely on that deep in expression evaluation as it can be relaxed.	2022-06-12 18:41:03 +03:00
Avi Kivity	9a4f2a8cc3	cql3: expr: wire column_value and subscript to evaluate() With everything standardized on evaluation_inputs(), it's a matter of calling get_value().	2022-06-12 18:21:04 +03:00
Avi Kivity	30721fdc4a	cql3: get_value(subscript): remove gratuitous pointer While extracting get_value(subscript) we inherited a pointer due to the calling convention, we can now remove it.	2022-06-12 18:18:59 +03:00
Avi Kivity	dd2fec9cb1	cql3: expr: reindent get_value(subscript) Whitespace only change.	2022-06-12 18:04:12 +03:00
Avi Kivity	31b9e2a565	cql3: expr: extract get_value(subscript) from get_value(column_maybe_subscripted) We wish to wire get_value(subscript) into evaluate (and get rid of column_maybe_subscripted).	2022-06-12 18:03:03 +03:00
Avi Kivity	b5287db8ea	cql3: expr: drop internal 'column_value_eval_bag' is_satisfied_by() used an internal column_value_eval_bag type that was more awkwardly named (and more awkward to use due to more nesting) than evaluation_inputs. Drop it and use evaluation_inputs throughout. The thunk is_satisified_by(evaluation_inputs) that just called is_satisified_by(column_value_eval_bag) is dropped.	2022-06-12 17:12:41 +03:00
Avi Kivity	55085906ca	cql3: expr: change evalute() to accept evaluation_inputs Currently, evaluate() accepts only query_options, which makes it not useful to evaluate columns. As a result some callers (column_condition) have to call it directly on the right-hand-side of binary expressions instead of evaluating the binary expression itself. Change it to accept evaluation_input as a parameter, but keep the old signature too, since it is called from many places that don't have rows.	2022-06-12 16:51:42 +03:00
Avi Kivity	2ecdb219fb	cql3: expr: make evaluate(<expression subtype>) static They aren't called from anywhere outside expression.cc, and we're playing with the signatures, so hide them to avoid rebuilds.	2022-06-12 16:13:20 +03:00
Avi Kivity	c80999fab4	cql3: expr: push is_satisfied_by regular and static column extraction to callers is_satisfied_by() rearranges the static and regular columns from query::result_row_view form (which is a use-once iterator) to std::vector<managed_bytes_opt> (which uses the standard value representation, and allows random access which expression evaluation needs). Doing it in is_saitisfied_by() means that it is done every time an expression is evaluated, which is wasteful. It's also done even if the expression doesn't need it at all. Push it out to callers, which already eliminates some calls. We still pass cql3::expr::selection, which is a layering violation, but that is left to another time. Note that in view.cc's check_if_matches(), we should have been able to move static_and_regular_columns calculation outside the loop. However, we get crashes if we do. This is likely due to a preexisting bug (which the zero iterations loop avoids). However, in selection.cc, we are able to avoid the computation when the code claims it is only handling partition keys or clustering keys.	2022-06-12 16:12:41 +03:00
Avi Kivity	4b715226fe	cql3: expr: convert is_satisfied_by() signature to evaluation_inputs Callers are converted, but the internals are kept using the old conventions until more APIs are converted. Although the new API allows passing no query_options, the view code keeps passing dummy query_options and improvement is left as a FIXME.	2022-06-12 12:53:44 +03:00
Avi Kivity	10aa6ddca3	cql3: expr: introduce type_of() to fetch the type of an expression For most types, we just return the type field. A few expressions have other methods to access the type, and some expressions cannot survive prepare and so calling type_of() on them is illegal.	2022-06-01 18:47:58 +03:00
Avi Kivity	43a3c94532	cql3: expr: keep type information in casts Currently, preparing a cast drops the cast completely (as the types are verified to be binary compatibile). This means we lose the casted-to type. Since we wish to keep type infomation, keep the cast in the prepared expression tree (and therefore the casted-to type). Once we do that, we must extend evaluate() to support cast expressions.	2022-06-01 18:46:55 +03:00
Avi Kivity	8d667e374b	cql3: expressions: reindent get_value() Whitespace-only change.	2022-05-30 13:29:49 +03:00
Avi Kivity	05388f7a2a	cql3: expression: evaluate() support subscripting lists We already support subscripting maps (for filtering WHERE m[3] = 6), so adding list subscript support is easy. Most of the code is shared. Differences are: - internal list representation is a vector of values, not of key/values - key type is int32_type, not defined by map - need to check index bounds	2022-05-30 13:29:49 +03:00
Jan Ciolek	f9b1fc0b69	cql: Forbid null in lists of IN values We used to allow nulls in lists of IN values, i.e. a query like this would be valid: SELECT * FROM tab WHERE pk IN (1, null, 2); This is an old feature that isn't really used and is already forbidden in Cassandra. Additionally the current implementation doesn't allow for nulls inside the list if it's sent as a bound value. So something like: SELECT * FROM tab WHERE pk IN ?; would throw an error if ? was (1, null, 2). This is inconsistent. Allowing it made writing code cumbersome because this was the only case where having a null inside of a collection was allowed. Because of it there needed to be separate code paths to handle regular lists and lists of NULL values. Forbidding it makes the code nicer and consistent at the cost of a feature that isn't really important. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-05-24 00:17:41 +02:00
Michał Sala	f6bdc4d694	cql3: expr: add printer for expression expression::printer is used to print CQL expressions in a pretty way that allows them to be parsed back to the same representation. There is a bunch of things that need to be changed when compared to the current implementation of opreatorr<<(expression) to output something parsable. column names should be printed without 'unresolved_identifier()' and sometimes they need to be quoted to perserve case sensitivity. I needed to write new code for printing constant values because the current one did debug printing (e.g. a set was printed as '1; 2; 3'). A list of IN values should be printed inside () intead of [], but because it is internally represented as a list it is by default printed with []. To fix this a temporary tuple_constructor is created and printed. Signed-off-by: cvybhu <jan.ciolek@scylladb.com>	2022-05-16 18:17:58 +02:00
cvybhu	be6e741b6c	clq3: expr: Add columns to expr::token struct The expr::token struct is created when something like token(p1, p2) occurs in the WHERE clause. Currently expr::token doesn't keep columns passed as arguemnts to the token function. They weren't needed because token() validation was done inside token_relation. Now that we want to use only expressions we need to have columns inside the token struct and validate that those are the correct columns. Signed-off-by: cvybhu <jan.ciolek@scylladb.com>	2022-05-16 18:03:11 +02:00
Michał Sala	69bc60ef24	cql3: expr: do not allow unset values inside collections Semantic of unset values inside collections is undefined. Previous behavior of transforming list with unset value into unset value was removed, because I couldn't find a reason for its existence.	2022-04-28 19:33:09 +02:00
Nadav Har'El	fbb2a41246	expressions: don't dereference invalid map subscript in filter If we have the filter expression "WHERE m[?] = 2", the existing code simply assumed that the subscript is an object of the right type. However, while it should indeed be the right type (we already have code that verifies that), there are two more options: It can also be a NULL, or an UNSET_VALUE. Either of these cases causes the existing code to dereference a non-object as an object, leading to bizarre errors (as in issue #10361) or even crashes (as in issue #10399). Cassandra returns a invalid request error in these cases: "Unsupported unset map key for column m" or "Unsupported null map key for column m". We decided to do things differently: * For NULL, we consider m[NULL] to result in NULL - instead of an error. This behavior is more consistent with other expressions that contain null - for example NULL[2] and NULL<2 both result in NULL as well. Moreover, if in the future we allow more complex expressions, such as m[a] (where a is a column), we can find the subscript to be null for some rows and non-null for other rows - and throwing an "invalid query" in the middle of the filtering doesn't make sense. * For UNSET_VALUE, we do consider this an error like Cassandra, and use the same error message as Cassandra. However, the current implementation checks for this error only when the expression is evaluated - not before. It means that if the scan is empty before the filtering, the error will not be reported and we'll silently return an empty result set. We currently consider this ok, but we can also change this in the future by binding the expression only once (today we do it on every evaluation) and validating it once after this binding. Fixes #10361 Fixes #10399 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-04-24 16:05:34 +03:00
Nadav Har'El	808a93d29b	expressions: fix invalid dereference in map subscript evaluation When we have an filter such as "WHERE m[2] = 3" (where m is a map column), if a row had a null value for m, our expression evaluation code incorrectly dereferences an unset optional, and continued processing the result of this dereference which resulted in undefined behavior - sometimes we were lucky enough to get "marshaling error" but other times Scylla crashed. The fix is trivial - just check before dereferencing the optional value of the map. We return null in that case, which means that we consider the result of null[2] to be null. I think this is a reasonable approach and fits our overall approach of making null dominate expressions (e.g., the value of "null < 2" is also null). The test test_filtering.py::test_filtering_null_map_with_subscript, which used to frequently fail with marshaling errors or crashes, now passes every time so its "xfail" mark is removed. Fixes #10417 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-04-24 14:58:56 +03:00
cvybhu	5c199cad45	cql3: expr: possible_lhs_values: Handle subscript This commit makes subscript an invalid argument to possible_lhs_values. Previously this function simply ignored subscripts and behaved as if it was called on the subscripted column without a subscript. This behaviour is unexpected and potentially dangerous so it would be better to forbid passing subscript to possible_lhs_values entirely. Trying to handle subscript correctly is impossible without refactoring the whole function. The first argument is a column for which we would like to know the possible values. What are possible values of a subscripted column c where c[0] = 1? All lists that have 1 on 0th position? If we wanted to handle this nicely we would have to change the arguments. Such refectoring is best left until the time when this functionality is actually needed, right now it's hard to predict what interface will be needed then. Signed-off-by: cvybhu <jan.ciolek@scylladb.com> Closes #10228	2022-04-11 19:05:09 +03:00
Jan Ciolek	2e7009f427	cql3: expr: is_supported_by: Return false for subscripted values is_supported_by checks whether a given restriction can be supported by some index. Currently when a subscripted value, e.g `m[1]` is encountered, we ignore the fact that there is a subscript and ask whether an index can support the `m` itself. This looks like unintentional behaviour leftover from the times when column_value had a sub field, which could be easily forgotten about. Scylla doesn't support indexes on collection elements at all, so simply returning false there seems like a good idea. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #10227	2022-03-15 20:19:33 +02:00
Jan Ciolek	e086201420	cql3: expr: Remove sub from column_value column_value::sub has been replaced by the subscript struct everywhere, so we can finally remove it. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 22:02:39 +01:00
Jan Ciolek	b80f9e6cf8	cql3: Create a subscript in single_column_relation When `val[sub]` is parsed, it used to be the case that column_value with a sub field was created. Now this has been changed to creating a subscript struct. This is the only place where a subscripted value can be created. All the code regarding subscripts now operates using only the subscript struct, so we will be able to remove column_value::sub soon. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 22:02:39 +01:00
Jan Ciolek	cf6e81e731	cql3: expr: Add subscript to expression All handlers for subscript have finally been implemented and subscript can now be added to expression without any trouble. All the commented out code that waited for this moment can now be uncommented. Every such piece of code had a `TODO(subscript)` note and by grepping this phrase we can make sure that we didn't forget any of them. Right now there is two ways to express a subscripted column - either by a column_value with a sub field or by using a subscript struct. The grammar still uses the old column_value way, but column_value.sub will be removed soon and everything will move to the subscript struct. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 22:02:29 +01:00
Jan Ciolek	c39498537c	cql3: expr: Handle subscript in fill_prepare_context fill_prepare_context collects useful information about the expression involved in query restrictions. We should collect this information from subscript as well, just like we do from column_value and its sub. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	811685ad6a	cql3: expr: Handle subscript in evaluate A column_value can't be evaluated, so a subscripted column can't evaluated be as well. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	db8990436a	cql3: expr: Handle subscript in extract_single_column_restrictions_for_column extract_single_column_restrictions_for_column finds all restrictions for a column and puts them in a vector. In case we encounter col[sub] we treat it as a restriction on col and add it to the result. This seems to make some sense and is in line with the current behaviour which doesn't check whether a column is subscripted at all. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	2eaa39e1c8	cql3: expr: Handle subscript in search_and_replace Prepare a handler for subscript in search_and_replace. Some of the code must be commented out for now because subscript hasn't been added to expression yet. It will uncommented later. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	e2d983f659	cql3: expr: Handle subscript in recurse_until Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	2d4174dc46	cql3: expr: Implement operator<< for subscript expression can be printed using operator<<. We need to handle subscript there. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	02c3b78e25	cql3: expr: Handle subscript in possible_lhs_values possible_lhs_values returns set of possible values for a column given some restrictions. Current behaviour in case of a subscripted column is to just ignore the subscript and treat the restriction as if it were on just the column. This seems wrong, or at least confusing, but I won't change it in this patch to preserve the existing behaviour. Trying to change this to something more reasonable breaks other code which assumes that possible_lhs_values returns a list of values. (See partition_ranges_from_EQs() in cql3/restrictions/statement_restrictions.cc) Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00
Jan Ciolek	07fbf74a97	cql3: expr: Handle subscript in is_supported_by is_supported_by checks whether the given expression is supported by some index. The current behaviour seems wrong, but I kept it to avoid making changes in a refactor PR. Scylla doesn't have indexes on map entries yet, so for a subscript the answer is always no. I think we should just return false there. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 21:56:41 +01:00

1 2 3 4

175 Commits