scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Kefu Chai	d7a404e1ec	alternator: add formatter for alternator::calculate_value_caller before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `alternator::calculate_value_caller`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17259	2024-02-11 11:49:46 +02:00
Nadav Har'El	04e5082d52	alternator: limit expression length and recursion depth DynamoDB limits of all expressions (ConditionExpression, UpdateExpression, ProjectionExpression, FilterExpression, KeyConditionExpression) to just 4096 bytes. Until now, Alternator did not enforce this limit, and we had an xfailing test showing this. But it turns out that not enforcing this limit can be dangerous: The user can pass arbitrarily-long and arbitrarily nested expressions, such as: a<b and (a<b and (a<b and (a<b and (a<b and (a<b and (...)))))) or ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( and those can cause recursive algorithms in Alternator's parser and later when applying expressions to recurse very deeply, overflow the stack, and crash. This patch includes new tests that demonstrate how Scylla crashes during parsing before enforcing the 4096-byte length limit on expressions. The patch then enforces this length limit, and these tests stop crashing. We also verify that deeply-nested expressions shorter than the 4096-byte limit are apparently short enough for our recursion ability, and work as expected. Unforuntately, running these tests many times showed that the 4096-byte limit is not low enough to avoid all crashes so this patch needs to do more: The parsers created by ANTLR are recursive, and there is no way to limit the depth of their recursion (i.e., nothing like YACC's YYMAXDEPTH). Very deep recursion can overflow the stack and crash Scylla. After we limited the length of expression strings to 4096 bytes this was almost enough to prevent stack overflows. But unfortunetely the tests revealed that even limited to 4096 bytes, the expression can sometimes recurse too deeply: Consider the expression "((((((....((((" with 4000 parentheses. To realize this is a syntax error, the parser needs to do a recursive call 4000 times. Or worse - because of other Antlr limitations (see rants in comments in expressions.g) it's actually 12000 recursive calls, and each of these calls have a pretty large frame. In some cases, this overflows the stack. The solution used in this patch is not pretty, but works. We add to rules in alternator/expressions.g that recurse (there are two of those - "value" and "boolean_expression") an integer "depth" parameter, which we increase when the rule recurses. Moreover, we add a so-called predicate "{depth<MAX_DEPTH}?" that stops the parsing when this limit is reached. When the parsing is stopped, the user will see a special kind of parse error, saying "expression nested too deeply". With this last modification to expressions.g, the tests for deeply-nested but still-below-4096-bytes expressions (test_limits.py::test_deeply_nested_expression_*) would not fail sporadically as they did without it. While adding the "expression nested too deeply" case, I also made the general syntax-error reporting in Alternator nicer: It no longer prints the internal "expression_syntax_error" type name (an exception type will only be printed if some sort of unexpected exception happens), and it prints the character position where the syntax error (or too deep nested expression) was recognized. Fixes #14473 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14477	2023-07-31 08:57:54 +03:00
Piotr Sarna	c613d1ce87	alternator: migrate expression parsers to string_view Following the advice in the FIXME note, helper functions for parsing expressions are now based on string views to avoid a few unnecessary conversions to std::string. Tests: unit(dev) Closes #10013	2022-02-04 12:34:19 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Nadav Har'El	282742a469	alternator: fix query with both projection and filtering We had a bug when a Query/Scan had both projection (ProjectionExpression or AttributesToGet) and filtering (FilterExpression or Query/ScanFilter). The problem was that projection left only the requested attributes, and the filter might have needed - and not got - additional attributes. The solution in this patch is to add the generated JSON item also the extra attributes needed by filtering (if any), run the filter on that, and only at the end remove the extra filtering attributes from the item to be returned. The two tests test_query_filter.py::test_query_filter_and_attributes_to_get test_filter_expression.py::test_filter_expression_and_projection_expression Which failed before this patch now pass so we drop their "xfail" tag. Fixes #6951. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Piotr Sarna	4de23d256e	alternator,utils: move rjson.hh to utils/ rjson is going to replace libjsoncpp, so it's moved from alternator to the common utils/ directory.	2020-07-03 08:30:01 +02:00
Nadav Har'El	8c026b9f10	alternator: move some code out of executor.cc The source file alternator/executor.cc has grown too much, reaching almost 4,000 lines. In this patch I move about 400 lines out of executor.cc: 1. Some functions related to serialization of sets and lists were moved to serialization.cc, 2. Functions related to evaluating parsed expressions were moved to expressions.cc. The header file expressions_eval.hh was also removed - the calculate_value() functions now live in expressions.cc, so we can just define them in expressions.hh, no need for a separate header files. This patch just moves code around. It doesn't make any functional changes. Refs #5783. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	0b9f25ab50	alternator: implement FilterExpression This patch provides a complete implementation for the FilterExpression parameter - the newer syntax for filtering the results of the Query or Scan operations. The implementation is pretty straightforward - we already added earlier a result-filtering framework to Alternator, and used it for the older filtering syntax - QuryFilter and ScanFilter. All we had to do now was to run the FilterExpression (which has the same syntax as a ConditionExpression) on each individual items. The previous cleanup patches were important to reduce the friction of running these expressions on the items. After the previous patches fixing small esoteric bugs in a few expression functions, with this patch all the tests in test_filter_expression.py now pass, and so do the two FilterExpression tests in test_query.py and test_scan.py. As far as I know (and of course minus any bugs we'll discover later), this marks the FilterExpression feature complete. Fixes #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	13ef31f38b	alternator: refactor resolving of references in expressions In the DynamoDB API, expressions (e.g., ConditionExpression and many more) may contain references to column names ("#name") or to values (":val") given in a separate part of the request - ExpressionAttributeNames and ExpressionAttributeValues respectively. Before this patch, we resolved these references as part of the expression's evaluation. This approach had two downsides: 1. It often misdiagnosed (both false negatives and false positives) cases of unused names and values in expressions. We already had two xfailing tests with examples - which pass after this patch. This patch also adds two additional tests, which failed before this patch and pass with it. 2. In one of the following patches we will add support for FilterExpression, where the same expression is used repeatedly on many items. It is a waste (as well as makes the code uglier) to resolve the same references again and again each time the expression is evaluated. We should be able to do it just once. So this patch introduces an intermediate step between parsing and evaluating an expression - "resolving" the expression. The new resolve_() functions modify the already parsed expression, replacing references to attribute names and constant values by the actual names and values taken from the request. The resolve_() functions also keep track which references were used, making it very easy to check (as DynamoDB does) if there are any unused names or values, before starting the evaluation. The interface of evaluate() functions become much simpler - they no longer need to know the original request (which was previously needed for ExpressionAttributeNames/Values), the table's schema (which was previously needed only for some error checking), keep track of which references were used. This simplification is helpful for using the expressions in contexts where these things (request and schema) are no longer conveniently available, namely in FilterExpression. A small side-benefit of this patch is that it moves a bit of code, which handled resolving of references in expressions, from executor.cc to expressions.cc. This is just the first step in a bigger effort to reduce the size of executor.cc by moving code to smaller source files. There is no attempt in this patch to move as much code as we can. We will move more code in a separate patch in this series. Fixes #6572. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 11:57:13 +03:00
Nadav Har'El	b50274e8a7	alternator: add support for ConditionExpression This patch adds support for the ConditionExpression parameter of the item-writing operations in Alternator: PutItem, UpdateItem and DeleteItem. We already supported conditional updates/put/delete using the "Expected" parameter. The ConditionExpression parameter implemented here provides a very similar feature, using a different - and also newer and more powerful - syntax. The implementation here reuses much of our existing expression-parsing infrastructure. Unsurprisingly, ConditionExpression's syntax has much in common with UpdateExpression which we already support) and also many of the comparison functions already implemented for "Expected". However, it's still quite a bit of new code, because of the many different comparisons, functions, and syntax variations we need to support. This patch also expands alternator-test/test_condition_expression.py with a few additional corner cases discovered during the development of this patch. Almost all of the tests for this feature (35 out of 39) now pass. Two tests still fail because we don't yet support nested attributes (this is a missing feature across Alternator), and two tests fail because of minor ideosyncracies in DynamoDB's error path that we chose not to duplicate yet (but still remember the difference in the form of an xfailing test). Fixes #5035 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:33 +02:00
Nadav Har'El	c9eb9d9c76	alternator: update license blurbs Update all the license blurbs to the one we use in the open-source Scylla project, licensed under the AGPL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825160321.10016-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	a8dd3044e2	alternator: support (most of) ProjectionExpression DynamoDB has two similar parameters - AttributesToGet and ProjectionExpression - which are supported by the GetItem, Scan and Query operations. Until now we supported only the older AttributesToGet, and this patch adds support to the newer ProjectionExpression. Besides having a different syntax, the main difference between AttributesToGet and ProjectionExpression is that the latter also allows fetching only a specific nested attribute, e.g., a.b[3].c. We do not support this feature yet, although it would not be hard to add it: With our current data representation, it means fetching the top-level attribute 'a', whose value is a JSON, and then post-filtering it to take out only the '.b[3].c'. We'll do that later. This patch also adds more test cases to test_projection_expression.py. All tests except three which check the nested attributes now pass, and those three xfail (they succeed on DynamoDB, and fail as expected on Alternator), reminding us what still needs to be done. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:15:01 +03:00
Nadav Har'El	829bafd181	alternator: add expression parsers The DynamoDB protocol is based on JSON, and most DynamoDB requests describe the operation and its parameters via JSON objects such as maps and lists. However, in some types of requests an "expression" is passed as a single string, and we need to parse this string. These cases include: 1. Attribute paths, such as "a[3].b.c", are used in projection expressions as well as inside other expressions described below. 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f", used in conditional updates, filters, and other places. 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d" This patch introduces the framework to parse these expressions, and an implementation of parsing update expressions. These update expressions will be used in the UpdateItem operation in the next patch. All these expression syntaxes are very simple: Most of them could be parsed as regular expressions, or at most a simple hand-written lexical analyzer and recursive-descent parser. Nevertheless, we decided to specify these parsers in the same ANTLR3 language already used in the Scylla project for parsing CQL, hopefully making these parsers easier to reason about, and easier to change if needed - and reducing the amount of boiler- plate code. The parsing of update expressions is most complete except that in SET actions, only the "path = value" form is supported and not yet forms forms such as "path1 = path2" (which does read-before-write) or "path1 = path1 + value" or "path = function(...)". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:12 +03:00

14 Commits