scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Piotr Szymaniak	e588c8667f	alternator: Limit attribute name lengths Attribute names are now checked against DynamoDB-compatible length limits. When exceeded, Alternator emits exception identical or similar to the DDB one. It might be worth noting that DDB emits more than a single kind of an exception string for some exceptions. The tests' catch clauses handle all the observed kinds of messages from DynamoDB. The validation differentiates between key and non-key attributes and applies the limit accordingly. AWS DDB raises exceptions with somewhat different contents when the get request contains ProjectionExpression, so this case needed separate treatment to emit the corresponding exception string. The length-validating function was declared and defined in expressions.hh/.cc respectively, because that's where the relevant parsing happens. ** Tests The following tests were validated when handling this issue: test_limit_attribute_length_nonkey_good, test_limit_attribute_length_nonkey_bad, test_limit_attribute_length_key_good, test_limit_attribute_length_key_bad, test_limit_attribute_length_gsi_lsi_good, test_limit_attribute_length_gsi_lsi_bad, test_limit_attribute_length_gsi_lsi_projection_bad. Some of the tests were expanded into being more granular. Namely, there is a new test function `test_limit_attribute_length_key_bad_incoherent_names` which groups tests with too long attribute names in the case of incorrect (incoherent) user requests. Similarily, there is a new test function `test_limit_attribute_length_gsi_lsi_bad_incoherent_names` All the tests cover now each combination of the key/keys being too long. Both the new fuctions contain tests that verify that ScyllaDB throws length-related exceptions (instead of the coherency-related), similar to what DynamoDB does. The new test test_limit_gsiu_key_len_bad covers the case of too long attribute name inside GlobalSecondaryIndexUpdates. The new test test_limit_gsiu_key_len_bad_incoherent_names covers the case of incorrect (incoherent) user requests containing too long attribute names and GlobalSecondaryIndexUpdates. test_limit_attribute_length_key_bad was found to have contaned an illegal KeySchema structure. Some of the tests were corrected their match clause. All the tests are stripped of the xfail flag except test_limit_attribute_length_key_bad, which has it changed since it still fails due to Projection in GSI and LIS not implemented in Alternator. The xfail now points to #5036. Fixes scylladb/scylladb#9169 Closes scylladb/scylladb#23097	2025-04-27 18:39:20 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	00810e6a01	treewide: include seastar/core/format.hh instead of seastar/core/print.hh The later includes the former and in addition to `seastar::format()`, `print.hh` also provides helpers like `seastar::fprint()` and `seastar::print()`, which are deprecated and not used by scylladb. Previously, we include `seastar/core/print.hh` for using `seastar::format()`. and in seastar 5b04939e, we extracted `seastar::format()` into `seastar/core/format.hh`. this allows us to include a much smaller header. In this change, we just include `seastar/core/format.hh` in place of `seastar/core/print.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21574	2024-11-14 17:45:07 +02:00
Kefu Chai	59eb2ab119	treewide: s/boost::algorithm::any_of/std::ranges::any_of/ now that we are allowed to use C++23. we now have the luxury of using `std::ranges::any_of`. in this change, we replace `boost::algorithm::any_of` with `std::ranges::any_of` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-05 14:06:09 +08:00
Kefu Chai	3e84d43f93	treewide: use seastar::format() or fmt::format() explicitly before this change, we rely on `using namespace seastar` to use `seastar::format()` without qualifying the `format()` with its namespace. this works fine until we changed the parameter type of format string `seastar::format()` from `const char*` to `fmt::format_string<...>`. this change practically invited `seastar::format()` to the club of `std::format()` and `fmt::format()`, where all members accept a templated parameter as its `fmt` parameter. and `seastar::format()` is not the best candidate anymore. despite that argument-dependent lookup (ADT for short) favors the function which is in the same namespace as its parameter, but `using namespace` makes `seastar::format()` more competitive, so both `std::format()` and `seastar::format()` are considered as the condidates. that is what is happening scylladb in quite a few caller sites of `format()`, hence ADT is not able to tell which function the winner in the name lookup: ``` /__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous 265 \| return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id()); \| ^~~~~~ /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 4290 \| format(format_string<_Args...> __fmt, _Args&&... __args) \| ^ /__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 143 \| format(fmt::format_string<A...> fmt, A&&... a) { \| ^ ``` in this change, we change all `format()` to either `fmt::format()` or `seastar::format()` with following rules: - if the caller expects an `sstring` or `std::string_view`, change to `seastar::format()` - if the caller expects an `std::string`, change to `fmt::format()`. because, `sstring::operator std::basic_string` would incur a deep copy. we will need another change to enable scylladb to compile with the latest seastar. namely, to pass the format string as a templated parameter down to helper functions which format their parameters. to miminize the scope of this change, let's include that change when bumping up the seastar submodule. as that change will depend on the seastar change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-11 23:21:40 +03:00
Kefu Chai	4e9596a5a9	treewide: replace std::result_of_t with std::invoke_result_t in theory, std::result_of_t should have been removed in C++20. and std::invoke_result_t is available since C++17. thanks to libstdc++, the tree is compiling. but we should not rely on this. so, in this change, we replace all `std::result_of_t` with `std::invoke_result_t`. actually, clang + libstdc++ is already warning us like: ``` In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9: In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11: In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38: Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations] 2624 \| using result_of_t = typename result_of<_Tp>::type; \| ^ /home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here 518 \| if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) { \| ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18835	2024-05-26 16:45:42 +03:00
Kefu Chai	57c408ab5d	alternator: add fmt::formatter for alternator::parsed::path before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `alternator::parsed::path`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17458	2024-02-22 16:40:01 +02:00
Nadav Har'El	04e5082d52	alternator: limit expression length and recursion depth DynamoDB limits of all expressions (ConditionExpression, UpdateExpression, ProjectionExpression, FilterExpression, KeyConditionExpression) to just 4096 bytes. Until now, Alternator did not enforce this limit, and we had an xfailing test showing this. But it turns out that not enforcing this limit can be dangerous: The user can pass arbitrarily-long and arbitrarily nested expressions, such as: a<b and (a<b and (a<b and (a<b and (a<b and (a<b and (...)))))) or ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( and those can cause recursive algorithms in Alternator's parser and later when applying expressions to recurse very deeply, overflow the stack, and crash. This patch includes new tests that demonstrate how Scylla crashes during parsing before enforcing the 4096-byte length limit on expressions. The patch then enforces this length limit, and these tests stop crashing. We also verify that deeply-nested expressions shorter than the 4096-byte limit are apparently short enough for our recursion ability, and work as expected. Unforuntately, running these tests many times showed that the 4096-byte limit is not low enough to avoid all crashes so this patch needs to do more: The parsers created by ANTLR are recursive, and there is no way to limit the depth of their recursion (i.e., nothing like YACC's YYMAXDEPTH). Very deep recursion can overflow the stack and crash Scylla. After we limited the length of expression strings to 4096 bytes this was almost enough to prevent stack overflows. But unfortunetely the tests revealed that even limited to 4096 bytes, the expression can sometimes recurse too deeply: Consider the expression "((((((....((((" with 4000 parentheses. To realize this is a syntax error, the parser needs to do a recursive call 4000 times. Or worse - because of other Antlr limitations (see rants in comments in expressions.g) it's actually 12000 recursive calls, and each of these calls have a pretty large frame. In some cases, this overflows the stack. The solution used in this patch is not pretty, but works. We add to rules in alternator/expressions.g that recurse (there are two of those - "value" and "boolean_expression") an integer "depth" parameter, which we increase when the rule recurses. Moreover, we add a so-called predicate "{depth<MAX_DEPTH}?" that stops the parsing when this limit is reached. When the parsing is stopped, the user will see a special kind of parse error, saying "expression nested too deeply". With this last modification to expressions.g, the tests for deeply-nested but still-below-4096-bytes expressions (test_limits.py::test_deeply_nested_expression_*) would not fail sporadically as they did without it. While adding the "expression nested too deeply" case, I also made the general syntax-error reporting in Alternator nicer: It no longer prints the internal "expression_syntax_error" type name (an exception type will only be printed if some sort of unexpected exception happens), and it prints the character position where the syntax error (or too deep nested expression) was recognized. Fixes #14473 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14477	2023-07-31 08:57:54 +03:00
Nadav Har'El	a4087f58df	alternator: fix error path for size() function on constants The DynamoDB documentation for the size() function claims that it only works on paths (attribute names or references), but it actually works on constants from the query (e.g., ":val") as well. It turns out that Alternator supports this undocumented case already, but gets the error path wrong: Usually, when size() is calculated on the data, if the data has the wrong type of size() (e.g., an integer), the condition simply doesn't match. But if the value comes from the query - it should generate an error that the query is wrong - ValidationException. This patch fixes this case, and also adds tests for it that pass on both DynamoDB and Alternator (after this patch). Fixes #14592 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14593	2023-07-12 12:29:05 +03:00
Kefu Chai	56c3462cba	alternator: correct format string when formatting the error message for `api_error::validation`, we always include the caller in the error message, but in this case, forgot to pass the `caller` to `seastar::format()`. if fmtlib actually formats them, it would throw. so let's pass `caller` to `seastar::format()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14589	2023-07-09 22:25:13 +03:00
Marcin Maliszkiewicz	6f055ca5f9	alternator: evaluate expressions as false for stored malformed binary data We'll try to distinguish the case when data comes from the storage rather than user reuqest. Such attribute can be used in expressions and when it can't be decoded it should make expression evaluate as false to simply exclude the row during filter query or scan. Note that this change focuses on binary type, for other types we may have some inconsistencies in the implementation.	2023-01-16 15:15:27 +01:00
Piotr Sarna	c613d1ce87	alternator: migrate expression parsers to string_view Following the advice in the FIXME note, helper functions for parsing expressions are now based on string views to avoid a few unnecessary conversions to std::string. Tests: unit(dev) Closes #10013	2022-02-04 12:34:19 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Nadav Har'El	5e52858295	rjson, alternator: rename set() functions add() The rjson::set() sounds like it can set any member of a JSON object (i.e., map), but that's not true :-( It calls the RapidJson function AddMember() so it can only add a member to an object which doesn't have a member with the same name (i.e., key). If it is called with a key that already has a value, the result may have two values for the same key, which is ill-formed and can cause bugs like issue #9542. So in this patch we begin by renaming rjson::set() and its variant to rjson::add() - to suggest to its user that this function only adds members, without checking if they already exist. After this rename, I was left with dozens of calls to the set() functions that need to changed to either add() - if we're sure that the object cannot already have a member with the same name - or to replace() if it might. The vast majority of the set() calls were starting with an empty item and adding members with fixed (string constant) names, so these can be trivially changed to add(). It turns out that all other set() calls - except the one fixed in issue #9542 - can also use add() because there are various "excuses" why we know the member names will be unique. A typical example is a map with column-name keys, where we know that the column names are unique. I added comments in front of such non-obvious uses of add() which are safe. Almost all uses of rjson except a handful are in Alternator, so I verified that all Alternator test cases continue to pass after this patch. Fixes #9583 Refs #9542 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211104152540.48900-1-nyh@scylladb.com>	2021-11-04 16:35:38 +01:00
Nadav Har'El	7e6c5394f3	alternator: move list_concatenate() function The list_concatenate() function was only used for UpdateExpression's ADD operation, so we made it a static function in the source file where it was used. In the next patch, we'll want to use it in another place (AttributeUpdates' ADD operation), so let's move it to the same file where similar functions for sets exist. This patch is almost entirely a code move, but also makes one small change: list_concatenate() used to throw an exception if one of the arguments wasn't a list, but the text of this exception was specific to UpdateExpression. So in the new version, we return a null value in this case - and the caller checks for it and throws the right exception. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-03 10:19:26 +02:00
Piotr Dulikowski	5a0942a0f8	utils,alternator: move base64 code from alternator to utils The base64 encoding/decoding functions will be used for serialization of hint sync point descriptions. Base64 format is not specific to Alternator, so it can be moved to utils.	2021-08-09 09:24:36 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Nadav Har'El	7789606545	alternator: limit the depth of nested paths DynamoDB limits the depth of a nested path in expressions (e.g. "a.b.c.d") to 32 levels. This patch adds the same limit also to Alternator. The exact value of this limit is less important (although it did make sense to choose the same limit as DynamoDB does), but it's important to have some limit: It's often convenient to handle paths with a recursive algorithm, and if we allow unlimited path depth, it can result in unlimited recursion depth, and a crash. Let's avoid this possibility. We detect the over-long path while building the parsed::path object in the parser, and generate a parse error. This patch also includes a test that verifies that both Alternator and DynamoDB have the same 32-level nesting limit on paths. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-02-14 12:21:34 +02:00
Nadav Har'El	f78d33dd73	alternator: make parsed::path object printable Make the parsed::path object printable - which is useful for error messages. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-02-14 12:21:34 +02:00
Nadav Har'El	e52785be08	alternator: support attribute paths in ConditionExpression, FilterExpression This patch fully implements support for attribute paths (e.g. a.b.c, a.d[3]) for the ConditionExpression in conditional updates, and FilterExpression in queries and scans. After this patch, all previously-xfailing tests in test_projection_expression.py and test_filter_expression.py now pass. The fix is simple: Both ConditionExpression and FilterExpression use the function calculate_value() to calculate the value of the expression. When this function calculates the value of a path, it mustn't just take the top-level attribute - it needs to walk into the specific sub-object as specified by the attribute path. This is not the end of attribute path support, UpdateExpression and ReturnValues are not yet fully supported. This will come in following patches. Refs #5024 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-02-08 19:19:09 +02:00
Nadav Har'El	104ef5242b	alternator: support attribute paths in ProjectionExpression This patch fully implements support for attribute paths (e.g. a.b.c, a.d[3]) for the ProjectionExpression in the various operations where this parameter is supported - GetItem, BatchGetItem, Query and Scan. After this patch, all xfailing tests in test_projection_expression.py now pass. In the previous patch we remembered in the "attrs_to_get" object not only the top-level attributes to read from the table, but also how to filter from it only the desired pieces of the nested document. In this patch we add a filter() function to do this filtering, and call it in the right places to post-process the JSON objects we read from the table. We also had to fix reference resolution in paths to resolve all the components of the path (e.g., #name1.#name2) and not just the top-level attribute. This is not the end of attribute path support, there are still other expressions (ConditionExpression, UpdateExpression, FilterExpression, ReturnValues) where they are not yet supported. This will come in following patches. Refs #5024 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-02-08 14:16:40 +02:00
Nadav Har'El	653610f4bc	alternator: fix ValidationException in FilterExpression - and more The first condition expressions we implemented in Alternator were the old "Expected" syntax of conditional updates. That implementation had some specific assumptions on how it handles errors: For example, in the "LT" operator in "Expected", the second operand is always part of the query, so an error in it (e.g., an unsupported type) resulted it a ValidationException error. When we implemented ConditionExpression and FilterExpression, we wrongly used the same functions check_compare(), check_BETWEEN(), etc., to implement them. This results in some inaccurate error handling. The worst example is what happens when you use a FilterExpression with an expression such as "x < y" - this filter is supposed to silently skip items whose "x" and "y" attributes have unsupported or different types, but in our implementation a bad type (e.g., a list) for y resulted in a ValidationException which aborted the entire scan! Interestingly, in once case (that of BEGINS_WITH) we actually noticed the slightly different behavior needed and implemented the same operator twice - with ugly code duplication. But in other operators we missed this problem completely. This patch first adds extensive tests of how the different expressions (Expected, QueryFilter, FilterExpression, ConditionExpression) and the different operators handle various input errors - unsupported types, missing items, incompatible types, etc. Importantly, the tests demonstrate that there is often different behavior depending on whether the bad input comes from the query, or from the item. Some of the new tests fail before this patch, but others pass and were useful to verify that the patch doesn't break anything that already worked correctly previously. As usual, all the tests pass on Cassandra. Finally, this patch fixes all these problems. The comparison functions like check_compare() and check_BETWEEN() now not only take the operands, they also take booleans saying if each of the operands came from the query or from an item. The old-syntax caller (Expected or QueryFilter) always say that the first operand is from the item and the second is from the query - but in the new-syntax caller (ConditionExpression or FilterExpression) any or all of the operands can come from the query and need verification. The old duplicated code for check_BEGINS_WITH() - which a TODO to remove it - is finally removed. Instead we use the same idea of passing booleans saying if each of its operands came from an item or from the query. Fixes #8043 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-02-08 14:16:30 +02:00
Nadav Har'El	282742a469	alternator: fix query with both projection and filtering We had a bug when a Query/Scan had both projection (ProjectionExpression or AttributesToGet) and filtering (FilterExpression or Query/ScanFilter). The problem was that projection left only the requested attributes, and the filter might have needed - and not got - additional attributes. The solution in this patch is to add the generated JSON item also the extra attributes needed by filtering (if any), run the filter on that, and only at the end remove the extra filtering attributes from the item to be returned. The two tests test_query_filter.py::test_query_filter_and_attributes_to_get test_filter_expression.py::test_filter_expression_and_projection_expression Which failed before this patch now pass so we drop their "xfail" tag. Fixes #6951. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-10-05 02:19:22 +03:00
Dejan Mircevski	fb6c011b52	everywhere: Insert space after `switch` Quoth @avikivity: "switch is not a function, and we celebrate that by putting a space after it like other control-flow keywords." https://github.com/scylladb/scylla/pull/7052#discussion_r471932710 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-08-18 14:31:04 +03:00
Nadav Har'El	bca88521ba	alternator: use api_error::validation() All the places in conditions.cc, expressions.cc and serialization.cc where we constructed an api_error, we always used the ValidationException type string, which the code repeated dozens of times. This patch converts all these places to use the factory function api_error::validation(). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-07-23 15:36:39 +03:00
Piotr Sarna	e59d41dad6	alternator: use plain function pointer instead of std::function Since all function handlers are plain functions without any state, there's no need for wrapping them with a 32-byte std::function when a plain function pointer would suffice. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <913c1de7d02c252b40dc0c545989ec83fe74e5a9.1592291413.git.sarna@scylladb.com>	2020-06-16 12:08:21 +03:00
Piotr Sarna	b1684cf2e1	alternator: move function handlers to a lookup map Instead of a long chain of `if` statements, handlers are now created in a static map. Fixes a TODO in the code. Tests: unit(dev) Message-Id: <0ea577a44dd56859da170fe82c16c8f810f9d695.1592232448.git.sarna@scylladb.com>	2020-06-15 23:44:45 +03:00
Nadav Har'El	493d7e6716	alternator: avoid unnecessary conversion to string In a couple of places, where we already have a std::string_view, there is no need to convert to to a std::string (which requires allocation). One cool observation (by Piotr Sarna) is that map over std::string_view is fine, when the strings in the map are always string constants. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	8c026b9f10	alternator: move some code out of executor.cc The source file alternator/executor.cc has grown too much, reaching almost 4,000 lines. In this patch I move about 400 lines out of executor.cc: 1. Some functions related to serialization of sets and lists were moved to serialization.cc, 2. Functions related to evaluating parsed expressions were moved to expressions.cc. The header file expressions_eval.hh was also removed - the calculate_value() functions now live in expressions.cc, so we can just define them in expressions.hh, no need for a separate header files. This patch just moves code around. It doesn't make any functional changes. Refs #5783. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	0b9f25ab50	alternator: implement FilterExpression This patch provides a complete implementation for the FilterExpression parameter - the newer syntax for filtering the results of the Query or Scan operations. The implementation is pretty straightforward - we already added earlier a result-filtering framework to Alternator, and used it for the older filtering syntax - QuryFilter and ScanFilter. All we had to do now was to run the FilterExpression (which has the same syntax as a ConditionExpression) on each individual items. The previous cleanup patches were important to reduce the friction of running these expressions on the items. After the previous patches fixing small esoteric bugs in a few expression functions, with this patch all the tests in test_filter_expression.py now pass, and so do the two FilterExpression tests in test_query.py and test_scan.py. As far as I know (and of course minus any bugs we'll discover later), this marks the FilterExpression feature complete. Fixes #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	13ef31f38b	alternator: refactor resolving of references in expressions In the DynamoDB API, expressions (e.g., ConditionExpression and many more) may contain references to column names ("#name") or to values (":val") given in a separate part of the request - ExpressionAttributeNames and ExpressionAttributeValues respectively. Before this patch, we resolved these references as part of the expression's evaluation. This approach had two downsides: 1. It often misdiagnosed (both false negatives and false positives) cases of unused names and values in expressions. We already had two xfailing tests with examples - which pass after this patch. This patch also adds two additional tests, which failed before this patch and pass with it. 2. In one of the following patches we will add support for FilterExpression, where the same expression is used repeatedly on many items. It is a waste (as well as makes the code uglier) to resolve the same references again and again each time the expression is evaluated. We should be able to do it just once. So this patch introduces an intermediate step between parsing and evaluating an expression - "resolving" the expression. The new resolve_() functions modify the already parsed expression, replacing references to attribute names and constant values by the actual names and values taken from the request. The resolve_() functions also keep track which references were used, making it very easy to check (as DynamoDB does) if there are any unused names or values, before starting the evaluation. The interface of evaluate() functions become much simpler - they no longer need to know the original request (which was previously needed for ExpressionAttributeNames/Values), the table's schema (which was previously needed only for some error checking), keep track of which references were used. This simplification is helpful for using the expressions in contexts where these things (request and schema) are no longer conveniently available, namely in FilterExpression. A small side-benefit of this patch is that it moves a bit of code, which handled resolving of references in expressions, from executor.cc to expressions.cc. This is just the first step in a bigger effort to reduce the size of executor.cc by moving code to smaller source files. There is no attempt in this patch to move as much code as we can. We will move more code in a separate patch in this series. Fixes #6572. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 11:57:13 +03:00
Rafael Ávila de Espíndola	555d8fe520	build: Be consistent about system versus regular headers We were not consistent about using '#include "foo.hh"' instead of '#include <foo.hh>' for scylla's own headers. This patch fixes that inconsistency and, to enforce it, changes the build to use -iquote instead of -I to find those headers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200608214208.110216-1-espindola@scylladb.com>	2020-06-10 15:49:51 +03:00
Piotr Sarna	f4e51a96ca	alternator: replace overloaded with overloaded_functor Turns out we already have a utility header for a visitor with overloaded lambdas. This patch purges the explicit reimplementation of the same trick and uses the existing class instead. Message-Id: <60c0b9a978f8208b188ef6ddc0564cb133bed707.1581496049.git.sarna@scylladb.com>	2020-02-12 14:21:42 +02:00
Nadav Har'El	b50274e8a7	alternator: add support for ConditionExpression This patch adds support for the ConditionExpression parameter of the item-writing operations in Alternator: PutItem, UpdateItem and DeleteItem. We already supported conditional updates/put/delete using the "Expected" parameter. The ConditionExpression parameter implemented here provides a very similar feature, using a different - and also newer and more powerful - syntax. The implementation here reuses much of our existing expression-parsing infrastructure. Unsurprisingly, ConditionExpression's syntax has much in common with UpdateExpression which we already support) and also many of the comparison functions already implemented for "Expected". However, it's still quite a bit of new code, because of the many different comparisons, functions, and syntax variations we need to support. This patch also expands alternator-test/test_condition_expression.py with a few additional corner cases discovered during the development of this patch. Almost all of the tests for this feature (35 out of 39) now pass. Two tests still fail because we don't yet support nested attributes (this is a missing feature across Alternator), and two tests fail because of minor ideosyncracies in DynamoDB's error path that we chose not to duplicate yet (but still remember the difference in the form of an xfailing test). Fixes #5035 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-01-23 13:57:33 +02:00
Nadav Har'El	c9eb9d9c76	alternator: update license blurbs Update all the license blurbs to the one we use in the open-source Scylla project, licensed under the AGPL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190825160321.10016-1-nyh@scylladb.com>	2019-09-11 18:01:05 +03:00
Nadav Har'El	2fb77ed9ad	alternator: use std::visit for reading std::variant The idiomatic way to use an std::variant depending the type holds is to use std::visit. This modern API makes it unnecessary to write many boiler-plate functions to test and cast the type of the variant, and makes it impossible to forget one of the options. So in this patch we throw out the old ways, and welcome the new. Thanks to Piotr Sarna for the idea. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190704205625.20300-1-nyh@scylladb.com>	2019-09-11 15:33:57 +03:00
Nadav Har'El	a8dd3044e2	alternator: support (most of) ProjectionExpression DynamoDB has two similar parameters - AttributesToGet and ProjectionExpression - which are supported by the GetItem, Scan and Query operations. Until now we supported only the older AttributesToGet, and this patch adds support to the newer ProjectionExpression. Besides having a different syntax, the main difference between AttributesToGet and ProjectionExpression is that the latter also allows fetching only a specific nested attribute, e.g., a.b[3].c. We do not support this feature yet, although it would not be hard to add it: With our current data representation, it means fetching the top-level attribute 'a', whose value is a JSON, and then post-filtering it to take out only the '.b[3].c'. We'll do that later. This patch also adds more test cases to test_projection_expression.py. All tests except three which check the nested attributes now pass, and those three xfail (they succeed on DynamoDB, and fail as expected on Alternator), reminding us what still needs to be done. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:15:01 +03:00
Nadav Har'El	829bafd181	alternator: add expression parsers The DynamoDB protocol is based on JSON, and most DynamoDB requests describe the operation and its parameters via JSON objects such as maps and lists. However, in some types of requests an "expression" is passed as a single string, and we need to parse this string. These cases include: 1. Attribute paths, such as "a[3].b.c", are used in projection expressions as well as inside other expressions described below. 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f", used in conditional updates, filters, and other places. 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d" This patch introduces the framework to parse these expressions, and an implementation of parsing update expressions. These update expressions will be used in the UpdateItem operation in the next patch. All these expression syntaxes are very simple: Most of them could be parsed as regular expressions, or at most a simple hand-written lexical analyzer and recursive-descent parser. Nevertheless, we decided to specify these parsers in the same ANTLR3 language already used in the Scylla project for parsing CQL, hopefully making these parsers easier to reason about, and easier to change if needed - and reducing the amount of boiler- plate code. The parsing of update expressions is most complete except that in SET actions, only the "path = value" form is supported and not yet forms forms such as "path1 = path2" (which does read-before-write) or "path1 = path1 + value" or "path = function(...)". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-09-11 15:06:12 +03:00

38 Commits