This adds to the grammar the option to SELECT a specific key in a
collection column using subscript syntax.
For example:
SELECT map['key'] FROM table
SELECT map['key1']['key2'] FROM table
The key can also be parameterized in a prepared query. For this we need
to pass the query options to result_set_builder where we process the
selectors.
Fixesscylladb/scylladb#7751
This allows to use subscript on a set column, in addition to map/list
which was possible until now.
The behavior is compatible with Cassandra - a subscript with a specific value
returns the value if it's found in the set, and null otherwise.
Where the grammar supports IN, we add NOT IN. This includes the WHERE
clause and LWT IF clause.
Evaluation of NOT IN follows from IN.
In statement_restrictions analysis, they are different, as NOT IN
doesn't enable any clever query plan and must filter.
Some tests are added. An error message was changed ('in' changed to 'IN'),
so some tests are adjusted.
Closesscylladb/scylladb#21992
Our "sstring_view" is an historic alias for the standard std::string_view.
The cql3/ directory used this old alias in a few of random places, let's
change them to use the standard type name.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Using the standard library is preffered over boost.
In cql3/expr/expression.cc to_sorted_vector got more of a
face-list and was modernized to use also std::unique
and while at it, to move its input range in the uniquely sorted
result vector.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move all of the blatantly restriction-related expression utilities
to statement_restrictions.cc.
Some are so blatant as to include the word "restriction" in their name.
Others are just so specialized that they cannot be used for anything else.
The motivation is that further refactoring will be simplified if it can
happen within the same module, as there will not be a need to prove
it has no effect elsewhere.
Most of the declarations are made non-public (in .cc file) to limit
proliferation. A few are needed for tests or in select_statement.cc
and so are kept public.
Other than that, the only changes are namespace qualifications and
removal of a now-duplicate definition ("inclusive").
Closesscylladb/scylladb#20732
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This is done to ease code reuse in the following commit.
It'd also help should we ever want properly mount functions
class to schema object instead of static storage.
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.
this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:
```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
254 | return formatter<std::string_view>::format(it->second, ctx);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
2759 | FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
| ^ ~~~~~~~~~~~~
```
because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18299
The read_field is std::optional<View>. The raw_value::make_value()
accepts managed_bytes_opt which is std::optional<manager_bytes>.
Finally, there's std::optional<T>::optional(std::optional<U>&&)
move constructor (and its copy-constructor peer).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18128
before this change, we already have a `fmt::formatter` specialized for
`expression::printer`. but the formatter was implemented by
1. formatting the `printer` instance to an `ostringstream`, and
2. extracting a `std::string` from this `ostringstream`
3. formatting the `std::string` instance to the fmt context
this is convoluted and is not an optimal implementation. so,
in this change, it is reimplemented by formatting directly to
the context. its operator<< is also dropped in this change.
please note, to avoid adding the large chunk of code into the
.hh file, the implementation is put in the .cc file. but in order
to preserve the usage of `transformed(fmt::to_string<expression::printer>)`,
the `format()` function is defined as a template, and instantiated
explicitly for two use cases:
1. to format to `fmt::context`
2. to format using `fmt::to_string()`
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* raw_value
* raw_value_view
`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.
`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for untyped_constant::type_class,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.
We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.
Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.
We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.
Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.
The unit tests are renamed and range.hh is deleted.
Closesscylladb/scylladb#17428
This change introduces a specialization of fmt::formatter
for cql3::expr::oper_t. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.
Usage of cql3::expr::oper_t without the defined formatter
resulted in compilation error when compiled with FMTv10.
Refs: #13245
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16719
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
this change is a cleanup.
to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.
and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:
```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
245 | const index_metadata_kind kind() const;
| ^~~~~
```
so this change also silences the above warnings.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`expression` is a std::variant with 16 different variants
that represent different types of AST nodes.
Let's add documentation that explains what each of these
16 types represents. For people who are not familiar with expression
code it might not be clear what each of them does, so let's add
clear descriptions for all of them.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closesscylladb/scylladb#15767
When doing a SELECT CAST(b AS int), Cassandra returns a column named
cast(b as int). Currently, Scylla uses a different name -
system.castasint(b). For Cassandra compatibility, we should switch to
the same name.
fixes#14508Closesscylladb/scylladb#14800
When preparing a `field_selection`, we need to prepare the UDT value,
and then verify that it has this field.
`field_selection_test_assignment` prepares the UDT value using the same
receiver as the whole `field_selection`. This is wrong, this receiver
has the type of the field, and not the UDT.
It's impossible to create a receiver for the UDT. Many different UDTs
can produce an `int` value when the field `a` is selected.
Therefore the receiver should be `nullptr`.
No unit test is added, as this bug doesn't currently cause any issues.
Preparing a column value doesn't do any type checks, so nothing fails.
Still it's good to fix it, just to be correct.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closesscylladb/scylladb#14788
do not use muti-line comment. this silences the warning from GCC:
```
In file included from ./cql3/prepare_context.hh:19,
from ./cql3/statements/raw/parsed_statement.hh:14,
from build/debug/gen/cql3/CqlParser.hpp:62,
from build/debug/gen/cql3/CqlParser.cpp:44:
./cql3/expr/expression.hh:490:1: error: multi-line comment [-Werror=comment]
490 | /// Custom formatter for an expression. Supports multiple modes:\
| ^
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15471
Before choosing a function, we prepare the arguments that can be
prepared without a receiver. Preparing an argument makes
its type known, which allows to choose the best overload
among many possible functions.
The function that prepared the argument passes the unprepared
argument by mistake. Let's fix it so that it actually uses
the prepared argument.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#14786
`expression`'s default constructor is dangerous as an it can leak
into computations and generate surprising results. Fix that by
removing the default constructor.
This is made somewhat difficult by the parser generator's reliance
on default construction, and we need to expand our workaround
(`uninitialized<>`) capabilities to do so.
We also remove some incidental uses of default-constructed expressions.
Closes#14706
* github.com:scylladb/scylladb:
cql3: expr: make expression non-default-constructible
cql3: grammar: don't default-construct expressions
cql3: grammar: improve uninitialized<> flexibility
cql3: grammar: adjust uninitialized<> wrapper
test: expr_test: don't invoke expression's default constructor
cql3: statement_restrictions: explicitly initialize expressions in index match code
cql3: statement_restrictions: explicitly intitialize some expression fields
cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions
cql3: expr: prepare_expression: avoid default-constructed expression
cql3: broadcast_tables: prepare new_value without relying on expression default constructor
Since ec77172b4b (" Merge 'cql3: convert
the SELECT clause evaluation phase to expressions' from Avi Kivity"),
we rewrite non-aggregating selectors to include an aggregation, in order
to have the rest of the code either deal with no aggregation, or
all selectors aggregating, with nothing in between. This is done
by wrapping column selectors with "first" function calls: col ->
first(col).
This broke non-aggregating selectors that included the ttl() or
writetime() pseudo functions. This is because we rewrote them as
writetime(first(col)), and writetime() isn't a function that operates
on any values; it operates on mutations and so must have access to
a column, not an expression.
Fix by detecting this scenario and rewriting the expression as
first(writetime(col)).
Unit and integration tests are added.
Fixes#14715.
Closes#14716
prepare_expression() already validates the types and computes
the index of the field; no need to redo that work when
evaluating the expression.
The tests are adjusted to also prepare the expression.
Closes#14562
There is no obvious default expression, so better not to allow
default construction of expressions to prevent unintended values
from leaking in. Resolves a FIXME.
We're about to remove expression's default constructor, so adjust
the usertype_constructor code that checks whether a field has an
initializer or whether we must supply a NULL to not rely on it.
fmtlib uses `{}` as the placeholder for the formatted argument, not
`{}}`.
so let's correct it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14586
field_selection::type refers to the type of the selection operation,
not the type of the structure being selected. This is what
prepare_expression() generates and how all other expression elements
work, but evaluate() for field_selection thinks it's the type
of the structure, and so fails when it gets an expression
from prepare_expression().
Fix that, and adjust the tests.
Aggregate functions cannot be evaluated directly, since they implicitly
refer to state (the accumulator). To allow for evaluation, we
split the expression into two: an inner expression that is evaluated
over the input vector (once per element). The inner expression calls
the aggregation function, with an extra input parameter (the accumulator).
The outer expression is evaluated once per input vector; it calls
the final function, and its input is just the accumulator. The outer
expression also contains any expressions that operate on the result
of the aggregate function.
The acculator is stored in a temporary.
Simple example:
sum(x)
is transformed into an inner expression:
t1 = (t1 + x) // really sum.aggregation_function
and an outer expression:
result = t1 // really sum.state_to_result_function
Complicated example:
scalar_func(agg1(x, f1(y)), agg2(x, f2(y)))
is transformed into two inner expressions:
t1 = agg1.aggregation_function(t1, x, f1(y))
t2 = agg2.aggregation_function(t2, x, f2(y))
and an outer expression
output = scalar_func(agg1.state_to_result_function(t1),
agg2.state_to_result_function(t2))
There's a small wart: automatically parallelized queries can generate
"reducible" aggregates that have no state_to_result function, since we
want to pass the state back to the coordinator. Detect that and short
circuit evaluation to pass the accumulator directly.
We plan to rewrite aggregation queries that have a non-aggregating
selector using the first function, so that all selectors are
aggregates (or none are). Prevent the first function from affecting
metadata (the auto-generated column names), by skipping over the
first function if detected. They input and output types are unchanged
so this only affects the name.
Temporaries are similar to bind variables - they are values provided from
outside the expression. While bind variables are provided by the user, temporaries
are generated internally.
The intended use is for aggregate accumulator storage. Currently aggregates
store the accumulator in aggregate_function_selector::_accumulator, which
means the entire selector hierarchy must be cloned for every query. With
expressions, we can have a single expression object reused for many computations,
but we need a way to inject the accumulator into an aggregation, which this
new expression element provides.
When returning a result set (and when preparing a statement), we
return metadata about the result set columns. Part of that is the
column names, which are derived from the expressions used as selectors.
Currently, they are computed via selector::column_name(), but as
we're dismantling that hierarchy we need a different way to obtain
those names.
It turns out that the expression formatter is close enough to what
we need. To avoid disturbing the current :user mode, add a new
:metadata mode and apply the adjustments needed to bring it in line
with what column metadata looks like today.
Note that column metadata is visible to applications and they can
depend on it; e.g. the Python driver allows choosing columns based on
their names rather than ordinal position.
Most clauses in a CQL statement don't tolerate aggregate functions,
and so they call verify_no_aggregate_functions(). It can now be
reimplemented in terms of aggregation_depth(), removing some code.
We define the "aggregation depth" of an expression by how many
nested aggregation functions are applied. In CQL/SQL, legal
values are 0 and 1, but for generality we deal with any aggregation depth.
The first helper measures the maximum aggregation depth along any path
in the expression graph. If it's 2 or greater, we have something like
max(max(x)) and we should reject it (though these helpers don't). If
we get 1 it's a simple aggregation. If it's zero then we're not aggregating
(though CQL may decide to aggregate anyway if GROUP BY is used).
The second helper edits an expression to make sure the aggregation depth
along any path that reaches a column is the same. Logically,
`SELECT x, max(y)` does not make sense, as one is a vector of values
and the other is a scalar. CQL resolves the problem by defining x as
"the first value seen". We apply this resolution by converting the
query to `SELECT first(x), max(y)` (where `first()` is an internal
aggregate function), so both selectors refer to scalars that consume
vectors.
When a scalar is consumed by an aggregate function (for example,
`SELECT max(x), min(17)` we don't have to bother, since a scalar
is implicity promoted to a vector by evaluating it every row. There
is some ambiguity if the scalar is a non-pure function (e.g.
`SELECT max(x), min(random())`, but it's not worth following.
A small unit test is added.
Currently, a prepared function_call expression is printed as an
"anonymous function", but it's not really anonymous - the name is
available. Print it out.
This helps in a unit test later on (and is worthwhile by itself).
Adding a function declaration to expression.hh causes many
recompilations. Reduce that by:
- moving some restrictions-related definitions to
the existing expr/restrictions.hh
- moving evaluation related names to a new header
expr/evaluate.hh
- move utilities to a new header
expr/expr-utilities.hh
expression.hh contains only expression definitions and the most
basic and common helpers, like printing.
Make evaluate()'s body more regular, then exploit it by
replacing the long list of branches with a lambda template.
Closes#14306
* github.com:scylladb/scylladb:
cql3: expr: simplify evaluate()
cql3: expr: standardize evaluate() branches to call do_evaluate()
cql3: expr: rename evaluate(ExpressionElement) to do_evaluate()
Spans are slightly cleaner, slightly faster (as they avoid an indirection),
and allow for replacing some of the arguments with small_vector:s.
Closes#14313
Now that all branches in the visitor are uniform and consist
of a single call to do_evaluate() overloads, we can simplify
by calling a lambda template that does just that.
evaluate(expression) calls the various evaluate(ExpressionElement)
overloads to perform its work. However, if we add an ExpressionElement
and forget to implement its evaluate() overload, we'll end up in
with infinite recursion. It will be caught immediately, but better to
avoid it.
Also sprinkle static:s on do_evaluate() where missing.
Enhance evaluation_inputs with timestamps and ttls, and use
them to evaluate writetime/ttl.
The data structure is compatible with the current way of doing
things (see result_set_builder::_timestamps, result_set_build::_ttls).
We use std::span<> instead of std::vector<> as it is more general
and a tiny bit faster.
The algorithm is taken from writetime_or_ttl_selector::add_input().