Getting token() function first tries to find a schema for underlying
table and continues with nullptr if there's no one. Later, when creating
token_fct, the schema is passed as is and referenced. If it's null crash
happens.
It used to throw before 5983e9e7b2 (cql3: test_assignment: pass optional
schema everywhere) on missing schema, but this commit changed the way
schema is looked up, so nullptr is now possible.
fixes: #18637
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18639
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for std::vector<data_type>,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Fix fromJson(null) to return null, not a error as it did before this patch.
We use "null" as the default value when unwrapping optionals
to avoid bad optional access errors.
Fixes: scylladb#7912
Signed-off-by: Michael Huang <michaelhly@gmail.com>
Closesscylladb/scylladb#15481
Code in functions.cc creates the different TYPEasblob() and blobasTYPE()
functions for all type names TYPE. The functions for the "counter" type
were skipped, supposedly because "counters are not supported yet". But
counters are supported, so let's add the missing functions.
The code fix is trivial, the tests that verify that the result behaves
like Cassandra took more work.
After this patch, unimplemented::cause::COUNTERS is no longer used
anywhere in the code. I wanted to remove it, but noticed that
unimplemented::cause is a graveyard of unused causes, so decided not
to remove this one either. We should clean it up in a separate patch.
Fixes#14742
Also includes tests for tangently-related issues:
Refs #12607
Refs #14319
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Type inference for function calls is a bit complicated:
- a function argument can be inferred from the signature: a call to
my_func(:arg) will infer :arg's type from the function signature
- a function signature can be inferred from its argument types:
a call to max(my_column) will select the correct max() signature
(as max is generic) from my_column's type
Currently, functions::get() implements this by invoking
dynamic_cast<selector*> on the argument. If the caller of
functions::get() is the SELECT clause preparation, then the
cast will succeed and we'll be able to find the type. If not,
we fail (and fall back to inferring the argument types from a
non-generic function signature).
Since we're about to move selectors to expressions, the dynamic_cast
will fail, so we must replace it with a less fragile approach.
The fix is to augment assignment_testable (the interface representing
a function argument) with an intentionally-awkwardly-named
assignment_testable_type_opt(), that sees whether we happen to know
the type for the argument in order to implement signature-from-argument
inference.
A note about assignment_testable: this is a bridge interface
that is the least common denominator of anything that calls functions.
Since we're moving towards expressions, there are fewer implementations of
the interface as the code evolves.
test_assignment() and related functions check for type compatibility between
a right-hand-side and a left-hand-side.
It started its life with a limited functionality for INSERT and UPDATE,
but now it's about to be used for cast expression in selectors, which
can cast a column_value. A column_value is still an unresolved_identifier
during the prepare phase, and cannot be resolved without a schema.
To prepare for this, pass an optional schema everywhere.
Ultimately, test_assignment likely needs to be folded into prepare_expr(),
but before that prepare_expr() has to be used everywhere.
count(col), unlike count(*), does not count rows for which col is NULL.
However, if col's data type is not a scalar (e.g. a collection, tuple,
or user-defined type) it behaves like count(*), counting NULLs too.
The cause is that get_dynamic_aggregate() converts count() to
the count(*) version. It works for scalars because get_dynamic_aggregate()
intentionally fails to match scalar arguments, and functions::get() then
matches the arguments against the pre-declared count functions.
As we can only pre-declare count(scalar) (there's an infinite number
of non-scalar types), we change the approach to be the same as min/max:
we make count() a generic function. In fact count(col) is much better
as a generic function, as it only examines its input to see if it is
NULL.
A unit test is added. It passes with Cassandra as well.
Fixes#14198.
Closes#14199
The method `functions::get` is used to get the `functions::function` object
of the CQL function called using `expr::function_call`.
Until now `functions::get` required the caller to pass both the keyspace
and the column family.
The keyspace argument is always needed, as every CQL function belongs
to some keyspace, but the column family isn't used in most cases.
The only case where having the column family is really required
is the `token()` function. Each variant of the `token()` function
belongs to some table, as the arguments to the function are the
consecutive partition key columns.
Let's make the column family argument optional. In most cases
the function will work without information about column family.
In case of the `token()` function there's gonna be a check
and it will throw an exception if the argument is nullopt.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `function_name` without the help of `operator<<`.
the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13608
Spans are more flexible and can be constructed from any contiguous
container (such as small_vector), or a subrange of such a container.
This can save allocations, so change the signature to accept a span.
Spans cannot be constructed from std::initializer_list, so one such
call site is changed to use construct a span directly from the single
argument.
Currently, aggregate functions are implemented in a statefull manner.
The accumulator is stored internally in an aggregate_function::aggregate,
requiring each query to instantiate new instances (see
aggregate_function_selector's constructor, and note how it's called
from selector::new_instance()).
This makes aggregates hard to use in expressions, since expressions
are stateless (with state only provided to evaluate()). To facilitate
migration towards stateless expressions, we define a
stateless_aggregate_function (modeled after user-defined aggregates,
which are already stateless). This new struct defines the aggregate
in terms of three scalar functions: one to aggregate a new input into
an accumulator (provided in the first parameter), one to finalize an
accumulator into a result, and one to reduce two accumulators for
parallelized aggregation.
All existing native aggregate functions are converted to the new model, and
the old interface is removed. This series does not yet convert selectors to
expressions, but it does remove one of the obstacles.
Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times).
Closes#13105
* github.com:scylladb/scylladb:
cql3/selection, forward_service: use use stateless_aggregate_function directly
db: functions: fold stateless_aggregate_function_adapter into aggregate_function
cql3: functions: simplify accumulator_for template
cql3: functions: base user-defined aggregates on stateless aggregates
cql3: functions: drop native_aggregate_function
cql3: functions: reimplement count(column) statelessly
cql3: functions: reimplement avg() statelessly
cql3: functions: reimplement sum() statelessly
cql3: functions: change wide accumulator type to varint
cql3: functions: unreverse types for min/max
cql3: functions: rename make_{min,max}_dynamic_function
cql3: functions: reimplement min/max statelessly
cql3: functions: reimplement count(*) statelessly
cql3: functions: simplify creating native functions even more
cql3: functions: add helpers for automating marshalling for scalar functions
types: fix big_decimal constructor from literal 0
cql3: functions: add helper class for internal scalar functions
db: functions: add stateless aggregate functions
db, cql3: move scalar_function from cql3/functions to db/functions
min() and max() had two implementations: one static (for each type in
a select list) and one dynamic (for compound types). Since the
dynamic implementation is sufficient, we only reimplement that. This
means we don't use the automarshalling helpers, since we don't do any
arithemetic on values apart from comparison, which is conveniently
provided by abstract_type.
now that fmtlib provides fmt::join(). see
https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view
there is not need to revent the wheel. so in this change, the homebrew
join() is replaced with fmt::join().
as fmt::join() returns an join_view(), this could improve the
performance under certain circumstances where the fully materialized
string is not needed.
please note, the goal of this change is to use fmt::join(), and this
change does not intend to improve the performance of existing
implementation based on "operator<<" unless the new implementation is
much more complicated. we will address the unnecessarily materialized
strings in a follow-up commit.
some noteworthy things related to this change:
* unlike the existing `join()`, `fmt::join()` returns a view. so we
have to materialize the view if what we expect is a `sstring`
* `fmt::format()` does not accept a view, so we cannot pass the
return value of `fmt::join()` to `fmt::format()`
* fmtlib does not format a typed pointer, i.e., it does not format,
for instance, a `const std::string*`. but operator<<() always print
a typed pointer. so if we want to format a typed pointer, we either
need to cast the pointer to `void*` or use `fmt::ptr()`.
* fmtlib is not able to pick up the overload of
`operator<<(std::ostream& os, const column_definition* cd)`, so we
have to use a wrapper class of `maybe_column_definition` for printing
a pointer to `column_definition`. since the overload is only used
by the two overloads of
`statement_restrictions::add_single_column_parition_key_restriction()`,
the operator<< for `const column_definition*` is dropped.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
since "token()" computes the token for a given partition key,
if we pass the key of the wrong type, it should reject.
in this change,
* we validate the keys before returning the "token()" function.
* drop the "xfail" decorator from two of the tests. they pass
now after this fix.
* change the tests which previously passed the wrong number of
arguments containing null to "token()" and expect it to return
null, so they verify that "token()" should reject these
arguments with the expected error message.
Fixes#10448
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12991
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Currently, if a UDA uses a UDF that's being replaced,
the UDA will still keep using the old UDF until the
node is restarted.
This patch fixes this behavior by checking all UDAs
when replacing a UDF and updating them if necessary.
Fixes#12709
When deciding whether two functions have the same
signature, we have to check if they have the same name
and parameter types. Additionally, if they're represented
by pointers, we need to check if any of them is a nullptr.
This logic is used multiple times, so it's extracted to
a separate function.
To use this function, the `used_by_user_aggregate` method
takes now a function instead of name and types list - we
can do it because we always use it with an existing user
function (that we're trying to drop).
The method will also be useful when we'll be not dropping,
but replacing a user function.
We will want to reuse the functions that we get from an aggregate
without making a deep copy, and it's only possible if we get
pointers from the aggregate instead of actual values.
When dropping a UDF we're checking if it's not begin used in any UDAs
and fail otherwise. However, we're only checking its state function
and final function, and it may also be used as its reduce function.
This patch adds the missing checks and a test for them.
Currently, when dropping a function, we only check if there exist
an aggregate that uses a function with the same name as its state
function or final function. This may cause the drop to fail even
when it's just another UDF with the same name that's used in the
aggregate, even when the actual dropped function is not used there.
This patch fixes this by checking whether not only the name of the
UDA's sfunc and finalfunc, but also their argument types.
Currently, nothing prevents us from dropping a user type
used in a user function, even though doing so may make us
unable to use the function correctly.
This patch prevents this behavior by checking all function
argument and return types when executing a drop type statement
and preventing it from completing if the type is referenced
by any of them.
Closes#12680
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
`mock_get` was created only for forward_service use, thus it only checks for
aggregate functions if no declared function was found.
The reason for this function is, there is no serialization of `cql3::selection::selection`,
so functions lying underneath these selections has to be refound.
Most of this code is copied from `functions::get()`, however `functions::get()` is not used because it requires to
mock or serialize expressions and `functions::find()` is not enough,
because it does not search for dynamic aggregate functions
Makes final function and initial condition to be optional while
creating UDA. No final function means UDA returns final state
and defeult initial condition is `null`.
Fixes: #10324
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.
data_dictionary::database::real_database() is called from several
places, for these reasons:
- calling yet-to-be-converted code
- callers with a legitimate need to access data (e.g. system_keyspace)
but with the ::database accessor removed from query_processor.
We'll need to find another way to supply system_keyspace with
data access.
- to gain access to the wasm engine for testing whether used
defined functions compile. We'll have to find another way to
do this as well.
The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.
Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
prepare_term now takes an expression and returns a prepared expression.
It should be renamed to prepare_expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
prepare_term is now the only function that uses terms.
Change it so that it returns expression instead of term
and remove all occurences of expr::to_expression(prepare_term(...))
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Adds a new function - expr::fill_prepare_context.
This function has the same functionality as term::fill_prepare_context, which will be removed soon.
fill_prepare_context used to take its argument with a const qualifier, but it turns out that the argume>
It sets the cache ids of function calls corresponding to partition key restrictions.
New function doesn't have const to make this clear and avoid surprises.
Added expr::visit that takes an argument without const qualifier.
There were some problems with cache_ids in function_call.
prepare_context used to collect ::shared_ptr<functions::function_call>
of some function call, and then this allowed it to clear
cache ids of all involved functions on demand.
To replicate this prepare_context now collects
shared pointers to expr::function_call cache ids.
It currently collects both, but functions::function_call will be removed soon.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
function_call can be evaluated now.
The code matches the one from functions::function_call::bind.
I needed to add cache id to function_call in order for it ot work properly.
See the blurb in struct function_call for more information.
New code corresponds to bind() in cql3/functions/functions.cc.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a method that converts given term to the matching expression.
It will be used as an intermediate step when implementing evaluate(expression).
evaluate(term) will convert the term to the expression and then call evaluate(expression).
For terminals this is simply calling get() to serialize the value.
For non-terminals the implementation is more complicated and will be implemeted in the following commits.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
term::bind_and_get is not needed anymore, remove it.
Some classes use bind_and_get internally, those functions are left intact
and renamed to bind_and_get_internal.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in constants.hh.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This reverts commit e9343fd382, reversing
changes made to 27138b215b. It causes a
regression in v2 serialization_format support:
collection_serialization_with_protocol_v2_test fails with: marshaling error: read_simple_bytes - not enough bytes (requested 1627390306, got 3)
Fixes#9360
term::bind_and_get is not needed anymore, remove it.
Some classes use bind_and_get internally, those functions are left intact
and renamed to bind_and_get_internal.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Start using evaluate_to_raw_value instead of bind_and_get.
This is a step towards using only evaluate.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in constants.hh.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
column_specification_or_tuple was introduced since some terms
were prepared using a single receiver e.g. (receiver = <term>) and
some using multiple receivers (e.g. (r1, r2) = <term>. Some
term types supported both.
To hide this complexity, the term->expr conversion used a single
interface for both variations (column_expression_or_tuple), but now
that we got rid of the term class and there are no virtual functions
any more, we can just use two separate functions for the two variants.
Internally we still use column_expression_or_tuple, it can be
removed later.
Remove cql3::functions::function_call::raw and replace it with
cql3::expr::function_call, which already existed from the selector
migration to expressions. The virtual functions implementing term::raw
are made free functions and remain in place, to ease migration and
review.
Note that preparing becomes a more complicated as it needs to
account for anonymous functions, which were not representable
in the previous structure (and still cannot be created by the
parser for the term::raw path).
The parser now wraps all its arguments with the term::raw->expr
bridge, since that's what expr::function_call expects, and in
turn wraps the function call with an expr->term::raw bridge, since
that's what the rest of the parser expects. These will disappear
when the migration completes.
In order to replace the term::raw hierarchy with expressions,
we need to unify the signatures of term::raw::prepare() and
term::multi_column_raw::prepare(). This is because we'll only have
one expression type to represent both single values and tuples
(although, different subexpression types will may used).
The difference in the two prepare() signatures is the
`receiver` parameter - which is a (type, name) pair used
to perfom type inference on the expression being prepared,
with the name used to report errors. In a perfect world, this
would just be an expression - a tuple or a singular expression
as the case requires. But we don't have the needed expression
infrastructure yet - general tuples or name-annotated expressions.
Resolve the problem by introducing a variant for the single-value
and tuple. This is more or less creating a mini-expression type
used just for this. Once our expression type grows the needed
capabilities, it can replace this type.
Note that for some cases, this replaces compile-time checks by
runtime checks (which should never trigger). In other cases
the classes really needed both interfaces, so the new variant
is a better fit.
And reuse these values when handling `bounce_to_shard` messages.
Otherwise such a function (e.g. `uuid()`) can yield a different
value when a statement re-executed on the other shard.
It can lead to an infinite number of `bounce_to_shard` messages
sent in case the function value is used to calculate partition
key ranges for the query. Which, in turn, will cause crashes
since we don't support bouncing more than one time and the second
hop will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The class is repurposed to be more generic and also be able
to hold additional metadata related to function calls within
a CQL statement. Rename all methods appropriately.
Visitor functions in AST nodes (`collect_marker_specification`)
are also renamed to a more generic `fill_prepare_context`.
The name `prepare_context` designates that this metadata
structure is a byproduct of `stmt::raw::prepare()` call and
is needed only for "prepare" step of query execution.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>