Due to an overzealous assertion put in the code (in one of the last iterations, by the way!) it was impossible to create an aggregate which accepts multiple arguments. The behavior is now fixed, and a test case is provided for it.
Tests: unit(release)
Closes#9211
* github.com:scylladb/scylla:
cql-pytest: add test case for UDA with multiple args
cql3: fix aggregates with > 1 argument
Merged patch series by By Benny Halevy:
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).
Test: unit(dev, debug)
* tag 'deferred_action-noexcept-v1' of github.com:bhalevy/scylla:
everywhere: make deferred actions noexcept
cql3: prepare_context: mark methods noexcept
commitlog: segment, segment_manager: mark methods noexcept
everywhere: cleanup defer.hh includes
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).
Test: unit(dev, debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
loading_shared_values/loading_cache'es iterators interface is dangerous/fragile because
iterator doesn't "lock" the entry it points to and if there is a
preemption point between aquiring non-end() iterator and its
dereferencing the corresponding cache entry may had already got evicted (for
whatever reason, e.g. cache size constraints or expiration) and then
dereferencing may end up in a use-after-free and we don't have any
protection against it in the value_extractor_fn today.
And this is in addition to #8920.
So, instead of trying to fix the iterator interface this patch kills two
birds in a single shot: we are ditching the iterators interface
completely and return value_ptr from find(...) instead - the same one we
are returning from loading_cache::get_ptr(...) asyncronous APIs.
A similar rework is done to a loading_shared_values loading_cache is
based on: we drop iterators interface and return
loading_shared_values::entry_ptr from find(...) instead.
loading_cache::value_ptr already takes care of "lock"ing the returned value so that it
would relain readable even if it's evicted from the cache by the time
one tries to read it. And of course it also takes care of updating the
last read time stamp and moving the corresponding item to the top of the
MRU list.
Fixes#8920
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20210817222404.3097708-1-vladz@scylladb.com>
Function call selectors correctly checked if their arguments
are required to run in threaded context, but forgot to check
the function itself - which is now done.
A user-defined aggregate is represented as an aggregate
which calls its state function on each input row
and then finalizes its execution by calling its final function
on the final state, after all rows were already processed.
What should the following pair of statements do?
CREATE INDEX xyz ON tbl(a)
CREATE INDEX IF NOT EXISTS xyz ON tbl(b)
There are two reasonable choices:
1. An index with the name xyz already exists, so the second command should
do nothing, because of the "IF NOT EXISTS".
2. The index on tbl(b) does *not* yet exist, so the command should try to
create it. And when it can't (because the name xyz is already taken),
it should produce an error message.
Currently, Cassandra went with choice 1, and Scylla went with choice 2.
After some discussions on the mailing list, we agreed that Scylla's
choice is the better one and Cassandra's choice could be considered a
bug: The "IF NOT EXIST" feature is meant to allow idempotent creation of
an index - and not to make it easy to make mistakes without not noticing.
The second command listed above is most likely a mistake by the user,
not anything intentional: The command intended to ensure than an index
on column b exists, but after the silent success of the command, no such
index exists.
So this patch doesn't change any Scylla code (it just adds a comment),
and rather it adds a test which "enshrines" the current behavior.
The test passes on Scylla and fails on Cassandra so we tag it
"cassandra_bug", meaning that we consider this difference to be
intentional and we consider Cassandra's behavior in this case to be wrong.
Fixes#9182.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210811113906.2105644-1-nyh@scylladb.com>
This trial patch set moves compaction_strategy.hh and compaction_garbage_collector.hh to compaction directory and drops two unused compact_for_mutation_query_state and compact_for_data_query_state.
Closes#9156
* github.com:scylladb/scylla:
compaction: Move compaction_garbage_collector.hh to compaction dir
compaction: Move compaction_strategy.hh to compaction dir
mutation_compactor: Drop compact_for_mutation_query_state and compact_for_data_query_state
Calculating clustering ranges on a local index has been rewritten to use the new `expression` variant.
This allows us to finally remove the old `bounds_ranges` function.
Closes#9080
* github.com:scylladb/scylla:
cql3: Remove unused functions like bounds_ranges
cql3: Use expressions to calculate the local-index clustering ranges
statement_restrictions_test: tests for extracting column restrictions
expression: add a function to extract restrictions for a column
Finding clustering ranges has been rewritten to use the new
expression variant.
Old bounds_ranges() and other similar ones are no longer needed.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Removes old code used to calculate local-index clustering range
and replaces it with new based on the expression variant.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function, which given an expression and a column,
extracts all restrictions involving this column.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When a WHERE clause contains a multi-column restriction and an indexed
regular column, we must filter the results. It is generally not
possible to craft the index-table query so it fetches only the
matching rows, because that table's clustering key doesn't match up
with the column tuple.
Fixes#9085.
Tests: unit (dev, debug)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#9122
"
Previously, the following functions were
incorrectly marked as pure, meaning that the
function is executed at "prepare" step:
* `currenttimestamp()`
* `currenttime()`
* `currentdate()`
* `currenttimeuuid()`
For functions that possibly depend on timing and random seed,
this is clearly a bug. Cassandra doesn't have a notion of pure
functions, so they are lazily evaluated.
Make Scylla to match Cassandra behavior for these functions.
Add a unit-test for a fix (excluding `currentdate()` function,
because there is no way to use synthetic clock with query
processor and sleeping for a whole day to demonstrate correct
behavior is clearly not an option).
Also, extend the cql-pytest for #8604 since there are now more
non-deterministic CQL functions, they are all subject to the test
now.
Fixes: #8816
"
* 'timeuuid_function_pure_annotation_v3' of https://github.com/ManManson/scylla:
test: test_non_deterministic_functions: test more non-pure functions
cql3: change `current*()` CQL functions to be non-pure
These include the following:
* `currenttimestamp()`
* `currenttime()`
* `currentdate()`
* `currenttimeuuid()`
Previously, they were incorrectly marked as pure, meaning
that the function is executed at "prepare" step.
For functions that possibly depend on timing and random seed,
this is clearly a bug. Cassandra doesn't have a notion of pure
functions, so they are lazily evaluated.
Make Scylla to match Cassandra behavior for these functions.
Add a unit-test for a fix (excluding `currentdate()` function,
because there is no way to use synthetic clock with query
processor and sleeping for a whole day to demonstrate correct
behavior is clearly not an option).
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The code for checking that an MV's select statement doesn't
have any bind markers uses the wrong method and always returns
`false` even when it should not.
`prepare_context::empty()` is a misleading name because
it doesn't check if the current instance is empty, but creates
an empty instance wrapped in a `lw_shared_ptr` instead.
Thus, the code in `create_view_statement::announce_migration()`
checks that the pointer is not empty, which is always false.
Use `get_variable_specifications().empty()` to check that the
specifications vector inside the `prepare_context`
instance is not empty.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
There's no point in copying the `_specs` vector by value in such
case, just return a const reference. All existing uses create
a copy either way.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
"
`function_call` AST nodes are created for each function
with side effects in a CQL query, i.e. non-deterministic
functions (`uuid()`, `now()` and some others timeuuid-related).
These nodes are evaluated either when a query itself is executed
or query restrictions are computed (e.g. partition/clustering
key ranges for LWT requests).
We need to cache the calls since otherwise when handling a
`bounce_to_shard` request for an LWT query, we can possibly
enter an infinite bouncing loop (in case a function is used
to calculate partition key ranges for a query), since the
results can be different each time.
Furthermore, we don't support bouncing more than one time.
Returning `bounce_to_shard` message more than one time
will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Function calls are indexed by the order in which they appear
within a statement while parsing. There is no need to
include any kind of statement identifier to the cache key
since `query_options` (which holds the cache) is limited
to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Add a test written in `cql-pytest` framework to verify
that both prepared and unprepared lwt statements handle
`bounce_to_shard` messages correctly in such scenario.
Fixes: #8604
Tests: unit(dev, debug)
NOTE: the patchset uses `query_options` as a container for
cached values. This doesn't look clean and `service::query_state`
seems to be a better place to store them. But it's not
forwarded to most of the CQL code and would mean that a huge number
of places would have to be amended.
The series presents a trade-off to avoid forwarding `query_state`
everywhere (but maybe it's the thing that needs to be done, nonetheless).
"
* 'lwt_bounce_to_shard_cached_fn_v6' of https://github.com/ManManson/scylla:
cql-pytest: add a test for non-pure CQL functions
cql3: cache function calls evaluation for non-deterministic functions
cql3: rename `variable_specifications` to `prepare_context`
As preparation for converting term::raw an expression, make it
forward declarable so that we can have a term::raw that is an
expression, and an expression that is a term::raw, without driving
the compiler insane.
Closes#9101
hierarchy with expressions' from Avi Kivity
Currently, the grammar has two parallel hierarchies. One hierarchy is
used in the WHERE clause, and is based on a combination of `term`
and expressions. The other is used in the SELECT clause, and is
using the cql3::selection::selectable hierarchy. There is some overlap
between the hierarchies: both can name columns. Logically, however,
they overlap completely - in SQL anything you can select you can
filter on, and vice versa. So merging the two hierarchies is important if
we want to enrich CQL. This series does that, partially (see below),
converting the SELECT clause to expressions.
There is another hierarchy split: between the "raw", pre-prepare object
hierarchy, and post-prepare non-raw. This series limits itself to converting
the raw hierarchy and leaves the non-raw hierarchy alone.
An important design choice is not to have this raw/non-raw split in expressions.
Note that most of the hierarchy is completely parallel: addition is addition
both before prepare and after prepare (but see [1]). The main difference
is around identifiers - before preparation they are unresolved, and after
preparation they become `column_definition` objects. We resolve that by
having two separate types: `unresolved_identifier` for the pre-prepare phase,
and the existing `column_value` for post-prepare phase.
Alternative choices would be to keep a separate expression::raw variant, or
to template the expression variant on whether it is raw or not. I think it would
cause undue bloat and confusion.
Note the series introduces many on_internal_error() calls. This is because
there is not a lot of overlap in the hierarchies today; you can't have a cast in
the WHERE clause, for example. These on_internal_error() calls cannot be
triggered since the grammar does not yet allow such expressions to be
expressed. As we expand the grammar, they will have to be replaced with
working implementations.
Lastly, field selection is expressible in both hierarchies. This series does not yet
merge the two representations (`column_value.sub` vs `field_selection`), but it
should be easy to do so later.
[1] the `+` operator can also be translated to list concatenation, which we may
choose to represent by yet another type.
Test: unit(dev)
Closes#9087
* github.com:scylladb/scylla:
cql3: expression: update find_atom, count_if for function_call, cast, field_selection
cql3: expressions: fix printing of nested expressions
cql3: selection: replace selectable::raw with expression
cql3: expression: convert selectable::with_field_selection::raw to expression
cql3: expression: convert selectable::with_cast::raw to expression
cql3: expression: convert selectable::with_anonymous_function::raw to expression
cql3: expression: convert selectable::with_function_call::raw to expressions
cql3: selectable: make selectable::raw forward-declarable
cql3: expressions: convert writetime_or_ttl::raw to expression
cql3: expression: add convenience constructor from expression element to nested expression
utils: introduce variant_element.hh
cql3: expression: use nested_expression in binary_operator
cql3: expression: introduce nested_expression class
Convert column_identifier_raw's use as selectable to expressions
make column_identifier::raw forward declarable
cql3: introduce selectable::with_expression::raw
And reuse these values when handling `bounce_to_shard` messages.
Otherwise such a function (e.g. `uuid()`) can yield a different
value when a statement re-executed on the other shard.
It can lead to an infinite number of `bounce_to_shard` messages
sent in case the function value is used to calculate partition
key ranges for the query. Which, in turn, will cause crashes
since we don't support bouncing more than one time and the second
hop will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Convert all known tri-compares that return an int to return std::strong_ordering.
Returning an int is dangerous since the caller can treat it as a bool, and indeed
this series uncovered a minor bug (#9103).
Test: unit (dev)
Fixes#1449Closes#9106
* github.com:scylladb/scylla:
treewide: remove redundant "x <=> 0" compares
test: mutation_test: convert internal tri-compare to std::strong_ordering
utils: int_range: change to std::strong_ordering
test: change some internal comparators to std::strong_ordering
utils: big_decimal: change to std::strong_ordering
utils: fragment_range: change to std::strong_ordering
atomic_cell: change compare_atomic_cell_for_merge() to std::strong_ordering
types: drop scaffolding erected around lexicographical_tri_compare
sstables: keys: change to std::strong_ordering internally
bytes: compare_unsigned(): change to std::strong_ordering
uuid: change comparators to std::strong_ordering
types: convert abstract_type::compare and related to std::strong_ordering
types: reduce boilerplate when comparing empty value
serialized_tri_compare: change to std::strong_ordering
compound_compat: change to std::strong-ordering
types: change lexicographical_tri_compare, prefix_equality_tri_compare to std::strong_ordering
If x is of type std::strong_ordering, then "x <=> 0" is equivalent to
x. These no-ops were inserted during #1449 fixes, but are now unnecessary.
They have potential for harm, since they can hide an accidental of the
type of x to an arithmetic type, so remove them.
Ref #1449.
The combination of the new types and these functions cannot happen yet,
but as they are generic functions it is better to implement them in
case it becomes possible later.
Now that all selectable::raw subclasses have been converted to
cql3::selectable::with_expression::raw, the class structure is
just a wrapper around expressions. Peel it, converting the
virtual member functions to free functions, and replacing
object instances with expression or nested_expression as the
case allows.
Add a field_selection variant element to expression. Like function_call
and cast, the structure from which a field is selectewd cannot yet be
an expression, since not all seletable::raw:s are converted. This will
be done in a later pass. This is also why printing a field selection now
does not print the selected expression; this will also be corrected later.
Add a cast variant element to expression. Like function_call, the
argument being converted cannot yet be an expression, since not
all seletable::raw:s are converted. This will be done in a later
pass. This is also why printing a cast now does not print the
casted expression; this will also be corrected later.
Rather than creating a new variant element in expression, we extend
function_call to handle both named and anonymous functions, since
most of the processing is the same.
Add a function_call variant element to hold function calls. Note
that because not all selectables are yet converted, function call
arguments are still of type selectable::raw. They will be converted
to expressions later. This is also why printing a function now
does not print its arguments; this will also be corrected later.
As temporary scaffolding while we're converting selectable::raw
subclasses to expressions, we'll need expressions to refer to
selectable::raw (specifically, function call arguments, which will
end up as expressions as well). To avoid a #include loop, make
selectable::raw forward-declarable by moving it to namespace scope.
Create a new element in the expression variant, column_mutation_attribute,
signifying we're picking up an attribute of a column mutation (not a
column value!). We use an enum rather than a bool to choose between
writetime and ttl (the two mutation attributes) for increased
explicitness.
Although there can only be one type for the column we're operating
on (it must be an unresolved_identifer), we use a nested_expression.
This is because we'll later need to also support a column_value
as the column type after we prepare it. This is somewhat similar
to the address of operator in C, which syntactically takes any
expression but semantically operates only on lvalues.
It is convenient to initialize a nested_expression variable from
one of the types that compose the expression variant, but C++ doesn't
allow it. Add a constructor that does this. Use the new variant_element
concept to constrain the input to be one of the variant's elements.
The exression type cannot be a member of a struct that is an
element of the expression variant. This is because it would then
be required to contain itself. So introduce a nested_expression
type to indirectly hold an expression, but keep the value semantics
we expect from expressions: it is copyable and a copy has separate
identity and storage.
In fact binary_operator had to resort to this trick, so it's converted
to nested_expression in the next patch.
Introduce unresolved_identifer as an unprepared counterpart to column_value.
column_identifier_raw no longer inherits from selectable::raw, but
methods for now to reduce churn.
Otherwise we run into a #include loop when try to have an expression
with column_identifier::raw: expression.hh -> column_identifier.hh
-> selectable.hh -> expression.hh.
Prepare to migrate selectable::raw sub-classes to expressions by
creating a bridge betweet the two types. with_expression::raw
is a selectable::raw and implements all its methods (right now,
trivially), and its contents is an expression. The methods are
implemented using the usual visitor pattern.
The class is repurposed to be more generic and also be able
to hold additional metadata related to function calls within
a CQL statement. Rename all methods appropriately.
Visitor functions in AST nodes (`collect_marker_specification`)
are also renamed to a more generic `fill_prepare_context`.
The name `prepare_context` designates that this metadata
structure is a byproduct of `stmt::raw::prepare()` call and
is needed only for "prepare" step of query execution.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>