When `val[sub]` is parsed, it used to be the case
that column_value with a sub field was created.
Now this has been changed to creating a subscript struct.
This is the only place where a subscripted value can be created.
All the code regarding subscripts now operates using only the
subscript struct, so we will be able to remove column_value::sub soon.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.
data_dictionary::database::real_database() is called from several
places, for these reasons:
- calling yet-to-be-converted code
- callers with a legitimate need to access data (e.g. system_keyspace)
but with the ::database accessor removed from query_processor.
We'll need to find another way to supply system_keyspace with
data access.
- to gain access to the wasm engine for testing whether used
defined functions compile. We'll have to find another way to
do this as well.
The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.
Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
This PR finally removes the `term` class and replaces it with `expression`.
* There was some trouble with `lwt_cache_id` in `expr::function_call`.
The current code works the following way:
* for each `function_call` inside a `term` that describes a pk restriction, `prepare_context::add_pk_function_call` is called.
* `add_pk_function_call` takes a `::shared_ptr<cql3::functions::function_call>`, sets its `cache_id` and pushes this shared pointer onto a vector of all collected function calls
* Later when some condiition is met we want to clear cache ids of all those collected function calls. To do this we iterate through shared pointers collected in `prepare_context` and clear cache id for each of them.
This doesn't work with `expr::function_call` because it isn't kept inside a shared pointer.
To solve this I put the `lwt_cache_id` inside a shared pointer and then `prepare_context` collects these shared pointers to cache ids.
I also experimented with doing this without any shared pointers, maybe we could just walk through the expression and clear the cache ids ourselves. But the problem is that expressions are copied all the time, we could clear the cache in one place, but forget about a copy. Doing it using shared pointers more closely matches the original behaviour.
The experiment is on the [term2-pr3-backup-altcache](https://github.com/cvybhu/scylla/tree/term2-pr3-backup-altcache) branch
* `shared_ptr<term>` being `nullptr` could mean:
* It represents a cql value `null`
* That there is no value, like `std::nullopt` (for example in `attributes.hh`)
* That it's a mistake, it shouldn't be possible
A good way to distinguish between optional and mistake is to look for `my_term->bind_and_get()`, we then know that it's not an optional value.
* On the other hand `raw_value` cased to bool means:
* `false` - null or unset
* `true` - some value, maybe empty
I ran a simple benchmark on my laptop to see how performance is affected:
```
build/release/test/perf/perf_simple_query --smp 1 -m 1G --operations-per-shard 1000000 --task-quota-ms 10
```
* On master (a21b1fbb2f) I get:
```
176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op)
median 176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op)
median absolute deviation: 0.00
maximum: 176506.60
minimum: 176506.60
```
* On this branch I get:
```
172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op)
median 172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op)
median absolute deviation: 0.00
maximum: 172225.30
minimum: 172225.30
```
Closes#9481
* github.com:scylladb/scylla:
cql3: Remove remaining mentions of term
cql3: Remove term
cql3: Rename prepare_term to prepare_expression
cql3: Make prepare_term return an expression instead of term
cql3: expr: Add size check to evaluate_set
cql3: expr: Add expr::contains_bind_marker
cql3: expr: Rename find_atom to find_binop
cql3: expr: Add find_in_expression
cql3: Remove term in operations
cql3: Remove term in relations
cql3: Remove term in multi_column_restrictions
cql3: Remove term in term_slice, rename to bounds_slice
cql3: expr: Remove term in expression
cql3: expr: Add evaluate_IN_list(expression, options)
cql3: Remove term in column_condition
cql3: Remove term in select_statement
cql3: Remove term in update_statement
cql3: Use internal cql format in insert_prepared_json_statement cache
types: Add map_type_impl::serialize(range of <bytes, bytes>)
cql3: Remove term in cql3/attributes
cql3: expr: Add constant::view() method
cql3: expr: Implement fill_prepare_context(expression)
cql3: expr: add expr::visit that takes a mutable expression
cql3: expr: Add receiver to expr::bind_variable
prepare_term now takes an expression and returns a prepared expression.
It should be renamed to prepare_expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
prepare_term is now the only function that uses terms.
Change it so that it returns expression instead of term
and remove all occurences of expr::to_expression(prepare_term(...))
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Some struct inside the expression variant still contained term.
Replace those terms with expression.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
sprint() is obsolete. Note some calls where to helper functions that
use sprint(), not to sprint() directly, so both the helpers and
the callers were modified.
We have a few cases where a column_definition* is converted
directly to an expression without an explicit call to column_value{}.
The new expression implementation will not allow this, so make
these cases explicit. IMO this is better form than to rely
on the compiler picking the right expression subtype.
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in lists.hh.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This reverts commit e9343fd382, reversing
changes made to 27138b215b. It causes a
regression in v2 serialization_format support:
collection_serialization_with_protocol_v2_test fails with: marshaling error: read_simple_bytes - not enough bytes (requested 1627390306, got 3)
Fixes#9360
To convert a terminal to expr::constant we need know the value type.
Implement getting value type for terminals in lists.hh.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Change term::raw in single_column_relation to expressions. Because a single
raw class is used to represent multiple shapes (IN ? and IN (x, y, z)),
some of the expressions are optional, corresponding to nullables before the
conversion.
to_term() is not converted, since it's part of the larger relation
hierarchy.
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).
Test: unit(dev, debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And reuse these values when handling `bounce_to_shard` messages.
Otherwise such a function (e.g. `uuid()`) can yield a different
value when a statement re-executed on the other shard.
It can lead to an infinite number of `bounce_to_shard` messages
sent in case the function value is used to calculate partition
key ranges for the query. Which, in turn, will cause crashes
since we don't support bouncing more than one time and the second
hop will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The class is repurposed to be more generic and also be able
to hold additional metadata related to function calls within
a CQL statement. Rename all methods appropriately.
Visitor functions in AST nodes (`collect_marker_specification`)
are also renamed to a more generic `fill_prepare_context`.
The name `prepare_context` designates that this metadata
structure is a byproduct of `stmt::raw::prepare()` call and
is needed only for "prepare" step of query execution.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Replace operator_type with the nicer-behaved oper_t in CQL parser and,
consequently, in the relation hierarchy and column_condition.
After this, no references to operator_type remain in live code.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
operator_type is awkward because it's not copyable or assignable.
Replace it in expression representation with a new enum class, oper_t.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Move the classes representing CQL expressions (and utility functions
on them) from the `restrictions` namespace to a new namespace `expr`.
Most of the restriction.hh content was moved verbatim to
expression.hh. Similarly, all expression-related code was moved from
statement_restrictions.cc verbatim to expression.cc.
As suggested in #5763 feedback
https://github.com/scylladb/scylla/pull/5763#discussion_r443210498
Tests: dev (unit)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Existing AST assumes the single-column expression is a special case of
multi-column expressions, so it cannot distinguish `c=(0)` from
`(c)=(0)`. This leads to incorrect behaviour and dtest failures. Fix
it by separating the two cases explicitly in the AST representation.
Modify AST-creation code to create different AST for single- and
multi-column expressions.
Modify AST-consuming code to handle column_name separately from
vector<column_name>. Drop code relying on cardinality testing to
distinguisn single-column cases.
Add a new unit test for `c=(0)`.
Fixes#6825.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Delete unused parts of the old restrictions representation:
- drop all methods, members, and types from class restriction, but
keep the class itself: it's the return type of
relation::to_restriction, which we're keeping intact for now
- drop all subclasses of single_column_restriction and
token_restriction, but keep multi_column_restriction subclasses for
their bounds_ranges method
Keep the restrictions (plural) class, because statement_restrictions
still keeps partition/clustering/other columns in separate
collections.
Move the restriction::merge_with method to primary_key_restrictions,
where it's still being used.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Add expression as a member of restriction. Create or update
expression everywhere restrictions are created or updated.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
* Pass raw::select_statement::parameters as lw_shared_ptr
* Some more const cleanups here and there
* lists,maps,sets::equals now accept const-ref to *_type_impl
instead of shared_ptr
* Remove unused `get_column_for_condition` from modification_statement.hh
* More methods now accept const-refs instead of shared_ptr
Every call site where a shared_ptr was required as an argument
has been inspected to be sure that no dangling references are
possible.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>
De-pointerize cql3 code APIs further: change some call sites
to pass `schema` as const-ref instead of `shared_ptr`.
Affected functions known to be expecting always non-null
pointer to schema and don't store or pass the pointer somewhere
else, assuming it's safe to give them just a reference.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200218142338.69824-1-pa.solodovnikov@scylladb.com>
`parsed_statement::get_bound_variables` is assumed to always
return a nonnull pointer to `variable_specifications` instance.
In this case using a pointer is superfluous and can be safely
replaced by a plain reference.
Also add a default ctor and a utility method `set_bound_variables`
to the `variable_specifications` class to actually reset the
contents of the class instance.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200120195839.164296-1-pa.solodovnikov@scylladb.com>
Instances of `variable_specifications` are passed around as
shared_ptr's, which are redundant in this case since the class
is marked as `final`. Use `lw_shared_ptr` instead since we know
for sure it's not a polymorphic pointer.
Tests: unit(debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191225232853.45395-1-pa.solodovnikov@scylladb.com>
This restriction leverages like_matcher to perform filtering.
Make single_column_relation::new_LIKE_restriction() return this new
restriction.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Add a new type of relation with operator LIKE. Handle it in
relation::to_restriction by introducing a new virtual method for it.
The temporary implementation of this method returns null; that will be
replaced in a subsequent patch.
Add abstract_type::is_string() to recognize string columns and
disallow LIKE operator on non-string columns.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
There are several places were IN restrictions are not currently supported,
especially in queries involving a secondary index. However, when the IN
restriction has just a single value, it is nothing more than an equality
restriction and can be converted into one and be supported. So this patch
does exactly this.
Note that Cassandra does this conversion since August 2016, and therefore
supports the special case of single-value IN even where general IN is not
supported. So it's important for Cassandra compatibility that we do this
conversion too.
This patch also includes a test with two queries involving a secondary
index that were previously disallowed because of the "IN" on the primary
key or the indexed column - and are now allowed when the IN restriction
has just a single value. A third query tested is not related to secondary
indexes, but confirms we don't break multi-column single-value IN queries.
Fixes#4455.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190428160317.23328-1-nyh@scylladb.com>
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
The constraint is no longer relevant, since Casandra removed
it in version 2.2. In addition the mechanism for handling this
case is already implemented and is identical in case of
clustering keys with single column EQ,= and IN relations.
(Cartesian product of singular ranges).
A unit test for this test case was added.
Fixes#1735
Tests:
1. Unit Tests.
2. Manual testing with the case described in the issue.
3. dtest: ql_additional_tests.py:TestCQL.composite_row_key_test
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <83b43fdc1ca0e0cc287f66f11816fc71b8bd2925.1534430405.git.eliransin@scylladb.com>
This patch fixes several cases where it was disallowed to create
a materialized view with a filter ("where ..."), for no good reason.
After this patch, these cases will be allowed. Fixes#2367.
In ordinary SELECT queries, certain types of filtering which is known to
be deceptively inefficient is now allowed. For example, trying to query
a range of partition keys cannot be done without reading the entire
database (because the murmur3 tokenizer randomizes the order of partitions).
Restricting two partition key components also cannot be done without
reading excessive amount of the entire partition. So Scylla, following
Cassandra, chooses to disallow such SELECT queries, and give an error
message.
However, the same SELECT statements *should* be allowed when defining a
materialized view. In this case, the filter is just used to check an
individual row - not to search for one - so there is no performance
concern.
Unfortunately the existing code did these validations while building the
SELECT statement's "restrictions", in code shared by both uses of SELECT
(query and MV definition). It was easy to move one of the validations
to later code which runs after the restriction has already been built (and
knows if it is working for query or MV), but because of the way the
"restrictions" objects (translated from Cassandra 2's code) hide what they
contain, many of the checks are harder to perform after having built the
restrictions object. So instead, we add in strategic places in the
restriction-handling code a new "allow_filtering" flag. If restrictions
are built with allow_filtering=true, the extra performance-oriented tests
on the filtering restrictions is not done. Materialized views sets
allow_filtering=true.
The allow_filtering flag will also be useful later when we want to support
the "ALLOW FILTERING" query option which is currently not supported properly
(we have several open issues on that). However note that this patch doesn't
complete that support: I left a FIXME in the spot where we set
allow_filtering in the Materialized Views case, but in the futre also need
to set it if the user specified "ALLOWED FILTERING" in the query.
This patch also enables several unit tests written by Duarte which used to
fail because of this bug, and now pass. These tests verify that the
restrictions are now allowed and filter the view as desired; But I also
added test code to verify that the same restrictions are still forbidden,
as before, when used in ordinary SELECT queries.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180423124343.17591-1-nyh@scylladb.com>
This is a confusing one, and can be replaced the fact that dense
schemas have a single regular column.
Ref #1542
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
A compact column is a dense schema's single regular column. The fact
that it is a different column_kind has lead to various bugs (#1535,
derived by the schema being dense and the column being regular.
Fixes#1542
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Values inside IN () restrictions may be either in a vector _in_values or
a marker (_in_marker or _value). To determine which one is appropriate
we check whether _in_values is empty, which is wrong because IN clause
can be empty (and there is no marker in such case). This is fixed by
using the presence of a marker to determine whether a vector of values
or a marker should be used.
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>