When executing internal queries, it is important that the developer
will decide if to cache the query internally or not since internal
queries are cached indefinitely. Also important is that the programmer
will be aware if caching is going to happen or not.
The code contained two "groups" of `query_processor::execute_internal`,
one group has caching by default and the other doesn't.
Here we add overloads to eliminate default values for caching behaviour,
forcing an explicit parameter for the caching values.
All the call sites were changed to reflect the original caching default
that was there.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
`execute_internal` has a parameter to indicate if caching a prepared
statement is needed for a specific call. However this parameter was a
boolean so it was easy to miss it's meaning in the various call sites.
This replaces the parameter type to a more verbose one so it is clear
from the call site what decision was made.
Slice restrictions on the "duration" type are not allowed, and also if
we have a collection, tuple or UDT of durations. We made an effort to
print helpful messages on the specific case encountered, such as "Slice
restrictions are not supported on UDTs containing duration".
But the if()s were reverse, meaning that a UDT - which is also a tuple -
will be reported as a tuple instead of UDT as we intended (and as Cassandra
reports it).
The wrong message was reproduced in the unit test translated from
Cassandra, select_test.py::testFilteringOnUdtContainingDurations
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220428071807.1769157-1-nyh@scylladb.com>
If we have the filter expression "WHERE m[?] = 2", the existing code
simply assumed that the subscript is an object of the right type.
However, while it should indeed be the right type (we already have code
that verifies that), there are two more options: It can also be a NULL,
or an UNSET_VALUE. Either of these cases causes the existing code to
dereference a non-object as an object, leading to bizarre errors (as
in issue #10361) or even crashes (as in issue #10399).
Cassandra returns a invalid request error in these cases: "Unsupported
unset map key for column m" or "Unsupported null map key for column m".
We decided to do things differently:
* For NULL, we consider m[NULL] to result in NULL - instead of an error.
This behavior is more consistent with other expressions that contain
null - for example NULL[2] and NULL<2 both result in NULL as well.
Moreover, if in the future we allow more complex expressions, such
as m[a] (where a is a column), we can find the subscript to be null
for some rows and non-null for other rows - and throwing an "invalid
query" in the middle of the filtering doesn't make sense.
* For UNSET_VALUE, we do consider this an error like Cassandra, and use
the same error message as Cassandra. However, the current implementation
checks for this error only when the expression is evaluated - not
before. It means that if the scan is empty before the filtering, the
error will not be reported and we'll silently return an empty result
set. We currently consider this ok, but we can also change this in the
future by binding the expression only once (today we do it on every
evaluation) and validating it once after this binding.
Fixes#10361Fixes#10399
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When we have an filter such as "WHERE m[2] = 3" (where m is a map
column), if a row had a null value for m, our expression evaluation
code incorrectly dereferences an unset optional, and continued
processing the result of this dereference which resulted in undefined
behavior - sometimes we were lucky enough to get "marshaling error"
but other times Scylla crashed.
The fix is trivial - just check before dereferencing the optional value
of the map. We return null in that case, which means that we consider
the result of null[2] to be null. I think this is a reasonable approach
and fits our overall approach of making null dominate expressions (e.g.,
the value of "null < 2" is also null).
The test test_filtering.py::test_filtering_null_map_with_subscript,
which used to frequently fail with marshaling errors or crashes, now
passes every time so its "xfail" mark is removed.
Fixes#10417
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This commit makes subscript an invalid argument to possible_lhs_values.
Previously this function simply ignored subscripts
and behaved as if it was called on the subscripted column
without a subscript.
This behaviour is unexpected and potentially
dangerous so it would be better to forbid
passing subscript to possible_lhs_values entirely.
Trying to handle subscript correctly is impossible
without refactoring the whole function.
The first argument is a column for which we would
like to know the possible values.
What are possible values of a subscripted column c where c[0] = 1?
All lists that have 1 on 0th position?
If we wanted to handle this nicely we would have to
change the arguments.
Such refectoring is best left until the time
when this functionality is actually needed,
right now it's hard to predict what interface
will be needed then.
Signed-off-by: cvybhu <jan.ciolek@scylladb.com>
Closes#10228
The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.
The option is only available if the whole cluster is aware
of it - guarded by a cluster feature.
Example of the table contents:
```
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;
keyspace_name | storage_options | storage_type
---------------+------------------------------------------------+--------------
ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} | S3
```
Makes final function and initial condition to be optional while
creating UDA. No final function means UDA returns final state
and defeult initial condition is `null`.
Fixes: #10324
As the name suggests, for UDFs defined as RETURNS NULL ON NULL
INPUT, we sometimes want to return nulls. However, currently
we do not return nulls. Instead, we fail on the null check in
init_arg_visitor. Fix by adding null handling before passing
arguments, same as in lua.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
Closes#10298
When a query contains IN restriction on its partition key,
it's currently not eligible for indexing. It was however
erroneously qualified as such, which lead to fetching incorrect
results. This commit fixes the issue by not allowing such queries
to undergo indexing, and comes with a regression test.
Fixes#10300Closes#10302
A user pointed out a misleading error message produced when
an indexed column is queried along with an IN relation
on the partition key. The message suggests that such queries are
not supported, but they are supported - just without indexing.
In particular, with ALLOW FILTERING, such queries are perfectly
fine.
Closes#10299
The error message incorrectly stated that the timeout value cannot
be longer than 24h, but it can - the actual restriction is that the
value cannot be expressed in units like days or months, which was done
in order to significantly simplify the parsing routines (and the fact
that timeouts counted in days are not expected to be common).
Fixes#10286Closes#10294
"
By way of having an implementation of `data_dictionary` and using that.
The schema loader only needs a database to parse cql3 statements, which
are all coordinator-side objects and hence been largely migrated to use
data dictionary instead.
A few hard-dependencies on replica:: objects were found and resolved:
* index::secondary_index_manager
* tombstone_gc
The former was migrated to use `data_dictionary::table` instead of
`replica::table`. This in turn requires disentangling
`replica::data_dictionary_impl` from `replica::database`, as currently
the former can only really be used by the latter.
What all of this achieves us is that we no longer have to instantiate a
`replica::database` object in `tools::load_schema()`. We want to use the
standard allocator in tools, which means they cannot use LSA memory at
all. Database on the other hand creates memtable and row-cache instances
so it had to go.
Refs: #9882
Tests: unit(dev, schema_loader_test:debug,
cql-pytest/test_tools.py:debug)
"
* 'tools-schema-loader-database-impl/v2' of https://github.com/denesb/scylla:
tools/schema_loader: use own data dictionary impl
tombstone_gc: switch to using data dictionary
index/secondary_index_manager: switch to using data dictionary
replica/table: add as_data_dictionary()
replica: disentangle data_dictionary_impl from database
replica: move data_dictionary_impl into own header
But only on the surface, the only internal function needing the database
(`needs_repair_before_gc()`) still gets a real database because the
replication factor cannot be obtained from the data dictionary
currently. Although this might not look like an improvement, it is
enough to avoid a `real_database()` call for tables that don't have
tombstone gc mode set to repair.
is_supported_by checks whether a given restriction
can be supported by some index.
Currently when a subscripted value, e.g `m[1]` is encountered,
we ignore the fact that there is a subscript and ask
whether an index can support the `m` itself.
This looks like unintentional behaviour leftover
from the times when column_value had a sub field,
which could be easily forgotten about.
Scylla doesn't support indexes on collection elements at all,
so simply returning false there seems like a good idea.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#10227
Commit 1c99ed6ced added tracing logs
about the index chosen for the query, but aggregate queries have
a separate code path, which wasn't taken into account.
After this patch, tracing for aggregate queries also includes
this additional information.
Closes#10195
The problem was incompatibility with cassandra, which accepts bool
as a string in `fromJson()` UDF. The difference between Cassandra and
Scylla now is Scylla accepts whitespaces around word in string,
Cassandra don't. Both are case insensitive.
Fixes: https://github.com/scylladb/scylla/issues/7915Closes#10134
* github.com:scylladb/scylla:
CQL3/pytest: Updating test_json
CQL3: fromJson accepts string as bool
It seams that batch prepared statements always return false for
depends_on_keyspace and depends_on_column_family, this in turn
renders the removal criteria from the cache to always be false
which result by the queries not being evicted.
Here we change the functions to return the true state meaning,
they will return true if any of the sub queries is dependant upon
the keyspace or column family.
In this fix we first make the API more coherent and then use this new API to implement
the batch statement's dependency test.
Fixes#10129
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Closes#10132
* github.com:scylladb/scylla:
prepared_statements: Invalidate batch statement too
cql3 statements: Change dependency test API to express better it's purpose
Currently a subscripted column is expressed using the struct `column_value`:
```c++
/// A column, optionally subscripted by a value (eg, c1 or c2['abc']).
struct column_value {
const column_definition* col;
std::optional<expression> sub; ///< If present, this LHS is col[sub], otherwise just col.
}
```
It would be better to have a generic AST node for expressing arbitrary subscripted values:
```c++
/// A subscripted value, eg list_colum[2], val[sub]
struct subscript {
expression val;
expression sub;
};
```
The `subscript` struct would allow us to express more, for example:
* subscripted `column_identifier`, not only `column_definition` (needed to get rid of `relation` class)
* nested subscripts: `col[1][2]`
Adding `subscript` to `expression` variant immediately would require to implement all `expr::visit` handlers immediately in the same commit, so I took a different approach. At first the struct is just there and visit handlers are implemented one by one in advance, then at the end `subscript` is added to the `expression`. This way all the new code can be neatly divided into commits and everything is still bisectable.
There were a few cases where the existing behaviour seemed to make little sense, but I didn't change it to keep the PR focused on refactoring. I left a `FIXME` comments there and I will submit separate patches to fix them.
Closes#10139
* github.com:scylladb/scylla:
cql3: expr: Remove sub from column_value
cql3: Create a subscript in single_column_relation
cql3: expr: Add subscript to expression
cql3: Handle subscript in multi_column_range_accumulator
cql3: Handle subscript in selectable_process_selection
cql3: expr: Handle subscript in test_assignment
cql3: expr: Handle subscript in prepare_expression
cql3: Handle subscript in prepare_selectable
cql3: expr: Handle subscript in extract_clustering_prefix_restrictions
cql3: expr: Handle subscript in extract_partition_range
cql3: expr: Handle subscript in fill_prepare_context
cql3: expr: Handle subscript in evaluate
cql3: expr: Handle subscript in extract_single_column_restrictions_for_column
cql3: expr: Handle subscript in search_and_replace
cql3: expr: Handle subscript in recurse_until
cql3: expr: Implement operator<< for subscript
cql3: expr: Handle subscript in possible_lhs_values
cql3: expr: Handle subscript in is_supported_by
cql3: expr: Handle subscript in is_satisifed_by
cql3: expr: Remove unused attribute
cql3: expr: Use column_maybe_subscripted in is_one_of()
cql3: expr: Use column_maybe_subscripted in limits()
cql3: expr: Use column_maybe_subscripted in equal()
cql3: expr: add get_subscripted_column(column_maybe_subscripted)
cql3: expr: Add as_column_maybe_subscripted
cql3: expr: Make get_value_comparator work with column_maybe_subscripted
cql3: expr: Make get_value work with column_maybe_subscripted
cql3: expr: Add column_maybe_subscripted
cql3: expr: Add get_subscripted_column
cql3: expr: Add subscript struct
The problem was incompatibility with cassandra, which accepts bool
as a string in `fromJson()` UDF. The difference between Cassandra and
Scylla now is Scylla accepts whitespaces around word in string,
Cassandra don't. Both are case insensitive.
Fixes: #7915
Passing integer which exceeds corresponding type's bounds to
`fromJson()` was causing silent overflow, e.g. inserting
`fromJson('2147483648')` to `int` coulmn stored `-2147483648`.
Now, this will cause marshal_exception. All integer types are testing agains their bounds.
Tests referring issue https://github.com/scylladb/scylla/issues/7914 in `test/cql-pytest/cassandra_tests/validation/entities/json_test.py` won't pass because the expected error's messages differ from the thrown ones. I was wondering what the message should be, because expected messages in tests aren't consistent, for instance:
- bigint overflow expects `Expected a bigint value, but got a` message
- short overflow expects `Unable to make short from` message
For now the message is `Value {} out of bound`.
Fixes: https://github.com/scylladb/scylla/issues/7914Closes#10145
* github.com:scylladb/scylla:
CQL3/pytest: Updating test_json
CQL3: fromJson out of range integer cause as error
Passing integer which exceeds corresponding type's bounds to
`fromJson()` was causing silent overflow, e.g. inserting
`fromJson('2147483648')` to `int` coulmn stored `-2147483648`.
Now, this will cause marshal_exception with value out of bound
message. Also, all integer types are testing agains their bounds.
Fixes: #7914
Recently, coordinator_result was introduced as an alternative for
exceptions. It was placed in the main "exceptions/exceptions.hh" header,
which virtually every single source file in Scylla includes.
But unfortunately, it brings in some heavy header files and templates,
leading to a lot of wasted build time - ClangBuildAnalyzer measured that
we include exceptions.hh in 323 source files, taking almost two seconds
each on average.
In this patch, we split the coordinator_result feature into a separate
header file, "exceptions/coordinator_result", and only the few places
which need it include the header file. Unfortunately, some of these
few places are themselves header, so the new header file ends up being
included in 100 source files - but 100 is still much less than 323 and
perhaps we can reduce this number 100 later.
After this patch, the total Scylla object-file size is reduced by 6.5%
(the object size is a proxy for build time, which I didn't directly
measure). ClangBuildAnalyzer reports that now each of the 323 includes
of exceptions.hh only takes 80ms, coordinator_result.hh is only included
100 times, and virtually all the cost to include it comes from Boost's
result.hh (400ms per inclusion).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>
column_value::sub has been replaced by the subscript struct
everywhere, so we can finally remove it.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When `val[sub]` is parsed, it used to be the case
that column_value with a sub field was created.
Now this has been changed to creating a subscript struct.
This is the only place where a subscripted value can be created.
All the code regarding subscripts now operates using only the
subscript struct, so we will be able to remove column_value::sub soon.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
All handlers for subscript have finally been implemented
and subscript can now be added to expression without
any trouble.
All the commented out code that waited for this moment
can now be uncommented.
Every such piece of code had a `TODO(subscript)` note
and by grepping this phrase we can make sure that
we didn't forget any of them.
Right now there is two ways to express a subscripted
column - either by a column_value with a sub field
or by using a subscript struct.
The grammar still uses the old column_value way,
but column_value.sub will be removed soon
and everything will move to the subscript struct.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
extract_clustering_prefix_restrictions collects restrictions
on clustering key columns.
In case we encounter col[sub] we treat it as a restriction on col
and add it to the result.
This seems to make some sense and is in line with the current behaviour
which doesn't check whether a column is subscripted at all.
The code has been copied from column_value& handler.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
extract_parition_range collects restrictions on partition key columns.
In case we encounter col[sub] we treat it as a restriction on col
and add it to the result.
This seems to make some sense and is in line with the current behaviour
which doesn't check whether a column is subscripted at all.
The code has been copied from column_value& handler.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
fill_prepare_context collects useful information about
the expression involved in query restrictions.
We should collect this information from subscript as well,
just like we do from column_value and its sub.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
extract_single_column_restrictions_for_column finds all restrictions
for a column and puts them in a vector.
In case we encounter col[sub] we treat it as a restriction on col
and add it to the result.
This seems to make some sense and is in line with the current behaviour
which doesn't check whether a column is subscripted at all.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Prepare a handler for subscript in search_and_replace.
Some of the code must be commented out for now
because subscript hasn't been added to expression yet.
It will uncommented later.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
possible_lhs_values returns set of possible values
for a column given some restrictions.
Current behaviour in case of a subscripted column
is to just ignore the subscript and treat
the restriction as if it were on just the column.
This seems wrong, or at least confusing,
but I won't change it in this patch to preserve the existing behaviour.
Trying to change this to something more reasonable
breaks other code which assumes that possible_lhs_values
returns a list of values.
(See partition_ranges_from_EQs() in cql3/restrictions/statement_restrictions.cc)
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
is_supported_by checks whether the given expression
is supported by some index.
The current behaviour seems wrong, but I kept
it to avoid making changes in a refactor PR.
Scylla doesn't have indexes on map entries yet,
so for a subscript the answer is always no.
I think we should just return false there.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
For the most part subscript can be handled
in the same way as column_value.
column_value has a sub argument and all
called functions evaluate lhs value using
get_value() which is prepared to handle
subscripted columns.
These functions now take column_maybe_subscripted
so we can pass &subscript to them without a problem.
The difference is in CONTAINS, CONTAINS_KEY and LIKE.
contains() and contains_key() throw an exception
when the passed column has a subscript, so now
we just throw an exception immediately.
like() doesn't have a check for subscripted value,
but from reading its code it's clear that
it's not ready to handle such values,
so an exception is now thrown as well.
It shouldn't break any tests because when one tries
to perform a query like:
`select * from t where m[0] like '%' allow filtering;`
an exception is throw somewhere earlier in the code.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Functions that were previously marked as unused to make the code
compile are now used and we can remove the markings.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
is_one_of() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
limits() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
equal() used to take column_value which could be subscripted as an argument.
column_value.sub will be removed so this function needs to take column_maybe_subscripted now.
To get lhs value the code uses get_value() which is ready to handle subscripted columns.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add a function that extracts the column_value
from column_maybe_subscripted.
There were already overloads for expression and subscript,
but this one will be needed as well.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>