This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionTest.java into our cql-pytest
framework.
This test file checks various LWT conditional updates which involve
collections or UDTs (there is a separate test file for LWT conditional
updates which do not involve collections, which I haven't translated
yet).
The tests reproduce one known bug:
Refs #5855: lwt: comparing NULL collection with empty value in IF
condition yields incorrect results
And also uncovered three previously-unknown bugs:
Refs #13586: Add support for CONTAINS and CONTAINS KEY in LWT expressions
Refs #13624: Add support for UDT subfields in LWT expression
Refs #13657: Misformatted printout of column name in LWT error message
Beyond those bona-fide bugs, this test also demonstrates several places
where we intentionally deviated from Cassandra's behavior, forcing me
to comment out several checks. These deviations are known, and intentional,
but some of them are undocumented and it's worth listing here the ones
re-discovered by this test:
1. On a successful conditional write, Cassandra returns just True, Scylla
also returns the old contents of the row. This difference is officially
documented in docs/kb/lwt-differences.rst.
2. Scylla allows the test "l = [null]" or "s = {null}" with this weird
null element (the result is false), whereas Cassandra prints an error.
3. Scylla allows "l[null]" or "m[null]" (resulting in null), Cassandra
prints an error.
4. Scylla allows a negative list index, "l[-2]", resulting in null.
Cassandra prints an error in this case.
5. Cassandra allows in "IF v IN (?, ?)" to bind individual values to
UNSET_VALUE and skips them, Scylla treats this as an error. Refs #13659.
6. Scylla allows "IN null" (the condition just fails), Cassandra prints
an error in this case.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13663
Let's remove expr::token and replace all of its functionality with expr::function_call.
expr::token is a struct whose job is to represent a partition key token.
The idea is that when the user types in `token(p1, p2) < 1234`,
this will be internally represented as an expression which uses
expr::token to represent the `token(p1, p2)` part.
The situation with expr::token is a bit complicated.
On one hand side it's supposed to represent the partition token,
but sometimes it's also assumed that it can represent a generic
call to the token() function, for example `token(1, 2, 3)` could
be a function_call, but it could also be expr::token.
The query planning code assumes that each occurence of expr::token
represents the partition token without checking the arguments.
Because of this allowing `token(1, 2, 3)` to be represented
as expr::token is dangerous - the query planning
might think that it is `token(p1, p2, p3)` and plan the query
based on this, which would be wrong.
Currently expr::token is created only in one specific case.
When the parser detects that the user typed in a restriction
which has a call to `token` on the LHS it generates expr::token.
In all other cases it generates an `expr::function_call`.
Even when the `function_call` represents a valid partition token,
it stays a `function_call`. During preparation there is no check
to see if a `function_call` to `token` could be turned into `expr::token`.
This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented
as `expr::token` and the query planner handles that, but sometimes it might
be represented as `function_call`, which the query planner doesn't handle.
There is also a problem because there's a lot of duplication
between a `function_call` and `expr::token`. All of the evaluation
and preparation is the same for `expr::token` as it's for a `function_call`
to the token function. Currently it's impossible to evaluate `expr::token`
and preparation has some flaws, but implementing it would basically
consist of copy-pasting the corresponding code from token `function_call`.
One more aspect is multi-table queries. With `expr::token` we turn
a call to the `token()` function into a struct that is schema-specific.
What happens when a single expression is used to make queries to multiple
tables? The schema is different, so something that is representad
as `expr::token` for one schema would be represented as `function_call`
in the context of a different schema.
Translating expressions to different tables would require careful
manipulation to convert `expr::token` to `function_call` and vice versa.
This could cause trouble for index queries.
Overall I think it would be best to remove expr::token.
Although having a clear marker for the partition token
is sometimes nice for query planning, in my opinion
the pros are outweighted by the cons.
I'm a big fan of having a single way to represent things,
having two separate representations of the same thing
without clear boundaries between them causes trouble.
Instead of having expr::token and function_call we can
just have the function_call and check if it represents
a partition token when needed.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
`os.P_NOWAIT` is supposed to be used in spawn calls, while `os.WNOHANG`
is used as in the options parameter passed to wait calls. fortunately,
`P_NOWAIT` is defined as "1" in CPython, and `os.WNOHANG` is defined
as "1" in linux kernel. that's why the existing implementation works.
but we should not rely on this coincidence. so, in this change,
`os.P_NOWAIT` is replaced with `os.WNOHANG` for correctness and for
better readability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13646
This patch contains tests reproducing issue #13601 and the corresponding
Cassandra issue CASSANDRA-18470. These issues are about what the AVG
aggregation does for arbitrary-precision "decimal" numbers - the tests
we add here show examples where the current behavior doesn't make sense:
The problem is that "decimal" has arbitrary precision - so, should an
average of 1/3 be returned as 0.3 or 0.33333333333333333? This is not
specified, so Scylla (and Cassandra) decided to pick the result precision
based on the input precision. In particular, the average of 1 and 2
is returned as 2 (zero digits after the decimal point, like in the
inputs) instead of the expected 1.5. Arguably this isn't useful behavior.
The test adds a second test which fails on Cassandra, but does pass
on Scylla: Cassandra returns as the average of 1, 2, 2, 3 the integer 1
whereas the correct average is 2 (and Scylla returns it correctly).
The reason why this bug is even worse on Cassandra is that Scylla's AVG
only loses precision when dividing the sum and count, but Cassandra
tries to maintain only the average, and loses precision at every step.
Refs #13601
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13603
This `with` context is supposed to disable, then re-enable
autocompaction for the given keyspaces, but it used the wrong API for
it, it used the column_family/autocompaction API, which operates on
column families, not keyspaces. This oversight led to a silent failure
because the code didn't check the result of the request.
Both are fixed in this patch:
* switch to use `storage_service/auto_compaction/{keyspace}` endpoint
* check the result of the API calls and report errors as exceptions
Fixes: #13553Closes#13568
When floating-point data contains +Inf and -Inf, the sum is NaN.
Our SUM() aggregation calculated this sum correctly, but then instead
of returning it, complained that the sum overflowed by narrowing.
This was a false positive: The sum() finalizer wanted to test that no
precision was lost when casting the accumulator to the result type,
so checked that the result before and after the cast are the same.
But specifically for NaN, it is never equal to anything - not even
to itself. This check is wrong for floating point, but moreover -
isn't even necessary when the two types (accumulator type and result
type) are identical so in this patch we skip it in this case.
Note that in the current code, a different accumulator and result type
is only used in the case of integer types; When accumulating floating
point sums, the same type is used, so the broken check will be avoided.
The test for this issue starts to pass with this patch, so the xfail
tag is removed.
Fixes#13551
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds tests to reproduce issue #13551. The issue, discovered
by a dtest (cql_cast_test.py), claimed that either cast() or sum(cast())
from varint type broke. So we add two tests in cql-pytest:
1. A new test file, test_cast_data.py, for testing data casts (a
CAST (...) as ... in a SELECT), starting with testing casts from
varint to other types.
The test uncovers a lot of interesting cases (it is heavily
commented to explain these cases) but nothing there is wrong
and all tests pass on Scylla.
2. An xfailing test for sum() aggregate of +Inf and -Inf. It turns out
that this caused #13551. In Cassandra and older Scylla, the sum
returned a NaN. In Scylla today, it generates a misleading
error message.
As usual, the tests were run on both Cassandra (4.1.1) and Scylla.
Refs #13551.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
scylla-sstable currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like qurantine, staging etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13448
* github.com:scylladb/scylladb:
test/cql-pytest: test_tools.py: add tests for schema loading
test/cql-pytest: add no_autocompaction_context
docs: scylla-sstable.rst: remove accidentally added copy-pasta
docs: scylla-sstable.rst: remove paragraph with schema limitations
docs: scylla-sstable.rst: update schema section
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema
before this change, we just print out the addresses of the elements
in `column_defs`, if the arguments passed to `token()` function are
not valid. this is not quite helpful from the user's perspective. as
user would be more interested in the values. also, we could print
more accurate error message for different error.
in this change, following Cassandra 4.1's behavior, three cases are
identified, and corresponding errors are returned respectively:
* duplicated partition keys
* wrong order of partition key
* missing keys
where, if the partition key order is wrong, instead of printing the
keys specified by user, the correct order is printed in the error
message for helping user to correct the `token()` function.
for better performance, the checks are performed only if the keys
do not match, based on the assumption that the error handling path
is not likely to be executed.
tests are added accordingly. they tested with Canssandra 4.1.1 also.
Fixes#13468
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13470
Description of storage options is important for S3, as one
needs to know if underlying storage is either local or
remote, and if the latter, details about it.
This relies on server-side desc statement.
$ ./bin/cqlsh.py -e "describe keyspace1;"
CREATE KEYSPACE keyspace1 WITH replication = { ... } AND
storage = {'type': 'S3', 'bucket': 'sstables',
'endpoint': '127.0.0.1:9000'} AND
durable_writes = true;
Fixes#13507.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13510
A set of comprehensive tests covering all the supported ways of providing
the schema to scylla-sstable, either explicitely or implicitely
(auto-detect).
It would have been better if `flush()` could have been called with a
keyspace and optional table param, but changing it now is too much
churn, so we add a dedicated method to flush a keyspace instead.
This is a translation of Cassandra's CQL unit test source file
validation/operations/DeleteTest.java into our cql-pytest framework.
There are 51 tests, and they did not reproduce any previously-unknown
bug, but did provide additional reproducers for three known issues:
Refs #4244 Add support for mixing token, multi- and single-column
restrictions
Refs #12474 DELETE prints misleading error message suggesting ALLOW
FILTERING would work
Refs #13250 one-element multi-column restriction should be handled like
a single-column restriction
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13436
The facilities in run.py script allow launching scylla over temporary
directory, waiting for it to come alive, killing, etc. The limitation of
those is that the work-dir create for scylla is tighly coupled with its
pid. The object-storage test in next patches will need to check that the
sstables are preserved on scylla restart and this hard binding of
workdir to pid won't work.
This patch generalizes the scylla run/abort helpers to accept an
external directory to work on and adds a call to restart scylla process
over existing directory.
And one small related change here -- log file is opened in O_APPEND mode
so that restarted scylla process continues writing into the old file.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This reverts commit 32fff17e19, reversing
changes made to 164afe14ad.
This series proved to be problematic, the new test introduced by it
failing quite often. Revert it until the problems are tracked down and
fixed.
`scylla-sstable` currently has two ways to obtain the schema:
* via a `schema.cql` file.
* load schema definition from memory (only works for system tables).
This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable *is* inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file.
This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override.
If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong.
A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes.
This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change.
Example:
```
$ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db
{"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}}
```
As seen above, subdirectories like `qurantine`, `staging` etc are also supported.
Fixes: https://github.com/scylladb/scylladb/issues/10126Closes#13075
* github.com:scylladb/scylladb:
docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section
test/cql-pytest: test_tools.py: add test for schema loading
test/cql-pytest: nodetool.py: add flush_keyspace()
tools/scylla-sstable: reform schema loading mechanism
tools/schema_loader: add load_schema_from_schema_tables()
db/schema_tables: expose types schema
before this change, we use `round(random.random(), 5)` for
the value of `bloom_filter_fp_chance` config option. there are
chances that this expression could return a number lower or equal
to 6.71e-05.
but we do have a minimal for this option, which is defined by
`utils::bloom_calculations::probs`. and the minimal false positive
rate is 6.71e-05.
we are observing test failures where the we are using 0 for
the option, and scylla right rejected it with the error message of
```
bloom_filter_fp_chance must be larger than 6.71e-05 and less than or equal to 1.0 (got 0)
```.
so, in this change, to address the test failure, we always use a number
slightly greater or equal to a number slightly greater to the minimum to
ensure that the randomly picked number is in the range of supported
false positive rate.
Fixes#13313
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13314
It would have been better if `flush()` could have been called with a
keyspace and optional table param, but changing it now is too much
churn, so we add a dedicated method to flush a keyspace instead.
This patch increases the connection timeout in the get_cql_cluster()
function in test/cql-pytest/run.py. This function is used to test
that Scylla came up, and also test/alternator/run uses it to set
up the authentication - which can only be done through CQL.
The Python driver has 2-second and 5-second default timeouts that should
have been more than enough for everybody (TM), but in #13239 we saw
that in one case it apparently wasn't enough. So to be extra safe,
let's increase the default connection-related timeouts to 60 seconds.
Note this change only affects the Scylla *boot* in the test/*/run
scripts, and it does not affect the actual tests - those have different
code to connect to Scylla (see cql_session() in test/cql-pytest/util.py),
and we already increased the timeouts there in #11289.
Fixes#13239
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13291
Cassandra detects when a batch has both an IF EXISTS and IF NOT EXISTS
on the same row, and complains this is not a useful request (after all,
it can never succeed, because the batch can only succeed if both conditions
are true, and that can't be if one checks IF EXISTS and the other
IF NOT EXISTS).
This patch adds a test, test_lwt_with_batch_conflict_1, which checks
that this case results in an error. It passes on Cassandra, but xfails
on Scylla which doesn't report an error in this case.
A second test, test_lwt_with_batch_conflict_2, shows that the detection
of the EXISTS / NOT EXISTS conflict is special, and other conflicts
such as having both "r=1" and "r=2" for the same row, are NOT detected
by Cassandra.
Refs #13011.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13270
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectMultiColumnRelationTest.java into our
cql-pytest framework.
The tests reproduce four already-known Scylla bugs and three new bugs.
All tests pass on Cassandra. Because of these bugs 9 of the 22 tests
are marked xfail, and one is marked skip (it crashes Scylla).
Already known issues:
Refs #64: CQL Multi column restrictions are allowed only on a clustering
key prefix
Refs #4178: Not covered corner case for key prefix optimization in filtering
Refs #4244: Add support for mixing token, multi- and single-column
restrictions
Refs #8627: Cleanly reject updates with indexed values where value > 64k
New issue discovered by these tests:
Refs #13217: Internal server error when null is used in multi-column relation
Refs #13241: Multi-column IN restriction with tuples of different lengths
crashes Scylla
Refs #13250: One-element multi-column restriction should be handled like a
single-column restriction
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13265
It turns out that Cassandra handles a restriction like `(c2) = (1)` just
like `c2 = 1`, and is not limited like multi-column restrictions. In
particular, this query works despite missing "c1", and may also use an
index if c2 is indexed.
But currently in Scylla, `(c2) = (1)` is handled like a multi-column
restriction, so complains if c2 is not the first clustering key column,
and cannot use an index.
This patch adds several tests demonstrating this difference between
Scylla and Cassandra (#13250). The xfailing tests pass on Cassandra
but fail on Scylla.
Refs #13250
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13252
This patch addes a reproducing test for issue #13241, where attempting
a SELECT restriction (b,c,d) IN ((1,2)) - where the tuple is shorter
than needed - crashes Scylla (on segmentation fault) instead of
generating a clean error as it should (and as done on Cassandra).
The test also demonstractes that if the tuple is longer than needed
(instead of shorter), the behavior is correct, and it is also
correct if "=" is used instead of IN. Only the combination of IN
and too-short tuple seems to be broken - but broken in a bad way
(can be used to crash Scylla).
Because the test crashes Scylla when fails, it is marked "skip".
Refs #13241
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13244
The translated Cassandra unit tests in cassandra_tests/validation/operations/
reproduced three bugs in GROUP BY's interaction with LIMIT and PER PARTITION
LIMIT - issue #5361, #5362 and #5363. Unfortunately, those test functions
are very long, and each test fails on all of these issues and a few more,
making it difficult to use these tests to verify when those tests have
been fixed. In other words, ideally a patch for issue 5361 should un-xfail
some reproducing test for this issue - but all the existing tests will
continue to fail after fixing 5361, because of other remaining bugs.
So in this patch, I created a new test file test_group_by.py with my own
tests for the GROUP BY feature. I tried to explore the different
capabilities of the GROUP BY feature, its different success and error
paths, and how GROUP BY interacts with LIMIT and PER PARTITION LIMIT.
As usual, I created many small test functions and not one huge test
function, and as a result we now have 5 xfailing tests which each
reproduces one bug and when the bug is fixed, it will start to pass.
All tests added here pass on Cassandra.
Refs #5361
Refs #5362
Refs #5363
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13136
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.
This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.
The tests reproduce six already-known Scylla issues and one new issue:
Already known issues:
Refs #2060: Allow mixing token and partition key restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra
in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
A new issue discovered by these tests:
Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13126
Adds two test cases which test what happens when we perform an LWT UPDATE, but the partition/clustering key has 0 possible values. This can happen e.g when a column is supposed to be equal to two different values (`c = 0 AND c = 1`).
Empty partition ranges work properly, empty clustering range currently causes a crash (#13129).
I added tests for both of these cases.
Closes#13130
* github.com:scylladb/scylladb:
cql-pytest/test_lwt: test LWT update with empty clustering range
cql-pytest/test_lwt: test LWT update with empty partition range
CQL supports type casting using C-style casts.
For example it's possible to do: `blob_column = (blob)funcReturningInt()`
This functionality is pretty limited, we only allow such casts between types that have a compatible binary representation. Compatible means that the bytes will stay unchanged after the conversion.
This means that it's legal to cast an int to blob (int is just a 4 byte blob), but it's illegal to cast a bigint to int (change 4 bytes -> 8 bytes).
This simplifies things, to cast we can just reinterpret the value as the other type.
Another use of C-style casts are type hints. Sometimes it's impossible to infer the exact type of an expression from the context. In such cases the type can be specified by casting the expression to this type.
For example: `overloadedFunction((int)?)`
Without the cast it would be impossible to guess what should be the bind marker's type. The function is overloaded, so there are many possible argument types. The type hint specifies that the bind marker has type int.
An interesting thing is that such casts don't have to be explicit. CQL allows to put an int value in a place where a blob value is expected and it will be automatically converted without any explicit casting.
---
I started looking at our implementation of casts because of #12900. In there the author expressed the need to specify a type hint for bind marker used to pass the WASM code. It could be either `(text)?` for text WASM, or `(blob)?` for binary WASM. This specific use of type hints wasn't supported because there was no `receiver` and the implementation of `prepare_expression` didn't handle that. Preparing casts without a receiver should be easy to implement - we can infer the type of the expression by looking at the type to which the expression is cast.
But while reading `prepare_expression` for `expr::cast` I noticed that the code there is a bit strange. The implementation prepared the expression to cast using the original `receiver` instead of a receiver with the cast type. This caused some issues because of which casting didn't work as expected.
For example it was possible to do:
```cql
blob_column = (blob)funcReturningInt()
```
But this didn't work at all:
```cql
blob_column = (blob)(int)12323
```
It tried to prepare `untyped_contant(12323)` with a `blob` receiver, which fails.
This makes `expr::cast` useless for casting. Casting when the representation is compatible is already implicit. I couldn't find a single case where adding a cast would change the behavior in any way.
There was some use for it as a type hint to choose a specific overload of a function, but it was worthless for casting.
Cassandra has the same issue, I created a `cql-pytest` test and it showed that we behave in the same way as Cassandra does.
I decided to improve this. By preparing the expression using a receiver with the cast type, `expr::cast` becomes actually useful for casting values. Things like `(blob)(int)12323` now work without any issues.
This diverges from the behavior in Cassandra, but it's an extension, not a breaking incompatibility.
---
This PR improves `prepare_expression` for `expr::cast` in the following ways:
1) Support for more complex casts by preparing the expression using a different receiver. This makes casts like `(blob)(int)123` possible
2) Support preparing `expr::cast` without a receiver. Type inference chooses the cast type as the type of the expression.
3) Add pytest tests for C-style casts
`2)` Is needed for #12900, the other changes is just something I decided to do since I was already working on this piece of code.
Closes#13053
* github.com:scylladb/scylladb:
expr_test: more tests for preparing bind variables with type hints
prepare_expr: implement preparing expr::cast with no receiver
prepare_expr: use :user formatting in cast_prepare_expression
prepare_expr: remove std::get<> in cast_prepare_expression
prepare_expr: improve cast_prepare_expression
prepare_expr: improve readability in cast_prepare_expression
cql-pytest: test expr::cast in test_cast.py
This series aims to allow users to set permissions on user-defined functions.
The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissionsFixes: #5572Fixes: #10633Closes#12869
* github.com:scylladb/scylladb:
cql3: allow UDTs in permissions on UDFs
cql3: add type_parser::parse() method taking user_types_metadata
schema_change_test: stop using non-existent keyspace
cql3: fix parameter names in function resource constructors
cql3: handle complex types as when decoding function permissions
cql3: enforce permissions for ALTER FUNCTION
cql-pytest: add a (failing) test case for UDT in UDF
cql-pytest: add a test case for user-defined aggregate permissions
cql-pytest: add tests for function permissions
cql3: enforce permissions on function calls
selection: add a getter for used functions
abstract_function_selector: expose underlying function
cql3: enforce permissions on DROP FUNCTION
cql3: enforce permissions for CREATE FUNCTION
client_state: add functions for checking function permissions
cql-pytest: add a case for serializing function permissions
cql3: allow specifying function permissions in CQL
auth: add functions_resource to resources
Currently, when preparing an authorization statement on a specific
function, we're trying to "prepare" all cql types that appear in
the function signature while parsing the statement. We cannot
do that for UDTs, because we don't know the UDTs that are present
in the databse at parsing time. As a result, such authorization
statements fail.
To work around this problem, we postpone the "preparation" of cql
types until the actual statement validation and execution time.
Until then, we store all type strings in the resource object.
The "preparation" happens in the `maybe_correct_resource` method,
which is called before every `execute` during a `check_access` call.
At that point, we have access to the `query_processor`, and as a
result, to `user_types_metadata` which allows us to prepare the
argument types even for UDTs.
Currently, we're parsing types that appear in a function resource
using abstract_type::parse_type, which only works with simple types.
This patch changes it to db::marshal::type_parser::parse, which
can also handle collections.
We also adjust the test_grant_revoke_udf_permissions test so that
it uses both simple and complex types as parameters of the function
that we're granting/revoking permissions on.
Currently, the ALTER permission is only enforced on ALL FUNCTIONS
or on ALL FUNCTIONS IN KEYSPACE.
This patch enforces the permisson also on a specific function.
Our permissions system is currently incapable of figuring out
user-defined type definitions when preparing functions permissions.
This test case creates such a function, and it passes on Cassandra.
Preparing expr::cast had some artificial limitations.
Things like this worked:
`blob_col = (blob)funcReturnsInt()`
But this didn't:
`blob_col = (blob)(int)1234`
This is caused by the line:
`prepare_expression(c.arg, db, keyspace, schema_opt, receiver)`
Here the code prepares the expression to be cast using the original
receiver which was passed to cast_prepare_expression.
In the example above this meant that it tried to prepare
untyped_constant(1234) using a receiver with type blob.
This failed because an integer literal is invalid for a blob column.
To me it looks like a mistake. What it should do instead
is prepare the int literal using the type (int) and then
see if int can be cast to blob, by checking if these types
have compatible binary representation.
This can be achieved by using `cast_type_receiver` instead of `receiver`.
Making this small change makes it possible to use the cast
in many situations where it was previously impossible.
The tests have to be updated to reflect the change,
some of them ow deviate from Cassandra, so they have
to be marked scylla_only.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This test case checks that granting function permissions
result in correct serialization of the permissions - so that
reading system_auth.role_permissions and listing the permissions
via CQL with `LIST permission OF role` works in a compatible way
with both Scylla and Cassandra.
Add a test case which performs an LWT UPDATE, but the partition key
has 0 possible values, because it's supposed to be equal to two
different values.
Such queries used to cause problems in the past.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Our documentation states that writing an item with "USING TTL 0" means it
should never expire. This should be true even if the table has a default
TTL. But Scylla mistakenly handled "USING TTL 0" exactly like having no
USING TTL at all (i.e., it took the default TTL, instead of unlimited).
We had two xfailing tests demonstrating that Scylla's behavior in this
is different from Cassandra. Scylla's behavior in this case was also
undocumented.
By the way, Cassandra used to have the same bug (CASSANDRA-11207) but
it was fixed already in 2016 (Cassandra 3.6).
So in this patch we fix Scylla's "USING TTL 0" behavior to match the
documentation and Cassandra's behavior since 2016. One xfailing test
starts to pass and the second test passes this bug and fails on a
different one. This patch also adds a third test for "USING TTL ?"
with UNSET_VALUE - it behaves, on both Scylla and Cassandra, like a
missing "USING TTL".
The origin of this bug was that after parsing the statement, we saved
the USING TTL in an integer, and used 0 for the case of no USING TTL
given. This meant that we couldn't tell if we have USING TTL 0 or
no USING TTL at all. This patch uses an std::optional so we can tell
the case of a missing USING TTL from the case of USING TTL 0.
Fixes#6447
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13079
Add a test which performs an UPDATE and
tries to pass an UNSET_VALUE as a value
for the primary key.
There is also an LWT variant of this test
that tries to set an UNSET_VALUE
in the IF condition.
These two tests are analogous to
test_insert_update_where and
test_insert_update_where_lwt,
but use an UPDATE instead of INSERT.
It's useful to test UPDATE as well as INSERT.
When I was developing a fix for #13001
I initially added the condition for unset value
inside insert_statement, but this didn't handle
update statements. These two tests allowed me
to see that UPDATE still causes a crash.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#13058
When Fedora 37 came out, we discovered that its "pytest" script started
to run Python with the "-s" option, which caused problems for packages
installed personally via pip. We fixed this by adding our own wrapper
script test/pytest.
But this bug (https://bugzilla.redhat.com/show_bug.cgi?id=2152171) was
already fixed in Fedora 37, and the new version already reached our
dbuild. So we no longer need this wrapper script. Let's remove it.
Fixes#12412
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13083
CQL supports C-style casts with the destination type specified
inside parenthesis e.g `blob_column = (blob)funcThatReturnsInt()`.
These casts can be used to convert values of types
that have compatible binary representation, or as a type hint
to specify the type where the situation is ambiguous.
I didn't find any cql-pytest tests for this feature,
so I added some.
It looks like the feature works, but only partially.
Doing things like this works:
`blob_column = (blob)funcThatReturnsInt()`
But trying to do something a bit more complex fails:
`blob_column = (blob)(int)1234`
This is the case in both Cassandra and Scylla,
the tests introduced in this commit pass on both of them.
In future commits I will extend this feature
to support the more complex cases as well,
then some tests will have to be marked scylla_only.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
One test in test/cql-pytest/test_batch.py accidentally had the asyncio
marker, despite not using any async features. Remove it. The test still
runs fine.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13002
When the WASM UDFs were first introduced, the LANGUAGE required in
the CQL statements to use them was "xwasm", because the ABI for the
UDFs was still not specified and changes to it could be backwards
incompatible.
Now, the ABI is stabilized, but if backwards incompatible changes
are made in the future, we will add a new ABI version for them, so
the name "xwasm" is no longer needed and we can finally
change it to "wasm".
Closes#13089
This small series reorganizes the existing functional tests for aggregation (min, max, count) and adds additional tests for sum reproducing the strange (but Cassandra-compatible) behavior described in issue #13027.
Closes#13038
* github.com:scylladb/scylladb:
cql-pytest: add tests for sum() aggregate
test/cql-pytest: move aggregation tests to one file
Since commit 73e258fc34, Scylla has partial
verification for the CLUSTERING ORDER BY clause in CREATE MATERIALIZED
VIEW. Specifically, invalid column names are rejected. But for reasons
explained in issue #12936 and in the test in this patch, Cassandra
demands that if CLUSTERING ORDER BY appears it must list all the
clustering columns, with no duplicates, and do so in the right order.
This patch replaces an existing test which suggested it is fine
(an extention over Cassandra) to accept a partial list of clustering
columns, by a test that verifies that such a partial list, or an
incorrectly-ordered list, or list with duplicates, should be rejected.
The new test fails on Scylla, and passes on Cassandra, so marked as xfail.
Refs #12936.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12938