Commit Graph

352 Commits

Author SHA1 Message Date
Nadav Har'El
ce347f4b67 test/cql-pytest: add test for meaning of fetch_size with filtering
A question was raised on what fetch_size (the requested page size
in a paged scan) counts when there is a filter: does it count the
rows before filtering (as scanned from disk) or after filter (as
will be returned to the client)?

This patch adds a test which demonstrates that Cassandra and Scylla
behave differently in this respect: Cassandra counts post-filtering -
so fetch_size results are actually returned, while Scylla currently
counts pre-filtering.

It is arguable which behavior is the "correct" one - we discuss this in
issue #12102. But we have already had several users (such as #11340)
who complained about Scylla's behavior and expected Cassandra's behavior,
so if we decide to keep Scylla's behavior we should at least explain and
justify this decision in our documentation. Until then, let's have this
test which reminds us of this incompatibility. This test currently passes
on Cassandra and fails (xfail) on Scylla.

Refs #11340
Refs #12102

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12103
2022-11-30 12:27:06 +02:00
Nadav Har'El
8bd8ef3d03 test/cql-pytest: add regression test for old issue
This patch adds a regression test for the old issue #65 which is about
a multi-column (tuple) clustering-column relation in a SELECT when one
these columns has reversed order. It turns out that we didn't notice,
but this issue was already solved - but we didn't have a regression test
for it. So this patch adds just a regression test. The test confirms that
Scylla now behaves like was desired when that issue was opened. The test
also passes on Cassandra, confirming that Scylla and Cassandra behave
the same for such requests.

Fixes #65

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12130
2022-11-30 12:22:21 +02:00
Nadav Har'El
c5121cf273 cql: fix column-name aliases in SELECT JSON
The SELECT JSON statement, just like SELECT, allows the user to rename
selected columns using an "AS" specification. E.g., "SELECT JSON v AS foo".
This specification was not honored: We simply forgot to look at the
alias in SELECT JSON's implementation (we did it correctly in regular
SELECT). So this patch fixes this bug.

We had two tests in cassandra_tests/validation/entities/json_test.py
that reproduced this bug. The checks in those tests now pass, but these
two tests still continue to fail after this patch because of two other
unrelated bugs that were discovered by the same tests. So in this patch
I also add a new test just for this specific issue - to serve as a
regression test.

Fixes #8078

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12123
2022-11-29 18:16:19 +02:00
Nadav Har'El
99a72a9676 Merge 'cql3: expr: make it possible to evaluate expr::binary_operator' from Jan Ciołek
As a part of CQL rewrite we want to be able to perform filtering by calling `evaluate()` on an expression and checking if it evaluates to `true`. Currently trying to do that for a binary operator would result in an error.

Right now checking if a binary operation like `col1 = 123` is true is done using `is_satisfied_by`, which is able to check if a binary operation evaluates to true for a small set of predefined cases.

Eventually once the grammar is relaxed we will be able to write expressions like: `(col1 < col2) = (1 > ?)`, which doesn't fit with what `is_satisfied_by` is supposed to do.
Additionally expressions like `1 = NULL` should evaluate to `NULL`, not `true` or `false`. `is_satsified_by` is not able to express that properly.

The proper way to go is implementing `evaluate(binary_operator)`, which takes a binary operation and returns what the result of it would be.

Implementing `prepare_expression` for `binary_operator` requires us to be able to evaluate it first. In the next PR I will add support for `prepare_expression`.

Closes #12052

* github.com:scylladb/scylladb:
  cql-pytest: enable two unset value tests that pass now
  cql-pytest: reduce unset value error message
  cql3: expr: change unset value error messages to lowercase
  cql_pytest: ensure that where clauses like token(p) = 0 AND p = 0 are rejected
  cql3: expr: remove needless braces around switch cases
  cql3: move evaluation IS_NOT NULL to a separate function
  expr_test: test evaluating LIKE binary_operator
  expr_test: test evaluating IS_NOT binary_operator
  expr_test: test evaluating CONTAINS_KEY binary_operator
  expr_test: test evaluating CONTAINS binary_operator
  expr_test: test evaluating IN binary_operator
  expr_test: test evaluating GTE binary_operator
  expr_test: test evaluating GT binary_operator
  expr_test: test evaluating LTE binary_operator
  expr_test: test evaluating LT binary_operator
  expr_test: test evaluating NEQ binary_operator
  expr_test: test evaluating EQ binary_operator
  cql3: expr properly handle null in is_one_of()
  cql3: expr properly handle null in like()
  cql3: expr properly handle null in contains_key()
  cql3: expr properly handle null in contains()
  cql3: expr: properly handle null in limits()
  cql3: expr: remove unneeded overload of limits()
  cql3: expr: properly handle null in equality operators
  cql3: expr: remove unneeded overload of equal()
  cql3: expr: use evaluate(binary_operator) in is_satisfied_by
  cql3: expr: handle IS NOT NULL when evaluating binary_operator
  cql3: expr: make it possible to evaluate binary_operator
  cql3: expr: accept expression as lhs argument to like()
  cql3: expr: accept expression as lhs in contains_key
  cql3: expr: accept expression as lhs argument to contains()
2022-11-28 11:30:00 +02:00
Jan Ciolek
77c7d8b8f6 cql-pytest: enable two unset value tests that pass now
While implementing evaluate(binary_operator)
missing checks for unset value were added
for comparisons in filtering code.

Because of that some tests for unset value
started passing.

There are still other tests for unset value
that are failing because Scylla doesn't
have all the checks that it should.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-24 17:07:17 +01:00
Jan Ciolek
5bc0bc6531 cql-pytest: reduce unset value error message
When unset value appears in an invalid place
both Cassandra and Scylla throw an error.

The tests were written with Cassandra
and thus the expected error messages were
exactly the same as produced by Cassandra.

Scylla produces different error messages,
but both databases return messages with
the text 'unset value'.

Reduce the expected message text
from the whole message to something
that contains 'unset value'.

It would be hard to mimic Cassandra's
error messages in Scylla. There is no
point in spending time on that.
Instead it's better to modify the tests
so that they are able to work with
both Cassandra and Scylla.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-24 17:04:07 +01:00
Nadav Har'El
c6bb64ab0e Merge 'Fix LWT insert crash if clustering key is null' from Gusev Petr
[PR](https://github.com/scylladb/scylladb/pull/9314) fixed a similar issue with regular insert statements
but missed the LWT code path.

It's expected behaviour of
`modification_statement::create_clustering_ranges` to return an
empty range in this case, since `possible_lhs_values` it
uses explicitly returns `empty_value_set` if it evaluates `rhs`
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
`modification_statement::process_where_clause`. So the only
problem was `modification_statement::execute_with_condition`
was not expecting an empty `clustering_range` in case of
a null clustering key.

Also this patch contains a fix for the problem with wrong
column name in Scylla error messages. If `INSERT` or `DELETE`
statement is missing a non-last element of
the primary key, the error message generated contains
an invalid column name.

The problem occurs if the query contains a column with the list type,
otherwise
`statement_restrictions::process_clustering_columns_restrictions`
checks that all the components of the key are specified.

Closes #12047

* github.com:scylladb/scylladb:
  cql: refactor, inline modification_statement::validate_primary_key_restrictions
  cql: DELETE with null value for IN parameter should be forbidden
  cql: add column name to the error message in case of null primary key component
  cql: batch statement, inserting a row with a null key column should be forbidden
  cql: wrong column name in error messages
  modification_statement: fix LWT insert crash if clustering key is null
2022-11-24 16:15:27 +02:00
Petr Gusev
f9936bb0cb cql: DELETE with null value for IN parameter should be forbidden
If a DELETE statement contains an IN operator and the
parameter value for it is NULL, this should also trigger
an error. This is in line with how Cassandra
behaves in this case.
2022-11-23 21:39:23 +04:00
Petr Gusev
c123f94110 cql: add column name to the error message in case of null primary key component
It's more user-friendly and the error message
corresponds to what Cassandra provides in this case.
2022-11-23 21:39:23 +04:00
Petr Gusev
7730c4718e cql: batch statement, inserting a row with a null key column should be forbidden
Regular INSERT statements with null values for primary key
components are rejected by Scylla since #9286 and #9314.
Batch statements missed a similar check, this patch
fixes it.

Fixes: #12060
2022-11-23 21:39:23 +04:00
Petr Gusev
89a5397d7c cql: wrong column name in error messages
If INSERT or DELETE statement is missing a non-last element of
the primary key, the error message generated contains
an invalid column name.

The problem occurs if the query contains a column with the list type,
otherwise
statement_restrictions::process_clustering_columns_restrictions
checks that all the components of the key are specified.

Fixes: #12046
2022-11-23 21:39:16 +04:00
Jan Ciolek
84501851eb cql_pytest: ensure that where clauses like token(p) = 0 AND p = 0 are rejected
Scylla doesn't support combining restrictions
on token with other restrictions on partition key columns.

Some pieces of code depend on the assumption
that such combinations are allowed.
In case they were allowed in the future
these functions would silently start
returning wrong results, and we would
return invalid rows.

Add a test that will start failing once
this restriction is removed. It will
warn the developer to change the
functions that used to depend
on the assumption.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-23 13:09:22 +01:00
Petr Gusev
0d443dfd16 modification_statement: fix LWT insert crash if clustering key is null
PR #9314 fixed a similar issue with regular insert statements
but missed the LWT code path.

It's expected behaviour of
modification_statement::create_clustering_ranges to return an
empty range in this case, since possible_lhs_values it
uses explicitly returns empty_value_set if it evaluates rhs
to null, and it has a comment about it (All NULL
comparisons fail; no column values match.) On the other hand,
all components of the primary key are required to be set,
this is checked at the prepare phase, in
modification_statement::process_where_clause. So the only
problem was modification_statement::execute_with_condition
was not expecting an empty clustering_range in case of
a null clustering key.

Fixes: #11954
2022-11-22 16:45:16 +04:00
Nadav Har'El
ff617c6950 cql-pytest: translate a few small Cassandra tests
This patch includes a translation of several additional small test files
from Cassandra's CQL unit test directory cql3/validation/operations.

All tests included here pass on both Cassandra and Scylla, so they did
not discover any new Scylla bugs, but can be useful in the future as
regression tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12045
2022-11-22 07:54:13 +02:00
Nadav Har'El
2ba8b8d625 test/cql-pytest: remove "xfail" from passing test testIndexOnFrozenCollectionOfUDT
We had a test that used to fail because of issue #8745. But this issue
was alread fixed, and we forgot to remove the "xfail" marker. The test
now passes, so let's remove the xfail marker.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12039
2022-11-20 19:54:59 +02:00
Jan Ciolek
286f182a8c cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
In issue #12014 a user has encountered an instance of #6200.
When filtering a WHERE clause which contained
both multi-column and regular restrictions,
the regular restrictions were ignored.

Add a test which reproduces the issue
using a reproducer provided by the user.

This problem is tested in another similar test,
but this one reproduces the issue in the exact
way it was found by the user.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-18 15:27:42 +01:00
Jan Ciolek
99e1032e34 cql-pytest: enable test for filtering combined multi column and regular column restrictions
The test test_multi_column_restrictions_and_filtering was marked as xfail,
because issue #6200 wasn't fixed. Now that filtering
multi column and other restrictions together has been fixed
the test passes.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-11-18 15:14:32 +01:00
Nadav Har'El
e393639114 test/cql-pytest: reproducer for crash in LWT with null key
This patch adds a reproducer for issue #11954: Attempting an
"IF NOT EXISTS" (LWT) write with a null key crashes Scylla,
instead of producing a simple error message (like happens
without the "IF NOT EXISTS" after #7852 was fixed).

The test passed on Cassandra, but crashes Scylla. Because of this
crash, we can't just mark the test "xfail" and it's temporarily
marked "skip" instead.

Refs #11954.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11982
2022-11-17 07:31:13 +02:00
Nadav Har'El
2f2f01b045 materialized views: fix view writes after base table schema change
When we write to a materialized view, we need to know some information
defined in the base table such as the columns in its schema. We have
a "view_info" object that tracks each view and its base.

This view_info object has a couple of mutable attributes which are
used to lazily-calculate and cache the SELECT statement needed to
read from the base table. If the base-table schema ever changes -
and the code calls set_base_info() at that point - we need to forget
this cached statement. If we don't (as before this patch), the SELECT
will use the wrong schema and writes will no longer work.

This patch also includes a reproducing test that failed before this
patch, and passes afterwords. The test creates a base table with a
view that has a non-trivial SELECT (it has a filter on one of the
base-regular columns), makes a benign modification to the base table
(just a silly addition of a comment), and then tries to write to the
view - and before this patch it fails.

Fixes #10026
Fixes #11542
2022-11-16 13:58:21 +02:00
Nadav Har'El
e4dba6a830 test/cql-pytest: add test for when MV requires IS NOT NULL
As noted in issue #11979, Scylla inconsistently (and unlike Cassandra)
requires "IS NOT NULL" one some but not all materialized-view key
columns. Specifically, Scylla does not require "IS NOT NULL" on the
base's partition key, while Cassandra does.

This patch is a test which demonstrates this inconsistency. It currently
passes on Cassandra and fails on Scylla, so is marked xfail.

Refs #11979

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11980
2022-11-15 14:21:48 +01:00
Nadav Har'El
ff87624fb4 test/cql-pytest: add another regression test for reversed-type bug
In commit 544ef2caf3 we fixed a bug where
a reveresed clustering-key order caused problems using a secondary index
because of incorrect type comparison. That commit also included a
regression test for this fix.

However, that fix was incomplete, and improved later in commit
c8653d1321. That later fix was labeled
"better safe than sorry", and did not include a test demonstrating
any actual bug, so unsurprisingly we never backported that second
fix to any older branches.

Recently we discovered that missing the second patch does cause real
problems, and this patch includes a test which fails when the first
patch is in, but the second patch isn't (and passes when both patches
are in, and also passes on Cassandra).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11943
2022-11-11 11:01:22 +02:00
Nadav Har'El
b9d88a3601 cql/pytest: add reproducer for timestamp column validation issue
This patch adds a reproducing test for issue #11588, which is still open
so the test is expected to fail on Scylla ("xfail), and passes on Cassandra.

The test shows that Scylla allows an out-of-range value to be written to
timestamp column, but then it can't be read back.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11864
2022-11-01 08:11:01 +02:00
Nadav Har'El
c31bf4184f test/cql-pytest: two reproducers for SI returning oversized pages
This patch has two reproducing tests for issue #7432, which are cases
where a paged query with a restriction backed by a secondary-index
returns pages larger than the desired page size. Because these tests
reproduce a still-open bug, they are both marked "xfail". Both tests
pass on Cassandra.

The two tests involve quite dissimilar casess - one involves requesting
an entire partition (and Scylla forgetting to page through it), and the
other involves GROUP BY - so I am not sure these two bugs even have the
same underlying cause. But they were both reported in #7432, so let's
have reproducers for both.

Refs #7432

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11586
2022-10-17 11:36:05 +03:00
Nadav Har'El
9f02431064 test/cql-pytest: fix test_permissions.py when running with "--ssl"
The tests in test_permissions.py use the new_session() utility function
to create a new connection with a different logged-in user.

It models the new connection on the existing one, but incorrectly
assumed that the connection is NOT ssl. This made this test failed
with cql-pytest/run is passed the "--ssl" option.

In this patch we correctly infer the is_ssl state from the existing
cql fixture, instead of assuming it is false. After this pass,
"cql-pytest/run --ssl" works as expected for this test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11742
2022-10-17 06:46:46 +03:00
Jan Ciolek
52bbc1065c cql3: allow lists of IN elements to be NULL
Requests like `col IN NULL` used to cause
an error - Invalid null value for colum col.

We would like to allow NULLs everywhere.
When a NULL occurs on either side
of a binary operator, the whole operation
should just evaluate to NULL.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #11775
2022-10-13 15:11:32 +02:00
Nadav Har'El
ef0da14d6f test/cql-pytest: add simple tests for USE statement
This patch adds a couple of simple tests for the USE statement: that
without USE one cannot create a table without explicitly specifying
a keyspace name, and with USE, it is possible.

Beyond testing these specific feature, this patch also serves as an
example of how to write more tests that need to control the effective USE
setting. Specifically, it adds a "new_cql" function that can be used to
create a new connection with a fresh USE setting. This is necessary
in such tests, because if multiple tests use the same cql fixture
and its single connection, they will share their USE setting and there
is no way to undo or reset it after being set.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11741
2022-10-11 08:20:19 +03:00
Jan Ciolek
a2c359a741 cql3: Make CONTAINS KEY NULL return false
A binary operator like this:
{1: 2, 3: 4} CONTAINS KEY NULL
used to evaluate to `true`.

This is wrong, any operation involving null
on either side of the operator should evaluate
to NULL, which is interpreted as false.

This change is not backwards compatible.
Some existing code might break.

partially fixes: #10359

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-10-05 18:15:44 +02:00
Jan Ciolek
bbfef4b510 cql3: Make CONTAINS NULL return false
A binary operator like this:
[1, 2, 3] CONTAINS NULL
used to evaluate to `true`.

This is wrong, any operation involving null
on either side of the operator should evaluate
to NULL, which is interpreted as false.

This change is not backwards compatible.
Some existing code might break.

partially fixes: #10359

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-10-05 18:15:15 +02:00
Botond Dénes
5621cdd7f9 db/view/view_builder: don't drop partition and range tombstones when resuming
The view builder builds the views from a given base table in
view_builder::batch_size batches of rows. After processing this many
rows, it suspends so the view builder can switch to building views for
other base tables in the name of fairness. When resuming the build step
for a given base table, it reuses the reader used previously (also
serving the role of a snapshot, pinning sstables read from). The
compactor however is created anew. As the reader can be in the middle of
a partition, the view builder injects a partition start into the
compactor to prime it for continuing the partition. This however only
included the partition-key, crucially missing any active tombstones:
partition tombstone or -- since the v2 transition -- active range
tombstone. This can result in base rows covered by either of this to be
resurrected and the view builder to generate view updates for them.
This patch solves this by using the detach-state mechanism of the
compactor which was explicitly developed for situations like this (in
the range scan code) -- resuming a read with the readers kept but the
compactor recreated.
Also included are two test cases reproducing the problem, one with a
range tombstone, the other with a partition tombstone.

Fixes: #11668

Closes #11671
2022-10-03 11:28:22 +03:00
Avi Kivity
cf3830a249 Merge 'Add support for TRUNCATE USING TIMEOUT' from Benny Halevy
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter is no longer derived from raw::cf_statement,
and just stores a schema_ptr to get to the keyspace and column_family.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Also, update docs/cql/ddl.rst truncate-statement section respectively.

Closes #11409

* github.com:scylladb/scylladb:
  docs: cql-extensions: add TRUNCATE to USING TIMEOUT section.
  docs: cql: ddl: add support for TRUNCATE USING TIMEOUT
  cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
  cql3: selectStatement: restrict to USING TIMEOUT in grammar
  cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
2022-09-28 18:19:03 +03:00
Benny Halevy
64140ccf05 cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter, statements::truncate_statement, is no longer derived
from raw::cf_statement, and just stores a schema_ptr to get to the
keyspace and column_family names.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
27d3e48005 cql3: selectStatement: restrict to USING TIMEOUT in grammar
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.

test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.

Note that cql3::statements::raw::select_statement validate_attrs
now asserts that the ttl or the timestamp attributes aren't set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Benny Halevy
0728d33d5f cql3: deleteStatement: restrict to USING TIMEOUT|TIMESTAMP in grammar
It is preferred to reject USING TLL / TIMESTAMP at the grammar
level rather than functionally validating the USING attributes.

test_using_timeout was adjusted respectively to expect the
`SyntaxException` error rather than `InvalidRequest`.

Note that now delete_statement ctor asserts that the ttl
attribute is not set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Nadav Har'El
868a884b79 test/cql-pytest: add reproducer for ignored IS NOT NULL
This test reproduces issue #10365: It shows that although "IS NOT NULL" is
not allowed in regular SELECT filters, in a materialized view it is allowed,
even for non-key columns - but then outright ignored and does not actually
filter out anything - a fact which already surprised several users.

The test also fails on Cassandra - it also wrongly allows IS NOT NULL
on the non-key columns but then ignores this in the filter. So the test
is marked with both xfail (known to fail on Scylla) and cassandra_bug
(fails on Cassandra because of what we consider to be a Cassandra bug).

Refs #10365
Refs #11606

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11615
2022-09-26 09:02:08 +03:00
Nadav Har'El
4c93a694b7 cql: validate bloom_filter_fp_chance up-front
Scylla's Bloom filter implementation has a minimal false-positive rate
that it can support (6.71e-5). When setting bloom_filter_fp_chance any
lower than that, the compute_bloom_spec() function, which writes the bloom
filter, throws an exception. However, this is too late - it only happens
while flushing the memtable to disk, and a failure at that point causes
Scylla to crash.

Instead, we should refuse the table creation with the unsupported
bloom_filter_fp_chance. This is also what Cassandra did six years ago -
see CASSANDRA-11920.

This patch also includes a regression test, which crashes Scylla before
this patch but passes after the patch (and also passes on Cassandra).

Fixes #11524.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11576
2022-09-20 06:18:51 +03:00
Jadw1
ba461aca8b cql-pytest: more neutral command in cql_test_connection fixture
I found 'use system` to not be neutral enough (e.g. in case of testing
describe statement). `BEGIN BATCH APPLY BATCH` sounds better.

Closes #11504
2022-09-11 18:49:06 +03:00
Wojciech Mitros
49dba4f0c1 functions: fix dropping of a keyspace with an aggregate in it
Currently, if a keyspace has an aggregate and the keyspace
is dropped, the keyspace becomes corrupted and another keyspace
with the same name cannot be created again

This is caused by the fact that when removing an aggregate, we
call create_aggregate() to get values for its name and signature.
In the create_aggregate(), we check whether the row and final
functions for the aggregate exist.
Normally, that's not an issue, because when dropping an existing
aggregate alone, we know that its UDFs also exist. But when dropping
and entire keyspace, we first drop the UDFs, making us unable to drop
the aggregate afterwards.

This patch fixes this behavior by removing the create_aggregate()
from the aggregate dropping implementation and replacing it with
specific calls for getting the aggregate name and signature.

Additionally, a test that would previously fail is added to
cql-pytest/test_uda.py where we drop a keyspace with an aggregate.

Fixes #11327

Closes #11375
2022-08-25 16:28:57 +02:00
Wojciech Mitros
9e6e8de38f tests: prevent test_wasm from occasional failing
Some cases in test_wasm.py assumed that all cases
are ran in the same order every time and depended
on values that should have been added to tables in
previous cases. Because of that, they were sometimes
failing. This patch removes this assumption by
adding the missing inserts to the affected cases.

Additionally, an assert that confirms low miss
rate of udfs is more precise, a comment is added
to explain it clearly.

Closes #11367
2022-08-25 11:32:06 +03:00
Nadav Har'El
055340ae39 cql-pytest: increase more timeouts
In commit 7eda6b1e90, we increased the
request_timeout parameter used by cql-pytest tests from the default of
10 seconds to 120 seconds. 10 seconds was usually more than enough for
finishing any Scylla request, but it turned out that in some extreme
cases of a debug build running on an extremely over-committed machine,
the default timeout was not enough.

Recently, in issue #11289 we saw additional cases of timeouts which
the request_timeout setting did *not* solve. It turns out that the Python
CQL driver has two additional timeout settings - connect_timeout and
control_connection_timeout, which default to 5 seconds and 2 seconds
respectively. I believe that most of the timeouts in issue #11289
come from the control_connection_timeout setting - by changing it
to a tiny number (e.g., 0.0001) I got the same error messages as those
reported in #11289. The default of that timeout - 2 seconds - is
certainly low enough to be reached on an extremely over-committed
machine.

So this patch significantly increases both connect_timeout and
control_connection_timeout to 60 seconds. We don't care that this timeout
is ridiculously large - under normal operations it will never be reached.
There is no code which loops for this amount of time, for example.

Refs #11289 (perhaps even Fixes, we'll need to see that the test errors
go away).

NOTE: This patch only changes test/cql-pytest/util.py, which is only
used by the cql-pytest test suite. We have multiple other test suites which
copied this code, and those test suites might need fixing separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11295
2022-08-16 19:11:59 +03:00
Piotr Sarna
cf30d4cbcf Merge 'Secondary index of collection columns' from Nadav Har'El
This pull request introduces global secondary-indexing for non-frozen collections.

The intent is to enable such queries:

```
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id));
CREATE INDEX ON test(keys(somemap));
CREATE INDEX ON test(values(somemap));
CREATE INDEX ON test(entries(somemap));
CREATE INDEX ON test(values(somelist));
CREATE INDEX ON test(values(someset));

-- index on test(c) is the same as index on (values(c))
CREATE INDEX IF NOT EXISTS ON test(somelist);
CREATE INDEX IF NOT EXISTS ON test(someset);
CREATE INDEX IF NOT EXISTS ON test(somemap);

SELECT * FROM test WHERE someset CONTAINS 7;
SELECT * FROM test WHERE somelist CONTAINS 7;
SELECT * FROM test WHERE somemap CONTAINS KEY 7;
SELECT * FROM test WHERE somemap CONTAINS 7;
SELECT * FROM test WHERE somemap[7] = 7;
```

We use here all-familiar materialized views (MVs). Scylla treats all the
collections the same way - they're a list of pairs (key, value). In case
of sets, the value type is dummy one. In case of lists, the key type is
TIMEUUID. When describing the design, I will forget that there is more
than one collection type.  Suppose that the columns in the base table
were as follows:

```
pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2)
```

The MV schema is as follows (the names of columns which are not the same
as in base might be different). All the columns here form the primary
key.

```
-- for index over entries
indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over keys
indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over values
indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int
```

The reason for the last additional column is that the values from a collection might not be unique.

Fixes #2962
Fixes #8745
Fixes #10707

This patch does not implement **local** secondary indexes for collection columns: Refs #10713.

Closes #10841

* github.com:scylladb/scylladb:
  test/cql-pytest: un-xfail yet another passing collection-indexing test
  secondary index: fix paging in map value indexing
  test/cql-pytest: test for paging with collection values index
  cql, view: rename and explain bytes_with_action
  cql, index: make collection indexing a cluster feature
  test/cql-pytest: failing tests for oversized key values in MV and SI
  cql: fix secondary index "target" when column name has special characters
  cql, index: improve error messages
  cql, index: fix default index name for collection index
  test/cql-pytest: un-xfail several collecting indexing tests
  test/cql-pytest/test_secondary_index: verify that local index on collection fails.
  docs/design-notes/secondary_index: add `VALUES` to index target list
  test/cql-pytest/test_secondary_index: add randomized test for indexes on collections
  cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra
  cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.
  test/boost/secondary_index_test: update for non-frozen indexes on collections
  test/cql-pytest: Uncomment collection indexes tests that should be working now
  cql, index: don't use IS NOT NULL on collection column
  cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows
  cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections
  cql3, index: Use entries() indexes on collections for queries
  cql3, index: Use keys() and values() indexes on collections for queries.
  types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented
  cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function
  db/view/view.cc: compute view_updates for views over collections
  view info: has_computed_column_depending_on_base_non_primary_key
  column_computation: depends_on_non_primary_key_column
  schema, index/secondary_index_manager: make schema for index-induced mv
  index/secondary_index_manager: extract keys, values, entries types from collection
  cql3/statements/: validate CREATE INDEX for index over a collection
  cql3/statements/create_index_statement,index_target: rewrite index target for collection
  column_computation.hh, schema.cc: collection_column_computation
  column_computation.hh, schema.cc: compute_value interface refactor
  Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`
2022-08-16 14:18:51 +02:00
Nadav Har'El
fbb0b66d0c test/cql-pytest: fix run's "--ssl" option
Commit 23acc2e848 broke the "--ssl" option of test/cql-pytest/run
(which makes Scylla - and cqlpytest - use SSL-encrypted CQL).
The problem was that there was a confusion between the "ssl" module
(Python's SSL support) and a new "ssl" variable. A rename and a missing
"import" solves the breakage.

We never noticed this because Jenkins does *not* run cql-pytest/run
with --ssl (actually, it no longer runs cql-pytest/run at all).
It is still a useful option for checking SSL-related problems in Scylla
and Seastar.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11292
2022-08-16 12:29:05 +02:00
Nadav Har'El
329068df99 test/cql-pytest: un-xfail yet another passing collection-indexing test
After collection indexing has been implemented, yet another test which
failed because of #2962 now passes. So remove the "xfail" marker.

Refs #2962

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
f6f18b187a secondary index: fix paging in map value indexing
When indexing a map column's *values*, if the same value appears more
than once, the same row will appear in the index more than once. We had
code that removed these duplicates, but this deduplication did not work
across page boundaries. We had two xfailing tests to demonstrate this bug.

In this patch we fix this bug by looking at the page's start and not
generating the same row again, thereby getting the same deduplication
we had inside pages - now across pages.

The previously-xfailing tests now pass, and their xfail tag is removed.
I also added another test, for the case where the base table has only
partition keys without clustering keys. This second test is important
because the code path for the partition-key-only case is different,
and the second test exposed a bug in it as well (which is also fixed
in this patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
dc445b9a73 test/cql-pytest: test for paging with collection values index
If a map has several keys with the same value, then the "values(m)" index
must remember all of them as matching the same row - because later we may
remove one of these keys from the map but the row would still need to
match the value because of the remaining keys.

We already had a test (test_index_map_values) that although the same row
appears more than once for this value, when we search for this value the
result only returns the row once. Under the hood, Scylla does find the
same value multiple times, but then eliminates the duplicate matched raw
and returns it only once.

But there is a complication, that this de-duplication does not easily
span *paging*. So in this patch we add a test that checks that paging
does not cause the same row to be returned more than once.
Unfortunately, this test currently fails on Scylla so marked "xfail".
It passes on Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
aa86f808a6 test/cql-pytest: failing tests for oversized key values in MV and SI
In issue #9013, we noticed that if a value larger than 64 KB is indexed,
the write fails in a bad way, and we fixed it. But the test we wrote
when fixing that issue already suggested that something was still wrong:
Cassandra failed the write cleanly, with an InvalidRequest, while Scylla
failed with a mysterious WriteFailure (with a relevant error message
only in the log).

This patch adds several xfailing tests which demonstrate what's still
wrong. This is also summarized in issue #8627:

1. A write of an oversized value to an indexed column returns the wrong
   error message.
2. The same problem also exists when indexing a collection, and the indexed
   key or value is oversized.
3. The situation is even less pleasant when adding an index to a table
   with pre-existing data and an oversized value. In this case, the
   view building will fail on the bad row, and never finish.
4. We have exactly the same bugs not just with indexes but also with
   materialized views. Interestingly, Cassandra has similar bugs in
   materialized views as well (but not in the secondary index case,
   where Cassandra does behave as expected).

Refs #8627.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
2c244c6e09 cql: fix secondary index "target" when column name has special characters
Unfortunately, we encode the "target" of a secondary index in one of
three ways:

1. It can be just a column name
2. It can be a string like keys(colname) - for the new type of
   collection indexes introduced in this series.
3. It can be a JSON map ({ ... }). This form is used for local indexes.

The code parsing this target - target_parser::parse() - needs not to
confuse these different formats. Before this patch, if the column name
contains special characters like braces or parentheses (this is allowed
in CQL syntax, via quoting), we can confuse case 1, 2, and 3: A column
named "keys(colname)" will be confused for case 2, and a column named
"{123}" will be confused with case 3.

This problem can break indexing of some specially-crafted column names -
as reproduced by test_secondary_index.py::test_index_quoted_names.

The solution adopted in this patch is that the column name in case 1
should be escaped somehow so it cannot be possibly confused with either
cases 2 and 3. The way we chose is to convert the column name to CQL (with
column_definition::as_cql_name()). In other words, if the column name
contains non-alphanumeric characters, it is wrapped in quotes and also
quotes are doubled, as in CQL. The result of this can't be confused
with case 2 or 3, neither of which may begin with a quote.

This escaping is not the minimal we could have done, but incidentally it
is exactly what Cassandra does as well, so I used it as well.

This change is *mostly* backward compatible: Already-existing indexes will
still have unescaped column names stored for their "target" string,
and the unescaping code will see they are not wrapped in quotes, and
not change them. Backward compatibility will only fail on existing indexes
on columns whose name begin and end in the quote characters - but this
case is extremely unlikely.

This patch illustrates how un-ideal our index "target" encoding is,
but isn't what made it un-ideal. We should not have used three different
formats for the index target - the third representation (JSON) should
have sufficed. However, two two other representations are identical
to Cassandra's, so using them when we can has its compatibility
advantages.

The patch makes test_secondary_index.py::test_index_quoted_names pass.

Fixes #10707.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
56204a3794 cql, index: improve error messages
Before this patch, trying to create an index on entries(x) where x is
not a map results in an error message:

  Cannot create index on index_keys_and_values of column x

The string "index_keys_and_values" is strange - Cassandra prints the
easier to understand string "entries()" - which better corresponds to
what the user actually did.

It turns out that this string "index_keys_and_values" comes from an
elaborate set of variables and functions spanning multiple source files,
used to convert our internal target_type variable into such a string.
But although this code was called "index_option" and sounded very
important, it was actually used just for one thing - error messages!

So in this patch we drop the entire "index_option" abstraction,
replacing it by a static trivial function defined exactly where
it's used (create_index_statement.cc), which prints a target type.
While at it, we print "entries()" instead of "index_keys_and_values" ;-)

After this patch, the
test_secondary_index.py::test_index_collection_wrong_type

finally passes (the previous patch fixed the default table names it
assumes, and this patch fixes the expected error messages), so its
"xfail" tag is removed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
84461f1827 cql, index: fix default index name for collection index
When creating an index "CREATE INDEX ON tbl(keys(m))", the default name
of the index should be tbl_m_idx - with just "m". The current code
incorrectly used the default name tbl_m_keys_idx, so this patch adds
a test (which passes on Cassandra, and after this patch also on Scylla)
and fixes the default name.

It turns out that the default index name was based on a mysterious
index_target::as_string(), which printed the target "keys(m)" as
"m_keys" without explaining why it was so. This method was actually
used only in three places, and all of them wanted just the column
name, without the "_keys" suffix! So in this patch we rename the
mysterious as_string() to column_name(), and use this function instead.

Now that the default index name uses column_name() and gets just
column_name(), the correct default index name is generated, and the
test passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Nadav Har'El
94ba03a4d6 test/cql-pytest: un-xfail several collecting indexing tests
After the previous patches implemented collection indexing, several
tests in test/cql-pytest/test_secondary_index.py that were marked
with "xfail" started to pass - so here we remove the xfail.

Only three collection indexing tests continue to xfail:
test_secondary_index.py::test_index_collection_wrong_type
test_secondary_index.py::test_index_quoted_names (#10707)
test_secondary_index.py::test_local_secondary_index_on_collection (#10713)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-08-14 10:29:52 +03:00
Michał Radwański
2690ecd65d test/cql-pytest/test_secondary_index: verify that local index on
collection fails.

Collection indexing is being tracked by #2962. Global secondary index
over collection is enabled by #10123. Leave this test to track this
behaviour.

Related issue: #10713
2022-08-14 10:29:52 +03:00