Commit Graph

557 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
14598fdfdd test: add test for compaction strategy validation 2023-09-13 16:59:40 +02:00
Botond Dénes
7e7101c180 Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes"
This reverts commit 628e6ffd33, reversing
changes made to 45ec76cfbf.

The test included with this PR is flaky and often breaks CI.
Revert while a fix is found.

Fixes: #15371
2023-09-13 10:45:37 +03:00
Avi Kivity
628e6ffd33 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes #14923

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-09-11 19:20:19 +03:00
Pavel Emelyanov
821a9c1fd4 test/cql-pytest: Add enable|disable-binary test case
The test checks that `nodetool disablebinary` makes subsequent queries
fail and `nodetool enablebinary` lets client to establish new
connections.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:38:49 +03:00
Pavel Emelyanov
2c3b30b395 test/pylib: Add nodetool enable|disable-binary commands
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Botond Dénes
b55cead5cd replica/mutation_dump: detect end-of-page in range-scans
The current read-loop fails to detect end-of-page and if the query
result buider cuts the page, it will just proceed to the next
partition. This will result in distorted query results, as the result
builder will request for the consumption to stop after each clustering
row.
To fix, check if the page was cut before moving on to the next
partition.
A unit test reproducing the bug was also added.
2023-09-11 07:02:14 -04:00
Patryk Jędrzejczak
23a4557662 test: test_cdc: introduce wait_for_first_cdc_generation
After introducing the CDC generation publisher,
test_cdc_log_entries_use_cdc_streams could (at least in theory)
fail by accessing system_distributed.cdc_streams_descriptions_v2
before the first CDC generation has been published.

To avoid flakiness, we simply wait until the first CDC generation
is published in a new function -- wait_for_first_cdc_generation.
2023-09-08 09:05:01 +02:00
Botond Dénes
34d94fb549 test/cql-pytest/test_tools.py: improve tempdir usage for scrub tests
Scrub tests use a lot of temporary directories. This is suspected to
cause problems in some cases. To improve the situation, this patch:
* Creates a single root temporary directory for all scrub tests
* All further fixtures create their files/directories inside this root
  dir.
* All scrub tests create their temporary directories within this root
  dir.
* All temporary directories now use an appropriate "prefix", so we can
  tell which temporary directory is part of the problem if a test fails.

Refs: #14309

Closes #15117
2023-09-01 07:17:49 +03:00
Botond Dénes
1609c76d62 tools/scylla-sstable: scrub: don't qurantine sstables after validate
Scylla sstable promises to *never* mutate its input sstables. This
promise was broken by `scylla sstable scrub --scrub-mode=validate`,
because validate moves invalid input sstables into qurantine. This is
unexpected and caused occasional failures in the scrub tests in
test_tools.py. Fix by propagating a flag down to
`scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether
validate should qurantine invalid sstables, then set this flag to false
in `scylla-sstable.cc`. The existing test for validate-mode scrub is
ammended to check that the sstable is not mutated. The test now fails
before the fix and passes afterwards.

Fixes: #14309

Closes #15139
2023-08-23 21:53:12 +03:00
Nadav Har'El
5530c529c2 test/cql-pytest: regression test for old bug with CAST(f AS TEXT) precision
When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd7475c, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs #15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15131
2023-08-23 16:06:52 +03:00
Nadav Har'El
a963b59495 test/cql-pytest: add reproducer for IN not working with secondary index
We already have a test for issue #13533, where an "IN" doesn't work with
a secondary index (the secondary index isn't used in that case, and
instead inefficient filtering is required). Recently a user noticed the
same problem also exists for local secondary indexes - and this patch
includes a reproducing test. The new test is marked xfail, as the issue is
still unfixed. The new test is Scylla-only because local secondary index
is a Scylla-only extension that doesn't exist in Cassandra.

Refs #13533.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15106
2023-08-22 07:25:32 +03:00
Nadav Har'El
18e8e62798 cql-pytest: translate Cassandra's tests for SELECT with LIMIT
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectLimitTest.java into our cql-pytest framework.

The tests reproduce two already-known bugs:

Refs #9879:  Using PER PARTITION LIMIT with aggregate functions should
             fail as Invalid query
Refs #10357: Spurious static row returned from query with filtering,
             despite not matching filter

And also helped discover two new issues:

Refs #15099: Incorrect sort order when combining IN, and ORDER BY
Refs #15109: PER PARTITION LIMIT should be rejected if SELECT DISTINCT
             is used

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15114
2023-08-21 22:29:11 +03:00
Avi Kivity
4f7e83a4d0 cql3: select_statement: reject DISTINCT with GROUP BY on clustering keys
While in SQL DISTINCT applies to the result set, in CQL it applies
to the table being selected, and doesn't allow GROUP BY with clustering
keys. So reject the combination like Cassandra does.

While this is not an important issue to fix, it blocks un-xfailing
other issues, so I'm clearing it ahead of fixing those issues.

An issue is unmarked as xfail, and other xfails lose this issue
as a blocker.

Fixes #12479

Closes #14970
2023-08-07 15:35:59 +03:00
Nadav Har'El
b55b8f29b9 test/cql-pytest: test confirming that casting to counter doesn't work
In the previous patch we implemented CAST operations from the COUNTER
type to various other types. We did not implement the reverse cast,
from different types to the counter type. Should we? In this patch
we add a test that shows we don't need to bother - Cassandra does not
support such casts, so it's fine that we don't too - and indeed the
test shows we don't support them.
It's not a useful operation anyway.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-07-30 20:16:25 +03:00
Nadav Har'El
b513bba201 cql: support casting of counter to other types
We were missing support in the "CAST(x AS type)" function for the counter
type. This patch adds this support, as well as extensive testing that it
works in Scylla the same as Cassandra.

We also un-xfail an existing test translated from Cassandra's unit
test. But note that this old test did not cover all the edge-cases that
the new test checks - some missing cases in the implementation were
not caught by the old test.

Fixes #14501

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-07-30 20:16:25 +03:00
Nadav Har'El
c1762750ed cql: implement missing counterasblob() and blobascounter() functions
Code in functions.cc creates the different TYPEasblob() and blobasTYPE()
functions for all type names TYPE. The functions for the "counter" type
were skipped, supposedly because "counters are not supported yet". But
counters are supported, so let's add the missing functions.

The code fix is trivial, the tests that verify that the result behaves
like Cassandra took more work.

After this patch, unimplemented::cause::COUNTERS is no longer used
anywhere in the code. I wanted to remove it, but noticed that
unimplemented::cause is a graveyard of unused causes, so decided not
to remove this one either. We should clean it up in a separate patch.

Fixes #14742

Also includes tests for tangently-related issues:
Refs #12607
Refs #14319

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-07-30 20:16:25 +03:00
Alexey Novikov
ff721ec3e3 make timestamp string format cassandra compatible
when we convert timestamp into string it must look like: '2017-12-27T11:57:42.500Z'
it concerns any conversion except JSON timestamp format
JSON string has space as time separator and must look like: '2017-12-27 11:57:42.500Z'
both formats always contain milliseconds and timezone specification

Fixes #14518
Fixes #7997

Closes #14726
2023-07-27 12:01:09 +03:00
Nadav Har'El
d2ca600eec test/*/run: kill Scylla with SIGTERM
Today, test/*/run always kills Scylla at the end of the test with
SIGKILL (kill -9), so the Scylla shutdown code doesn't run. It was
believed that a clean shutdown would take a long time, but in fact,
it turns out that 99% of the shutdown time was a silly sleep in the
gossip code, which this patch disables with the "--shutdown-announce-in-ms"
option.

After enabling this option, clean shutdown takes (in a dev build on
my laptop) just 0.02 seconds. It's worth noting that this shutdown
has no real work to do - no tables to flush, and so on, because the
pytest framework removes all the tables in its own fixture cleanup
phase.

So in this patch, to kill Scylla we use SIGTERM (15) instead of SIGKILL.
We then wait until a timeout of 10 seconds (much much more than 0.02
seconds!) for Scylla to exit. If for some reason it didn't exit (e.g.,
it hung during the shutdown), it is killed again with SIGKILL, which
is guaranteed to succed.

This change gives us two advantages

1. Every test run with test/*/run exercises the shutdown path. It is perhaps
   excessive, but since the shutdown is so quick, there is no big downside.

2. In a test-coverage run, a clean shutdown allows flushing the counter
   files, which wasn't possible when Scylla was killed with KILL -9.

Fixes #8543

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14825
2023-07-26 14:06:24 +03:00
Jan Ciolek
cbc97b41d4 cql.g: make the parser reject INSERT JSON without a JSON value
We allow inserting column values using a JSON value, eg:
```cql
INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}';
```

When no JSON value is specified, the query should be rejected.

Scylla used to crash in such cases. A recent change fixed the crash
(https://github.com/scylladb/scylladb/pull/14706), it now fails
on unwrapping an uninitialized value, but really it should
be rejected at the parsing stage, so let's fix the grammar so that
it doesn't allow JSON queries without JSON values.

A unit test is added to prevent regressions.

Refs: https://github.com/scylladb/scylladb/pull/14707
Fixes: https://github.com/scylladb/scylladb/issues/14709

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #14785
2023-07-21 18:52:47 +03:00
Avi Kivity
e00811caac cql3: grammar: reject intValue with no contents
The grammar mistakenly allows nothing to be parsed as an
intValue (itself accepted in LIMIT and similar clauses).

Easily fixed by removing the empty alternative. A unit test is
added.

Fixes #14705.

Closes #14707
2023-07-21 00:24:51 +03:00
Avi Kivity
460b28d067 Merge 'Introduce SELECT MUTATION FRAGMENTS statement' from Botond Dénes
SELECT MUTATION FRAGMENTS is a new select statement sub-type, which allows dumping the underling mutations making up the data of a given table. The output of this statement is mutation-fragments presented as CQL rows. Each row corresponds to a mutation-fragment. Subsequently, the output of this statement has a schema that is different than that of the underlying table.  The output schema is derived from the table's schema, as following:
* The table's partition key is copied over as-is
* The clustering key is formed from the following columns:
    - mutation_source (text): the kind of the mutation source, one of: memtable, row-cache or sstable; and the identifier of the individual mutation source.
    - partition_region (int): represents the enum with the same name.
    - the copy of the table's clustering columns
    - position_weight (int): -1, 0 or 1, has the same meaning as that in position_in_partition, used to disambiguate range tombstone changes with the same clustering key, from rows and from each other.
* The following regular columns:
    - metadata (text): the JSON representation of the mutation-fragment's metadata.
    - value (text): the JSON representation of the mutation-fragment's value.

Data is always read from the local replica, on which the query is executed. Migrating queries between coordinators is frobidden.

More details in the documentation commit (last commit).

Example:
```cql
cqlsh> CREATE TABLE ks.tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck));

cqlsh> DELETE FROM ks.tbl WHERE pk = 0;
cqlsh> DELETE FROM ks.tbl WHERE pk = 0 AND ck > 0 AND ck < 2;
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 0, 0);
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 1, 0);
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 2, 0);
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (1, 0, 0);
cqlsh> SELECT * FROM ks.tbl;

 pk | ck | v
----+----+---
  1 |  0 | 0
  0 |  0 | 0
  0 |  1 | 0
  0 |  2 | 0

(4 rows)
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl);

 pk | mutation_source | partition_region | ck | position_weight | metadata                                                                                                                 | mutation_fragment_kind | value
----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
  1 |      memtable:0 |                0 |    |                 |                                                                                                         {"tombstone":{}} |        partition start |      null
  1 |      memtable:0 |                2 |  0 |               0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} |         clustering row | {"v":"0"}
  1 |      memtable:0 |                3 |    |                 |                                                                                                                     null |          partition end |      null
  0 |      memtable:0 |                0 |    |                 |                                      {"tombstone":{"timestamp":1688122848686316,"deletion_time":"2023-06-30 11:00:48z"}} |        partition start |      null
  0 |      memtable:0 |                2 |  0 |               0 | {"marker":{"timestamp":1688122860037077},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122860037077}}} |         clustering row | {"v":"0"}
  0 |      memtable:0 |                2 |  0 |               1 |                                      {"tombstone":{"timestamp":1688122853571709,"deletion_time":"2023-06-30 11:00:53z"}} | range tombstone change |      null
  0 |      memtable:0 |                2 |  1 |               0 | {"marker":{"timestamp":1688122864641920},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122864641920}}} |         clustering row | {"v":"0"}
  0 |      memtable:0 |                2 |  2 |              -1 |                                                                                                         {"tombstone":{}} | range tombstone change |      null
  0 |      memtable:0 |                2 |  2 |               0 | {"marker":{"timestamp":1688122868706989},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122868706989}}} |         clustering row | {"v":"0"}
  0 |      memtable:0 |                3 |    |                 |                                                                                                                     null |          partition end |      null

(10 rows)
```

Perf simple query:
```
/build/release/scylla perf-simple-query -c1 -m2G --duration=60
```

Before:
```
median 141596.39 tps ( 62.1 allocs/op,  13.1 tasks/op,   43688 insns/op,        0 errors)
median absolute deviation: 137.15
maximum: 142173.32
minimum: 140492.37
```
After:
```
median 141889.95 tps ( 62.1 allocs/op,  13.1 tasks/op,   43692 insns/op,        0 errors)
median absolute deviation: 167.04
maximum: 142380.26
minimum: 141025.51
```

Fixes: https://github.com/scylladb/scylladb/issues/11130

Closes #14347

* github.com:scylladb/scylladb:
  docs/operating-scylla/admin-tools: add documentation for the SELECT * FROM MUTATION_FRAGMENTS() statement
  test/topology_custom: add test_select_from_mutation_fragments.py
  test/boost/database_test: add test for mutation_dump/generate_output_schema_from_underlying_schema
  test/cql-pytest: add test_select_mutation_fragments.py
  test/cql-pytest: move scylla_data_dir fixture to conftest.py
  cql3/statements: wire-in mutation_fragments_select_statement
  cql3/restrictions/statement_restrictions: fix indentation
  cql3/restrictions/statement_restrictions: add check_indexes flag
  cql3/statments/select_statement: add mutation_fragments_select_statement
  cql3: add SELECT MUTATION FRAGMENTS select statement sub-type
  service/pager: allow passing a query functor override
  service/storage_proxy: un-embed coordinator_query_options
  replica: add mutation_dump
  replica: extract query_state into own header
  replica/table: add make_nonpopulating_cache_reader()
  replica/table: add select_memtables_as_mutation_sources()
  tools,mutation: extract the low-level json utilities into mutation/json.hh
  tools/json_writer: fold SstableKey() overloads into callers
  tools/json_writer: allow writing metadata and value separately
  tools/json_writer: split mutation_fragment_json_writer in two classes
  tools/json_writer: allow passing custom std::ostream to json_writer
2023-07-19 11:54:11 +03:00
Avi Kivity
503d21b570 cql3: expr: avoid separating column_mutation_attribute from its column_value when levellizing aggregation depth
Since ec77172b4b (" Merge 'cql3: convert
the SELECT clause evaluation phase to expressions' from Avi Kivity"),
we rewrite non-aggregating selectors to include an aggregation, in order
to have the rest of the code either deal with no aggregation, or
all selectors aggregating, with nothing in between. This is done
by wrapping column selectors with "first" function calls: col ->
first(col).

This broke non-aggregating selectors that included the ttl() or
writetime() pseudo functions. This is because we rewrote them as
writetime(first(col)), and writetime() isn't a function that operates
on any values; it operates on mutations and so must have access to
a column, not an expression.

Fix by detecting this scenario and rewriting the expression as
first(writetime(col)).

Unit and integration tests are added.

Fixes #14715.

Closes #14716
2023-07-19 11:35:01 +03:00
Botond Dénes
6709a71b96 test/cql-pytest: add test_select_mutation_fragments.py 2023-07-19 01:28:28 -04:00
Botond Dénes
05e010b1d3 test/cql-pytest: move scylla_data_dir fixture to conftest.py
It will soon be used by more than one test file.
2023-07-19 01:28:28 -04:00
Nadav Har'El
4ce46a998a cql-pytest: translate Cassandra's tests for BATCH operations
This is a translation of Cassandra's CQL unit test source file
BatchTest.java into our cql-pytest framework.

This test file an old (2014) and small test file, with only a few minimal
testing of mostly error paths in batch statements. All test tests pass in
both Cassandra and Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14733
2023-07-18 17:01:18 +03:00
Michał Jadwiszczak
62ced66702 schema: add scylla specific options to schema description
Add `paxos_grace_seconds`, `tombstone_gc`, `cdc` and `synchronous_updates`
options to schema description.

Fixes: #12389
Fixes: scylladb/scylla-enterprise#2979

Closes #14275
2023-07-18 11:16:19 +03:00
Nadav Har'El
f08bc83cb2 cql-pytest: translate Cassandra's tests for CAST operations
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs #14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs #14508: SELECT CAST column names should match Cassandra's
Refs #14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs #14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14528
2023-07-12 11:42:04 +03:00
Avi Kivity
f6f974cdeb cql3: selection: fix GROUP BY, empty groups, and aggregations
A GROUP BY combined with aggregation should produce a single
row per group, except for empty groups. This is in contrast
to an aggregation without GROUP BY, which produces a single
row no matter what.

The existing code only considered the case of no grouping
and forced a row into the result, but this caused an unwanted
row if grouping was used.

Fix by refining the check to also consider GROUP BY.

XFAIL tests are relaxed.

Fixes #12477.

Note, forward_service requires that aggregation produce
exactly one row, but since it can't work with grouping,
it isn't affected.

Closes #14399
2023-06-28 18:56:22 +03:00
Michał Jadwiszczak
0a8fcead08 cql3: Specify arguments types in UDA creation errors
Display not only function name but also expected arguments
if `state_function` or `final_function` was not found.

Fixes: #12088

Closes #14278
2023-06-28 15:27:49 +03:00
Jan Ciolek
ccdb26bf9e statements/cas_request: fix crash on empty clustering range in LWT
LWT queries with empty clustering range used to cause a crash.
For example in:
```cql
UPDATE tab SET r = 9000 WHERE p = 1  AND c = 2 AND c = 2000 IF r = 3
```
The range of `c` is empty - there are no valid values.

This caused a segfault when accessing the `first` range:
```c++
op.ranges.front()
```

To fix it let's throw en exception when the clustering range
is empty. Cassandra also rejects queries with `c = 1 AND c = 2`.

There's also a check for empty partition range, as it used
to crash in the past, can't really hurt to add it.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-28 10:18:06 +02:00
Benny Halevy
9231a6c480 cql-pytest: test_using_timestamp: increase ttl
It seems like the current 1-second TTL is too
small for debug build on aarch64 as seen in
https://jenkins.scylladb.com/job/scylla-master/job/build/1513/artifact/testlog/aarch64/debug/cql-pytest.test_using_timestamp.1.log
```
            k = unique_key_int()
            cql.execute(f"INSERT INTO {table} (k, v) VALUES ({k}, {v1}) USING TIMESTAMP {ts} and TTL 1")
            cql.execute(f"INSERT INTO {table} (k, v) VALUES ({k}, {v2}) USING TIMESTAMP {ts}")
>           assert_value(k, v1)

test/cql-pytest/test_using_timestamp.py:140:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

k = 10, expected = 2

    def assert_value(k, expected):
        select = f"SELECT k, v FROM {table} WHERE k = {k}"
        res = list(cql.execute(select))
>       assert len(res) == 1
E       assert 0 == 1
E        +  where 0 = len([])
```

Increase the TTL used to write data to de-flake the test
on slow machines running debug build.

Ref #14182

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #14396
2023-06-26 21:35:31 +03:00
Nadav Har'El
0a1283c813 Merge 'cql3:statements:describe_statement: check pointer after casting to UDF/UDA' from Michał Jadwiszczak
There was a bug in describe_statement. If executing `DESC FUNCTION  <uda name>` or ` DESC AGGREGATE <udf name>`, Scylla was crashing because the function was found (`functions::find()` searches both UDFs and UDAs) but the function was bad and the pointer wasn't checked after cast.

Added a test for this.

Fixes: #14360

Closes #14332

* github.com:scylladb/scylladb:
  cql-pytest:test_describe: add test for filtering UDF and UDA
  cql3:statements:describe_statement: check pointer to UDF/UDA
2023-06-22 20:54:25 +03:00
Michał Jadwiszczak
d3d9a15505 cql-pytest:test_describe: add test for filtering UDF and UDA 2023-06-22 18:08:45 +02:00
Botond Dénes
e1c2de4fb8 Merge 'forward_service: fix forgetting case-sensitivity in aggregates ' from Jan Ciołek
There was a bug that caused aggregates to fail when used on column-sensitive columns.

For example:
```cql
SELECT SUM("SomeColumn") FROM ks.table;
```
would fail, with a message saying that there is no column "somecolumn".

This is because the case-sensitivity got lost on the way.

For non case-sensitive column names we convert them to lowercase, but for case sensitive names we have to preserve the name as originally written.

The problem was in `forward_service` - we took a column name and created a non case-sensitive `column_identifier` out of it.
This converted the name to lowercase, and later such column couldn't be found.

To fix it, let's make the `column_identifier` case-sensitive.
It will preserve the name, without converting it to lowercase.

Fixes: https://github.com/scylladb/scylladb/issues/14307

Closes #14340

* github.com:scylladb/scylladb:
  service/forward_service.cc: make case-sensitivity explicit
  cql-pytest/test_aggregate: test case-sensitive column name in aggregate
  forward_service: fix forgetting case-sensitivity in aggregates
2023-06-22 08:25:33 +03:00
Jan Ciolek
854b0301be cql-pytest/test_aggregate: test case-sensitive column name in aggregate
There was a bug which made aggregates fail when used with case-sensitive
column names.
Add a test to make sure that this doesn't happen in the future.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-06-21 14:49:24 +02:00
Nadav Har'El
8a9de08510 sstable: limit compression chunk size to 128 KB
The chunk size used in sstable compression can be set when creating a
table, using the "chunk_length_in_kb" parameter. It can be any power-of-two
multiple of 1KB. Very large compression chunks are not useful - they
offer diminishing returns on compression ratio, and require very large
memory buffers and reading a very large amount of disk data just to
read a small row. In fact, small chunks are recommended - Scylla
defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB
(in Cassandra 3) to 16 KB (in Cassandra 4).

Therefore, allowing arbitrarily large chunk sizes is just asking for
trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang
Scylla when it runs out of memory. So in this patch we add a hard limit
of 128 KB for the chunk size - anything larger is refused.

Fixes #9933

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14267
2023-06-21 14:26:02 +03:00
Tomasz Grabiec
87b4606cd6 Merge 'atomic_cell: compare value last' from Benny Halevy
Currently, when two cells have the same write timestamp
and both are alive or expiring, we compare their value first,
before checking if either of them is expiring
and if both are expiring, comparing their expiration time
and ttl value to determine which of them will expire
later or was written later.

This was based on an early version of Cassandra.
However, the Cassandra implementation rightfully changed in
e225c88a65 ([CASSANDRA-14592](https://issues.apache.org/jira/browse/CASSANDRA-14592)),
where the cell expiration is considered before the cell value.

To summarize, the motivation for this change is three fold:
1. Cassandra compatibility
2. Prevent an edge case where a null value is returned by select query when an expired cell has a larger value than a cell with later expiration.
3. A generalization of the above: value-based reconciliation may cause select query to return a mixture of upserts, if multiple upserts use the same timeastamp but have different expiration times.  If the cell value is considered before expiration, the select result may contain cells from different inserts, while reconciling based the expiration times will choose cells consistently from either upserts, as all cells in the respective upsert will carry the same expiration time.

Fixes #14182

Also, this series:
- updates dml documentation
- updates internal documentation
- updates and adds unit tests and cql pytest reproducing #14182

Closes #14183

* github.com:scylladb/scylladb:
  docs: dml: add update ordering section
  cql-pytest: test_using_timestamp: add tests for rewrites using same timestamp
  mutation_partition: compare_row_marker_for_merge: consider ttl in case expiry is the same
  atomic_cell: compare_atomic_cell_for_merge: update and add documentation
  compare_atomic_cell_for_merge: compare value last for live cells
  mutation_test: test_cell_ordering: improve debuggability
2023-06-20 12:11:48 +02:00
Benny Halevy
31a3152a59 cql-pytest: test_using_timestamp: add tests for rewrites using same timestamp
Add reproducers for #14182:

test_rewrite_different_values_using_same_timestamp verifies
expiration-based cell reconciliation.

test_rewrite_different_values_using_same_timestamp_and_expiration
is a scylla_only test, verifying that when
two cells with same timestamp and same expiration
are compared, the one with the lesser ttl prevails.

test_rewrite_using_same_timestamp_select_after_expiration
reproduces the specific issue hit in #14182
where a cell is selected after it expires since
it has a lexicographically larger value than
the other cell with later expiration.

test_rewrite_multiple_cells_using_same_timestamp verifies
atomicity of inserts of multiple columns, with a TTL.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-20 10:10:39 +03:00
Nadav Har'El
7deba4f4a5 test/cql-pytest: add tests reproducing bugs in compression configuration
This patch adds some minimal tests for the "with compression = {..}" table
configuration. These tests reproduce three known bugs:

Refs #6442: Always print all schema parameters (including default values)

  Scylla doesn't return the default chunk_length_in_kb, but Cassandra
  does.

Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

  Cassandra switched, long ago, the "sstable_compression" attribute's
  name to "class". This can break Cassandra applications that create
  tables (where we won't understand the "class" parameter) and applications
  that inquire about the configuration of existing tables. This patch adds
  tests for both problems.

Refs #9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a
            core dump on all nodes

  Our test for this issue hangs Scylla (or crashes, depending on the test
  environment configuration), when a huge allocation is attempted during
  memtable flush. So this test is marked "skip" instead of xfail.

The tests included here also uncovered a new minor/insignificant bug,
where Scylla allows floating point numbers as chunk_length_in_kb - this
number is truncated to an integer, and allowed, unlike Cassandra or
common sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14261
2023-06-20 06:36:13 +03:00
Nadav Har'El
a66c407bf1 Merge 'scylla-sstable: add scrub operation' from Botond Dénes
Exposing scrub compaction to the command-line. Allows for offline scrub of sstables, in cases where online scrubbing (via scylla itself) is not possible or not desired. One such case recently was an sstable from a backup which turned out to be corrupt, `nodetool refresh --load-and-stream` refusing to load it.

Fixes: #14203

Closes #14260

* github.com:scylladb/scylladb:
  docs/operating-scylla/admin-tools: scylla-sstable: document scrub operation
  test/cql-pytest: test_tools.py: add test for scylla sstable scrub
  tools/scylla-sstable: add scrub operation
  tools/scylla-sstable: write operation: add none to valid validation levels
  tools/scylla-sstable: handle errors thrown by the operation
  test/cql-pytest: add option to omit scylla's output from the test output
  tools/scylla-sstable: s/option/operation_option/
  tool/scylla-sstable: add missing comments
2023-06-19 15:40:51 +03:00
Benny Halevy
b0bcad0c91 cql-pytest: rename test-timestamp.py to test_using_timestamp.py
1. Otherwise test.py doesn't recognize it.
2. As it represents what the test does in a better way.
3. Following `test_using_timeout.py` naming convention.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-19 13:26:24 +03:00
Benny Halevy
19208c42dc cql-pytest: test-timestamp: test_key_writetime: update expected errors
The error messages were changed in
b7bbcdd178.

Extend the `match` regular expression param
to pytest.raises to include both old and new message
to remain backward compatible also with Cassandra,
as this test is run against both Cassandra and Scylla.

Note that the test didn't run automatically
since it's named `test-timestamp.py` and test.py
looks up only test scripts beginning with `test_`.
The test will be renamed in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-19 13:25:13 +03:00
Michał Chojnowski
db0871a644 test: test_keyspace: add a test checking that ALTER KEYSPACE preserves UDTs
Reproduces #14139

Closes #14144
2023-06-18 16:50:39 +03:00
Botond Dénes
19708d39ae test/cql-pytest: test_tools.py: add test for scylla sstable scrub
The tests are meant to excercise the command line interface and the
plumbing, not the scrub logic itself, we have dedicated tests for that.
2023-06-16 06:20:14 -04:00
Botond Dénes
e32fdcba06 test/cql-pytest: add option to omit scylla's output from the test output
Scylla's output is often unnecessary to debug a failed test, or even
detrimental because one has to scroll back in the terminal after each
test run, to see the actual test's output. Add an option,
--omit-scylla-output, which when present on the command line of `run`,
the output of scylla will be omitted from the test output.
Also, to help discover this option (and others), don't run the tests
when either -h or --help is present on the command line. Just invoke
pytest (with said option) and exit.
2023-06-16 06:20:14 -04:00
Nadav Har'El
e1513f1199 Merge 'cql3: prepare selectors' from Avi Kivity
CQL statements carry expressions in many contexts: the SELECT, WHERE, SET, and IF clauses, plus various attributes. Previously, each of these contexts had its own representation for an expression, and another one for the same expression but before preparation. We have been gradually moving towards a uniform representation of expressions.

This series tackles SELECT clause elements (selectors), in their unprepared phase. It's relatively simple since there are only five types of expression components (column references, writetime/ttl modifiers, function calls, casts, and field selections). Nevertheless, there isn't much commonality with previously converted expression elements so quite a lot of code is involved.

After the series, we are still left with a custom post-prepare representation of expressions. It's quite complicated since it deals with two passes, for aggregation, so it will be left for another series.

Closes #14219

* github.com:scylladb/scylladb:
  cql3: seletor: drop inheritance from assignment_testable
  cql3: selection: rely on prepared expressions
  cql3: selection: prepare selector expressions
  cql3: expr: match counter arguments to function parameters expecting bigint
  cql3: expr: avoid function constant-folding if a thread is needed
  cql3: add optional type annotation to assignment_testable
  cql3: expr: wire unresolved_identifier to test_assignment()
  cql3: expr: support preparing column_mutation_attribute
  cql3: expr: support preparing SQL-style casts
  cql3: expr: support preparing field_selection expressions
  cql3: expr: make the two styles of cast expressions explicit
  cql3: error injection functions: mark enabled_injections() as impure
  cql3: eliminate dynamic_cast<selector> from functions::get()
  cql3: test_assignment: pass optional schema everywhere
  cql3: expr: prepare_expr(): allow aggregate functions
  cql3: add checks for aggregation functions after prepare
  cql3: expr: add verify_no_aggregate_functions() helper
  test: add regression test for rejection of aggregates in the WHERE clause
  cql3: expr: extract column_mutation_attribute_type
  cql3: expr: add fmt formatter for column_mutation_attribute_kind
  cql3: statements: select_statement: reuse to_selectable() computation in SELECT JSON
2023-06-15 15:59:41 +03:00
Nadav Har'El
3a73048bc9 test/cql-pytest: reproducer for bug of PER PARTITION LIMIT with INDEX
This patch adds an xfailing test reproducing the bug in issue #12762:
When a SELECT uses a secondary index to list rows, if there is also a
PER PARTITION LIMIT given, Scylla forgets to apply it.

The test shows that the PER PARTITION LIMIT is correctly applied when
the index doesn't exist, but forgotten when the index is added.
In contrast, both cases work correctly in Cassandra.

This patch also adds a second variant of this test, which adds filtering
to the mix, and ensures that PER PARTITION LIMIT 1 doesn't give up on the
first row of each partition - but rather looks for the first row that
passes the filter, and only then moves on to the next partition.

Refs #12762.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14248
2023-06-15 09:17:50 +03:00
Nadav Har'El
5a75713ea7 cql-pytest: translate Cassandra's test for UPDATE operations
This is a translation of Cassandra's CQL unit test source file
validation/operations/UpdateTest.java into our cql-pytest framework.

There are 18 tests, and they did not reproduce any previously-unknown
bug, but did provide additional reproducers for two known issues:

Refs #12243: Setting USING TTL of "null" should be allowed

Refs #12474: DELETE/UPDATE print misleading error message suggesting
             ALLOW FILTERING would work
     Note that we knew about this issue for the DELETE operation, and
     the new test shows the same issue exists for UPDATE.

I had to modify some of the tests to allow for different error messages
in ScyllaDB (in cases where the different message makes sense), as well
as cases where we decided to allow in Scylla some behaviors that are
forbidden in Cassandra - namely Refs #12472.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14222
2023-06-14 12:31:15 +03:00
Avi Kivity
e7c1824ed0 test: add regression test for rejection of aggregates in the WHERE clause
The test passes on Cassandra and ScyllaDB.
2023-06-13 21:04:49 +03:00
Avi Kivity
78f4ee385f cql3: functions: fix count(col) for non-scalar types
count(col), unlike count(*), does not count rows for which col is NULL.
However, if col's data type is not a scalar (e.g. a collection, tuple,
or user-defined type) it behaves like count(*), counting NULLs too.

The cause is that get_dynamic_aggregate() converts count() to
the count(*) version. It works for scalars because get_dynamic_aggregate()
intentionally fails to match scalar arguments, and functions::get() then
matches the arguments against the pre-declared count functions.

As we can only pre-declare count(scalar) (there's an infinite number
of non-scalar types), we change the approach to be the same as min/max:
we make count() a generic function. In fact count(col) is much better
as a generic function, as it only examines its input to see if it is
NULL.

A unit test is added. It passes with Cassandra as well.

Fixes #14198.

Closes #14199
2023-06-13 14:40:14 +03:00