Commit Graph

4075 Commits

Author SHA1 Message Date
Tomasz Grabiec
ebcd736343 cache: Fix undefined behavior when populating with non-full keys
Regression introduced in 23e4c8315.

view_and_holder position_in_partiton::after_key() triggers undefined
behavior when the key was not full because the holder is moved, which invalidates the view.

Fixes #12367

Closes #12447
2023-01-10 12:51:54 +02:00
Kamil Braun
822410c49b test/pylib: scylla_cluster: release IPs when cluster is no longer needed
With sufficiently many test cases we would eventually run out of IP
addresses, because IPs (which are leased from a global host registry)
would only be released at the end of an entire test suite.

In fact we already hit this during next promotions, causing much pain
indeed.

Release IPs when a cluster, after being marked dirty, is stopped and
thrown away.

Closes #12482
2023-01-10 06:59:41 +02:00
Avi Kivity
e71e1dc964 Merge 'tools/scylla-sstable: add lua scripting support' from Botond Dénes
Introduce a new "script" operation, which loads a script from the specified path, then feeds the mutation fragment stream to it. The script can then extract, process and present information from the sstable as it wishes.
For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available.

Example:
```lua
function new_stats(key)
    return {
        partition_key = key,
        total = 0,
        partition = 0,
        static_row = 0,
        clustering_row = 0,
        range_tombstone_change = 0,
    };
end

total_stats = new_stats(nil);

function inc_stat(stats, field)
    stats[field] = stats[field] + 1;
    stats.total = stats.total + 1;
    total_stats[field] = total_stats[field] + 1;
    total_stats.total = total_stats.total + 1;
end

function on_new_sstable(sst)
    max_partition_stats = new_stats(nil);
    if sst then
        current_sst_filename = sst.filename;
    else
        current_sst_filename = nil;
    end
end

function consume_partition_start(ps)
    current_partition_stats = new_stats(ps.key);
    inc_stat(current_partition_stats, "partition");
end

function consume_static_row(sr)
    inc_stat(current_partition_stats, "static_row");
end

function consume_clustering_row(cr)
    inc_stat(current_partition_stats, "clustering_row");
end

function consume_range_tombstone_change(crt)
    inc_stat(current_partition_stats, "range_tombstone_change");
end

function consume_partition_end()
    if current_partition_stats.total > max_partition_stats.total then
        max_partition_stats = current_partition_stats;
    end
end

function on_end_of_sstable()
    if current_sst_filename then
        print(string.format("Stats for sstable %s:", current_sst_filename));
    else
        print("Stats for stream:");
    end
    print(string.format("\t%d fragments in %d partitions - %d static rows, %d clustering rows and %d range tombstone changes",
        total_stats.total,
        total_stats.partition,
        total_stats.static_row,
        total_stats.clustering_row,
        total_stats.range_tombstone_change));
    print(string.format("\tPartition with max number of fragments (%d): %s - %d static rows, %d clustering rows and %d range tombstone changes",
        max_partition_stats.total,
        max_partition_stats.partition_key,
        max_partition_stats.static_row,
        max_partition_stats.clustering_row,
        max_partition_stats.range_tombstone_change));
end
```
Running this script wilt yield the following:
```
$ scylla sstable script --script-file fragment-stats.lua --system-schema system_schema.columns /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-1-big-Data.db
Stats for sstable /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//me-1-big-Data.db:
        397 fragments in 7 partitions - 0 static rows, 362 clustering rows and 28 range tombstone changes
        Partition with max number of fragments (180): system - 0 static rows, 179 clustering rows and 0 range tombstone changes
```

Fixes: https://github.com/scylladb/scylladb/issues/9679

Closes #11649

* github.com:scylladb/scylladb:
  tools/scylla-sstable: consume_reader(): improve pause heuristincs
  test/cql-pytest/test_tools.py: add test for scylla-sstable script
  tools: add scylla-sstable-scripts directory
  tools/scylla-sstable: remove custom operation
  tools/scylla-sstable: add script operation
  tools/sstable: introduce the Lua sstable consumer
  dht/i_partitioner.hh: ring_position_ext: add weight() accessor
  lang/lua: export Scylla <-> lua type conversion methods
  lang/lua: use correct lib name for string lib
  lang/lua: fix type in aligned_used_data (meant to be user_data)
  lang/lua: use lua_State* in Scylla type <-> Lua type conversions
  tools/sstable_consumer: more consistent method naming
  tools/scylla-sstable: extract sstable_consumer interface into own header
  tools/json_writer: add accessor to underlying writer
  tools/scylla-sstable: fix indentation
  tools/scylla-sstable: export mutation_fragment_json_writer declaration
  tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer
  tools/scylla-sstable: extract json writing logic from json_dumper
  tools/scylla-sstable: extract json_writer into its own header
  tools/scylla-sstable: use json_writer::DataKey() to write all keys
  tools/scylla-types: fix use-after-free on main lambda captures
2023-01-09 20:54:42 +02:00
Raphael S. Carvalho
05ffb024bb replica: Kill table::calculate_shard_from_sstable_generation()
Inferring shard from generation is long gone. We still use it in
some scripts, but that's no longer needed in Scylla, when loading
the SSTables, and it also conflicts with ongoing work of UUID-based
generations.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12476
2023-01-09 20:17:57 +02:00
Nadav Har'El
d6e6820f33 Merge 'Drop support for cql binary protocols versions 1 and 2' from Avi Kivity
The CQL binary protocol version 3 was introduced in 2014. All Scylla
version support it, and Cassandra versions 2.1 and newer.

Versions 1 and 2 have 16-bit collection sizes, while protocol 3 and newer
use 32-bit collection sizes.

Unfortunately, we implemented support for multiple serialization formats
very intrusively, by pushing the format everywhere. This avoids the need
to re-serialize (sometimes) but is quite obnoxious. It's also likely to be
broken, since it's almost untested and it's too easy to write
cql_serialization_format::internal() instead of propagating the client
specified value.

Since protocols 1 and 2 are obsolete for 9 years, just drop them. It's
easy to verify that they are no longer in use on a running system by
examining the `system.clients` table before upgrade.

Fixes #10607

Closes #12432

* github.com:scylladb/scylladb:
  treewide: drop cql_serialization_format
  cql: modification_statement: drop protocol check for LWT
  transport: drop cql protocol versions 1 and 2
2023-01-09 18:52:41 +02:00
Botond Dénes
1d222220e0 test/cql-pytest/test_tools.py: add test for scylla-sstable script
To test the script operation, we use some of the example scripts from
the example directory. Namely, dump.lua and slice.lua. These two scripts
together have a very good coverage of the entire script API. Testing
their functionality therefore also provides a good coverage of the lua
bindings. A further advantage is that since both scripts dump output in
identical format to that of the data-dump operation, it is trivial to do
a comparison against this already tested operation.
A targeted test is written for the sstable skip functionality of the
consumer API.
2023-01-09 09:46:57 -05:00
Tomasz Grabiec
f97268d8f2 row_cache: Fix violation of the "oldest version are evicted first" when evicting last dummy
Consider the following MVCC state of a partition:

   v2: ==== <7> [entry2] ==== <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

Where === means a continuous range and --- means a discontinuous range.

After two LRU items are evicted (entry1 and entry2), we will end up with:

   v2: ---------------------- <9> ===== <last dummy>
   v1: ================================ <last dummy> [entry1]

This will cause readers to incorrectly think there are no rows before
entry <9>, because the range is continuous in v1, and continuity of a
snapshot is a union of continuous intervals in all versions. The
cursor will see the interval before <9> as continuous and the reader
will produce no rows.

This is only temporary, because current MVCC merging rules are such
that the flag on the latest entry wins, so we'll end up with this once
v1 is no longer needed:

   v2: ---------------------- <9> ===== <last dummy>

...and the reader will go to sstables to fetch the evicted rows before
entry <9>, as expected.

The bug is in rows_entry::on_evicted(), which treats the last dummy
entry in a special way, and doesn't evict it, and doesn't clear the
continuity by omission.

The situation is not easy to trigger because it requires certain
eviction pattern concurrent with multiple reads of the same partition
in different versions, so across memtable flushes.

Closes #12452
2023-01-09 16:10:52 +02:00
Nadav Har'El
2d845b6244 test/cql-pytest: a test for more than one equality in WHERE
Cassandra refuses a request with more than one equality relation to the
same column, for example

    DELETE FROM tbl WHERE partitionKey = ? AND partitionKey = ?

It complains that

    partitionkey cannot be restricted by more than one relation if it
    includes an Equal

Currently, Scylla doesn't consider such requests an error. Whether or
not we should be compatible with Cassandra here is discussed in
issue #12472. But as long as we do accept this query, we should be
sure we do the right thing: "WHERE p = 1 AND p = 2" should match
nothing (not the first, or last, value being tested..), and "WHERE p = 1
AND p = 1" should match the matches of p = 1. This patch adds a test
for verify that these requests indeed yield correct results. The
test is scylla_only because, as explained above, Cassandra doesn't
support this feature at all.

Refs #12472

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12473
2023-01-09 11:56:39 +02:00
Alejo Sanchez
d632e1aa7a test/pytest: add missing import, remove unused import
Add missed import time and remove unused name import.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12446
2023-01-08 17:38:46 +02:00
Avi Kivity
5ffe4fee6d Merge 'Remove legacy half reverse' from Michał Radwański
This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).

As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.

In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.

As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:

The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains

```
     if (reverse == consume_in_reverse::yes) {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
        }
     } else {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
         }
     }
```

So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.

Refs: #12353

Closes #12453

* github.com:scylladb/scylladb:
  mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse
  mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes
  mutation: move consume_in_reverse def to mutation_consumer.hh
2023-01-08 15:42:00 +02:00
Botond Dénes
c4688563e3 sstables: track decompressed buffers
Convert decompressed temporary buffers into tracked buffers just before
returning them to the upper layer. This ensures these buffers are known
to the reader concurrency semaphore and it has an accurate view of the
actual memory consumption of reads.

Fixes: #12448

Closes #12454
2023-01-08 15:34:28 +02:00
Kamil Braun
b77df84543 test: test_topology: make test_nodes_with_different_smp less hacky
The test would use a trick to start a separate Scylla cluster from the
one provided originally by the test framework. This is not supported by
the test framework and may cause unexpected problems.

Change the test to perform regular node operations. Instead of starting
a fresh cluster of 3 nodes, we join the first of these nodes to the
original framework-provided cluster, then decommission the original
nodes, then bootstrap the other 2 fresh nodes.

Also add some logging to the test.

Refs: #12438, #12442

Closes #12457
2023-01-08 15:33:17 +02:00
Avi Kivity
02c9968e73 Merge 'Add WASM UDF implementation in Rust' from Wojciech Mitros
This series adds the implementation and usage of rust wasmtime bindings.

The WASM UDFs introduced by this patch are interruptable and use memory allocated using the seastar allocator.

This series includes #11102 (the first two commits) because #11102 required disabling wasm UDFs completely. This patch disables them in the middle of the series, and enables them again at the end.
After this patch, `libwasmtime.a` can be removed from the toolchain.
This patch also removes the workaround for #https://github.com/scylladb/scylladb/issues/9387 but it hasn't been tested with ARM yet - if the ARM test causes issues I'll revert this part of the change.

Closes #11351

* github.com:scylladb/scylladb:
  build: remove references to unused c bindings of wasmtime
  test: assert that WASM allocations can fail without crashing
  wasm: limit memory allocated using mmap
  wasm: add configuration options for instance cache and udf execution
  test: check that wasmtime functions yield
  wasm: use the new rust bindings of wasmtime
  rust: add Wasmtime bindings
  rust: add build profiles more aligned with ninja modes
  rust: adjust build according to cxxbridge's recommendations
  tools: toolchain: dbuild: prepare for sharing cargo cache
2023-01-08 15:31:09 +02:00
Nadav Har'El
f5cda3cfc3 test/cql-pytest: add more tests for "timestamp" column type
In issue #3668, a discussion spanning several years theorized that several
things are wrong with the "timestamp" type. This patch begins by adding
several tests that demonstrate that Scylla is in fact behaving correctly,
and mostly identically to Cassandra except one esoteric error handling
case.

However, after eliminating the red herrings, we are left for the real
issue that prompted opening #3668, which is a duplicate of issues #2693
and #2694, and this patch also adds a reproducer for that. The issue is
that Cassandra 4 added support for arithmetic expressions on values,
and timestamps can be added durations, for example:

        '2011-02-03 04:05:12.345+0000' - 1d

is a valid timestamp - and we don't currently support this syntax.
So the new test - which passes on Cassandra 4 and fails on Scylla
(or Cassandra 3) is marked xfail.

Refs #2693
Refs #2694

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12436
2023-01-08 15:00:49 +02:00
Wojciech Mitros
996a942e05 test: assert that WASM allocations can fail without crashing
The main source of big allocations in the WASM UDF implementation
is the WASM Linear Memory. We do not want Scylla to crash even if
a memory allocation for the WASM Memory fails, so we assert that
an exception is thrown instead.

The wasmtime runtime does not actually fail on an allocation failure
(assuming the memory allocator does not abort and returns nullptr
instead - which our seastar allocator does). What happens then
depends on the failed allocation handling of the code that was
compiled to WASM. If the original code threw an exception or aborted,
the resulting WASM code will trap. To make sure that we can handle
the trap, we need to allow wasmtime to handle SIGILL signals, because
that what is used to carry information about WASM traps.

The new test uses a special WASM Memory allocator that fails after
n allocations, and the allocations include both memory growth
instructions in WASM, as well as growing memory manually using the
wasmtime API.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2023-01-06 14:07:29 +01:00
Wojciech Mitros
f05d612da8 wasm: limit memory allocated using mmap
The wasmtime runtime allocates memory for the executable code of
the WASM programs using mmap and not the seastar allocator. As
a result, the memory that Scylla actually uses becomes not only
the memory preallocated for the seastar allocator but the sum of
that and the memory allocated for executable codes by the WASM
runtime.
To keep limiting the memory used by Scylla, we measure how much
memory do the WASM programs use and if they use too much, compiled
WASM UDFs (modules) that are currently not in use are evicted to
make room.
To evict a module it is required to evict all instances of this
module (the underlying implementation of modules and instances uses
shared pointers to the executable code). For this reason, we add
reference counts to modules. Each instance using a module is a
reference. When an instance is destroyed, a reference is removed.
If all references to a module are removed, the executable code
for this module is deallocated.
The eviction of a module is actually acheved by eviction of all
its references. When we want to free memory for a new module we
repeatedly evict instances from the wasm_instance_cache using its
LRU strategy until some module loses all its instances. This
process may not succeed if the instances currently in use (so not
in the cache) use too much memory - in this case the query also
fails. Otherwise the new module is added to the tracking system.
This strategy may evict some instances unnecessarily, but evicting
modules should not happen frequently, and any more efficient
solution requires an even bigger intervention into the code.
2023-01-06 14:07:29 +01:00
Wojciech Mitros
b8d28a95bf wasm: add configuration options for instance cache and udf execution
Different users may require different limits for their UDFs. This
patch allows them to configure the size of their cache of wasm,
the maximum size of indivitual instances stored in the cache, the
time after which the instances are evicted, the fuel that all wasm
UDFs are allowed to consume before yielding (for the control of
latency), the fuel that wasm UDFs are allowed to consume in total
(to allow performing longer computations in the UDF without
detecting an infinite loop) and the hard limit of the size of UDFs
that are executed (to avoid large allocations)
2023-01-06 14:07:27 +01:00
Wojciech Mitros
3214f5c2db test: check that wasmtime functions yield
The new implementation for WASM UDFs allows executing the UDFs
in pieces. This commit adds a test asserting that the UDF is in fact
divided and that each of the execution segments takes no longer than
1ms.
2023-01-06 14:05:53 +01:00
Michał Radwański
1fbf433966 mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse
This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).

As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.

In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.

As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:

The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains

```
     if (reverse == consume_in_reverse::yes) {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
        }
     } else {
         while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
             co_await yield();
         }
     }
```

So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.
2023-01-05 18:48:55 +01:00
Kamil Braun
09da661eeb Merge 'raft: replace experimental raft option with dedicated flag' from Gleb Natapov
Unlike other experimental feature we want to raft to be opt in even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.

* 'consistent-cluster-management-flag' of github.com:scylladb/scylla-dev:
  raft: replace experimental raft option with dedicated flag
  main: move supervisor notification about group registry start where it actually starts
2023-01-05 15:21:35 +01:00
Kamil Braun
4268b1bbc2 Merge 'raft: raft_group0, register RPC verbs on all shards' from Gusev Petr
raft_group0 used to register RPC verbs only on shard 0. This worked on
clusters with the same --smp setting on all nodes, since RPCs in this
case are processed on the same shard as the calling code, and
raft_group0 methods only run on shard 0.

A new test test_nodes_with_different_smp was added to identify the
problem. Since --smp can only be specified via the command line, a
corresponding parameter was added to the ManagerClient.server_add
method.  It allows to override the default parameters set by the
SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting
individual items.

Fixes: #12252

Closes #12374

* github.com:scylladb/scylladb:
  raft: raft_group0, register RPC verbs on all shards
  raft: raft_append_entries, copy entries to the target shard
  test.py, allow to specify the node's command line in test
2023-01-04 11:11:21 +01:00
Michał Jadwiszczak
83bb77b8bb test/boost/cql_query_test: enable parallelized_aggregation
Run tests for parallelized aggregation with
`enable_parallelized_aggregation` set always to true, so the tests work
even if the default value of the option is false.

Closes #12409
2023-01-04 10:11:25 +02:00
Avi Kivity
2739ac66ed treewide: drop cql_serialization_format
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).

A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.

Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.

One test checking the 16-bit serialization format is removed.
2023-01-03 19:54:13 +02:00
Alejo Sanchez
889acf710c test/python: increase CQL connection timeout for...
test_ssl

In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to a more reasonable time.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12408
2023-01-03 17:10:46 +02:00
Petr Gusev
8417840647 raft: raft_group0, register RPC verbs on all shards
raft_group0 used to register RPC verbs only on shard 0.
This worked on clusters with the same --smp setting on
all nodes, since RPCs in this case are (usually)
processed on the same shard as the calling code,
and raft_group0 methods only run on shard 0.

A new test test_nodes_with_different_smp was added
to identify the problem.

Fixes: #12252
2023-01-03 17:04:07 +03:00
Petr Gusev
1c23390f12 test.py, allow to specify the node's command line in test
An optional parameter cmdline has been added to
the ManagerClient.server_add method.
It allows you to override the default parameters
set by the SCYLLA_CMDLINE_OPTIONS variable
by changing, adding or deleting individual
items. To change or add a parameter just specify
its name and value one after the other.
To remove parameter use the special keyword
__remove__ as a value. To set a parameter
without a value (such as --overprovisioned)
use the special keyword __missing__ as the value.
2023-01-03 15:24:54 +03:00
Nadav Har'El
eb85f136c8 cql-pytest: document how to write new cql-pytest tests
Add to test/cql-pytest/README.md an explanation of the philosophy
of the cql-pytest test suite, and some guideliness on how to write
good tests in that framework.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12400
2023-01-03 12:13:22 +02:00
Gleb Natapov
1688163233 raft: replace experimental raft option with dedicated flag
Unlike other experimental feature we want to raft to be optional even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
2023-01-03 11:15:11 +02:00
Raphael S. Carvalho
b4e4bbd64a database_test: Reduce x_log2_compaction_group values to avoid timeout
database_test in timing out because it's having to run the tests calling
do_with_cql_env_and_compaction_groups 3x, one for each compaction group
setting. reduce it to 2 settings instead of 3 if running in debug mode.

Refs #12396.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #12421
2023-01-01 13:56:18 +02:00
Nadav Har'El
200bc82913 test/cql-pytest: exit immediately if Scylla is down
In commit acfa180766 we added to
test/cql-pytest a mechanism to detect when Scylla crashes in the middle
of a test function - in which case we report the culprit test and exit
immediately to avoid having a hundred more tests report that they failed
as well just because Scylla was down.

However, if Scylla was *never* up - e.g., if the user ran "pytest" without
ever running Scylla -  we still report hundreds of tests as having failed,
which is confusing and not helpful.

So with this patch, if a connection cannot be made to Scylla at all,
the test exits immediately, explaining what went wrong, not blaming
any specific test:

    $ pytest
    ...
    ! _pytest.outcomes.Exit: Cannot connect to Scylla at --host=localhost --port=9042 !
    ============================ no tests ran in 0.55s =============================

Beyond being a helpful reminder for a developer who runs "pytest" without
having started Scylla first (or using test/cql-pytest/run or test.py to
start Scylla easily), this patch is also important when running tests
through test.py if it reuses an instance of Scylla that crashed during an
earlier pytest file's run.

This patch does not fix test.py - it can still try to run pytest with
a dead Scylla server without checking. But at least with this patch
pytest will notice this problem immediately and won't report hundreds of
test functions having failed. The only report the user will see will be
the last test which crashed Scylla, which will make it easier to find
this failure without being hidden between hundreds of spurious failures.

Fixes #12360

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12401
2022-12-28 13:04:28 +02:00
Alejo Sanchez
d408b711e3 test/python: increase CQL connection timeouts
In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to more reasonable time.

These timeout values are the same as used in topology tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12405
2022-12-28 10:06:33 +02:00
Alejo Sanchez
1bfe234133 test/pylib: API get/set logger level of Scylla server
Provide helpers to get and set logger level for Scylla servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #12394
2022-12-25 13:58:43 +02:00
Botond Dénes
b0d95948e1 mutation_compactor: reset stop flag on page start
When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.

This commit also adds a unit test which reproduces the bug.

Fixes: #12361

Closes #12384
2022-12-24 13:52:45 +02:00
Nadav Har'El
ef2e5675ed materialized views, test: add tests for CLUSTERING ORDER BY
In issue #10767, concerned were raised that the CLUSTERING ORDER BY
clause is handled incorrectly in a CREATE MATERIALIZED VIEW definition.

The tests in this patch try to explore the different ways in which
CLUSTERING ORDER BY can be used in CREATE MATERIALIZED VIEW and allows
us to compare Scylla's behaivor to Cassandra, and to common sense.

The tests discover that the CLUSTERING ORDER BY feature in materialized
views generally works as expected, but there are *three* differences
between Scylla and Cassandra in this feature. We consider two differences
to be bugs (and hence the test is marked xfail) and one a Scylla extension:

1. When a base table has a reverse-order clustering column and this
   clustering column is used in the materialized view, in Cassandra
   the view's clustering order inherits the reversed order. In Scylla,
   the view's clustering order reverts to the default order.
   Arguably, both behaviors can be justified, but usually when in doubt
   we should implement Cassandra's behavior - not pick a different
   behavior, even if the different behavior is also reasonable. So
   this test (test_mv_inherit_clustering_order()) is marked "xfail",
   and a new issue was created about this difference: #12308.

   If we want to fix this behavior to match Cassandra's we should also
   consider backward compatibility - what happens if we change this
   behavior in Scylla now, after we had the opposite behavior in
   previous releases? We may choose to enshrine Scylla's Cassandra-
   incompatible behavior here - and document this difference.

2. The CLUSTERING ORDER BY should, as its name suggests, only list
   clustering columns. In Scylla, specifying other things, like regular
   columns, partition-key columns, or non-existent columns, is silently
   ignored, whereas it should result in an Invalid Request error (as it
   does in Cassandra). So test_mv_override_clustering_order_error()
   is marked "xfail".
   This is the difference already discovered in #10767.

3. When a materialized view has several clustering columns, Cassandra
   requires that a CLUSTERING ORDER BY clause, if present, must specify
   the order of all of *all* clustering columns. Scylla, in contrast,
   allows the user to override the order of only *some* of these columns -
   and the rest get the default order. I consider this to be a
   legitimate Scylla extension, and not a compatibility bug, so marked
   the test with "scylla_only", and no issue was opened about it.

Refs #10767
Refs #12308

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12307
2022-12-22 09:48:16 +02:00
Nadav Har'El
6d2e146aa6 test/cql-pytest.py: add scylla_inject_error() utility
This patch adds a scylla_inject_error(), a context manager which tests
can use to temporarily enable some error injection while some test
code is running. It can be used to write tests that artificially
inject certain errors instead of trying to reach the elaborate (and
often requiring precise timing or high amounts of data) situation where
they occur naturally.

The error-injection API is Scylla-specific (it uses the Scylla REST API)
and does not work on "release"-mode builds (all other modes are supported),
so when Cassandra or release-mode build are being tested, the test which
uses scylla_inject_error() gets skipped.

Example usage:

```python
    from rest_api import scylla_inject_error
    with scylla_inject_error(cql, "injection_name", one_shot=True):
        # do something here
        ...
```

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12264
2022-12-22 09:39:10 +02:00
Nadav Har'El
01f0644b22 Merge 'scylla-gdb.py: introduce scylla get-config-value' from Botond Dénes
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:

    (gdb) scylla get-config-value compaction_static_shares
    value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart

Closes #12362

* github.com:scylladb/scylladb:
  scylla-gdb.py: add scylla get-config-value gdb command
  scylla-gdb.py: extract $downcast_vptr logic to standalone method
  test: scylla-gdb/run: improve diagnostics for failed tests
2022-12-21 18:38:23 +02:00
Botond Dénes
29d49e829e scylla-gdb.py: add scylla get-config-value gdb command
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:
    (gdb) scylla get-config-value compaction_static_shares
    value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart
2022-12-21 03:05:56 -05:00
Botond Dénes
24022c19a6 test: scylla-gdb/run: improve diagnostics for failed tests
By instructing gdb to print the full python stack in case of errors.
2022-12-21 03:05:56 -05:00
Avi Kivity
3fce43124a Merge 'Static compaction groups' from Raphael "Raph" Carvalho
Allows static configuration of number of compaction groups per table per shard.

To bootstrap the project, config option x_log2_compaction_groups was added which controls both number of groups and partitioning within a shard.

With a value of 0 (default), it means 1 compaction group, therefore all tokens go there.
With a value of 3, it means 8 compaction groups, and 3 most-significant-bits of tokens being used to decide which group owns the token.
And so on.

It's still missing:
- integration with repair / streaming
- integration with reshard / reshape.

perf/perf_simple_query --smp 1 --memory 1G

BEFORE
-----
median 61358.55 tps ( 71.1 allocs/op,  12.2 tasks/op,   56375 insns/op,        0 errors)
median 61322.80 tps ( 71.1 allocs/op,  12.2 tasks/op,   56391 insns/op,        0 errors)
median 61058.58 tps ( 71.1 allocs/op,  12.2 tasks/op,   56386 insns/op,        0 errors)
median 61040.94 tps ( 71.1 allocs/op,  12.2 tasks/op,   56381 insns/op,        0 errors)
median 61118.40 tps ( 71.1 allocs/op,  12.2 tasks/op,   56379 insns/op,        0 errors)

AFTER
-----
median 61656.12 tps ( 71.1 allocs/op,  12.2 tasks/op,   56486 insns/op,        0 errors)
median 61483.29 tps ( 71.1 allocs/op,  12.2 tasks/op,   56495 insns/op,        0 errors)
median 61638.05 tps ( 71.1 allocs/op,  12.2 tasks/op,   56494 insns/op,        0 errors)
median 61726.09 tps ( 71.1 allocs/op,  12.2 tasks/op,   56509 insns/op,        0 errors)
median 61537.55 tps ( 71.1 allocs/op,  12.2 tasks/op,   56491 insns/op,        0 errors)

Closes #12139

* github.com:scylladb/scylladb:
  test: mutation_test: Test multiple compaction groups
  test: database_test: Test multiple compaction groups
  test: database_test: Adapt it to compaction groups
  db: Add config for setting static number of compaction groups
  replica: Introduce static compaction groups
  test: sstable_test: Stop referencing single compaction group
  api: compaction_manager: Stop a compaction type for all groups
  api: Estimate pending tasks on all compaction groups
  api: storage_service: Run maintenance compactions on all compaction groups
  replica: table: Adapt assertion to compaction groups
  replica: database: stop and disable compaction on behalf of all groups
  replica: Introduce table::parallel_foreach_table_state()
  replica: disable auto compaction on behalf of all groups
  replica: table: Rework compaction triggers for compaction groups
  replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups
  replica: Adapt table::rebuild_statistics() to compaction groups
  replica: table: Perform major compaction on behalf of all groups
  replica: table: Perform off-strategy compaction on behalf of all groups
  replica: table: Perform cleanup compaction on behalf of all groups
  replica: Extend table::discard_sstables() to operate on all compaction groups
  replica: table: Create compound sstable set for all groups
  replica: table: Set compaction strategy on behalf of all groups
  replica: table: Return min memtable timestamp across all groups
  replica: Adapt table::stop() to compaction groups
  replica: Adapt table::clear() to compaction groups
  replica: Adapt table::can_flush() to compaction groups
  replica: Adapt table::flush() to compaction groups
  replica: Introduce parallel_foreach_compaction_group()
  replica: Adapt table::set_schema() to compaction groups
  replica: Add memtables from all compaction groups for reads
  replica: Add memtable_count() method to compaction_group
  replica: table: Reserve reader list capacity through a callback
  replica: Extract addition of memtables to reader list into a new function
  replica: Adapt table::occupancy() to compaction groups
  replica: Adapt table::active_memtable() to compaction groups
  replica: Introduce table::compaction_groups()
  replica: Preparation for multiple compaction groups
  scylla-gdb: Fix backward compatibility of scylla_memtables command
2022-12-20 17:04:47 +02:00
Nadav Har'El
08c8e0d282 test/alternator: enable tests for long strings of consecutive tombstones
In the past we had issue #7933 where very long strings of consecutive
tombstones caused Alternator's paging to take an unbounded amount of
time and/or memory for a single page. This issue was fixed (by commit
e9cbc9ee85) but the two tests we had
reproducing that issue were left with the "xfail" mark.
They were also marked "veryslow" - each taking about 100 seconds - so
they didn't run by default so nobody noticed they started to pass.

In this patch I make these tests much faster (taking less than a second
together), confirm that they pass - and remove the "xfail" mark and
improve their descriptions.

The trick to making these tests faster is to not create a million
tombstones like we used to: We now know that after string of just 10,000
tombstones ('query_tombstone_page_limit') the page should end, so
we can check specifically this number. The story is more complicated for
partition tombstones, but there too it should be a multiple of
query_tombstone_page_limit. To make the tests even faster, we change
run.py to lower the query_tombstone_page_limit from the default 10,000
to 1000. The tests work correctly even without this change, but they are
ten times faster with it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12350
2022-12-20 07:08:36 +02:00
Raphael S. Carvalho
e7380bea65 test: mutation_test: Test multiple compaction groups
Extends mutation_test to run the tests with more than one
compaction group, in addition to a single one (default).

Piggyback on existing tests. Avoids duplication.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 12:36:07 -03:00
Raphael S. Carvalho
e3e7c3c7e5 test: database_test: Test multiple compaction groups
Extends database_test to run the tests with more than one
compaction group, in addition to a single one (default).

Piggyback on existing tests. Avoids duplication.

Caught a bug when snapshotting, in implementation of
table::can_flush(), showing its usefulness.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 12:36:07 -03:00
Raphael S. Carvalho
e103e41c76 test: database_test: Adapt it to compaction groups
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 12:36:05 -03:00
Raphael S. Carvalho
c807e61715 test: sstable_test: Stop referencing single compaction group
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:16:20 -03:00
Raphael S. Carvalho
ef8f542d75 replica: Adapt table::active_memtable() to compaction groups
active_memtable() was fine to a single group, but with multiple groups,
there will be one active memtable per group. Let's change the
interface to reflect that.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-12-19 11:15:14 -03:00
Avi Kivity
7c7eb81a66 Merge 'Encapsulate filesystem access by sstable into filesystem_storage subsclass' from Pavel Emelyanov
This is to define the API sstable needs from underlying storage. When implementing object-storage backend it will need to implement those. The API looks like

        future<> snapshot(const sstable& sst, sstring dir, absolute_path abs) const;
        future<> quarantine(const sstable& sst, delayed_commit_changes* delay);
        future<> move(const sstable& sst, sstring new_dir, generation_type generation, delayed_commit_changes* delay);
        void open(sstable& sst, const io_priority_class& pc); // runs in async context
        future<> wipe(const sstable& sst) noexcept;

        future<file> open_component(const sstable& sst, component_type type, open_flags flags, file_open_options options, bool check_integrity);

It doesn't have "list" or alike, because it's not a method of an individual sstable, but rather the one from sstables_manager. It will come as separate PR.

Closes #12217

* github.com:scylladb/scylladb:
  sstable, storage: Mark dir/temp_dir private
  sstable: Remove get_dir() (well, almost)
  sstable: Add quarantine() method to storage
  sstable: Use absolute/relative path marking for snapshot()
  sstable: Remove temp_... stuff from sstable
  sstable: Move open_component() on storage
  sstable: Mark rename_new_sstable_component_file() const
  sstable: Print filename(type) on open-component error
  sstable: Reorganize new_sstable_component_file()
  sstable: Mark filename() private
  sstable: Introduce index_filename()
  tests: Disclosure private filename() calls
  sstable: Move wipe_storage() on storage
  sstable: Remove temp dir in wipe_storage()
  sstable: Move unlink parts into wipe_storage
  sstable: Remove get_temp_dir()
  sstable: Move write_toc() to storage
  sstable: Shuffle open_sstable()
  sstable: Move touch_temp_dir() to storage
  sstable: Move move() to storage
  sstable: Move create_links() to storage
  sstable: Move seal_sstable() to storage
  sstable: Tossing internals of seal_sstable()
  sstable: Move remove_temp_dir() to storage
  sstable: Move create_links_common() to storage
  sstable: Move check_create_links_replay() to storage
  sstable: Remove one of create_links() overloads
  sstable: Remove create_links_and_mark_for_removal()
  sstable: Indentation fix after prevuous patch
  sstable: Coroutinize create_links_common()
  sstable: Rename create_links_common()'s "dir" argument
  sstable: Make mark_for_removal bool_class
  sstable, table: Add sstable::snapshot() and use in table::take_snapshot
  sstable: Move _dir and _temp_dir on filesystem_storage
  sstable: Use sync_directory() method
  test, sstable: Use component_basename in test
  sstables: Move read_{digest|checksum} on sstable
2022-12-18 17:29:35 +02:00
Botond Dénes
64903ba7d5 test/cql-pytest: use pytest site-packages workaround
Recently, the pytest script shipped by Fedora started invoking python
with the `-s` flag, which disables python considering user site
packages. This caused problems for our tests which install the cassandra
driver in the user site packages. This was worked around in e5e7780f32
by providing our own pytest interposer launcher script which does not
pass the above mentioned flag to python. Said patch fixed test.py but
not the run.py in cql-pytest. So if the cql-pytest suite is ran via
test.py it works fine, but if it is invoked via the run script, it fails
because it cannot find the cassandra driver. This patch patches run.py
to use our own pytest launcher script, so the suite can be run via the
run script as well.
Since run.py is shared with the alternator pytest suite, this patch also
fixes said test suite too.

Closes #12253
2022-12-15 16:05:31 +02:00
Benny Halevy
639e247734 test: cql-pytest: test_describe: test_table_options_quoting: USE test_keyspace
Without that, I often (but not always) get the following error:
```
__________________________ test_table_options_quoting __________________________

cql = <cassandra.cluster.Session object at 0x7f1aafb10650>
test_keyspace = 'cql_test_1671103335055'

    def test_table_options_quoting(cql, test_keyspace):
        type_name = f"some_udt; DROP KEYSPACE {test_keyspace}"
        column_name = "col''umn -- @quoting test!!"
        comment = "table''s comment test!\"; DESC TABLES --quoting test"
        comment_plain = "table's comment test!\"; DESC TABLES --quoting test" #without doubling "'" inside comment

>       cql.execute(f"CREATE TYPE \"{type_name}\" (a int)")

test/cql-pytest/test_describe.py:623:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cassandra/cluster.py:2699: in cassandra.cluster.Session.execute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"
```

CQL driver in use ise the scylla driver version 3.25.10.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12329
2022-12-15 14:35:33 +02:00
Botond Dénes
8f8284783a Merge 'Fix handling of non-full clustering keys in the read path' from Tomasz Grabiec
This PR fixes several bugs related to handling of non-full
clustering keys.

One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse
mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.

Fixes #12180

after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.

This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.

It probably already causes problems for such keys as after_key() is used
in various parts in the read path.

Refs #1446

Closes #12234

* github.com:scylladb/scylladb:
  position_in_partition: Make after_key() work with non-full keys
  position_in_partition: Introduce before_key(position_in_partition_view)
  db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
  types: Fix comparison of frozen sets with empty values
2022-12-15 10:47:12 +02:00
Pavel Emelyanov
a46d378bee sstable: Remove temp_... stuff from sstable
There's a bunch of helpers around XFS-specific temp-dir sitting in
publie sstable part. Drop it altogether, no code needs it for real.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-12-15 10:14:49 +03:00