Commit Graph

4150 Commits

Author SHA1 Message Date
Pavel Emelyanov
dd307d8a42 test: Use tempdir from sstable_test_env
The test cases in sstable_directory_test use a temporary directory that
differs from the one sstables manager starts over. Fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 11:47:06 +03:00
Pavel Emelyanov
0c3799db71 test: Add tmpdir to sstable test env
This adds the test/lib's tmpdir instance _and_ configures the
data_file_directories with this path. This makes sure sstables manager
and the rest of the test use the same directory for sstables. For now
it doesn't change anything, but helps next patching.

(A neat side effect of this change is that sstable_test_env is now
 configured the same way as cql_test_env does)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-26 11:47:06 +03:00
Pavel Emelyanov
9ccae1be18 test: Keep db::config as unique pointer
The goal is to make it possible to make config with custom-initialized
options in test_env::impl's constructor initializer list (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-01-25 19:38:47 +03:00
Nadav Har'El
55558e1bd7 test/alternator: check operation on invalid TableName
Issue #12538 suggested that maybe Alternator shouldn't bother reporting an
invalid table name in item operations like PutItem, and that it's enough
to report that the table doesn't exist. But the test added in this patch
shows that DynamoDB, like Alternator, reports the invalid table name in
this case - not just that the table doesn't exist.

That should make us think twice before acting on issue #12538. If we do
what this issue recommended, this test will need to be fixed (e.g., to
accept as correct both types of errors).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12608
2023-01-24 14:14:39 +02:00
Nadav Har'El
ccc2c6b5dd Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun
Don't use a range scan, which is very inefficient, to perform a query for checking CQL availability.

Improve logging when waiting for server startup times out. Provide details about the failure: whether we managed to obtain the Host ID of the server and whether we managed to establish a CQL connection.

Closes #12588

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: better logging for timeout on server startup
  test/pylib: scylla_cluster: use less expensive query to check for CQL availability
2023-01-23 17:00:52 +02:00
Kamil Braun
8a1ea6c49f test/pylib: scylla_cluster: better logging for timeout on server startup
Waiting for server startup is a multi-step procedure: after we start the
actual process, we will:
- try to obtain the Host ID (by querying a REST API endpoint)
- then try to connect a CQL session
- then try to perform a CQL query

The steps are repeated every .1 second until we reach a timeout (the
Host ID step is skipped if we previously managed to obtain it).

On timeout we'd only get a generic "failed to start server" message, it
wouldn't say what we managed to do and what not.

For example, on one of the failed jobs on Jenkins I observed this
timeout error. Looking at the logs of the server, it turned out that the
server printed the "initialization completed" message more than 2
minutes before the actual timeout happened. So for 2 minutes, the test
framework either couldn't obtain the Host ID, or couldn't establish a
CQL connection, or couldn't perform a CQL query, but I wasn't able to
determine fully which one of these was the case.

Improve the code by printing whether we managed to get the Host ID of
the server and if so - whether we managed to connect to CQL.
2023-01-23 15:59:42 +01:00
Kamil Braun
0e591606a5 test/pylib: scylla_cluster: use less expensive query to check for CQL availability
The previous CQL query used a range scan which is very inefficient, even
for local tables.

Also add a comment explaining why we need this query.
2023-01-23 15:59:05 +01:00
Nadav Har'El
54f174a1f4 Merge 'test.py: handle broken clusters for Python suite' from Alecco
If the after test check fails (is_after_test_ok is False), discard the cluster and raise exception so context manager (pool) does not recycle it.

Ignore exception re-raised by the context manager.

Fixes #12360

Closes #12569

* github.com:scylladb/scylladb:
  test.py: handle broken clusters for Python suite
  test.py: Pool discard method
2023-01-22 19:58:12 +02:00
Botond Dénes
7f9b39009c reader_concurrency_semaphore_test: leak test: relax iteration limit
This test creates random dummy reads and simulates a query with them.
The test works in terms of iteration (tick), advancing each simulating
read in each iteration. To prevent infinite runtime an iteration limit
of 100 was added to detect a non-converging test and kill it. This limit
proved too strict however and in this patch we bump it to 1000 to
prevent some unlucky seed making this test fail, as seen recently in CI.

Closes #12580
2023-01-20 15:39:13 +02:00
Nadav Har'El
3d78dbd9f2 test/cql-pytest: regression tests for null lookup in local SI
We noticed that old branches of Scylla had problems with looking up a
null value in a local secondary index - hanging or crashing. This patch
includes tests to reproduce these bugs. The tests pass on current
master - apparently this bug has already been fixed, but we didn't
have a regression test for it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12570
2023-01-19 23:58:33 +02:00
Alejo Sanchez
c886a05b37 test.py: Pool discard method
Add a context manager discard() method to tell it to discard the object.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-19 21:43:45 +01:00
Kamil Braun
2f84e820fd test/pylib: scylla_cluster: return error details from test framework endpoints
If an endpoint handler throws an exception, the details of the exception
are not returned to the client. Normally this is desirable so that
information is not leaked, but in this test framework we do want to
return the details to the client so it can log a useful error message.

Do it by wrapping every handler into a catch clause that returns
the exception message.

Also modify a bit how HTTPErrors are rendered so it's easier to discern
the actual body of the error from other details (such as the params used
to make the request etc.)

Before:
```
E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error
E
E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff
```

After:
```
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body:
E Failed to start server at host 127.155.129.1.
E Check the log files:
E /home/kbraun/dev/scylladb/testlog/test.py.dev.log
E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log
```

Closes #12563
2023-01-19 17:47:13 +02:00
Kamil Braun
3ed3966f13 test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager
When we obtained a new cluster for a test case after the previous test
case left a dirty cluster, we would release the old cluster's used IP
addresses (`_before_test` function). However, we would not release the
last cluster's IP after the last test case. We would run out of IPs with
sufficiently many test files or `--repeat` runs. Fix this.

Also reorder the operations a bit: stop the cluster (and release its
IPs) before freeing up space in the cluster pool (i.e. call
`self.cluster.stop()` before `self.clusters.steal()`). This reduces
concurrency a bit - fewer Scyllas running at the same time, which is
good (the pool size gives a limit on the desired max number of
concurrently running clusters). Killing a cluster is quick so it won't
make a significant difference for the next guy waiting on the pool.

Closes #12564
2023-01-19 17:46:46 +02:00
Nadav Har'El
18be50582d test/cql-pytest: add tests for behavior of unset values
Recently, commit 0b418fa made the checking for "unset" values more
centralized and more robust, but as the tests added in this patch
show, the situation is good (and in particular, that #10358 is
solved).

The tests in this patch check that the behavior of "unset" values in
the CQL v4 protocol matches Cassandra's behavior and its documentation,
and how it compares to our wishes of how we want unset values to behave.

One of these tests fail on Cassandra (we consider this a Cassandra bug).
One test fails on Scylla because it doesn't yet support arithmetic
expressions (Refs #2693).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12534
2023-01-19 15:48:07 +02:00
Nadav Har'El
9433108158 Merge 'Allow transient list values to contain NULLs' from Avi Kivity
The CQL protocol and specification call for lists with NULLs in
some places. For example, the statement:

```cql
UPDATE tab
SET x = 3
IF y IN (1, 2,  NULL)
WHERE pk = 4
```

has a list `(1, 2, NULL)` that contains NULL. Although the syntax is tuple-like, the value is a list;
consider the same statement as a prepared statement:

```cql
UPDATE tab
SET x = :x
IF y IN :y_values
WHERE pk = :pk
```

`:y_values` must have a list type, since the number of elements is unknown.

Currently, this is done with special paths inside LWT that bypass normal
evaluation, but if we want to unify those paths, we must allow NULLs in
lists (except in storage). This series does that.

Closes #12411

* github.com:scylladb/scylladb:
  test: materialized view: add test exercising synthetic empty-type columns
  cql3: expr: relax evaluate_list() to allow allow NULL elements
  types: allow lists with NULL
  test: relax NULL check test predicate
  cql3, types: validate listlike collections (sets, lists) for storage
  types: make empty type deserialize to non-null value
2023-01-19 15:15:16 +02:00
Botond Dénes
d661d03057 Merge 'main, test: integrate perf tools into scylla' from Kefu Chai
following tests are integrated into scylla executable

- perf_fast_forward
- perf_row_cache_update
- perf_simple_query
- perf_row_cache_update
- perf_sstable

before this change
```console
$ size build/release/scylla
   text    data     bss     dec     hex filename
82284664         288960  335897 82909521        4f11951 build/release/scylla
$ ls -l build/release/scylla
-rwxrwxr-x 1 kefu kefu 1719672112 Jan 19 17:51 build/release/scylla
```
after this change
```console
$ size build/release/scylla
   text    data     bss     dec     hex filename
84349449         289424  345257 84984130        510c142 build/release/scylla
$ ls -l build/release/scylla
-rwxrwxr-x 1 kefu kefu 1774204800 Jan 19 17:52 build/release/scylla
```

Fixes #12484

Closes #12558

* github.com:scylladb/scylladb:
  main: move perf_sstable into scylla
  main: move perf_row_cache_update into scylla
  test: perf_row_cache_update: add static specifier to local functions
  main: move perf_fast_forward into scylla
  main: move perf_simple_query into scylla
  test: extract debug::the_database out
  main: shift the args when checking exec_name
  main: extract lookup_main_func() out
2023-01-19 15:01:30 +02:00
Kamil Braun
147dd73996 test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot
If a cluster fails to boot, it saves the exception in
`self.start_exception` variable; the exception will be rethrown when
a test tries to start using this cluster. As explained in `before_test`:
```
    def before_test(self, name) -> None:
        """Check that  the cluster is ready for a test. If
        there was a start error, throw it here - the server is
        running when it's added to the pool, which can't be attributed
        to any specific test, throwing it here would stop a specific
        test."""
```
It's arguable whether we should blame some random test for a failure
that it didn't cause, but nevertheless, there's a problem here: the
`start_exception` will be rethrown and the test will fail, but then the
cluster will be simply returned to the pool and the next test will
attempt to use it... and so on.

Prevent this by marking the cluster as dirty the first time we rethrow
the exception.

Closes #12560
2023-01-19 14:26:57 +02:00
Avi Kivity
9029b8dead test: disable commitlog O_DSYNC, preallocation
Commitlog O_DSYNC is intended to make Raft and schema writes durable
in the face of power loss. To make O_DSYNC performant, we preallocate
the commitlog segments, so that the commitlog writes only change file
data and not file metadata (which would require the filesystem to commit
its own log).

However, in tests, this causes each ScyllaDB instance to write 384MB
of commitlog segments. This overloads the disks and slows everything
down.

Fix this by disabling O_DSYNC (and therefore preallocation) during
the tests. They can't survive power loss, and run with
--unsafe-bypass-fsync anyway.

Closes #12542
2023-01-19 11:14:05 +01:00
Kefu Chai
7f5bb19d1f main: move perf_sstable into scylla
* configure.py:
  - include `test/perf/perf_sstable` and its dependencies in scylla_perfs
* test/perf/perf_sstable.cc: change `main()` to
  `perf::scylla_sstable_main()`
* test/perf/entry_point.hh: add
  `perf::scylla_sstable_main()`
* main.cc:
  - dispatch "perf-sstable" subcommand to
    `perf::scylla_sstable_main`

before this change, we have a tool at `test/perf/perf_sstable`
for running performance tests by exercising sstable related operations.

after this change, the `test/perf/perf_sstable` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-sstable`
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:52 +08:00
Kefu Chai
240f2c6f00 main: move perf_row_cache_update into scylla
* configure.py:
  - include `test/perf/perf_row_cache_update.cc` in scylla_perfs
* main.cc:
  - dispatch "perf-row-cache-update" subcommand to
    `perf::scylla_row_cache_update_main`
* test/perf/perf_fast_forward.cc: change `main()` to
  `perf::scylla_row_cache_update_main()`
* test/perf/entry_point.hh: add
  `perf::scylla_row_cache_update_main()`

before this change, we have a tool at `test/perf/perf_row_cache_update`
for running performance tests by updating row cache.

after this change, the `test/perf/perf_row_cache_update` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-row-cache-update
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:46 +08:00
Kefu Chai
4e390b9a05 test: perf_row_cache_update: add static specifier to local functions
now that these functions are only used by the same compiling unit,
they don't need external linkage. so let's hide them using `static`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:46 +08:00
Kefu Chai
228ccdc1c7 main: move perf_fast_forward into scylla
* configure.py:
  - include `test/perf/perf_simple_query.cc` in scylla_perfs
* main.cc:
  - dispatch "perf-fast-forward" subcommand to
    `perf::scylla_fast_forward_main`
* test/perf/perf_fast_forward.cc: change `main()` to
  `perf::scylla_simple_query_main()`
* test/perf/entry_point.hh: add
  `perf::scylla_simple_query_main()`

before this change, we have a tool at `test/perf/perf_fast_forward`
for running performance tests by fast forwarding the reader.

after this change, the `test/perf/perf_fast_forward` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-fast-forward
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:40 +08:00
Kefu Chai
09de031cab main: move perf_simple_query into scylla
* configure.py:
  - include scylla_perfs in scylla
  - move 'test/lib/debug.cc' down scylla_perfs, as the latter uses
    `debug::the_database`
  - link `scylla` against seastar_testing_libs also. because we
    use the helpers in `test/lib/random_utils.hh` for generating
    random numbers / sequences in `perf_simple_query.cc`, and
    `random_utils.hh` references `seastar::testing::local_random_engine`
    as a local RNG. but `seastar::testing::local_random_engine`
    is included in `libseastar_testing.a` or
    `libseastar_perf_testing.a`. since we already have the rules for
    linking against `libseastar_testing.a`, let's just reuse them,
    and link `scylla` against this new dependency.

* main.cc:
  - dispatch "perf-simple-query" subcommand to
    `perf::scylla_simple_query_main`
* test/perf/perf_simple_query.cc: change `main()` to
  `perf::scylla_simple_query_main()`
* test/perf/entry_point.hh: define the main function entries
  so `main.cc` can find them. it's quite like how we collect
  the entries in `tools/entry_point.hh`

before this change, we have a tool at `test/perf/perf_simple_query`
for running performance test by sending simple query to a single-node
cluster.

after this change, the `test/perf/perf_simple_query` is integreated
into `scylla` as a subcommand. so we can run `scylla perf-simple-query
[options, ...]` to perform the same tests previous driven by the tool.

Fixes #12484
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:30 +08:00
Kefu Chai
c65692a13a test: extract debug::the_database out
we want to integrate some perf test into scylla executable, so we
can run them on a regular basis. but `test/lib/cql_test_env.cc`
shares `debug::the_database` with `main.cc`, so we cannot just
compile them into a single binary without changing them.

before this change, both `test/lib/cql_test_env.cc`
and `main.cc` define `debug::the_database`.

after this change, `debug::the_database` is extracted into
`debug.cc`, so it compiles into a separate compiling unit.
and scylla and tests using seastar testing framework are linked
against `debug.cc` via `scylla_core` respectively. this paves the road to
integrating scylla with the tests linking aginst
`test/lib/cql_test_env.cc`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-01-19 17:42:23 +08:00
Nadav Har'El
0ff0c80496 test/cql-pytest: un-xfail tests for UNSET values
Commit 0b418fa improved the error detection of unset values in
inappropriate CQL statements, and some of the unit tests translated
from Cassandra started to pass, so this patch removes their "xfail"
mark.

In a couple of places Scylla's error message is worded differently
from Cassandra, so the test was modified to look for a shorter
string common to both implementations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12553
2023-01-19 07:47:08 +02:00
Kefu Chai
6a3b19b53d test/perf: replace "std::cout <<" with fmt::print()
for better readablity

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12559
2023-01-19 07:45:13 +02:00
Avi Kivity
aab5954cfb Merge 'reader_concurrency_semaphore: add more layers of defense against OOM' from Botond Dénes
The reader concurrency semaphore has no mechanism to limit the memory consumption of already admitted read. Once memory collective memory consumption of all the admitted reads is above the limit, all it can do is to not admit any more. Sometimes this is not enough and the memory consumption of the already admitted reads balloons to the point of OOMing the node. This pull-request offers a solution to this: it introduces two more layers of defense above this: a soft and a hard limit. Both are multipliers applied on the semaphores normal memory limit.
When the soft limit threshold is surpassed, all readers but one are blocked via a new blocking `request_memory()` call which is used by the `tracking_file_impl`. The reader to be allowed to proceed is chosen at random, it is the first reader which happens to request memory after the limit is surpassed. This is both very simple and should avoid situations where the algorithm choosing the reader to be allowed to proceed chooses a reader which will then always time out.
When the hard limit threshold is surpassed, `reader_concurrency_semaphore::consume()` starts throwing `std::bad_alloc`. This again will result in eliminating whichever reader was unlucky enough to request memory at the right moment.

With this, the semaphore is now effectively enforcing an upper bound for memory consumption, defined by the hard limit.

Refs: https://github.com/scylladb/scylladb/issues/11927

Closes #11955

* github.com:scylladb/scylladb:
  test: reader_concurrency_semaphore_test: add tests for semaphore memory limits
  reader_permit: expose operator<<(reader_permit::state)
  reader_permit: add id() accessor
  reader_concurrency_semaphore: add foreach_permit()
  reader_concurrency_semaphore: document the new memory limits
  reader_concurrency_semaphore: add OOM killer
  reader_concurrency_semaphore: make consume() and signal() private
  test: stop using reader_concurrency_semaphore::{consume,signal}() directly
  reader_concurrency_semaphore: move consume() out-of-line
  reader_permit: consume(): make it exception-safe
  reader_permit: resource_units::reset(): only call consume() if needed
  reader_concurrency_semaphore: tracked_file_impl: use request_memory()
  reader_concurrency_semaphore: add request_memory()
  reader_concurrency_semaphore: wrap wait list
  reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters
  test/boost/reader_concurrency_semaphore_test: dummy_file_impl: don't use hardoced buffer size
  reader_permit: add make_new_tracked_temporary_buffer()
  reader_permit: add get_state() accessor
  reader_permit: resource_units: add constructor for already consumed res
  reader_permit: resource_units: remove noexcept qualifier from constructor
  db/config: introduce reader_concurrency_semaphore_{serialize,kill}_limit_multiplier
  scylla-gdb.py: scylla-memory: extract semaphore stats formatting code
  scylla-gdb.py: fix spelling of "graphviz"
2023-01-18 17:02:55 +02:00
Avi Kivity
9a54cb5deb Merge 'cql3/expr: make it possible to prepare binary_operator' from Jan Ciołek
`prepare_expression` takes an unprepared CQL expression straight from the parser output and prepares it. Preparation consists of various type checks that are needed to ensure that the expression is correct and to reason about it.

While `prepare_expression` supports a number of different types of expressions, until now it was impossible to prepare a `binary_operator`. Eventually we would like to be able to prepare all kinds of expressions, so this PR adds the missing support for `binary_operator`.

Closes #12550

* github.com:scylladb/scylladb:
  expr_test: test preparing binary_operator with NULL RHS
  expr_test: test preparing IS NOT NULL binary_operator
  expr_test: test preparing binary_operator with LIKE
  expr_test: test preparing binary_operator with CONTAINS KEY
  expr_test: test preparing binary_operator with CONTAINS
  expr_test: test preparing binary_operator with IN
  expr_test: test preparing binary_operator with =, !=, <, <=, >, >=
  expr_test: use make_*_untyped function in existing tests
  expr_test_utils: add utilities to create untyped_constant
  expr_test_utils: add make_float_* and make_double_*
  cql3: expr: make it possible to prepare binary_operator using prepare_expression
  cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators
  cql3: expr: pass non-empty keyspace name in prepare_binary_operator
  cql3: expr: take reference to schema in prepare_binary_operator
2023-01-18 16:55:18 +02:00
Jan Ciolek
ae0e955b90 expr_test: test preparing binary_operator with NULL RHS
Make sure that preparing binary_operator works properly
when the RHS is NULL.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:46 +01:00
Jan Ciolek
65b8a09409 expr_test: test preparing IS NOT NULL binary_operator
Add unit test which check that preparing binary_operators
which represent IS NOT NULL works as expected

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:46 +01:00
Jan Ciolek
5b3e6769f1 expr_test: test preparing binary_operator with LIKE
Add unit test which check that preparing binary_operators
with the LIKE operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com
2023-01-18 12:04:45 +01:00
Jan Ciolek
e876496f7f expr_test: test preparing binary_operator with CONTAINS KEY
Add unit test which check that preparing binary_operators
with the CONTAINS KEY operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:45 +01:00
Jan Ciolek
c6d2e1a03e expr_test: test preparing binary_operator with CONTAINS
Add unit test which check that preparing binary_operators
with the CONTAINS operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:45 +01:00
Jan Ciolek
6b147ecaea expr_test: test preparing binary_operator with IN
Add unit test which check that preparing binary_operators
with the IN operation works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:45 +01:00
Jan Ciolek
669d791250 expr_test: test preparing binary_operator with =, !=, <, <=, >, >=
Add unit test which check that preparing binary_operators
with basic comparison operations works as expected.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
60803d12a9 expr_test: use make_*_untyped function in existing tests
Use the newly introduced convenience methods that create
untyped_constant in existing tests.

This will make the code more readable by removing
visual clutter that came with the previous overly
verbose code.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
819390f9fe expr_test_utils: add utilities to create untyped_constant
expression tests often need to create instances of untyped_constant.
Creating them by hand is tedious because the required code is overly verbose.
Having convenience functions for it speeds up test writing.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Jan Ciolek
362bf7f534 expr_test_utils: add make_float_* and make_double_*
Add utilities to create float and double values in tests.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2023-01-18 12:04:44 +01:00
Nadav Har'El
48e2d6a541 Merge 'utils: throw error on malformed input in base64 decode' from Marcin Maliszkiewicz
Several cases where fixed in this patches, all are related to processing of malformed base64 data. Main purpose was to bring alternator implementation closer to what DynamoDB does. We now:
- Throw error when padding is missing during base64 decoding
- Throw error when base64 data is malformed
- In alternator when invalid base64 data is fetched from DB (as opposed to being part of user's request) we now exclude such row during filtering

Additionally some small code quality improvements:
- avoid unnecessary type conversions in calls to rjson:from_strings functions
- avoid some copy constructions in calls to rjson:from_strings functions

Fixes https://github.com/scylladb/scylladb/issues/6487

Closes #11944

* github.com:scylladb/scylladb:
  alternator: evaluate expressions as false for stored malformed binary data
  rjson: avoid copy constructors in from_string calls when possible
  alternator: remove unused parameters from describe_items func
  utils: throw error on malformed input in base64 decode
  utils: throw error on missing padding in base64 decode
2023-01-18 12:40:57 +02:00
Avi Kivity
561f4ca057 test: materialized view: add test exercising synthetic empty-type columns
Materialized views inject synthetic empty-type columns in some conditions.
Since we just touched empty-type serialization/deserialization, add a
test to exercise it and make sure it still works.
2023-01-18 10:38:24 +02:00
Avi Kivity
04925a7b29 cql3: expr: relax evaluate_list() to allow allow NULL elements
Tests are similarly relaxed. A test is added in lwt_test to show
that insertion of a list with NULL is still rejected, though we
allow NULLs in IF conditions.

One test is changed from a list of longs to a list of ints, to
prevent churn in the test helper library.
2023-01-18 10:38:24 +02:00
Avi Kivity
390a0ca47b types: allow lists with NULL
Allow transient lists that contain NULL throughout the
evaluation machinery. This makes is possible to evalute things
like `IF col IN (1, 2, NULL)` without hacks, once LWT conditions
are converted to expressions.

A few tests are relaxed to accommodate the new behavior:
 - cql_query_test's test_null_and_unset_in_collections is relaxed
   to allow `WHERE col IN ?`, with the variable bound to a list
   containing NULL; now it's explicitly allowed
 - expr_test's evaluate_bind_variable_validates_no_null_in_list was
   checking generic lists for NULLs, and was similary relaxed (and
   renamed)
 - expr_Test's evaluate_bind_variable_validates_null_in_lists_recursively
   was similarly relaxed to allow NULLs.
2023-01-18 10:38:24 +02:00
Avi Kivity
00145f9ada test: relax NULL check test predicate
When we start allowing NULL in lists in some contexts, the exact
location where an error is raised (when it's disallowed) will
change. To prepare for that, relax the exception check to just
ensure the word NULL is there, without caring about the exact
wording.
2023-01-18 10:38:24 +02:00
Avi Kivity
da4abccf89 types: make empty type deserialize to non-null value
The empty type is used internally to implement CQL sets on top
of multi-cell maps. The map's key (an atomic cell) represents the
set value, and the map's value is discarded. Since it's unneeded
we use an internal "empty" type.

Currently, it is deserialized into a `data_value` object representing
a NULL. Since it's discarded, it really doesn't matter.

However, with the impending change to change lists to allow NULLs,
it does matter:

 1. the coordinator sets the 'collections_as_maps' flag for LWT
    requests since it wants list indexes (this affects sets too).
 2. the replica responds by serializing a set as a map.
 3. since we start allow NULL collection values, we now serialize
    those NULLs as NULLs.
 4. the coordinator deserializes the map, and complains about NULL
    values, since those are not supported.

The solution is simple, deserialize the empty value as a non-NULL
object. We create an empty empty_type_representation and add the
scaffolding needed. Serialization and deserialization is already
coded, it was just never called for NULL values (which were serialized
with size 0, in collections, rather than size -1, luckily).

A unit test is added.
2023-01-18 10:38:24 +02:00
Tomasz Grabiec
563998b69a Merge 'raft: improve group 0 reconfiguration failure handling' from Kamil Braun
Make it so that failures in `removenode`/`decommission` don't lead to reduced availability, and any leftovers in group 0 can be removed by `removenode`:
- In `removenode`, make the node a non-voter before removing it from the token ring. This removes the possibility of having a group 0 voting member which doesn't correspond to a token ring member. We can still be left with a non-voter, but that's doesn't reduce the availability of group 0.
- As above but for `decommission`.
- Make it possible to remove group 0 members that don't correspond to token ring members from group 0 using `removenode`.
- Add an API to query the current group 0 configuration.

Fixes #11723.

Closes #12502

* github.com:scylladb/scylladb:
  test: test_topology: test for removing garbage group 0 members
  test/pylib: move some utility functions to util.py
  db: system_keyspace: add a virtual table with raft configuration
  db: system_keyspace: improve system.raft_snapshot_config schema
  service: storage_service: better error handling in `decommission`
  service: storage_service: fix indentation in removenode
  service: storage_service: make `removenode` work for group 0 members which are not token ring members
  service/raft: raft_group0: perform read_barrier in wait_for_raft
  service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode
  test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove
  service/raft: raft_group0: link to Raft docs where appropriate
  service/raft: raft_group0: more logging
  service/raft: raft_group0: separate function for checking and waiting for Raft
2023-01-17 21:23:15 +01:00
Kamil Braun
d134c458e5 test/pylib: increase timeout when waiting for cluster before test
Increase the timeout from default 5 minutes to 10 minutes.
Sent as a workaround for #12546 to unblock next promotions.

Closes #12547
2023-01-17 21:03:09 +02:00
Kamil Braun
4f1c317bdc test: test_raft_upgrade: stop servers gracefully in test_recovery_after_majority_loss
This test is frequently failing due to a timeout when we try to restart
one of the nodes. The shutdown procedure apparently hangs when we try to
stop the `hints_manager` service, e.g.:
```
INFO  2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 0] hints_manager - Stopped
INFO  2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop
INFO  2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped
INFO  2023-01-13 03:22:56,997 [shard 0] hints_manager - Stopped
```
observe the 5 minute delay at the end.

There is a known issue about `hints_manager` stop hanging: #8079.

Now, for some reason, this is the only test case that is hitting this
issue. We don't completely understand why. There is one significant
difference between this test case and others: this is the only test case
which kills 2 (out of 3) servers in the cluster and then tries to
gracefully shutdown the last server. There's a hypothesis that the last
server gets stuck trying to send hints to the killed servers. We weren't
able to prove/falsify it yet. But if it's true, then this patch will:
- unblock next promotions,
- give us some important information when we see that the issue stops
  appearing.
In the patch we shutdown all servers gracefully instead of killing them,
like we do in the other test cases.

Closes #12548
2023-01-17 20:51:09 +02:00
Kamil Braun
5545547d07 test: test_topology: test for removing garbage group 0 members
Verify that `removenode` can remove group 0 members which are not token
ring members.
2023-01-17 12:28:00 +01:00
Kamil Braun
c959ec455a test/pylib: move some utility functions to util.py
They were used in test_raft_upgrade, but we want to use them in other
test files too.
2023-01-17 12:28:00 +01:00
Kamil Braun
a483915c62 db: system_keyspace: add a virtual table with raft configuration
Add a new virtual table `system.raft_state` that shows the currently
operating Raft configuration for each present group. The schema is the
same as `system.raft_snapshot_config` (the latter shows the config from
the last snapshot). In the future we plan to add more columns to this
table, showing more information (like the current leader and term),
hence the generic name.

Adding the table requires some plumbing of
`sharded<raft_group_registry>&` through function parameters to make it
accessible from `register_virtual_tables`, but it's mostly
straightforward.

Also added some APIs to `raft_group_registry` to list all groups and
find a given group (returning `nullptr` if one isn't found, not throwing
an exception).
2023-01-17 12:28:00 +01:00