test/scylla-gdb tests Scylla's gdb debugging tools, and cannot work if
Scylla was compiled without debug information (i.e, the "dev" build mode).
In the past, test/scylla-gdb/run detected this case and printed a clear error:
Scylla executable was compiled without debugging information (-g)
so cannot be used to test gdb. Please set SCYLLA environment variable.
Unfortunately, since recently this detection fails, because even when
Scylla is compiled without debug information we link into it a library
(libwasmtime.a) which has *some* debug information. As a result, instead
of one clear error message, we get all scylla-gdb tests running -
and each of them failing separately. This is ugly and unhelpful.
Each of the tests fail because our "gdb" test fixture tries to load
scylla-gdb.py and fails when the symbols it needs (e.g., "size_t")
cannot be found. So in this patch, we check once for the existance
of this symbol - and if missing we exit pytest instead of failing each
individual test.
Moreover, if loading scylla-gdb.py fails for some other unexpected
reason, let's exit the test as well, instead of failing each individual
test.
Fixes#10863.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10937
Closes#10930
* github.com:scylladb/scylla:
test: perf_row_cache_update: Flush std output after each line
test: perf_row_cache_update: Drain background cleaner before starting the test
test: perf_row_cache_update: Measure memtable filling time
test: perf_row_cache_update: Respect preemption when applying mutations
test: perf_row_cache_update: Drop unused pk variable
Before this patch, the test cql-pytest/test_tools.py left behind
a temporary file in /tmp. It used pytest's "tmp_path_factory" feature,
but it doesn't remove temporary files it creates.
This patch removes the temporary file when the fixture using it ends,
but moreover, it puts the temporary file not in /tmp but rather next
to Scylla's data directory. That directory will be eventually removed
entirely, so even if we accidentally leave a file there, it will
eventually be deleted.
Fixes#10924
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10929
There is a bug introduced in e74c3c8 (4.6.0) which makes memtable
reader skip one a range tombstone for a certain pattern of deletions
and under certain sequence of events.
_rt_stream contains the result of deoverlapping range tombstones which
had the same position, which were sipped from all the versions. The
result of deoverlapping may produce a range tombstone which starts
later, at the same position as a more recent tombstone which has not
been sipped from the partition version yet. If we consume the old
range tombstone from _rt_stream and then refresh the iterators, the
refresh will skip over the newer tombstone.
The fix is to drop the logic which drains _rt_stream so that
_rt_stream is always merged with partition versions.
For the problem to trigger, there have to be multiple MVCC versions
(at least 2) which contain deletions of the following form:
[a, c] @ t0
[a, b) @ t1, [b, d] @ t2
c > b
The proper sequence for such versions is (assuming d > c):
[a, b) @ t1,
[b, d] @ t2
Due to the bug, the reader will produce:
[a, b) @ t1,
[b, c] @ t0
The reader also needs to be preempted right before processing [b, d] @
t2 and iterators need to get invalidated so that
lsa_partition_reader::do_refresh_state() is called and it skips over
[b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it
does emit the proper range tombstone, it's possible that it will violate
fragment order in the stream if _rt_stream accumulated remainders
(possible with 3 MVCC versions).
The problem goes away once MVCC versions merge.
Fixes#10913Fixes#10830Closes#10914
The commits here were extracted from PR https://github.com/scylladb/scylla/pull/10835 which implements upgrade procedure for Raft group 0.
They are mostly refactors which don't affect the behavior of the system, except one: the commit 4d439a16b3 causes all schema changes to be bounced to shard 0. Previously, they would only be bounced when the local Raft feature was enabled. I do that because:
1. eventually, we want this to be the default behavior
2. in the upgrade PR I remove the `is_raft_enabled()` function - the function was basically created with the mindset "Raft is either enabled or not" - which was right when we didn't support upgrade, but will be incorrect when we introduce intermediate states (when we upgrade from non-raft-based to raft-based operations); the upgrade PR introduces another mechanism to dispatch based on the upgrade state, but for the case of bouncing to shard 0, dispatching is simply not necessary.
Closes#10864
* github.com:scylladb/scylla:
service/raft: raft_group_registry: add assertions when fetching servers for groups
service/raft: raft_group_registry: remove `_raft_support_listener`
service/raft: raft_group0: log adding/removing servers to/from group 0 RPC map
service/raft: raft_group0: move group 0 RPC handlers from `storage_service`
service/raft: messaging: extract raft_addr/inet_addr conversion functions
service: storage_service: initialize `raft_group0` in `main` and pass a reference to `join_cluster`
treewide: remove unnecessary `migration_manager::is_raft_enabled()` calls
test/boost: memtable_test: perform schema operations on shard 0
test/boost: cdc_test: remove test_cdc_across_shards
message: rename `send_message_abortable` to `send_message_cancellable`
message: change parameter order in `send_message_oneway_timeout`
There effectively are several test-cases in this test, each calls the
scylla_sstable() to prepare, thus each creates a type in the same scylla
instance. The 2nd attempt ends up with the "already exists" error:
E cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="A user type of name cql_test_1656396925652.type1 already exists"
tests: unit(dev)
https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1075/fixes: #10872
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220628081459.12791-1-xemul@scylladb.com>
A number of improvements in test.py as requested by maintainers:
* don't capture pytest output
* stick to the specific server in control connections
* support --log-level option and pass it to logging module
* when checking if CQL is up, ignore timeout errors
* no longer force schema migration when starting the server
* use test uname, not id, in log output
* improve logging of ScyllaServer
* log what cluster is used for a test
* extend xml output with logs
On the same token, remove mypy warnings and make linter pass on test.py, as well as add some type checking.
Fixes#10871Fixes#10785Closes#10902
* github.com:scylladb/scylla:
test.py: extend xml output with logs
test.py: log what cluster is used for a test
test.py: improve logging of ScyllaServer
test.py: use test uname, not id, in log output
test.py: support --log-level option and pass it to logging module
test.py: make ScyllaServer more reliable and fast
test.py: don't capture pytest output
test.py: add type annotations
test.py: convert log_filename to pathlib
test.py: please linter
test.py: remove mypy warnings
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache (e.g. Adding a user and changing their permissions)
can become pretty annoying.
This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
It also adds an API for flushing the cache to make it easier for users who
don't want to modify their permissions_cache config.
branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable
CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/
dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache
* https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable:
loading_cache_test: Test loading_cache::reset and loading_cache::update_config
api: Add API for resetting authorization cache
authorization_cache: Make permissions cache and authorized prepared statements cache live updateable
auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
utils/loading_cache.hh: Add update_config method
utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
utils/loading_cache.hh: Add reset method
Validate that the size of the cache is zero after calling the
reset method and that the config is being updated correctly
after calling update_config.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
For cases where we have very high values set to permissions_cache validity and
update interval (E.g.: 1 day), whenever a change to permissions is made it's
necessary to update scylla config and decrease these values, since waiting for
all this time to pass wouldn't be viable.
This patch adds an API for resetting the authorization cache so that changing
the config won't be mandatory for these cases.
Usage:
$ curl -X POST http://localhost:10000/authorization_cache/reset
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache(e.g.: Adding a user) can become pretty annoying.
This patch make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch makes authorized_prepared_statements_cache acccept a config struct,
similarly to permissions_cache. This will make it easier to make this cache
live updateable on the next patch.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch adds an update_config method in order to allow live updating the
config for permissions_cache. This method is going to be used in the next
patches after making permissions_cache config live updateable.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch renames the permissions_cache_config struct to loading_cache_config
and moves it to utils/loading_cache.hh. This will make it easier to handle
config updates to the authorization caches on the next patches
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
Change tests to use async mode and add helpers and tests for schema changes.
These test series will be expanded with topology changes.
Closes#10550
* github.com:scylladb/scylla:
test.py topology: repro for issue #1207
test.py: port fixture fails_without_raft
test.py topology: table methods to add/remove index
test.py topology: add/drop table column helpers
test.py topology: insert sequential row
test.py: remove deprecated test test_null
test.py: managed random tables
test.py: test_keyspace fixture async
test.py: rename fixture test_keyspace to keyspace
test.py topology: test with asyncio
This PR adds necessary modifications to perf_simple_query so that it can be used to test performance of the timeout handling path. With an appropriate combination of flags, it is possible to consistently trigger timeouts on every operation.
The following flags are added:
- `--stop-on-error` - if true (which is the default), the test stops after encountering the first exception and reports it; otherwise it causes errors to be counted and reported at the end.
- `--timeout <x>` - allows to use `USE TIMEOUT <x>` in the benchmark query/statement.
- `--bypass-cache` - uses `BYPASS CACHE` in the benchmark query (relevant only to reads).
Examples:
```
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --write
131023.65 tps ( 56.2 allocs/op, 13.2 tasks/op, 49784 insns/op, 0 errors)
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --write --stop-on-error=false --timeout=0s
97163.73 tps ( 53.1 allocs/op, 5.1 tasks/op, 78687 insns/op, 1000000 errors)
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000
154060.36 tps ( 63.1 allocs/op, 12.1 tasks/op, 42998 insns/op, 0 errors)
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --stop-on-error=false --flush --bypass-cache --timeout=0s
30127.43 tps ( 48.2 allocs/op, 14.3 tasks/op, 312416 insns/op, 1000000 errors)
```
Refs: #2363Closes#10899
* github.com:scylladb/scylla:
test: perf: add bypass cache argument
test: perf: add timeout argument
test: perf: count errors and report the count in results
test: perf: add stop-on-error argument
test: perf: coroutinize run_worker()
test: perf: fix crash on exception in time_parallel_ex
This fixes a quadratic behavior in case lots of snapshots with range
tombstones are queued for merging. Before the change, new snapshots
were inserted at the front, which is also where the worker looks
at. Merging a version has a linear component in complexity function
which depends on the number of range tombstones. If we merge snapshots
starting from the latest to oldest then the whole process becomes
quadratic because the version which is merged accumulates an
increasing amont of tombstones, ones which were already merged
before. We should instead merge starting from the oldest snapshots,
this way each tombstone is applied exactly once during merge.
This bug got wose after 4bd4aa2e88,
which makes merging tombstones more expensive.
Closes#10916
When the run scripts for tests of cql-pytest, alternator, redis, etc.,
run Scylla, they should set the UBSAN_OPTIONS and ASAN_OPTIONS so that
if the executable is built with sanitizers enabled, it will ignore false
positives that we know about, and fail on real errors.
The change in this patch affects all test/*/run scripts which use the
this shared Scylla-starting code. test.py already had the same settings,
and it affected the tests that it knows to run directly (unit tests,
cql-pytest, etc.).
Fixes#10904
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10915
Add test and server logs, as well as the unidiff, to
XML output. This makes jenkins reports nicer.
While on it, debug & fix bugs in handling of flaky tests:
- the reset would reset a flaky test even after the last attempt
fails, so it would be impossible to see what happened to it
- the args needed to be reset as well, since execution modifies
them
- we would say that we're going to retry the flaky test when in
fact it was the last attempt to run it and no more retries were
planned
1) Stick to the specific server in control connections.
It could happen that, when starting a cluster and checking
if a specific node is up, the check would actually execute
against an already running node. Prevent this from happening
by setting a white list connection balancing policy for control
connections.
2) When checking if CQL is up, ignore timeout errors
Scylla in debug mode can easily time out on a DDL query,
and the timeout error at start up would lead to the entire cluster
marked as broken. This is too harsh, allow timeouts at start.
3) No longer force schema migration when starting the server
By default, Raft is on, so the nodes are getting schema
through Raft leader. Schema migration significantly slows
down cluster start in debug mode (60 seconds -> 100 seconds),
and even though it was a great test that helped discover
several bugs in Scylla, it shouldn't be part of normal
cluster boot, so disable it.
Repro for bug in concurrent schema changes for many tables and indexing
involved.
Do alter tables by doing in parallel new table creation, alter a table
(_alter), and index other tables (_index).
Original repro had sets of 20 of those and slept for 20 seconds to
settle. This repro does it for Scylla with just 1 set and 1 second.
This issue goes away once Raft is enabled.
https://github.com/scylladb/scylla/issues/1207
Originally at https://issues.apache.org/jira/browse/CASSANDRA-10250
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Port fails_without_raft to higher level conftest file for future use in
topology pytests.
While there, make it async.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
For each table keep a counter and insert rows with sequential values
generated correspondingly by each column's type.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Helpers to create keyspace and manange randomized tables.
Fixture drops all created tables still active after the test finishes.
Includes helper methods to verify schema consistency.
These helpers will be used in Raft schema changes tests coming later.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Run test async using a wrapper for Cassandra python driver's future.
The wrapper was suggested by a user and brought forward by @fruch.
It's based on https://stackoverflow.com/a/49351069 .
Redefine pytest event_loop fixture to avoid issues with fixtures with
scope bigger than function (like keyspace).
See https://github.com/pytest-dev/pytest-asyncio/issues/68
Convert sample test_null to async. More useful test cases will come
afterwards.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
- Use `sstables::generation_type` in more places
- Enforce conceptual separation of `sstables::generation_type` and `int64_t`
- Fix `extremum_tracker` so that `sstables::generation_type` can be non-default-constructible
Fixes#10796.
Closes#10844
* github.com:scylladb/scylla:
sstables: make generation_type an actual separate type
sstables: use generation_type more soundly
extremum_tracker: do not require default-constructible value types
Fixes#9367
The CL counters pending_allocations and requests_blocked_memory are
exposed in graphana (etc) and often referred to as metrics on whether
we are blocking on commit log. But they don't really show this, as
they only measure whether or not we are blocked on the memory bandwidth
semaphore that provides rate back pressure (fixed num bytes/s - sortof).
However, actual tasks in allocation or segment wait is not exposed, so
if we are blocked on disk IO or waiting for segments to become available,
we have no visible metrics.
While the "old" counters certainly are valid, I have yet to ever see them
be non-zero in modern life.
Closes#9368