There effectively are several test-cases in this test, each calls the
scylla_sstable() to prepare, thus each creates a type in the same scylla
instance. The 2nd attempt ends up with the "already exists" error:
E cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="A user type of name cql_test_1656396925652.type1 already exists"
tests: unit(dev)
https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1075/fixes: #10872
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220628081459.12791-1-xemul@scylladb.com>
A number of improvements in test.py as requested by maintainers:
* don't capture pytest output
* stick to the specific server in control connections
* support --log-level option and pass it to logging module
* when checking if CQL is up, ignore timeout errors
* no longer force schema migration when starting the server
* use test uname, not id, in log output
* improve logging of ScyllaServer
* log what cluster is used for a test
* extend xml output with logs
On the same token, remove mypy warnings and make linter pass on test.py, as well as add some type checking.
Fixes#10871Fixes#10785Closes#10902
* github.com:scylladb/scylla:
test.py: extend xml output with logs
test.py: log what cluster is used for a test
test.py: improve logging of ScyllaServer
test.py: use test uname, not id, in log output
test.py: support --log-level option and pass it to logging module
test.py: make ScyllaServer more reliable and fast
test.py: don't capture pytest output
test.py: add type annotations
test.py: convert log_filename to pathlib
test.py: please linter
test.py: remove mypy warnings
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache (e.g. Adding a user and changing their permissions)
can become pretty annoying.
This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
It also adds an API for flushing the cache to make it easier for users who
don't want to modify their permissions_cache config.
branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable
CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/
dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache
* https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable:
loading_cache_test: Test loading_cache::reset and loading_cache::update_config
api: Add API for resetting authorization cache
authorization_cache: Make permissions cache and authorized prepared statements cache live updateable
auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
utils/loading_cache.hh: Add update_config method
utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
utils/loading_cache.hh: Add reset method
Validate that the size of the cache is zero after calling the
reset method and that the config is being updated correctly
after calling update_config.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
For cases where we have very high values set to permissions_cache validity and
update interval (E.g.: 1 day), whenever a change to permissions is made it's
necessary to update scylla config and decrease these values, since waiting for
all this time to pass wouldn't be viable.
This patch adds an API for resetting the authorization cache so that changing
the config won't be mandatory for these cases.
Usage:
$ curl -X POST http://localhost:10000/authorization_cache/reset
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache(e.g.: Adding a user) can become pretty annoying.
This patch make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch makes authorized_prepared_statements_cache acccept a config struct,
similarly to permissions_cache. This will make it easier to make this cache
live updateable on the next patch.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch adds an update_config method in order to allow live updating the
config for permissions_cache. This method is going to be used in the next
patches after making permissions_cache config live updateable.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch renames the permissions_cache_config struct to loading_cache_config
and moves it to utils/loading_cache.hh. This will make it easier to handle
config updates to the authorization caches on the next patches
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
Change tests to use async mode and add helpers and tests for schema changes.
These test series will be expanded with topology changes.
Closes#10550
* github.com:scylladb/scylla:
test.py topology: repro for issue #1207
test.py: port fixture fails_without_raft
test.py topology: table methods to add/remove index
test.py topology: add/drop table column helpers
test.py topology: insert sequential row
test.py: remove deprecated test test_null
test.py: managed random tables
test.py: test_keyspace fixture async
test.py: rename fixture test_keyspace to keyspace
test.py topology: test with asyncio
This PR adds necessary modifications to perf_simple_query so that it can be used to test performance of the timeout handling path. With an appropriate combination of flags, it is possible to consistently trigger timeouts on every operation.
The following flags are added:
- `--stop-on-error` - if true (which is the default), the test stops after encountering the first exception and reports it; otherwise it causes errors to be counted and reported at the end.
- `--timeout <x>` - allows to use `USE TIMEOUT <x>` in the benchmark query/statement.
- `--bypass-cache` - uses `BYPASS CACHE` in the benchmark query (relevant only to reads).
Examples:
```
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --write
131023.65 tps ( 56.2 allocs/op, 13.2 tasks/op, 49784 insns/op, 0 errors)
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --write --stop-on-error=false --timeout=0s
97163.73 tps ( 53.1 allocs/op, 5.1 tasks/op, 78687 insns/op, 1000000 errors)
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000
154060.36 tps ( 63.1 allocs/op, 12.1 tasks/op, 42998 insns/op, 0 errors)
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --stop-on-error=false --flush --bypass-cache --timeout=0s
30127.43 tps ( 48.2 allocs/op, 14.3 tasks/op, 312416 insns/op, 1000000 errors)
```
Refs: #2363Closes#10899
* github.com:scylladb/scylla:
test: perf: add bypass cache argument
test: perf: add timeout argument
test: perf: count errors and report the count in results
test: perf: add stop-on-error argument
test: perf: coroutinize run_worker()
test: perf: fix crash on exception in time_parallel_ex
This fixes a quadratic behavior in case lots of snapshots with range
tombstones are queued for merging. Before the change, new snapshots
were inserted at the front, which is also where the worker looks
at. Merging a version has a linear component in complexity function
which depends on the number of range tombstones. If we merge snapshots
starting from the latest to oldest then the whole process becomes
quadratic because the version which is merged accumulates an
increasing amont of tombstones, ones which were already merged
before. We should instead merge starting from the oldest snapshots,
this way each tombstone is applied exactly once during merge.
This bug got wose after 4bd4aa2e88,
which makes merging tombstones more expensive.
Closes#10916
When the run scripts for tests of cql-pytest, alternator, redis, etc.,
run Scylla, they should set the UBSAN_OPTIONS and ASAN_OPTIONS so that
if the executable is built with sanitizers enabled, it will ignore false
positives that we know about, and fail on real errors.
The change in this patch affects all test/*/run scripts which use the
this shared Scylla-starting code. test.py already had the same settings,
and it affected the tests that it knows to run directly (unit tests,
cql-pytest, etc.).
Fixes#10904
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10915
Add test and server logs, as well as the unidiff, to
XML output. This makes jenkins reports nicer.
While on it, debug & fix bugs in handling of flaky tests:
- the reset would reset a flaky test even after the last attempt
fails, so it would be impossible to see what happened to it
- the args needed to be reset as well, since execution modifies
them
- we would say that we're going to retry the flaky test when in
fact it was the last attempt to run it and no more retries were
planned
1) Stick to the specific server in control connections.
It could happen that, when starting a cluster and checking
if a specific node is up, the check would actually execute
against an already running node. Prevent this from happening
by setting a white list connection balancing policy for control
connections.
2) When checking if CQL is up, ignore timeout errors
Scylla in debug mode can easily time out on a DDL query,
and the timeout error at start up would lead to the entire cluster
marked as broken. This is too harsh, allow timeouts at start.
3) No longer force schema migration when starting the server
By default, Raft is on, so the nodes are getting schema
through Raft leader. Schema migration significantly slows
down cluster start in debug mode (60 seconds -> 100 seconds),
and even though it was a great test that helped discover
several bugs in Scylla, it shouldn't be part of normal
cluster boot, so disable it.
Repro for bug in concurrent schema changes for many tables and indexing
involved.
Do alter tables by doing in parallel new table creation, alter a table
(_alter), and index other tables (_index).
Original repro had sets of 20 of those and slept for 20 seconds to
settle. This repro does it for Scylla with just 1 set and 1 second.
This issue goes away once Raft is enabled.
https://github.com/scylladb/scylla/issues/1207
Originally at https://issues.apache.org/jira/browse/CASSANDRA-10250
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Port fails_without_raft to higher level conftest file for future use in
topology pytests.
While there, make it async.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
For each table keep a counter and insert rows with sequential values
generated correspondingly by each column's type.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Helpers to create keyspace and manange randomized tables.
Fixture drops all created tables still active after the test finishes.
Includes helper methods to verify schema consistency.
These helpers will be used in Raft schema changes tests coming later.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Run test async using a wrapper for Cassandra python driver's future.
The wrapper was suggested by a user and brought forward by @fruch.
It's based on https://stackoverflow.com/a/49351069 .
Redefine pytest event_loop fixture to avoid issues with fixtures with
scope bigger than function (like keyspace).
See https://github.com/pytest-dev/pytest-asyncio/issues/68
Convert sample test_null to async. More useful test cases will come
afterwards.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
- Use `sstables::generation_type` in more places
- Enforce conceptual separation of `sstables::generation_type` and `int64_t`
- Fix `extremum_tracker` so that `sstables::generation_type` can be non-default-constructible
Fixes#10796.
Closes#10844
* github.com:scylladb/scylla:
sstables: make generation_type an actual separate type
sstables: use generation_type more soundly
extremum_tracker: do not require default-constructible value types
Fixes#9367
The CL counters pending_allocations and requests_blocked_memory are
exposed in graphana (etc) and often referred to as metrics on whether
we are blocking on commit log. But they don't really show this, as
they only measure whether or not we are blocked on the memory bandwidth
semaphore that provides rate back pressure (fixed num bytes/s - sortof).
However, actual tasks in allocation or segment wait is not exposed, so
if we are blocked on disk IO or waiting for segments to become available,
we have no visible metrics.
While the "old" counters certainly are valid, I have yet to ever see them
be non-zero in modern life.
Closes#9368
Currently in docs/alternator/compatibility.md experimental features
and unimplemented features are bunched together under one heading
("unimplemented features"). In this patch we separate them into two
sections. This makes the "unimplemented features" section shorter,
and also allows us to link to the new "experimental features" section
separately.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10893
A scan over range tombstones will ignore preemption, which may cause
reactor stalls or read failure due to std::bad_alloc.
This is a regression introduced in
5e97fb9fc4. _lower_bound_changed was
always set to false, which is later checked at preemption point and
inhibits yielding.
Closes#10900
Adds the "--timeout" argument which allows specifying a timeout used in
all operations. It works by inserting "USING <timeout>" in appropriate
place in the query.
The flag is most useful when set to zero - with an appropriate
combination of other flags (flush, bypass cache) it guarantees that each
operation will time out and performance of the timeout handling logic
can be measured.
Adds the "--stop-on-error" argument to perf_simple_query. When enabled
(and it is enabled by default), the benchmark will propagate exceptions
if any occur in the tested function. Otherwise, errors will be ignored.
Converts the executor::run_worker() method to a coroutine. This will
allow extending the function in further commits without having to
allocate continuations.
The `time_parallel_ex` function creates a sharded<executor> and uses it
to run the benchmark on multiple shards in parallel. However, if the
benchmarking function throws an exception, the sharded<executor> will be
destroyed without being stopped, which triggers an assertion in
sharded<T> destructor.
This commit makes sure that the executor is stopped before being
destroyed by putting `exec.stop()` into a `seastar::defer`.
* seastar ff46af9ae0...9c016aeebf (8):
> Merge "Handle overflow in token bucket replenisher" from Pavel E
Fixes#10743Fixes#10846
> abort_source: request_abort: restore legacy no-args method
> configure.py: do not use distutils
> configure.py: drop unused "import sys"
> Revert "Use recv syscall instead of read in do_read_some()"
> Use recv syscall instead of read in do_read_some()
> Merge 'Add initial support for websocket protocol' from Andrzej Stalke
> Merge 'abort_source: request_abort: allow passing exception to subscribers' from Benny Halevy
Closes#10898
While we're iterating over the fetched keyspace names, some of these
keyspaces may get dropped. Handle that by checking if the keyspace still
exists.
Also, when retrieving the replication strategy from the keyspace, store
the pointer (which is an `lw_shared_ptr`) to the strategy to keep it
alive, in case the keyspace that was holding it gets dropped.
Closes#10861
Consider this:
- User starts a repair job with http api
- User aborts all repair
- The repair_info object for the repair job is created
- The repair job is not aborted
In this patch, the repair uuid is recorded before repair_info object is
created, so that repair can now abort repair jobs in the early stage.
Fixes#10384Closes#10428
Otherwise cql_transport::additional_options_for_proto_ext() complains
about inability to format the enum class value
Introduced by efc3953c (transport: add rate_limit_error)
Fmt version 8.1.1-5.fc35, fresher one must have it out of the box
Fixes#10884
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220627052703.32024-1-xemul@scylladb.com>
Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones.
This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters.
Refs: https://github.com/scylladb/scylla/issues/3672
Refs: https://github.com/scylladb/scylla/issues/7689
Refs: https://github.com/scylladb/scylla/issues/7933
Tests: manual upgrade test.
I wrote a data set with:
```
./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000
```
This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload:
```
./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100
```
I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs.
Closes#10829
* github.com:scylladb/scylla:
query: have replica provide the last position
idl/query: add last_position to query_result
mutlishard_mutation_query: propagate compaction state to result builder
multishard_mutation_query: defer creating result builder until needed
querier: use full_position instead of ad-hoc struct
querier: rely on compactor for position tracking
mutation_compactor: add current_full_position() convenience accessor
mutation_compactor: s/_last_clustering_pos/_last_pos/
mutation_compactor: add state accessor to compact_mutation
introduce full_position
idl: move position_in_partition into own header
service/paging: use position_in_partition instead of clustering_key for last row
alternator/serialization: extract value object parsing logic
service/pagers/query_pagers.cc: fix indentation
position_in_partition: add to_string(partition_region) and parse_partition_region()
mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh