With compaction groups, automatic flushing may not pick the user
table. Fix it by using explicit flush.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its blessed status when doing so. This could result
in this permit being re-queued for admission if it was evicted in the
meanwhile, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed
This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.
Fixes: #12603Closes#12694
Currently, nothing prevents us from dropping a user type
used in a user function, even though doing so may make us
unable to use the function correctly.
This patch prevents this behavior by checking all function
argument and return types when executing a drop type statement
and preventing it from completing if the type is referenced
by any of them.
Closes#12680
Add tests which test filtering using IN restriction
with a list which contains a bind variable.
There are other cql-pytest tests which
test IN lists with a bind variable,
but it looks like they don't do filtering.
IN restrictions on primary key columns
are handled in a special way to generate
the right ranges.
These tests hit a different code path as
filtering uses `expr::evaluate()`.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
There was a check for immediate consistency after a decommission
operation has finished in one of the tests, but it turns out that also
after decommission it might take some time for token ring to be updated
on other nodes. Replace the check with a wait.
Also do the wait in another test that performs a sequence of
decommissions. We won't attempt to start another decommission until
every node learns that the previously decommissioned node has left.
Closes#12686
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
free up space in the pool (using `steal`), stop the cluster, then get a
new cluster from the pool.
Between the `steal` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager` would start, because there was free
space in the pool.
This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.
Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.
The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)`
and a `destroy` function passed to the pool (now the pool is responsible
for stopping the cluster and releasing its IPs).
Fixes#11757Closes#12549
* github.com:scylladb/scylladb:
test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests
test/pylib: pool: introduce `replace_dirty`
test/pylib: pool: replace `steal` with `put(is_dirty=True)`
The "cluster manager" used by the topology test suite uses a UNIX-domain
socket to communicate between the cluster manager and the individual tests.
The socket is currently located in the test directory but there is a
problem: In Linux the length of the path used as a UNIX-domain socket
address is limited to just a little over 100 bytes. In Jenkins run, the
test directory names are very long, and we sometimes go over this length
limit and the result is that test.py fails creating this socket.
In this patch we simply put the socket in /tmp instead of the test
directory. We only need to do this change in one place - the cluster
manager, as it already passes the socket path to the individual tests
(using the "--manager-api" option).
Tested by cloning Scylla in a very long directory name.
A test like ./test.py --mode=dev test_concurrent_schema fails before
this patch, and passes with it.
Fixes#12622Closes#12678
Use the newly introduced key generation facilities, instead of the the
old inflexible alternatives and hand-rolled code.
Most of the migrations are mechanic, but there are two tests that
were tricky to migrate:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation
These two tests seems to depend on generated keys all being of the same
size. This makes some sense in the case of the key count estimation
test, but makes no sense at all to me in the case of the sstable run
test.
Creating the default schema, used in the default constructor of
table_for_tests. Allows for getting the default schema without creating
an instance first.
Contains methods to generate partition and clustering keys. In the case
of the former, one can specify the shard to generate keys for.
We currently have some methods to generate these but they are not
generic. Therefore the tests are littered by open-coded variants.
The methods introduced here are completely generic: they can generate
keys for any schema.
Allow caller to specify the minimum size in bytes of the generated
value. Only really works with string-like types (and collections of
these).
Also fixed max size enforcement for strings: before this patch, the
provided max size was dividied by wide string size, instead of the
char width of the actual string type the value is generated for.
The test measured copying of the mutation object, but verified the
measurement against mutation_partition::external_memory_usage(). So
anything allocated on the mutation object level would cause the test
to (incorrectly) fail. Fix that by copying only the mutation_partition
part.
Currently not a problem, because the partition_key is stored in the
in-line storage. Would become a problem once inline storage is
reduced.
This patch switches memtable and cache to use mutation_partition_v2,
and all affected algorithms accordingly.
The memtable reader was changed to use the same cursor implementation
which cache uses, for improved code reuse and reducing risk of bugs
due to discrepancy of algorithms which deal with MVCC.
Range tombstone eviction in cache has now fine granularity, like with
rows.
Fixes#2578Fixes#3288Fixes#10587
Partition version merging can now insert sentinels, which may
temporarily increase unspooled memory. It is no longer true that
unspooled monotonically decreases, which the test verified. Relax it,
and only verify that unspooled is smaller than real dirty.
api::new_timestamp() is not monotonic. In
test_single_row_and_tombstone_not_cached_single_row_range1, we
generate a deletion and an insertion in the deleted reange. If they
get the same timestamp, the inserted row will be covered.
This will surface after cache starts to compact rows with range tombstones.
When inserting range tombstones, the test uses api::new_timestamp(),
but when inserting rows, it uses a fixed timestamp of 1. This will be
problematic when rows get compacted with range tombstone, all rows
would get compacted away, which is not expected by the test. To fix
this, let's use the same timestamp source as range tombstones. This
way rows will get a later timestamp.
compact_for_compaction() will perform cell expiration based on
gc_clock::now(), which introduces sporadic mismatches due to expiry
status of a row marker.
Drop this, we can rely on compaction done by is_equal_to_compacted()
This tombstone has a high chance of obliterating all data, which will
make tests which involve partition version merging not very
interesting. The result will be an empty partition with a
tombstone. Reduce its frequency, so that in MVCC there is a
significant chance of having live data in the combined entry where
individual versions come from the generator.
evictable snapshots must have all entries added to the
tracker. Partition version merging assumes this. Before this was
benign, but will start to trigger asserts in mutation_partition_v2.
This will trigger an assert in apply_monotonically() later in the
series, where this row would be merged with a dummy at the same
position. This row must not be marked as non-dummy, there is an
assumption that non-clustering positions are all dummies. There can't
be two entries with the same position an a different dummy status.
We had one test test_gsi.py::test_gsi_identical that didn't work on KA/LA
sstables due to #6157, so it was skipped. Today, Scylla no longer supports
writing these old sstable formats, so the test can never find itself
running on these versions, so should pass. And indeed it does, and the
"skip" marker can be removed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12651
Today's sstable_test_env starts with a default-configured db::config and, thus, sstables_manager. Test cases that run in this env always create a tempdir to store sstable files in on their own. Next patching makes sstable-manager and friends fully control the data-dir path in order to support object storage for sstables in a nice way, and this behavior of tests upsets this ongoing work.
Said that, this PR configures sstable_test_env with a tempdir and pins down the cases using it to stick to that directory, rather than to the custom one.
Closes#12641
* github.com:scylladb/scylladb:
test: Use tempdir from sstable_test_env
test: Add tmpdir to sstable test env
test: Keep db::config as unique pointer
After topology changes like removing a node, verify that the set of
group 0 members and token ring members is the same.
Modify `get_token_ring_host_ids` to only return NORMAL members. The
previous version which used the `/storage_service/host_id` endpoint
might have returned non-NORMAL members as well.
Fixes: #12153Closes#12619
There are several helpers to make an sstable for the table and two with most of the arguments are only used by tests. This PR leaves table with just one arg-less call thus making it easier to patch further.
Closes#12636
* github.com:scylladb/scylladb:
table: Shrink sstables making API
tests: Use sstables manager to make sstables
distributed_loader: Add helpers to make sstables for reshape/reshard
If a server is stopped suddenly (i.e. not graceful), schema tables might
be in inconsistent state. Add a test case and enable Scylla
configuration option (force_schema_commit_log) to handle this.
Fixes#12218Closes#12630
* github.com:scylladb/scylladb:
pytest: test start after ungraceful stop
test.py: enable force_schema_commit_log
`ScyllaClusterManager` is used to run a sequence of test cases from
a single test file. Between two consecutive tests, if the previous test
left the cluster 'dirty', meaning the cluster cannot be reused, it would
put the old cluster to the pool with `is_dirty=True`, then get a new
cluster from the pool.
Between the `put` and the `get`, a concurrent test run (with its own
instance of `ScyllaClusterManager`) would start, because there was free
space in the pool.
This resulted in undesirable behavior when we ran tests with
`--repeat X` for a large `X`: we would start with e.g. 4 concurrent
runs of a test file, because the pool size was 4. As soon as one of the
runs freed up space in the pool, we would start another concurrent run.
Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so
on. We would have a large number of concurrent runs, even though the
original 4 runs didn't finish yet. All of these concurrent runs would
compete waiting on the pool, and waiting for space in the pool would
take longer and longer (the duration is linear w.r.t number of
concurrent competing runs). Tests would then time out because they would
have to wait too long.
Fix that by using the new `replace_dirty` function introduced to the
pool. This function frees up space by returning a dirty cluster and then
immediately takes it away to be used for a new cluster. Thanks to this,
we will only have at most as many concurrent runs as the pool size. For
example with --repeat 8 and pool size 4, we would run 4 concurrent runs
and start the 5th run only when one of the original 4 runs finishes,
then the 6th run when a second run finishes and so on.
Fixes#11757
Used to atomically return a dirty object to the pool and then use the
space freed by this object to get another object. Unlike
`put(is_dirty=True)` followed by `get`, a concurrent waiter cannot take
away our space from us.
A piece of `get` was refactored to a private function `_build_and_get`,
this piece is also used in `replace_dirty`.
The pool usage was kind of awkward previously: if the user of a pool
decided that a previously borrowed object should no longer be used,
it was their responsibility to destroy the object (releasing associated
resources and so on) and then call `steal()` on the pool to free space
for a new object.
Change the interface. Now the `Pool` constructor obtains a `destroy`
function additionally to the `build` function. The user calls the
function `put` to return both objects that are still usable and those
aren't. For the latter, they set `is_dirty=True`. The pool will
'destroy' the object with the provided function, which could mean e.g.
releasing associated resources.
For example, instead of:
```
if self.cluster.is_dirty:
self.clusters.stop()
self.clusters.release_ips()
self.clusters.steal()
else:
self.clusters.put(self.cluster)
```
we can now use:
```
self.clusters.put(self.cluster, is_dirty=self.cluster.is_dirty)
```
(assuming that `self.clusters` is a pool constructed with a `destroy`
function that stops the cluster and releases its IPs.)
Also extend the interface of the context manager obtained by
`instance()` - the user must now pass a flag `dirty_on_exception`. If
the context manager exists due to an exception and that flag was `True`,
the object will be considered dirty. The dirty flag can also be set
manually on the context manager. For example:
```
async with (cm := pool.instance(dirty_on_exception=True)) as server:
cm.dirty = await run_test(test, server)
# It will also be considered dirty if run_test throws an exception
```
The test cases in sstable_directory_test use a temporary directory that
differs from the one sstables manager starts over. Fix that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This adds the test/lib's tmpdir instance _and_ configures the
data_file_directories with this path. This makes sure sstables manager
and the rest of the test use the same directory for sstables. For now
it doesn't change anything, but helps next patching.
(A neat side effect of this change is that sstable_test_env is now
configured the same way as cql_test_env does)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This test uses two many-args helpers from table calss to create sstables
with desired parameters. The table API in question is not used by any
other code but these few places, to it's better to open-code it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The goal is to make it possible to make config with custom-initialized
options in test_env::impl's constructor initializer list (next patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
`ScyllaCluster.server_stop` had this piece of code:
```
server = self.running.pop(server_id)
if gracefully:
await server.stop_gracefully()
else:
await server.stop()
self.stopped[server_id] = server
```
We observed `stop_gracefully()` failing due to a server hanging during
shutdown. We then ended up in a state where neither `self.running` nor
`self.stopped` had this server. Later, when releasing the cluster and
its IPs, we would release that server's IP - but the server might have
still been running (all servers in `self.running` are killed before
releasing IPs, but this one wasn't in `self.running`).
Fix this by popping the server from `self.running` only after
`stop_gracefully`/`stop` finishes.
Make an analogous fix in `server_start`: put `server` into
`self.running` *before* we actually start it. If the start fails, the
server will be considered "running" even though it isn't necessarily,
but that is OK - if it isn't running, then trying to stop it later will
simply do nothing; if it is actually running, we will kill it (which we
should do) when clearing after the cluster; and we don't leak it.
Closes#12613
Test case for a start of a server after it was stopped suddenly (instead
of gracefully). This coud cause commitlog flush issues.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
change row purge condition for compacting_reader to remove all expired
rows to avoid read perfomance problems when there are many expired
tombstones in row cache
Refs #2252Closes#12565