test_storage_service_keyspace_cleanup_with_no_owned_ranges
from test_storage_service.py creates snapshots with tags based
on current time. Thus if a test runs on the same node twice
with time distance short enough, there may be a name collision
between the snapshots from two runs. This will cause the second
run to fail on assertions.
Use new_test_snapshot fixture to drop snapshots after the test.
Delete my_snapshot_tags as it's no longer necessary.
Fixes: #15680.
Closesscylladb/scylladb#15683
Rewrite test that checks whether task_manager/wait_task works properly.
The old version didn't work. Delete functions used in old version.
Closes#14959
* github.com:scylladb/scylladb:
test: rewrite wait_task test
test: move ThreadWrapper to rest_util.py
Fixes#9405
`sync_point` API provided with incorrect sync point id might allocate
crazy amount of memory and fail with `std::bad_alloc`.
To fix this, we can check if the encoded sync point has been modified
before decoding. We can achieve this by calculating a checksum before
encoding, appending it to the encoded sync point, and compering it with
a checksum calculated in `db::hints::decode` before decoding.
Closes#14534
* github.com:scylladb/scylladb:
db: hints: add checksum to sync point encoding
db: hints: add the version_size constant
sync point API provided with incorrect sync point id might allocate
crazy amount of memory and fail with std::bad_alloc.
To fix this, we can check if the encoded sync point has been modified
before decoding. We can achieve this by calculating a checksum before
encoding, appending it to the encoded sync point, and compering
it with a checksum calculated in db::hints::decode before decoding.
When running compaction task test on the same scylla instantion
other tests are run, some compaction tasks from other test cases may
be left in task manager. If they stay in memory long enough, they may
get unregistered during the compaction task test and cause bad_request
status.
Drain old compaction tasks before and after each test.
Fixes: #14584.
Closes#14585
This reverts commit 2a58b4a39a, reversing
changes made to dd63169077.
After patch 87c8d63b7a,
table_resharding_compaction_task_impl::run() performs the forbidden
action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard,
which is a data race that can cause a use-after-free, typically manifesting
as allocator corruption.
Note: before the bad patch, this was avoided by copying the _contents_ of the
lw_shared_ptr into a new, local lw_shared_ptr.
Fixes#14475Fixes#14618Closes#14641
Return 400 Bad Request instead of 500 Internal Server Error
when user requests task or module which does not exist through
task manager and task manager test apis.
Closes#14166
* github.com:scylladb/scylladb:
test: add test checking response status when requested module does not exist
api: fix indentation
api: throw bad_param_exception when requested task/module does not exists
In task manager and task manager test rest apis when a task or module
which does not exist is requested, we get Internal Server Error.
In such cases, wrap thrown exceptions in bad_param_exception
to respond with Bad Request code.
Modify test accordingly.
In test_compaction_task.py tests concerning the same type of compaction
are squashed together so that they are run synchronously and there is
no data race when the tests are run in parallel.
It is possible that a node will have no owned token ranges
in some keyspaces based on their replication strategy,
if the strategy is configured to have no replicas in
this node's data center.
In this case we should go ahead with cleanup that will
effectively delete all data.
Note that this is current very inefficient as we need
to filter every partition and drop it as unowned.
It can be optimized by either special casing this case
or, better, use skip forward to the next owned range.
This will skip to end-of-stream since there are no
owned ranges.
Fixes#13634
Also, add a respective rest_api unit test
Closes#13849
* github.com:scylladb/scylladb:
test: rest_api: test_storage_service: add test_storage_service_keyspace_cleanup_with_no_owned_ranges
compaction_manager: perform_cleanup: handle empty owned ranges
Add a test for get/enable/disable auto_compaction via to column_family api.
And add log messages for admin operations over that api.
Closes#13566
* github.com:scylladb/scylladb:
api: column_family: add log messages for admin operation
test: rest_api: add test_column_family
Fixes#13332
The tests user the discriminator "system" as prefix to assume keyspaces are marked
"internal" inside scylla. This is not true in enterprise universe (replicated key
provider). It maybe/probably should, but that train is sailing right now.
Fix by removing one assert (not correct) and use actual API info in the alternator
test.
Closes#13333
Task manager task implementations of classes that cover
cleanup keyspace compaction which can be started through
/storage_service/keyspace_compaction/ api.
Top level task covers the whole compaction and creates child
tasks on each shard.
Closes#12712
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test cleanup compaction
compaction: create task manager's task for cleanup keyspace compaction on one shard
compaction: create task manager's task for cleanup keyspace compaction
api: add get_table_ids to get table ids from table infos
compaction: create cleanup_compaction_task_impl
The REST test test_storage_service.py::test_toppartitions_pk_needs_escaping
was flaky. It tests the toppartition request, which unfortunately needs
to choose a sampling duration in advance, and we chose 1 second which we
considered more than enough - and indeed typically even 1ms is enough!
but very rarely (only know of only one occurance, in issue #13223) one
second is not enough.
Instead of increasing this 1 second and making this test even slower,
this patch takes a retry approach: The tests starts with a 0.01 second
duration, and is then retried with increasing durations until it succeeds
or a 5-seconds duration is reached. This retry approach has two benefits:
1. It de-flakes the test (allowing a very slow test to take 5 seconds
instead of 1 seconds which wasn't enough), and 2. At the same time it
makes a successful test much faster (it used to always take a full
second, now it takes 0.07 seconds on a dev build on my laptop).
A *failed* test may, in some cases, take 10 seconds after this patch
(although in some other cases, an error will be caught immediately),
but I consider this acceptable - this test should pass, after all,
and a failure indicates a regression and taking 10 seconds will be
the last of our worries in that case.
Fixes#13223.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13238
Implementation of task_manager's task that covers major keyspace compaction
on one shard.
Closes#12662
* github.com:scylladb/scylladb:
test: extend major keyspace compaction tasks test
compaction: create task manager's task for major keyspace compaction on one shard
Before this patch, all scripts which use test/cql-pytest/run.py
looked for the Scylla executable as their first step. This is usually
the right thing to do, except in two cases where Scylla is *not* needed:
1. The script test/cql-pytest/run-cassandra.
2. The script test/alternator/run with the "--aws" option.
So in this patch we change run.py to only look for Scylla when actually
needed (the find_scylla() function is called). In both cases mentioned
above, find_scylla() will never get called and the script can work even
if Scylla was never built.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13010
Task ttl can be set with task manager test api, which is disabled
in release mode.
Move get_and_update_ttl from task manager test api to task manager
api, so that it can be used in release mode.
Closes#12894
Task manager api will be used in many tests. Thus, to make it easier
api calls to task manager are wrapped into functions in task_manager_utils.py.
Some helpers that may be reused in other tests are moved there too.
In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to more reasonable time.
These timeout values are the same as used in topology tests.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#12405