new_test_keyspace is problematic here since
the presence of the banned node can fail the automatic drop of
the test keyspace due to NoHostAvailable (in debug mode for
some reason)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 55b35eb21c)
When the test fails with exception, keep the keyspace
intact for post-mortem analysis.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0fd1b846fe)
Define create_new_test_keyspace that can be used in
cases we cannot automatically drop the newly created keyspace
due to e.g. loss of raft majority at the end of the test.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit f946302369)
Following patch will convert topology tests to use
new_test_keyspace and friends.
Some tests restart server and reset the driver connection
so we cannot use the original cql Session for
dropping the created keyspace in the `finally` block.
Pass the ManagerClient instead to get a new cql
session for dropping the keyspace.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 50ce0aaf1c)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This change resolves an issue where selecting a version from the multiversion dropdown on Markdown pages (e.g. https://docs.scylladb.com/manual/stable/alternator/getting-started.html) incorrectly redirected users to the main page instead of the corresponding versioned page.
The underlying cause was that the `multiversion` extension relies on `source_suffix` to identify available pages for URL mapping. Without this configuration, proper redirection fails for `.md` files.
This fix should be backported to `2025.1` to ensure correct behavior. Otherwise, the fix will only take effect in future releases.
Testing locally is non-trivial: clone the repository, apply the changes to each relevant branch, set `smv_remote_whitelist` to "", then run `make multiversionpreview`. Afterward, switch between versions in the dropdown to verify behavior. I've tested it locally, so the best next step is to merge and confirm that it works as expected in the live environment.
Closesscylladb/scylladb#23957
(cherry picked from commit 4ba7182515)
Closesscylladb/scylladb#24010
Currently, flush throws no_such_column_family if a table is dropped. Skip the flush of dropped table instead.
Fixes: #16095.
Needs backport to 2025.1 and 6.2 as they contain the bug
- (cherry picked from commit 91b57e79f3)
- (cherry picked from commit c1618c7de5)
Parent PR: #23876Closesscylladb/scylladb#23905
* github.com:scylladb/scylladb:
test: test table drop during flush
replica: skip flush of dropped table
Currently, stream_session::prepare throws when a table in requests
or summaries is dropped. However, we do not want to fail streaming
if the table is dropped.
Delete table checks from stream_session::prepare. Further streaming
steps can handle the dropped table and finish the streaming successfully.
Fixes: #15257.
Closesscylladb/scylladb#23915
(cherry picked from commit 20c2d6210e)
Closesscylladb/scylladb#24052
When an sstable is identified by sstable_directory as remote-unshared,
it will at some point be moved to the target shard. When it happens a
log-message appears:
sstable_directory - Moving 1 unshared SSTables to shard 1
Processing of tables by sstable_directory often happens in parallel, and
messages from sstable_directory are intermixed. Having a message like
above is not very informative, as it tells nothing about sstables that
are being moved.
Equip the message with ks:cf pair to make it more informative.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23912
(cherry picked from commit d40d6801b0)
Closesscylladb/scylladb#24015
This is passed by reference to the constructor, but a copy is saved into
the _table_shared_data member. A reference to this member is passed down
to all memtable readers. Because of the copy, the memtable readers save
a reference to the memtable_list's member, which goes away together with
the memtable_list when the storage_group is destroyed.
This causes use-after-free when a storage group is destroyed while a
memtable read is still ongoing. The memtable reader keeps the memtable
alive, but its reference to the memtable_table_shared_data becomes
stale.
Fix by saving a reference in the memtable_list too, so memtable readers
receive a reference pointing to the original replica::table member,
which is stable accross tablet migrations and merges.
The copy was introduced by 2a76065e3d.
There was a copy even before this commit, but in the previous vnode-only
world this was fine -- there was one memtable_list per table and it was
around until the table itself was. In the tablet world, this is no
longer given, but the above commit didn't account for this.
A test is included, which reproduces the use-after-free on memtable
migration. The test is somewhat artificial in that the use-after-free
would be prevented by holding on to an ERM, but this is done
intentionaly to keep the test simple. Migration -- unlike merge where
this use-after-free was originally observed -- is easy to trigger from
unit tests.
Fixes: #23762Closesscylladb/scylladb#23984
(cherry picked from commit 0a9ca52cfd)
Closesscylladb/scylladb#24033
The test populates a table with 50k rows, creates a view on that table
and then compares the time spent in streaming vs. gossip scheduling
groups. It only takes 10s in dev mode on my machine, but is much slower
in debug mode in CI - building the view doesn't finish within 2 minutes.
The bigger the view to build, the more accurrate the measurement;
moreover, the test scenario isn't interesting enough to be worth running
it in debug mode as this should be covered by other tests. Therefore,
just skip this test in debug mode.
Fixes: scylladb/scylladb#23862Closesscylladb/scylladb#23866
(cherry picked from commit 3d73c79a72)
Closesscylladb/scylladb#23881
The loading_cache has a periodic timer which acquires the
_timer_reads_gate. The stop() method first closes the gate and then
cancels the timer - this order is necessary because the timer is
re-armed under the gate. However, the timer callback does not check
whether the gate was closed but tries to acquire it, which might result
in unhandled exception which is logged with ERROR severity.
Fix the timer callback by acquiring access to the gate at the beginning
and gracefully returning if the gate is closed. Even though the gate
used to be entered in the middle of the callback, it does not make sense
to execute the timer's logic at all if the cache is being stopped.
Fixes: scylladb/scylladb#23951Closesscylladb/scylladb#23952
(cherry picked from commit 8ffe4b0308)
Closesscylladb/scylladb#23981
Currently, test_tablet_resize_revoked tries to trigger split revoke
by deleting some rows. This method isn't deterministic and so a test
is flaky.
Use error injection to trigger resize revoke.
Fixes: #22570.
Closesscylladb/scylladb#23966
(cherry picked from commit 1f4edd8683)
Closesscylladb/scylladb#23974
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.
Fix this by adding the missing break.
Fixes: scylladb/scylladb#23897
From the code inspection alone it looks like 2025.1 and 6.2 have this problem, so marking for backport to both of them.
- (cherry picked from commit 66acaa1bf8)
- (cherry picked from commit 845cedea7f)
- (cherry picked from commit 670a69007e)
Parent PR: #23914Closesscylladb/scylladb#23949
* github.com:scylladb/scylladb:
test: cluster: add test_bad_initial_token
topology coordinator: do not proceed further on invalid boostrap tokens
cdc: add sanity check for generating an empty generation
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.
Fixes: https://github.com/scylladb/scylladb/issues/22514.
Needs backport to 2025.1 and 6.2 as they contain the bug.
- (cherry picked from commit 53e0f79947)
- (cherry picked from commit e178bd7847)
Parent PR: #23787Closesscylladb/scylladb#23943
* github.com:scylladb/scylladb:
test: add test for getting tasks children
tasks: check whether a node is alive before rpc
Add missing awaits for the rebuild_repair and repair background actions.
Although the background actions hold the _async_gate
which is closed in topology_coordinator::run(),
stop() still needs to await all background action futures
and handle any errors they may have left behind.
Fixes#23755
* The issue exists since 6.2
- (cherry picked from commit d624795fda)
- (cherry picked from commit 6de79d0dd3)
- (cherry picked from commit 7a0f5e0a54)
Parent PR: #17712Closesscylladb/scylladb#23799
* github.com:scylladb/scylladb:
topology_coordinator: stop: await all background_action_holder:s
topology_coordinator: stop: improve error messages
topology_coordinator: stop: define stop_background_action helper
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.
(cherry picked from commit 53e0f79947)