The test is failing in CI sometimes due to performance reasons.
There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
reclaim, the `background_reclaim` scheduling group isn't actually
guaranteed to get any CPU, regardless of shares. If the process is
switched out inside the `background_reclaim` group, it might
accumulate so much vruntime that it won't get any more CPU again
for a long time.
We have seen both.
This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.
So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test). After the patch,
we only check that the background reclaim happens *eventually*.
Fixes https://github.com/scylladb/scylladb/issues/15677
Backporting this is optional. The test is flaky even in stable branches, but the failure is rare.
- (cherry picked from commit c47f438db3)
- (cherry picked from commit 1c1741cfbc)
Parent PR: #24030Closesscylladb/scylladb#24092
* github.com:scylladb/scylladb:
logalloc_test: don't test performance in test `background_reclaim`
logalloc: make background_reclaimer::free_memory_threshold publicly visible
Currently, stream_manager is initialized after storage_service and
so it is stopped before the storage_service is. In its stop method
storage_service accesses stream_manager which is uninitialized
at a time.
Move stream_manager initialization over the storage_service initialization.
Fixes: #23207.
Closesscylladb/scylladb#24008
(cherry picked from commit 9c03255fd2)
Closesscylladb/scylladb#24189
The test is failing in CI sometimes due to performance reasons.
There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
reclaim, the `background_reclaim` scheduling group isn't actually
guaranteed to get any CPU, regardless of shares. If the process is
switched out inside the `background_reclaim` group, it might
accumulate so much vruntime that it won't get any more CPU again
for a long time.
We have seen both.
This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.
So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test).
(cherry picked from commit 1c1741cfbc)
This PR improves and refactors the test.topology.util new_test_keyspace generator
and adds a corresponding create_new_test_keyspace function to be used by most if not
all topology unit tests in order to standardize the way the tests create keyspaces
and to mitigate the python driver create keyspace retry issue: https://github.com/scylladb/python-driver/issues/317Fixes#22342Fixes#21905
Refs https://github.com/scylladb/scylla-enterprise/issues/5060Fixes#23699
- (cherry picked from commit 50ce0aaf1c)
- (cherry picked from commit 5d448f721e)
- (cherry picked from commit f946302369)
- (cherry picked from commit 0fd1b846fe)
- (cherry picked from commit a66ddb7c04)
- (cherry picked from commit df84097a4b)
- (cherry picked from commit 59687c25e0)
- (cherry picked from commit fdb339bf28)
- (cherry picked from commit 205ed113dd)
- (cherry picked from commit 57faab9ffa)
- (cherry picked from commit 4fefffe335)
- (cherry picked from commit 480a5837ab)
- (cherry picked from commit fed078a38a)
- (cherry picked from commit c6653e65ba)
- (cherry picked from commit 9c095b622b)
- (cherry picked from commit 0668c642a2)
- (cherry picked from commit 0e11aad9c5)
- (cherry picked from commit ef85c4b27e)
- (cherry picked from commit b13e48b648)
- (cherry picked from commit a82e734110)
- (cherry picked from commit 629ee3cb46)
- (cherry picked from commit 42a104038d)
- (cherry picked from commit d5e3c578f5)
- (cherry picked from commit c05794c156)
- (cherry picked from commit 966cf82dae)
- (cherry picked from commit 11005b10db)
- (cherry picked from commit ff9c8428df)
- (cherry picked from commit 55b35eb21c)
- (cherry picked from commit 5759a97eb4)
- (cherry picked from commit c68d2a471c)
- (cherry picked from commit e05372afa4)
- (cherry picked from commit 380c5e5ac8)
- (cherry picked from commit 3f35491264)
- (cherry picked from commit e72a9d3faa)
- (cherry picked from commit 47326d01b7)
- (cherry picked from commit 72bc4016e7)
- (cherry picked from commit 4fd6c2d24e)
- (cherry picked from commit 50a8f5c1c0)
- (cherry picked from commit 005ceb77d3)
- (cherry picked from commit 649e68c6db)
- (cherry picked from commit 0b88ea9798)
- (cherry picked from commit 6b37d04aa9)
- (cherry picked from commit e59aca66bf)
- (cherry picked from commit 5ff3153912)
- (cherry picked from commit 20f7eda16e)
- (cherry picked from commit f30e4c6917)
- (cherry picked from commit 96d327fb83)
- (cherry picked from commit 16ef78075c)
- (cherry picked from commit 2d4af01281)
- (cherry picked from commit b810791fbb)
- (cherry picked from commit 46b1850f0c)
- (cherry picked from commit 0564e95c51)
- (cherry picked from commit 12f85ce57c)
- (cherry picked from commit 9829b1594f)
- (cherry picked from commit cbe79b20f7)
- (cherry picked from commit cc281ff88d)
Parent PR: #22399Closesscylladb/scylladb#23408
* github.com:scylladb/scylladb:
test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
test/repair: create_table_insert_data_for_repair: create keyspace with unique name
topology_tasks/test_tablet_tasks: use new_test_keyspace
topology_tasks/test_node_ops_tasks: use new_test_keyspace
topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
topology_custom/test_view_build_status: use new_test_keyspace
topology_custom/test_truncate_with_tablets: use new_test_keyspace
topology_custom/test_topology_failure_recovery: use new_test_keyspace
topology_custom/test_tablets_removenode: use create_new_test_keyspace
topology_custom/test_tablets_migration: use new_test_keyspace
topology_custom/test_tablets_merge: use new_test_keyspace
topology_custom/test_tablets_intranode: use new_test_keyspace
topology_custom/test_tablets_cql: use new_test_keyspace
topology_custom/test_tablets2: use *new_test_keyspace
topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
test/cluster/test_tablets.py: Fix test errorneous indentation
topology_custom/test_tablets: use new_test_keyspace
topology_custom/test_table_desc_read_barrier: use new_test_keyspace
topology_custom/test_shutdown_hang: use new_test_keyspace
topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
topology_custom/test_rpc_compression: use new_test_keyspace
topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
topology_custom/test_raft_no_quorum: use new_test_keyspace
topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
topology_custom/test_query_rebounce: use new_test_keyspace
topology_custom/test_not_enough_token_owners: use new_test_keyspace
topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
topology_custom/test_node_isolation: use create_new_test_keyspace
topology_custom/test_mv_topology_change: use new_test_keyspace
topology_custom/test_mv_tablets_replace: use new_test_keyspace
topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
topology_custom/test_mv_tablets: use new_test_keyspace
topology_custom/test_mv_read_concurrency: use new_test_keyspace
topology_custom/test_mv_fail_building: use new_test_keyspace
topology_custom/test_mv_delete_partitions: use new_test_keyspace
topology_custom/test_mv_building: use new_test_keyspace
topology_custom/test_mv_backlog: use new_test_keyspace
topology_custom/test_mv_admission_control: use new_test_keyspace
topology_custom/test_major_compaction: use new_test_keyspace
topology_custom/test_maintenance_mode: use new_test_keyspace
topology_custom/test_lwt_semaphore: use new_test_keyspace
topology_custom/test_ip_mappings: use new_test_keyspace
topology_custom/test_hints: use new_test_keyspace
topology_custom/test_group0_schema_versioning: use new_test_keyspace
topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
topology_custom/test_read_repair: use new_test_keyspace
topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
test/topology/util: new_test_keyspace: drop keyspace only on success
test/topology/util: refactor new_test_keyspace
test/topology/util: CREATE KEYSPACE IF NOT EXISTS
test/topology/util: new_test_keyspace: accept ManagerClient
And create_new_test_keyspace when we need drop
to be explicit.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e59aca66bf)
Some of the statements in the test are not indented properly
and, as a result, are never run. It's most likely a small mistake,
so let's fix it.
Closesscylladb/scylladb#23659
(cherry picked from commit 0ed21d9cc1)
Using the new_test_keyspace fixture is awkward for this test
as it is written to explicitly drop the created keyspaces
at certain points.
Therefore, just use create_new_test_keyspace to standardize the
creation procedure.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e72a9d3faa)
new_test_keyspace is problematic here since
the presence of the banned node can fail the automatic drop of
the test keyspace due to NoHostAvailable (in debug mode for
some reason)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 55b35eb21c)