Merge '[Backport 2025.1] logalloc_test: don't test performance in test background_reclaim' from Scylladb[bot]

The test is failing in CI sometimes due to performance reasons.

There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
   doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
   reclaim, the `background_reclaim` scheduling group isn't actually
   guaranteed to get any CPU, regardless of shares. If the process is
   switched out inside the `background_reclaim` group, it might
   accumulate so much vruntime that it won't get any more CPU again
   for a long time.

We have seen both.

This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.

So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test). After the patch,
we only check that the background reclaim happens *eventually*.

Fixes https://github.com/scylladb/scylladb/issues/15677

Backporting this is optional. The test is flaky even in stable branches, but the failure is rare.

- (cherry picked from commit c47f438db3)

- (cherry picked from commit 1c1741cfbc)

Parent PR: #24030

Closes scylladb/scylladb#24092

* github.com:scylladb/scylladb:
  logalloc_test: don't test performance in test `background_reclaim`
  logalloc: make background_reclaimer::free_memory_threshold publicly visible
This commit is contained in:
Pavel Emelyanov
2025-05-19 12:35:39 +03:00
3 changed files with 8 additions and 13 deletions

View File

@@ -812,27 +812,20 @@ SEASTAR_THREAD_TEST_CASE(background_reclaim) {
logalloc::shard_tracker().stop().get();
});
sleep(500ms).get(); // sleep a little, to give the reclaimer a head start
std::vector<managed_bytes> std_allocs;
size_t std_alloc_size = 1000000; // note that managed_bytes fragments these, even in std
for (int i = 0; i < 50; ++i) {
// Background reclaim is supposed to eventually ensure a certain amount of free memory.
while (memory::free_memory() < background_reclaim_free_memory_threshold) {
thread::maybe_yield();
}
auto compacted_pre = logalloc::shard_tracker().statistics().memory_compacted;
fmt::print("compacted {} items {} (pre)\n", compacted_pre, evictable_allocs.size());
std_allocs.emplace_back(managed_bytes::initialized_later(), std_alloc_size);
auto compacted_post = logalloc::shard_tracker().statistics().memory_compacted;
fmt::print("compacted {} items {} (post)\n", compacted_post, evictable_allocs.size());
BOOST_REQUIRE_EQUAL(compacted_pre, compacted_post);
// Pretend to do some work. Sleeping would be too easy, as the background reclaim group would use
// all that time.
//
// Use thread_cputime_clock to prevent overcommitted test machines from stealing CPU time
// and causing test failures.
auto deadline = thread_cputime_clock::now() + 100ms;
while (thread_cputime_clock::now() < deadline) {
thread::maybe_yield();
}
}
}

View File

@@ -405,7 +405,7 @@ class background_reclaimer {
promise<>* _main_loop_wait = nullptr;
future<> _done;
bool _stopping = false;
static constexpr size_t free_memory_threshold = 60'000'000;
static constexpr size_t free_memory_threshold = background_reclaim_free_memory_threshold;
private:
bool have_work() const {
#ifndef SEASTAR_DEFAULT_ALLOCATOR

View File

@@ -29,6 +29,8 @@ constexpr int segment_size_shift = 17; // 128K; see #151, #152
constexpr size_t segment_size = 1 << segment_size_shift;
constexpr size_t max_zone_segments = 256;
constexpr size_t background_reclaim_free_memory_threshold = 60'000'000;
//
// Frees some amount of objects from the region to which it's attached.
//