Merge '[Backport 2025.1] logalloc_test: don't test performance in test background_reclaim' from Scylladb[bot]

The test is failing in CI sometimes due to performance reasons. There are at least two problems: 1. The initial 500ms (wall time) sleep might be too short. If the reclaimer doesn't manage to evict enough memory during this time, the test will fail. 2. During the 100ms (thread CPU time) window given by the test to background reclaim, the `background_reclaim` scheduling group isn't actually guaranteed to get any CPU, regardless of shares. If the process is switched out inside the `background_reclaim` group, it might accumulate so much vruntime that it won't get any more CPU again for a long time. We have seen both. This kind of timing test can't be run reliably on overcommitted machines without modifying the Seastar scheduler to support that (by e.g. using thread clock instead of wall time clock in the scheduler), and that would require an amount of effort disproportionate to the value of the test. So for now, to unflake the test, this patch removes the performance test part. (And the tradeoff is a weakening of the test). After the patch, we only check that the background reclaim happens *eventually*. Fixes https://github.com/scylladb/scylladb/issues/15677 Backporting this is optional. The test is flaky even in stable branches, but the failure is rare. - (cherry picked from commit c47f438db3) - (cherry picked from commit 1c1741cfbc) Parent PR: #24030 Closes scylladb/scylladb#24092 * github.com:scylladb/scylladb: logalloc_test: don't test performance in test `background_reclaim` logalloc: make background_reclaimer::free_memory_threshold publicly visible
2026-05-12 19:02:12 +00:00 · 2025-05-19 12:35:39 +03:00
parent b4e2773e63 c339f464b6
commit 3296920fa3
3 changed files with 8 additions and 13 deletions
--- a/test/boost/logalloc_test.cc
+++ b/test/boost/logalloc_test.cc
@@ -812,27 +812,20 @@ SEASTAR_THREAD_TEST_CASE(background_reclaim) {
        logalloc::shard_tracker().stop().get();
    });

-    sleep(500ms).get(); // sleep a little, to give the reclaimer a head start
-
    std::vector<managed_bytes> std_allocs;
    size_t std_alloc_size = 1000000; // note that managed_bytes fragments these, even in std
    for (int i = 0; i < 50; ++i) {
+        // Background reclaim is supposed to eventually ensure a certain amount of free memory.
+        while (memory::free_memory() < background_reclaim_free_memory_threshold) {
+            thread::maybe_yield();
+        }
+
        auto compacted_pre = logalloc::shard_tracker().statistics().memory_compacted;
        fmt::print("compacted {} items {} (pre)\n", compacted_pre, evictable_allocs.size());
        std_allocs.emplace_back(managed_bytes::initialized_later(), std_alloc_size);
        auto compacted_post = logalloc::shard_tracker().statistics().memory_compacted;
        fmt::print("compacted {} items {} (post)\n", compacted_post, evictable_allocs.size());
        BOOST_REQUIRE_EQUAL(compacted_pre, compacted_post);
-    
-        // Pretend to do some work. Sleeping would be too easy, as the background reclaim group would use
-        // all that time.
-        //
-        // Use thread_cputime_clock to prevent overcommitted test machines from stealing CPU time
-        // and causing test failures.
-        auto deadline = thread_cputime_clock::now() + 100ms;
-        while (thread_cputime_clock::now() < deadline) {
-            thread::maybe_yield();
-        }
    }
 }

--- a/utils/logalloc.cc
+++ b/utils/logalloc.cc
@@ -405,7 +405,7 @@ class background_reclaimer {
    promise<>* _main_loop_wait = nullptr;
    future<> _done;
    bool _stopping = false;
-    static constexpr size_t free_memory_threshold = 60'000'000;
+    static constexpr size_t free_memory_threshold = background_reclaim_free_memory_threshold;
 private:
    bool have_work() const {
 #ifndef SEASTAR_DEFAULT_ALLOCATOR
--- a/utils/logalloc.hh
+++ b/utils/logalloc.hh
@@ -29,6 +29,8 @@ constexpr int segment_size_shift = 17; // 128K; see #151, #152
 constexpr size_t segment_size = 1 << segment_size_shift;
 constexpr size_t max_zone_segments = 256;

+constexpr size_t background_reclaim_free_memory_threshold = 60'000'000;
+
 //
 // Frees some amount of objects from the region to which it's attached.
 //