Files
scylladb/utils
Avi Kivity 6b415cfd4b Merge 'managed_bytes: in the copy constructor, respect the target preferred allocation size' from Michał Chojnowski
Commit 14bf09f447 added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer.

But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too.

But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes.

In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator).

In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments.

Consequences of the bug:

1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2.

2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though).

3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory.

   There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew.

   But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments.

   If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation.

Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072
Fixes https://github.com/scylladb/scylladb/issues/22941
Fixes https://github.com/scylladb/scylladb/issues/22389
Fixes https://github.com/scylladb/scylladb/issues/23781

This is a regression fix, should be backported to all affected releases.

Closes scylladb/scylladb#23782

* github.com:scylladb/scylladb:
  managed_bytes_test: add a reproducer for #23781
  managed_bytes: in the copy constructor, respect the target preferred allocation size
2025-04-17 21:14:10 +03:00
..
2025-01-14 07:56:39 -05:00
2025-01-14 07:56:39 -05:00
2024-12-23 23:37:02 +01:00
2024-12-23 23:37:02 +01:00
2025-01-14 07:56:39 -05:00
2025-01-14 07:56:39 -05:00
2025-04-01 00:07:28 +02:00
2025-04-01 00:07:28 +02:00
2025-01-14 07:56:39 -05:00
2025-01-14 07:56:39 -05:00
2025-01-14 07:56:39 -05:00
2025-01-14 07:56:39 -05:00
2025-04-01 00:07:28 +02:00
2025-01-26 15:54:06 +02:00
2025-01-14 07:56:39 -05:00
2025-01-14 07:56:39 -05:00
2025-02-25 10:32:32 +03:00
2025-01-14 07:56:39 -05:00