mutation_partition_v2::apply_monotonically() needs to perform some allocations
in a destructor, to ensure that the invariants of the data structure are
restored before returning. But it is usually called with reclaiming disabled,
so the allocations might fail even in a perfectly healthy node with plenty of
reclaimable memory.
This patch adds a mechanism which allows to reserve some LSA memory (by
asking the allocator to keep it unused) and make it available for allocation
right when we need to guarantee allocation success.
(cherry picked from commit 7b3f55a65f)
In the next patch, we will want to do the thing as
refill_emergency_reserve() does, just with a quantity different
than _emergency_reserve_max. So we split off the shareable part
to a new function, and use it to implement refill_emergency_reserve().
(cherry picked from commit f784be6a7e)
before this change, `reclaim_timer::report()` calls
```c++
fmt::format(", at {}", current_backtrace())
```
which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.
in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.
Fixes#18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18100
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `occupancy_stats`, and
drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.
One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.
We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.
Ref #13930.
Closesscylladb/scylladb#15153
This new exception type inherits from std::bad_alloc and allows logalloc
code to add additional information about why the allocation failed. We
currently have 3 different throw sites for std::bad_alloc in logalloc.cc
and when investigating a coredump produced by --abort-on-lsa-bad-alloc,
it is impossible to determine, which throw-site activated last,
triggering the abort.
This patch fixes that by disambiguating the throw-sites and including it
in the error message printed, right before abort.
Refs: #15373Closesscylladb/scylladb#15503
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, `seastar_memory_segment_store_backend`
is class with virtual method, but it does not have a virtual
dtor. but we do use a unique_ptr<segment_store_backend> to
manage the lifecycle of an intance of its derived class.
to enable the compiler to call the right dtor, we should
mark the base class's dtor as virtual. this should address
following warings from Clang-17:
```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'logalloc::seastar_memory_segment_store_backend' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
delete __ptr;
^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::seastar_memory_segment_store_backend>::operator()' requested here
get_deleter()(std::move(__ptr));
^
/home/kefu/dev/scylladb/utils/logalloc.cc:812:20: note: in instantiation of member function 'std::unique_ptr<logalloc::seastar_memory_segment_store_backend>::~unique_ptr' requested here
: _backend(std::make_unique<seastar_memory_segment_store_backend>())
^
```
and
```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'logalloc::segment_store_backend' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::segment_store_backend>::operator()' requested here
get_deleter()(std::move(__ptr));
^
/home/kefu/dev/scylladb/utils/logalloc.cc:811:5: note: in instantiation of member function 'std::unique_ptr<logalloc::segment_store_backend>::~unique_ptr' requested here
contiguous_memory_segment_store()
^
```
Fixes#12872
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12873
logalloc::tracker has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.
reclaim_timer uses a coarse clock, but does not account for
the measurement error introduced by that -- it can falsely
report reclaims as stalls, even if they are shorter by a full
coarse clock tick from the requested threshold
(blocked-reactor-notify-ms).
Notably, if the stall threshold happens to be smaller or equal to coarse
clock resolution, Scylla's log gets spammed with false stall reports.
The resolution of coarse clocks in Linux is 1/CONFIG_HZ. This is
typically equal to 1 ms or 4 ms, and stall thresholds of this order
can occur in practice.
Eliminate false positives by requiring the measured reclaim duration to
be at least 1 clock tick longer than the configured threshold for it to
be considered a stall.
Fixes#10981Closes#11680
Tools want to be as little disrupting to the environment they run in as possible, because they might be run in a production environment, next to a running scylladb production server. As such, the usual behavior of seastar applications w.r.t. memory is an anti-pattern for tools: they don't want to reserve most of the system memory, in fact they don't want to reserve any amount, instead consuming as much as needed on-demand.
To achieve this, tools want to use the standard allocator. To achieve this they need a seastar option to to instruct seastar to *not* configure and use the seastar allocator and they need LSA to cooperate with the standard allocator.
The former is provided by https://github.com/scylladb/seastar/pull/1211.
The latter is solved by introducing the concept of a `segment_store_backend`, which abstracts away how the memory arena for segments is acquired and managed. We then refactor the existing segment store so that the seastar allocator specific parts are moved to an implementation of this backend concept, then we introduce another backend implementation appropriate to the standard allocator.
Finally, tools configure seastar with the newly introduced option to use the standard allocator and similarly configure LSA to use the standard allocator appropriate backend.
Refs: https://github.com/scylladb/scylladb/issues/9882
This is the last major code piece in scylla for making tools production ready.
Closes#11510
* github.com:scylladb/scylladb:
test/boost: add alternative variant of logalloc test
tools: use standard allocator
utils/logalloc: add use_standard_allocator_segment_pool_backend()
utils/logalloc: introduce segment store backend for standard allocator
utils/logalloc: rebase release segment-store on segment-store-backend
utils/logalloc: introduce segment_store_backend
utils/logalloc: push segment alloc/dealloc to segment_store
test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe
Creating a standard-memory-allocator backend for the segment store.
This is targeted towards tools, which want to configure LSA with a
segment store backend that is appropriate for the standard allocator
(which they want to use).
We want to be able to use this in both release and debug mode. The
former will be used by tools and the latter will be used to run the
logalloc tests with this new backend, making sure it works and doesn't
regress. For this latter, we have to allow the release and debug stores
to coexist in the same build and for the debug store to be able to
delegate to the release store when the standard allocator backend is
used.
Rebase the seastar allocator based segment store implementation on the
recently introduced segment store backend which is now abstracts away
how memory for segments is obtained.
This patch also introduces an explicit `segment_npos` to be used for
cases when a segment -> index mapping fails (segment doesn't belong to
the store). Currently the seastar allocator based store simply doesn't
handle this case, while the standard allocator based store uses 0 as the
implicit invalid index.
We want to make it possible to select the segment-store to be used for
LSA -- the seastar allocator based one or the standard allocator based
on -- at runtime. Currently this choice is made at compile time via
preprocessor switches.
The current standard memory based store is specialized for debug build,
we want something more similar to the seastar standard memory allocator
based one. So we introduce a segment store backend for the current
seastar allocator based store, which abstracts how the backing memory
for all segments is allocated/freed, while keeping the segment <-> index
mapping common. In the next patches we will rebase the current seastar
allocator based segment store on this backend and later introduce
another backend for standard allocator, targeted for release builds.
Currently the actual alloc/dealloc of memory for segments is located
outside the segment stores. We want to abstract away how segments are
allocated, so we move this logic too into the segment store. For now
this results in duplicate code in the two segment store implementations,
but this will soon be gone.
The logger is proof against allocation failures, except if
--abort-on-seastar-bad-alloc is specified. If it is, it will crash.
The reclaim stall report is likely to be called in low memory conditions
(reclaim's job is to alleviate these conditions after all), so we're
likely to crash here if we're reclaiming a very low memory condition
and have a large stall simultaneously (AND we're running in a debug
environment).
Prevent all this by disabling --abort-on-seastar-bad-alloc temporarily.
Fixes#11549Closes#11555
We want to consolidate all the logalloc state into a single object: the
shard tracker. Replacing this global with a member in said object is
part of this effort.
These are pretend free functions, accessing globals in the background,
make them a member of the tracker instead, which everything needed
locally to compute them. Callers still have to access these stats
through the global tracker instance, but this can be changed to happen
through a local instance. Soon....
Instead, get the tracker instance from the region. This requires adding
a `region&` parameter to `with_reserve()`.
This brings us one step closer to eliminating the global tracker.
Instead of a separate global segment pool instance, make it a member of
the already global tracker. Most users are inside the tracker instance
anyway. Outside users can access the pool through the global tracker
instance.
For now this member is initialized from the global tracker instance. But
it allows the members of region impl to be detached from said global,
making a step towards removing it.
segment has some members, which simply forward the call to a
segment_pool method, via the global segment_pool instance. Remove these
and make the callers use the segment pool directly instead.
Don't open-code calling the region_impl
_listeners->moved() in region move-constructor
and move-assignment op.
The other._impl->_region might be different then &other
post region::merge so let the region_impl
decide which region* is moved from.
The new_region is also set to region_impl->_region
so need to open-code that either in the said call sites.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The other _impl is presumed to be engaged already,
so just call other.get_impl() once for both use cases.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We can't be sure that the other_impl->_region == &other
since it could be a result of a previous merge,
so don't decide for it which region to unlisten to,
let it use its current _region.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Current ~region and region::operator= open-code
region_impl::unlisten. Just call it so it will be
easier to maintain.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
First check if _impl is engaged before accessing it
to set its _region = this in the move constructor and
move assignment operator.
Add unit test for these odd orner cases.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It was added in d20fae96a2
as a precaution not to invalidate iterators while
traversing _regions. However it is not requried as no allocation
is done on this synchronous path - therefore there is no
point in preventing reclaim.
This will allow making the respective functions const
as they merely return stats and do not modify the tracker impl.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>