339 Commits

Author SHA1 Message Date
Tomasz Grabiec
dd0fc35c63 lsa: Export metrics for reclaim/evict/compact time
Currently, we only know about long reclaims from lsa-timing stall
reports. Shorter reclaims can go under the radar.

Those metrics will help to asses increase in LSA activity, which
translates to higher CPU cost of a workload.

reclaim tracks memory which goes to the standard allocator, e.g. when
entering and allocating_section or in the background reclaimer.

evict/compact count activity towrads building LSA reserve, in
allocating_section entry, or naked LSA allocation.

Closes scylladb/scylladb#27774
2026-01-19 12:08:16 +03:00
copilot-swe-agent[bot]
288d4b49e9 Skip backtrace in lsa-timing logs for preemptible reclaim
Preemptible reclaim is only done from the background reclaimer,
so backtrace is not useful. It's also normal that it takes a long time.
Skip the backtrace when reclaim is preemptible to reduce log noise.

Fixes the issue where background reclaim was printing unnecessary
backtraces in lsa-timing logs when operations took longer than the
stall detection threshold.

Closes: #27692

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2025-12-22 20:02:40 +02:00
Tomasz Grabiec
082342ecad Attach names to allocating sections for better debuggability
Large reserves in allocating_section can cause stalls. We already log
reserve increase, but we don't know which table it belongs to:

  lsa - LSA allocation failure, increasing reserve in section 0x600009f94590 to 128 segments;

Allocating sections used for updating row cache on memtable flush are
notoriously problematic. Each table has its own row_cache, so its own
allocating_section(s). If we attached table name to those sections, we
could identify which table is causing problems. In some issues we
suspected system.raft, but we can't be sure.

This patch allows naming allocating_sections for the purpose of
identifying them in such log messages. I use abstract_formatter for
this purpose to avoid the cost of formatting strings on the hot path
(e.g. index_reader). And also to avoid duplicating strings which are
already stored elsewhere.

Fixes #25799

Closes scylladb/scylladb#27470
2025-12-07 14:14:25 +02:00
Avi Kivity
f0ec9dd8f2 Merge 'utils/logalloc: enforce the max contiguous allocation size limit' from Michał Chojnowski
This series fixes the only known violation of logalloc's allocation size limits (in `chunked_managed_vector`), and then it make those limits hard.

Before the series, LSA handles overly-large allocations by forwarding them to the standard allocator. After the series, an attempt to do an overly large allocations via LSA will trigger an `on_internal_error` instead.

We do this because the allocator fallback logic turned out to have subtle and problematic accounting bugs.
We could fix them, or we can remove the mechanism altogether.
It's hard to say which choice is better. This PR arbitrarily makes the choice to remove the mechanism.
This makes the logic simpler, at the risk of escalating some allocation size bugs to crashes.

See the descriptions of individual commits for more details.

Fixes scylladb/scylladb#23850
Fixes scylladb/scylladb#23851
Fixes scylladb/scylladb#23854

I'm not sure if any of this should be backported or not.

The `chunked_managed_vector` fix could be backported, because it's a bugfix. It's an old bug, though, and we have never observed problems related to it.

The changes to `logalloc` aren't supposed to be fixing any observable problem, so a backport probably has more risk than benefit in this case.

Closes scylladb/scylladb#23944

* github.com:scylladb/scylladb:
  utils/logalloc: enforce LSA allocation size limits
  utils/lsa/chunked_managed_vector: fix the calculation of max_chunk_capacity()
2025-05-29 22:11:41 +03:00
Michał Chojnowski
cb02d47b10 utils/logalloc: enforce LSA allocation size limits
In order to guarantee a decent upper limit on fragmentation,
LSA only handles allocations smaller than 0.1 of a segment.

Allocations larger than this limit are permitted, but they are
not placed in LSA segments. Instead, they are forwarded to
the standard allocator.

We don't really have any use case for this "fallback".
As far as I can tell, it only exists for "historical"
reasons, from times where there were some data structures
which weren't fully adapted to LSA yet.

We don't the fallback to be used.
Long-lived standard allocations are undesirable.
They have higher internal fragmentation than LSA
allocations, and they can cause external fragmentation
in the standard allocator. So we want to eliminate them all.

The only reason to keep the fallback is to soften the impact
if some bug results in limit-exceeding LSA allocations happening
in production. In principle, the fallback turns a crash
(or something similarly drastic) into just a performance problem.

However, it turns out that the fallback is buggy.
Recently we had a bug which caused limit-exceeding LSA allocations
to happen.
And then it turned out that LSA reclaim doesn't deal fully correctly
with evictable non-LSA allocations, and the dirty_memory_manager
accounting for non-LSA allocations is completely wrong.
This resulted in subtle, serious, and hard to understand stability
problems in production.

Arguably the biggest problem is that the "fallback" allocations
weren't reported in any way. They were happening in some tests,
but they were silently permitted, so nobody noticed that they
should be eliminated. If we just had a rate-limited error log
that reports fallback allocations, they would have never got
into a release.

So maybe we could fix the fallback, add more tests for it,
add a warning for when it's used, and keep it.

But this PR instead opts for removing the fallback mechanism
altogether and failing fast. After the patch, if a non-conforming
allocation happens, it will trigger an `on_internal_error`.
With this, we risk a greater impact if some non-conforming allocations
happen in production, but we make the system simpler.

It's hard to say if it's a good tradeoff.
2025-05-29 13:05:08 +02:00
Michał Chojnowski
c47f438db3 logalloc: make background_reclaimer::free_memory_threshold publicly visible
Wanted by the change to the background_reclaim test in the next patch.
2025-05-06 18:59:18 +02:00
Michał Chojnowski
7f9152babc utils/lsa/chunked_managed_vector: fix the calculation of max_chunk_capacity()
`chunked_managed_vector` is a vector-like container which splits
its contents into multiple contiguous allocations if necessary,
in order to fit within LSA's max preferred contiguous allocation
limits.

Each limited-size chunk is stored in a `managed_vector`.
`managed_vector` is unaware of LSA's size limits.
It's up to the user of `managed_vector` to pick a size which
is small enough.

This happens in `chunked_managed_vector::max_chunk_capacity()`.
But the calculation is wrong, because it doesn't account for
the fact that `managed_vector` has to place some metadata
(the backreference pointer) inside the allocation.
In effect, the chunks allocated by `chunked_managed_vector`
are just a tiny bit larger than the limit, and the limit is violated.

Fix this by accounting for the metadata.

Also, before the patch `chunked_managed_vector::max_contiguous_allocation`,
repeats the definition of logalloc::max_managed_object_size.
This is begging for a bug if `logalloc::max_managed_object_size`
changes one day. Adjust it so that `chunked_managed_vector` looks
directly at `logalloc::max_managed_object_size`, as it means to.
2025-04-28 12:30:13 +02:00
Amnon Heiman
bf39a760aa utils/logalloc.cc label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_lsa_total_space_bytes
scylla_lsa_non_lsa_used_space_bytes

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
00810e6a01 treewide: include seastar/core/format.hh instead of seastar/core/print.hh
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.

Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.

In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21574
2024-11-14 17:45:07 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Avi Kivity
656dc438ab utils: logalloc: replace boost with std 2024-10-08 12:07:14 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Michał Chojnowski
7b3f55a65f logalloc: add hold_reserve
mutation_partition_v2::apply_monotonically() needs to perform some allocations
in a destructor, to ensure that the invariants of the data structure are
restored before returning. But it is usually called with reclaiming disabled,
so the allocations might fail even in a perfectly healthy node with plenty of
reclaimable memory.

This patch adds a mechanism which allows to reserve some LSA memory (by
asking the allocator to keep it unused) and make it available for allocation
right when we need to guarantee allocation success.
2024-07-08 16:08:27 +02:00
Michał Chojnowski
f784be6a7e logalloc: generalize refill_emergency_reserve()
In the next patch, we will want to do the thing as
refill_emergency_reserve() does, just with a quantity different
than _emergency_reserve_max. So we split off the shareable part
to a new function, and use it to implement refill_emergency_reserve().
2024-07-04 12:19:01 +02:00
Benny Halevy
e5ca65f78b test/perf: report also log_allocations/op
Currently perf-simple-query --write ignores
log allocations that happen on the memtable
apply path.

This change adds tracking and accounting
of the number of log allocation,
and reporting of thereof.

For reference, here's the output of
build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1
```
random-seed=1
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
78073.55 tps ( 59.4 allocs/op,  16.3 logallocs/op,  14.3 tasks/op,   52991 insns/op,        0 errors)
77263.59 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53282 insns/op,        0 errors)
79913.07 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53295 insns/op,        0 errors)
79554.32 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53284 insns/op,        0 errors)
79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)

median 79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)
median absolute deviation: 761.54
maximum: 79913.07
minimum: 77263.59
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:42:41 +03:00
Kefu Chai
fcf7ca5675 utils/logalloc: do not allocate memory in reclaim_timer::report()
before this change, `reclaim_timer::report()` calls

```c++
fmt::format(", at {}", current_backtrace())
```

which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.

in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.

Fixes #18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18100
2024-04-01 11:01:52 +03:00
Kefu Chai
3d9054991b utils/logalloc: add fmt::formatter for occupancy_stats
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `occupancy_stats`, and
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 11:32:41 +08:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Avi Kivity
ee9cc450d4 logalloc: report increases of reserves
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.

One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.

We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.

Ref #13930.

Closes scylladb/scylladb#15153
2023-10-23 13:37:50 +02:00
Botond Dénes
a0c5dee2aa utils/logalloc: introduce logalloc::bad_alloc
This new exception type inherits from std::bad_alloc and allows logalloc
code to add additional information about why the allocation failed. We
currently have 3 different throw sites for std::bad_alloc in logalloc.cc
and when investigating a coredump produced by --abort-on-lsa-bad-alloc,
it is impossible to determine, which throw-site activated last,
triggering the abort.
This patch fixes that by disambiguating the throw-sites and including it
in the error message printed, right before abort.

Refs: #15373

Closes scylladb/scylladb#15503
2023-09-21 17:43:53 +03:00
Pavel Emelyanov
30959fc9b1 lsa, test: Extend memory footprint test with per-type total sizes
When memory footprint test is over it prints total size taken by row
cache, memtable and sstables as well as individual objects' sizes. It's
also nice to know the details on the row-cache's individual objects.
This patch extends the printing with total size of allocated object
types according to migrator_fn types.

Sample output:

    mutation footprint:
     - in cache:     11040928
     - in memtable:  9142424
     - in sstable:
       mc:   2160000
       md:   2160000
       me:   2160000
     - frozen:       540
     - canonical:    827
     - query result: 342

     sizeof(cache_entry) = 64
     sizeof(memtable_entry) = 64
     sizeof(bptree::node) = 288
     sizeof(bptree::data) = 72
     -- sizeof(decorated_key) = 32
     -- sizeof(mutation_partition) = 96
     -- -- sizeof(_static_row) = 8
     -- -- sizeof(_rows) = 24
     -- -- sizeof(_row_tombstones) = 40

     sizeof(rows_entry) = 144
     sizeof(evictable) = 24
     sizeof(deletable_row) = 72
     sizeof(row) = 16
     radix_tree::inner_node::node_sizes =  48 80 144 272 528 1040
     radix_tree::leaf_node::node_sizes =  120 216 416 816 3104
     sizeof(atomic_cell_or_collection) = 16
     btree::linear_node_size(1) = 24
     btree::inner_node_size = 216
     btree::leaf_node_size = 120
    LSA stats:
      N18compact_radix_tree4treeI13cell_and_hashjE9leaf_nodeE: 360
      N5bplus4dataIl15intrusive_arrayI11cache_entryEN3dht25raw_token_less_comparatorELm16ELNS_10key_searchE0ELNS_10with_debugE0EEE: 5040
      N5bplus4nodeIl15intrusive_arrayI11cache_entryEN3dht25raw_token_less_comparatorELm16ELNS_10key_searchE0ELNS_10with_debugE0EEE: 19296
      17partition_version: 952416
      N11intrusive_b4nodeI10rows_entryXadL_ZNS1_5_linkEEENS1_11tri_compareELm12ELm20ELNS_10key_searchE0ELNS_10with_debugE0EEE: 317472
      10rows_entry: 1429056
      12blob_storage: 254

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15434
2023-09-18 11:23:18 +02:00
Avi Kivity
2303f08eea utils: logalloc: correct asan_interface.h location
It's a system header, so it deserves angle brackets.

Closes #14036
2023-05-29 23:03:25 +03:00
Kefu Chai
3ae11de204 treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:53 +08:00
Kefu Chai
9520acb1a1 logalloc: mark segment_store_backend's virtual
before this change, `seastar_memory_segment_store_backend`
is class with virtual method, but it does not have a virtual
dtor. but we do use a unique_ptr<segment_store_backend> to
manage the lifecycle of an intance of its derived class.
to enable the compiler to call the right dtor, we should
mark the base class's dtor as virtual. this should address
following warings from Clang-17:

```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'logalloc::seastar_memory_segment_store_backend' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::seastar_memory_segment_store_backend>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/utils/logalloc.cc:812:20: note: in instantiation of member function 'std::unique_ptr<logalloc::seastar_memory_segment_store_backend>::~unique_ptr' requested here
        : _backend(std::make_unique<seastar_memory_segment_store_backend>())
                   ^
```
and
```
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'logalloc::segment_store_backend' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::segment_store_backend>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/home/kefu/dev/scylladb/utils/logalloc.cc:811:5: note: in instantiation of member function 'std::unique_ptr<logalloc::segment_store_backend>::~unique_ptr' requested here
    contiguous_memory_segment_store()
    ^
```
Fixes #12872
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12873
2023-02-16 19:05:48 +02:00
Avi Kivity
a2d43bb851 logalloc: disambiguate types and non-type members
logalloc::tracker has some members with the same names as types from
namespace scope. gcc (rightfully) complains that this changes
the meaning of the name. Qualify the types to disambiguate.
2022-11-28 21:58:30 +02:00
Michał Chojnowski
4563cbe595 logalloc: prevent false positives in reclaim_timer
reclaim_timer uses a coarse clock, but does not account for
the measurement error introduced by that -- it can falsely
report reclaims as stalls, even if they are shorter by a full
coarse clock tick from the requested threshold
(blocked-reactor-notify-ms).

Notably, if the stall threshold happens to be smaller or equal to coarse
clock resolution, Scylla's log gets spammed with false stall reports.
The resolution of coarse clocks in Linux is 1/CONFIG_HZ. This is
typically equal to 1 ms or 4 ms, and stall thresholds of this order
can occur in practice.

Eliminate false positives by requiring the measured reclaim duration to
be at least 1 clock tick longer than the configured threshold for it to
be considered a stall.

Fixes #10981

Closes #11680
2022-10-02 13:41:40 +03:00
Avi Kivity
2cec417426 Merge 'tools: use the standard allocator' from Botond Dénes
Tools want to be as little disrupting to the environment they run in as possible, because they might be run in a production environment, next to a running scylladb production server. As such, the usual behavior of seastar applications w.r.t. memory is an anti-pattern for tools: they don't want to reserve most of the system memory, in fact they don't want to reserve any amount, instead consuming as much as needed on-demand.
To achieve this, tools want to use the standard allocator. To achieve this they need a seastar option to to instruct seastar to *not* configure and use the seastar allocator and they need LSA to cooperate with the standard allocator.
The former is provided by https://github.com/scylladb/seastar/pull/1211.
The latter is solved by introducing the concept of a `segment_store_backend`, which abstracts away how the memory arena for segments is acquired and managed. We then refactor the existing segment store so that the seastar allocator specific parts are moved to an implementation of this backend concept, then we introduce another backend implementation appropriate to the standard allocator.
Finally, tools configure seastar with the newly introduced option to use the standard allocator and similarly configure LSA to use the standard allocator appropriate backend.

Refs: https://github.com/scylladb/scylladb/issues/9882
This is the last major code piece in scylla for making tools production ready.

Closes #11510

* github.com:scylladb/scylladb:
  test/boost: add alternative variant of logalloc test
  tools: use standard allocator
  utils/logalloc: add use_standard_allocator_segment_pool_backend()
  utils/logalloc: introduce segment store backend for standard allocator
  utils/logalloc: rebase release segment-store on segment-store-backend
  utils/logalloc: introduce segment_store_backend
  utils/logalloc: push segment alloc/dealloc to segment_store
  test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe
2022-09-20 12:59:34 +03:00
Botond Dénes
a55903c839 utils/logalloc: add use_standard_allocator_segment_pool_backend()
Creating a standard-memory-allocator backend for the segment store.
This is targeted towards tools, which want to configure LSA with a
segment store backend that is appropriate for the standard allocator
(which they want to use).
We want to be able to use this in both release and debug mode. The
former will be used by tools and the latter will be used to run the
logalloc tests with this new backend, making sure it works and doesn't
regress. For this latter, we have to allow the release and debug stores
to coexist in the same build and for the debug store to be able to
delegate to the release store when the standard allocator backend is
used.
2022-09-16 13:02:40 +03:00
Botond Dénes
c1c74005b7 utils/logalloc: introduce segment store backend for standard allocator
To be used by tools, this store backend is compatible with the standard
allocator as it acquires the memory arena for segments via mmap().
2022-09-16 12:16:57 +03:00
Botond Dénes
d2a7ebbe66 utils/logalloc: rebase release segment-store on segment-store-backend
Rebase the seastar allocator based segment store implementation on the
recently introduced segment store backend which is now abstracts away
how memory for segments is obtained.
This patch also introduces an explicit `segment_npos` to be used for
cases when a segment -> index mapping fails (segment doesn't belong to
the store). Currently the seastar allocator based store simply doesn't
handle this case, while the standard allocator based store uses 0 as the
implicit invalid index.
2022-09-16 12:16:57 +03:00
Botond Dénes
3717f7740d utils/logalloc: introduce segment_store_backend
We want to make it possible to select the segment-store to be used for
LSA -- the seastar allocator based one or the standard allocator based
on -- at runtime. Currently this choice is made at compile time via
preprocessor switches.
The current standard memory based store is specialized for debug build,
we want something more similar to the seastar standard memory allocator
based one. So we introduce a segment store backend for the current
seastar allocator based store, which abstracts how the backing memory
for all segments is allocated/freed, while keeping the segment <-> index
mapping common. In the next patches we will rebase the current seastar
allocator based segment store on this backend and later introduce
another backend for standard allocator, targeted for release builds.
2022-09-16 12:16:57 +03:00
Botond Dénes
5ea4d7fb39 utils/logalloc: push segment alloc/dealloc to segment_store
Currently the actual alloc/dealloc of memory for segments is located
outside the segment stores. We want to abstract away how segments are
allocated, so we move this logic too into the segment store. For now
this results in duplicate code in the two segment store implementations,
but this will soon be gone.
2022-09-16 12:16:57 +03:00
Avi Kivity
d3b8c0c8a6 logalloc: don't crash while reporting reclaim stalls if --abort-on-seastar-bad-alloc is specified
The logger is proof against allocation failures, except if
--abort-on-seastar-bad-alloc is specified. If it is, it will crash.

The reclaim stall report is likely to be called in low memory conditions
(reclaim's job is to alleviate these conditions after all), so we're
likely to crash here if we're reclaiming a very low memory condition
and have a large stall simultaneously (AND we're running in a debug
environment).

Prevent all this by disabling --abort-on-seastar-bad-alloc temporarily.

Fixes #11549

Closes #11555
2022-09-15 19:24:39 +02:00
Michał Chojnowski
c61b901828 utils: logalloc: prefer memory::free_memory() to memory::stats().free_memory
The former is a shortcut that does not involve a copy of all stats.
This saves some instructions in the hot path.

Closes #11495
2022-09-08 14:12:20 +03:00
Botond Dénes
5bc499080d utils/logalloc: remove reclaim_timer:: globals
One of them (_active_timer) is moved to shard tracker, the other is made
a simple local in reclaim_timer.
2022-08-23 10:38:58 +03:00
Botond Dénes
5f8971173e utils/logalloc: make s_sanitizer_report_backtrace global a member of tracker
We want to consolidate all the logalloc state into a single object: the
shard tracker. Replacing this global with a member in said object is
part of this effort.
2022-08-23 10:38:58 +03:00
Botond Dénes
499b9a3a7c utils/logalloc: tracker_reclaimer_lock: get shard tracker via constructor arg 2022-08-23 10:38:58 +03:00
Botond Dénes
7d17d675af utils/logalloc: move global stat accessors to tracker
These are pretend free functions, accessing globals in the background,
make them a member of the tracker instead, which everything needed
locally to compute them. Callers still have to access these stats
through the global tracker instance, but this can be changed to happen
through a local instance. Soon....
2022-08-23 10:38:58 +03:00
Botond Dénes
f406151a86 utils/logalloc: allocating_section: don't use the global tracker
Instead, get the tracker instance from the region. This requires adding
a `region&` parameter to `with_reserve()`.
This brings us one step closer to eliminating the global tracker.
2022-08-23 10:38:58 +03:00
Botond Dénes
e968866fa1 utils/logalloc: pass down tracker::impl reference to segment_pool
To get rid of some usages of `shard_tracker()`.
2022-08-23 10:38:58 +03:00
Botond Dénes
3bd94e41bf utils/logalloc: move segment pool into tracker
Instead of a separate global segment pool instance, make it a member of
the already global tracker. Most users are inside the tracker instance
anyway. Outside users can access the pool through the global tracker
instance.
2022-08-23 10:38:58 +03:00
Botond Dénes
5b86dfc35a utils/logalloc: add tracker member to basic_region_impl
For now this member is initialized from the global tracker instance. But
it allows the members of region impl to be detached from said global,
making a step towards removing it.
2022-08-23 10:38:58 +03:00
Botond Dénes
f4056bd344 utils/logalloc: make segment independent of segment pool
segment has some members, which simply forward the call to a
segment_pool method, via the global segment_pool instance. Remove these
and make the callers use the segment pool directly instead.
2022-08-23 10:38:58 +03:00
Benny Halevy
1d9862dab3 logalloc: region_impl: add moved method
Don't open-code calling the region_impl
_listeners->moved() in region move-constructor
and move-assignment op.

The other._impl->_region might be different then &other
post region::merge so let the region_impl
decide which region* is moved from.

The new_region is also set to region_impl->_region
so need to open-code that either in the said call sites.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 10:49:49 +03:00
Benny Halevy
cd4dbb1cae logalloc: region: merge: optimize getting other impl
The other _impl is presumed to be engaged already,
so just call other.get_impl() once for both use cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 10:49:36 +03:00
Benny Halevy
a547cb79e8 logalloc: region: merge: call region_impl::unlisten
We can't be sure that the other_impl->_region == &other
since it could be a result of a previous merge,
so don't decide for it which region to unlisten to,
let it use its current _region.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 10:49:27 +03:00
Benny Halevy
003216de59 logalloc: region: call unlisten rather than open coding it
Current ~region and region::operator= open-code
region_impl::unlisten.  Just call it so it will be
easier to maintain.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 10:49:11 +03:00