Commit Graph

264 Commits

Author SHA1 Message Date
Avi Kivity
5b541bed72 logalloc: drop region_impl public accessors
With the region heap handle removed from logalloc::region, there is
nothing remaining there that needs violation of the abstraction
boundary, so we can drop these hacks.
2022-07-26 11:12:10 +03:00
Avi Kivity
2cb5f79e9d logalloc, dirty_memory_manager: move size-tracking binomial heap out of logalloc
The region_group mechanism used an intrusive heap handle embedded in
logalloc::region to allow region_group:s to track the largest region. But
with region_group moved out of logalloc, the handle is out of place.

Move it out, introducing a new intermediate class size_tracked_region
to hold the heap handle. We might eventually merge the new class into
memtable (which derives from it), but that requires a large rearrangement
of unit tests, so defer that.
2022-07-26 11:12:10 +03:00
Avi Kivity
ee720fa23b logalloc: relax lifetime rules around region_listener
Currently, a region_listener is added during construction and removed
during destruction. This was done to mimick the old region(region_group&)
constructor, as region_listener replaces region_group.

However, this makes moving the binomial heap handle outside logalloc
difficult. The natural place for the handle is in a derived class
of logalloc::region (e.g. memtable), but members of this derived class
will be destroyed earlier than the logalloc::region here. We could play
trickes with an earlier base class but it's better to just decouple
region lifecycle from listener lifecycle.

Do that be adding listen()/unlisten() methods. Some small awkwardness
remains in that merge() implicitly unlistens (see comment in
region::unlisten).

Unit tests are adjusted.
2022-07-26 11:12:10 +03:00
Avi Kivity
fbe8ea7727 logalloc, dirty_memory_manager: move region_group and associated code
region_group is an abstraction that allows accounting for groups of
regions, but the cost/benefit ratio of maintaining the abstraction
is poor. Each time we need to change decision algorithm of memtable
flushing (admittedly rarely), we need to distill that into an abstraction
for region_groups and then use it. An example is virtual regions groups;
we wanted to account for the partially flushed memtables and had to
invent region groups to stand in their place.

Rather than continuing to invest in the abstraction, break it now
and move it to the memtable dirty memory manager which is responsible
for making those decisions. The relevant code is moved to
dirty_memory_manager.hh and dirty_memory_manager.cc (new file), and
a new unit test file is added as well.

A downside of the change is that unit testing will be more difficult.
2022-07-26 11:12:10 +03:00
Avi Kivity
bffee2540f logalloc: expose tracker_reclaimer_lock
tracker_reclaimer_lock is used by region_group, which is being moved
out of logalloc, so expose it.
2022-07-26 11:12:10 +03:00
Avi Kivity
4ba0658670 logalloc: reimplement tracker_reclaim_lock to avoid using hidden classes
Right now tracker_reclaim_lock uses tracker::impl::reclaiming_lock,
which won't be visible if we want to expose tracker_reclaim_lock and
use it from another translation unit. However, it's simple to switch
to an implementation that doesn't require an unknown-size data member,
and instead increment a counter via a pointer, so do that.
2022-07-26 11:12:10 +03:00
Avi Kivity
652ab6f4a2 logalloc: reduce friendship between region and region_group
- add conversions between region and region_impl
 - add accessor for the binomial heap handle
 - add accessor for region_impl::id()
 - remove friend declarations

This helps in moving region_group to a different source file, where
the definitions of region_impl will not be visible.
2022-07-26 11:12:10 +03:00
Avi Kivity
c91ee9d04e logalloc: decouple region_group from region
As a first step in moving region_group away from logalloc, decouple
communications between region and region_group. We introduce region_listener,
that listens for the events that region passed directly to region_group.
A region_group now installs a region_listener in a region, instead of
having region know about the region_group directly.

This decoupling is still leaky:
 - merge() chooses to forget the merged-from region's region_listener.
  This happens to be suitable for the only user of merge().
 - We're still embedding the binomial heap handle, used by region_group
   to keep track of region sizes, in regions. A complete decoupling would
   transfer that responsibility to region_group.
2022-07-26 11:12:03 +03:00
Michael Livshin
ca21ce8e6f utils: logalloc: fix indentation
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
bcb7404a0e utils: logalloc: split the reclaim_timer in compact_and_evict_locked()
(Into one for the compact part and one for the evict part)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
007d8fb5c9 utils: logalloc: report segment stats if reclaim_segments() times out
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
1d700442ae utils: logalloc: reclaim_timer: add optional extra log callback
The idea is to let the caller add arbitrary extra info to the timeout
report.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
abd7b9f01c utils: logalloc: reclaim_timer: report non-decreasing durations
The hope is that this reduces logspam without losing utility.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
07fdcb268e utils: logalloc: have reclaim_timer print reserve limits
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
256b911fbd utils: logalloc: move reclaim timer destructor for more readability
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
c15e384507 utils: logalloc: define a proper bundle type for reclaim_timer stats
And define/use arithmetics on it.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
0eefbfa3cc utils: logalloc: add arithmetic operations to segment_pool::stats
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Michael Livshin
3fced65542 utils: logalloc: have reclaim timers detect being nested
Make sure that inner timers don't waste CPU measuring anything.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
76ca93b779 utils: logalloc: add more reclaim_timers
Measure stalls at higher resolution.

Refs #6189

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
42db63d012 utils: logalloc: move reclaim_timer to compact_and_evict_locked
track compact_and_evict_locked timing from
all call paths, not only from compact_and_evict.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
fd2b4a4b7d utils: logalloc: pull reclaim_timer definition forward
So it can be used in functions defined earlier in the source file
in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
33785d261e utils: logalloc: reclaim_timer make tracker optional
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
acd82d3b25 utils: logalloc: reclaim_timer: print backtrace if stall detected
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
239992f16c utils: logalloc: reclaim_timer: get call site name
Before adding even more call sites, print the call site
name in the report.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
c4d64c3bf7 utils: logalloc: reclaim_timer: rename set_result
Rename set_result to set_memory_released
to make it clearer what the result means.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
5ce0038e6a utils: logalloc: reclaim_timer: rename _reserve_segments member
Rename reclaim_timer::_reserve_segments to _segments_to_release
as it is clearer and more suitable for later patches
that will add reclaim_timers in more functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Benny Halevy
c34d1a7705 utils: logalloc: reclaim_timer round up microseconds
better report 29000 us than 28999 us.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-14 19:40:09 +03:00
Avi Kivity
528ab5a502 treewide: change metric calls from make_derive to make_counter
make_derive was recently deprecated in favor of make_counter, so
make the change throughput the codebase.

Closes #10564
2022-05-14 12:53:55 +02:00
Botond Dénes
f8da0a8d1e Merge "Conceptualize some static assertions" From Pavel Emelyanov
"
Some templates put constraints onto the involved types with the help of
static assertions. Having them in form of concepts is much better.

tests: unit(dev)
"

* 'br-static-assert-to-concept' of https://github.com/xemul/scylla:
  sstables: Remove excessive type-match assertions
  mutation_reader: Sanitize invocable asserion and concept
  code: Convert is_future result_of assertions into invoke_result concept
  code: Convert is_same+result_of assertions into invocable concepts
  code: Convert nothrow construction assertions into concepts
  code: Convert is_integral assertions to concepts
2022-02-28 13:58:01 +02:00
Tomasz Grabiec
1d75a8c843 lsa: Abort when trying to free a standard allocator object not
allocated through the region

It indicates alloc-dealloc mismatch, and can cause other problems in
the systems like unable to reclaim memory. We want to catch this at
the deallocation site to be able to quickly indentify the offender.

Misbehavior of this sort can cause fake OOMs due to underflow of
_non_lsa_memory_in_use. When it underflows enough,
shard_segment_pool.total_memory() will become 0 and memory reclamation
will stop doing anything.

Refs #10056
2022-02-25 01:42:15 +01:00
Tomasz Grabiec
9dd4153c16 lsa: Abort when _non_lsa_memory_in_use goes negative
It indicates alloc-dealloc mismatch, and can cause other problems in
the systems like unable to reclaim memory. Catch early.

Refs #10056
2022-02-25 01:42:15 +01:00
Pavel Emelyanov
645896335d code: Convert is_same+result_of assertions into invocable concepts
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-24 19:46:10 +03:00
Tomasz Grabiec
7ee79fa770 logalloc: Add more logging
Message-Id: <20220127232009.314402-1-tgrabiec@scylladb.com>
2022-01-28 14:12:33 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
4118f2d8be treewide: replace deprecated seastar::later() with seastar::yield()
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().

This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.

Tests: unit (dev)

Closes #9904
2022-01-12 12:19:19 +01:00
Tomasz Grabiec
7038dc7003 lsa: Fix segment leak on memory reclamation during alloc_buf
alloc_buf() calls new_buf_active() when there is no active segment to
allocate a new active segment. new_buf_active() allocates memory
(e.g. a new segment) so may cause memory reclamation, which may cause
segment compaction, which may call alloc_buf() and re-enter
new_buf_active(). The first call to new_buf_active() would then
override _buf_active and cause the segment allocated during segment
compaction to be leaked.

This then causes abort when objects from the leaked segment are freed
because the segment is expected to be present in _closed_segments, but
isn't. boost::intrusive::list::erase() will fail on assertion that the
object being erased is linked.

Introduced in b5ca0eb2a2.

Fixes #9821
Fixes #9192
Fixes #9825
Fixes #9544
Fixes #9508
Refs #9573

Message-Id: <20211229201443.119812-1-tgrabiec@scylladb.com>
2021-12-30 11:02:08 +02:00
Avi Kivity
f907205b92 utils: logalloc: correct and adjust timing unit in stall report
The stall report uses the millisecond unit, but actually reports
nanoseconds.

Switch to microseconds (milliseconds are a bit too coarse) and
use the safer "duration / 1us" style rather than "duration::count()"
that leads to unit confusion.

Fixes #9733.

Closes #9734
2021-12-06 09:51:57 +02:00
Tomasz Grabiec
bf6898a5a0 lsa: Add sanity checks around lsa_buffer operations
We've been observing hard to explain crashes recently around
lsa_buffer destruction, where the containing segment is absent in
_segment_descs which causes log_heap::adjust_up to abort. Add more
checks to catch certain impossible senarios which can lead to this
sooner.

Refs #9192.
Message-Id: <20211116122346.814437-1-tgrabiec@scylladb.com>
2021-11-16 14:25:02 +02:00
Tomasz Grabiec
4d627affc3 lsa: Mark compact_segment_locked() as noexcept
We cannot recover from a failure in this method. The implementation
makes sure it never happens. Invariants will be broken if this
throws. Detect violations early by marking as noexcept.

We could make it exception safe and try to leave the data structures
in a consistent state but the reclaimer cannot make progress if this throws, so
it's pointless.

Refs #9192
Message-Id: <20211116122019.813418-1-tgrabiec@scylladb.com>
2021-11-16 14:23:10 +02:00
Avi Kivity
c30be50252 utils: logalloc: convert debug printf to fmt::print()
Standardize on one format language.
2021-10-28 10:48:08 +03:00
Avi Kivity
369afe3124 treewide: use coroutine::maybe_yield() instead of co_await make_ready_future()
The dedicated API shows the intent, and may be a tiny bit faster.

Closes #9382
2021-09-23 12:28:56 +02:00
Michael Livshin
0eb2eb1b44 rename coarse_clock to coarse_steady_clock
Also add a comment to explain why it exists.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #9123
2021-08-02 17:41:21 +03:00
Michael Livshin
71d721a97e logalloc: add on-stall memory reclaim diagnostics
Reuse the existing `reclaim_timer` for stall detection.

* Since a timer is now set around every reclaim and compaction, use a
  coarse one for speed.
* Set log level according to conditions (stalls deserve a warning).
* Add compaction/migration/eviction/allocation stats.

Refs #4186.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 21:51:08 +03:00
Michael Livshin
20c760e638 logalloc: split tracker::impl::reclaim into reclaim & reclaim_locked
Similarly to compact_and_evict().

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:34:13 +03:00
Michael Livshin
a96aed3973 logalloc: metrics: remove unneeded captures and a pleonasm
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:34:13 +03:00
Michael Livshin
aa6c8ef582 logalloc: add metrics for evicted and freed memory
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:34:13 +03:00
Michael Livshin
a6283b322b logalloc: count evicted memory
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:34:13 +03:00
Michael Livshin
4bcd91a09a logalloc: count freed memory
(On the individual free() request level, i.e. similarly to allocs)

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-01 19:34:13 +03:00
Tomasz Grabiec
dcd05f77b1 lsa: Avoid excessive eviction if region is not compactible
Introduced in d72b91053b.

If region was not compactible, for example because it has dense
segments, we would keep evicting even though the target for reclaimed
segments was met. In the worst case we may have to evict whole cache.

Refs #9038 (unlikely to be the cause though)
Message-Id: <20210720104039.463662-1-tgrabiec@scylladb.com>
2021-07-20 14:36:14 +03:00
Tomasz Grabiec
50ec3ea295 lsa: Fix misaccunting of used space when allocating lsa_buffers
lsa_buffer allocations are aligned to 4K. If smaller size is
requested, whole 4K is used. However, only requested size was used in
accounting segment occupancy. This can confuse reclaimer which may
think the segment is sparse while it is actually dense, and compacting
it will yield no or little gain. This can cause inefficient memory
reclamation or lack of progress.

Refs #9038
Message-Id: <20210720104110.463812-1-tgrabiec@scylladb.com>
2021-07-20 14:08:06 +03:00