Commit Graph

364 Commits

Author SHA1 Message Date
Calle Wilund
e3153dd5b0 Commitlog replayer: Range-check skip call
Fixes #15269

If segment being replayed is corrupted/truncated we can attempt skipping
completely bogues byte amounts, which can cause assert (i.e. crash) in
file_data_source_impl. This is not a crash-level error, so ensure we
range check the distance in the reader.

v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will
    ensure we log the fact that things are broken

Closes scylladb/scylladb#15270

(cherry picked from commit 6ffb482bf3)
2024-01-05 09:19:28 +02:00
Calle Wilund
560d3c17f0 commitlog: Add keeping track of approximate lowest GC clock for CF entries
Adds a lowest timestamp of GC clock whenever a CF is added to a CL segment
first. Because GC clock is wall clock time and only connected to TTL (not
cell/row timestamps), this gives a fairly accurate view of GC low bounds
per segment.

Includes of course a function to get the all-segment lowest per CF.
2023-10-17 10:26:41 +00:00
Calle Wilund
810d06946f commitlog: Add helper to force new active segment
When called, if active segment holds data, close and replace with pristine one.
2023-10-17 10:26:40 +00:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Pavel Emelyanov
5aea6938ae commitlog: Introduce and use comitlog sched group
Nowadays all commitlog code runs in whatever sched group it's kicked
from. Since IO prio classes are going to be inherited from the current
sched group the commitlog IO loops should be moved into commitlog sched
group, not inherit a "random" one.

There are currently two places that need correct context for IO -- the
.cycle() method and segments replenisher.

`$ perf-simple-query --write -c2` results

--- Before the patch ---
194898.36 tps ( 56.3 allocs/op,  12.7 tasks/op,   54307 insns/op,        0 errors)
199286.23 tps ( 56.2 allocs/op,  12.7 tasks/op,   54375 insns/op,        0 errors)
199815.84 tps ( 56.2 allocs/op,  12.7 tasks/op,   54377 insns/op,        0 errors)
198260.98 tps ( 56.3 allocs/op,  12.7 tasks/op,   54380 insns/op,        0 errors)
198572.86 tps ( 56.2 allocs/op,  12.7 tasks/op,   54371 insns/op,        0 errors)

median 198572.86 tps ( 56.2 allocs/op,  12.7 tasks/op,   54371 insns/op,        0 errors)
median absolute deviation: 713.36
maximum: 199815.84
minimum: 194898.36

--- After the patch ---
194751.80 tps ( 56.3 allocs/op,  12.7 tasks/op,   54331 insns/op,        0 errors)
199084.70 tps ( 56.2 allocs/op,  12.7 tasks/op,   54389 insns/op,        0 errors)
195551.47 tps ( 56.3 allocs/op,  12.7 tasks/op,   54385 insns/op,        0 errors)
197953.47 tps ( 56.3 allocs/op,  12.7 tasks/op,   54386 insns/op,        0 errors)
198710.00 tps ( 56.3 allocs/op,  12.7 tasks/op,   54387 insns/op,        0 errors)

median 197953.47 tps ( 56.3 allocs/op,  12.7 tasks/op,   54386 insns/op,        0 errors)
median absolute deviation: 1131.24
maximum: 199084.70
minimum: 194751.80

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14005
2023-05-23 21:25:57 +03:00
Botond Dénes
52e66e38e7 db/commitlog: s/std::regex/boost::regex/
The former is prone to producing stack-overflow as it uses recursion in
it match implementation.

The migration is entirely mechanical.
2023-04-06 09:51:24 -04:00
Kefu Chai
94c6df0a08 treewide: use fmtlib when printing UUID
this change tries to reduce the number of callers using operator<<()
for printing UUID. they are found by compiling the tree after commenting
out `operator<<(std::ostream& out, const UUID& uuid)`. but this change
alone is not enough to drop all callers, as some callers are using
`operator<<(ostream&, const unordered_map&)` and other overloads to
print ranges whose elements contain UUID. so in order to limit the
 scope of the change, we are not changing them here.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 15:38:45 +08:00
Botond Dénes
e70be47276 Merge 'commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off' from Calle Wilund
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments.

Closes #12950

* github.com:scylladb/scylladb:
  commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
  commitlog: change type of stored size
2023-03-02 12:39:11 +02:00
Calle Wilund
97881091d3 commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off
Fixes #12810

We did not update total_size_on_disk in commitlog totals when use o_dsync was off.
This means we essentially ran with no registered footprint, also causing broken
comparisons in delete_segments.
2023-02-21 16:35:23 +00:00
Calle Wilund
64102780fe commitlog: Use static (reused) regex for (left over) descriptor parse
Refs #11710

Allows reusing regex for segment matching (for opening left-over segments after crash).
Should remove any stalls caused by commitlog replay preparation.

v2: Add unit test for descriptor parsing

Closes #12112
2023-02-21 18:34:04 +02:00
Calle Wilund
6f972ee68b commitlog: change type of stored size
known_size() is technically not a size_t.
2023-02-21 15:26:02 +00:00
Kefu Chai
afd1221b53 commitlog: mark request_controller_timeout_exception_factory::timeout() noexcept
request_controller_timeout_exception_factory::timeout() creates an
instance of `request_controller_timed_out_error` whose ctor is
default-created by compiler from that of timed_out_error, which is
in turn default-created from the one of `std::exception`. and
`std::exception::exception` does not throw. so it's safe to
mark this factory method `noexcept`.

with this specifier, we don't need to worry about the exception thrown
by it, and don't need to handle them if any in `seastar::semaphore`,
where `timeout()` is called for the customized exception.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12759
2023-02-07 14:38:54 +02:00
Michał Chojnowski
fa7e904cd6 commitlog: fix total_size_on_disk accounting after segment file removal
Currently, segment file removal first calls `f.remove_file()` and
does `total_size_on_disk -= f.known_size()` later.
However, `remove_file()` resets `known_size` to 0, so in effect
the freed space in not accounted for.

`total_size_on_disk` is not just a metric. It is also responsible
for deciding whether a segment should be recycled -- it is recycled
only if `total_size_on_disk - known_size < max_disk_size`.
Therefore this bug has dire performance consequences:
if `total_size_on_disk - known_size` ever exceeds `max_disk_size`,
the recycling of commitlog segments will stop permanently, because
`total_size_on_disk - known_size` will never go back below
`max_disk_size` due to the accounting bug. All new segments from this
point will be allocated from scratch.

The bug was uncovered by a QA performance test. It isn't easy to trigger --
it took the test 7 hours of constant high load to step into it.
However, the fact that the effect is permanent, and degrades the
performance of the cluster silently, makes the bug potentially quite severe.

The bug can be easily spotted with Prometheus as infinitely rising
`commitlog_total_size_on_disk` on the affected shards.

Fixes #12645

Closes #12646
2023-01-30 12:20:04 +02:00
Michał Chojnowski
b52bd9ef6a db: commitlog: remove unused max_active_writes()
Dead and misleading code.

Closes #12327
2022-12-16 10:23:03 +02:00
Avi Kivity
19e62d4704 commitlog: delete unused "num_deleted" variable
Since d478896d46 we update the variable, but never read it.
Clang 15 notices and complains. Remove the variable to make it
happy.

Closes #11765
2022-10-13 15:11:32 +02:00
Michał Chojnowski
9b6fc553b4 db: commitlog: don't print INFO logs on shutdown
The intention was for these logs to be printed during the
database shutdown sequence, but it was overlooked that it's not
the only place where commitlog::shutdown is called.
Commitlogs are started and shut down periodically by hinted handoff.
When that happens, these messages spam the log.

Fix that by adding INFO commitlog shutdown logs to database::stop,
and change the level of the commitlog::shutdown log call to DEBUG.

Fixes #11508

Closes #11536
2022-09-14 11:30:53 +03:00
Calle Wilund
a729c2438e commitlog: Make get_segments_to_replay on-demand
Refs #11237

Don't store segments found on init scan in all shard instances,
instead retrieve (based on low time-pos for current gen) when
required. This changes very little, but we at last don't store
pointless string lists in shards 1 to X, and also we can potentially
ask for the list twice. More to the point, goes better hand-in-hand
with the semantics of "delete_segments", where any file sent in is
considered candidate for recycling, and included in footprint.
2022-08-11 06:41:23 +00:00
Calle Wilund
8116c56807 commitlog: Revert/modify fac2bc4 - do footprint add in delete
Fixes #11184
Fixes #11237

In prev (broken) fix for #11184 we added the footprint for left-over
files (replay candidates) to disk footprint on commitlog init.
This effectively prevents us from creating segments iff we have tight
limits. Since we nowadays do quite a bit of inserts _before_ commitlog
replay (system.local, but...) we can end up in a situation where we
deadlock start because we cannot get to the actual replay that will
eventually free things.
Another, not thought through, consequence is that we add a single
footprint to _all_ commitlog shard instances - even though only
shard 0 will get to actually replay + delete (i.e. drop footprint).
So shards 1-X would all be either locked out or performance degraded.

Simplest fix is to add the footprint in delete call instead. This will
lock out segment creation until delete call is done, but this is fast.
Also ensures that only replay shard is involved.
2022-08-10 08:04:03 +00:00
Calle Wilund
fac2bc41ba commitlog: Include "segments_to_replay" in initial footprint
Fixes #11184

Not including it here can cause our estimate of "delete or not" after replay
to be skewed in favour of retaining segments as (new) recycles (or even flip
a counter), and if we have repeated crash+restarts we could be accumulating
an effectivly ever increasing segment footprint

Closes #11205
2022-08-05 12:16:53 +03:00
Benny Halevy
5991482049 commitlog: make discard_completed_segments and friends noexcept
To simplify table::seal_active_memtable error handling
and retry logic.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-27 13:43:17 +03:00
Benny Halevy
acae3cc223 treewide: stop use of deprecated coroutine::make_exception
Convert most use sites from `co_return coroutine::make_exception`
to `co_await coroutine::return_exception{,_ptr}` where possible.

In cases this is done in a catch clause, convert to
`co_return coroutine::exception`, generating an exception_ptr
if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10972
2022-07-07 15:02:16 +03:00
Avi Kivity
33fe28b0c5 Merge 'commitlog allocation/deletion/flush request rate counters + footprint projection' from Calle Wilund
Adds measuring the apparent delta vector of footprint added/removed within
the timer time slice, and potentially include this (if influx is greater
than data removed) in threshold calculation. The idea is to anticipate
crossing usage threshold within a time slice, so request a flush slightly
earlier, hoping this will give all involved more time to do their disk
work.

Obviously, this is very akin to just adjusting the threshold downwards,
but the slight difference is that we take actual transaction rate vs.
segment free rate into account, not just static footprint.

Note: this is a very simplistic version of this anticipation scheme,
we just use the "raw" delta for the timer slice.
A more sophisiticated approach would perhaps do either a lowpass
filtered rate (adjust over longer time), or a regression or whatnot.
But again, the default persiod of 10s is something of an eternity,
so maybe that is superfluous...

Closes #10651

* github.com:scylladb/scylla:
  commitlog: Add (internal) measurement of byte rates add/release/flush-req
  commitlog: Add counters for # bytes released/flush requested
  commitlog: Keep track of last flush high position to avoid double request
  commitlog: Fix counter descriptor language
2022-07-04 16:26:17 +03:00
Pavel Emelyanov
2e1ec36efd commitlog: Add shutdown message
It happens in database::drain(), we know when it starts after keyspaces
are flushed, now it's good to know when it completes

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-04 13:42:45 +03:00
Calle Wilund
688fd31e64 commitlog: Add counters for actual pending allocations + segment wait
Fixes #9367

The CL counters pending_allocations and requests_blocked_memory are
exposed in graphana (etc) and often referred to as metrics on whether
we are blocking on commit log. But they don't really show this, as
they only measure whether or not we are blocked on the memory bandwidth
semaphore that provides rate back pressure (fixed num bytes/s - sortof).

However, actual tasks in allocation or segment wait is not exposed, so
if we are blocked on disk IO or waiting for segments to become available,
we have no visible metrics.

While the "old" counters certainly are valid, I have yet to ever see them
be non-zero in modern life.

Closes #9368
2022-06-28 08:36:27 +03:00
Calle Wilund
8b49718203 commitlog: Add (internal) measurement of byte rates add/release/flush-req
Adds measuring the apparent delta vector of footprint added/removed within
the timer time slice, and potentially include this (if influx is greater
than data removed) in threshold calculation. The idea is to anticipate
crossing usage threshold within a time slice, so request a flush slightly
earlier, hoping this will give all involved more time to do their disk
work.

Obviously, this is very akin to just adjusting the threshold downwards,
but the slight difference is that we take actual transaction rate vs.
segment free rate into account, not just static footprint.

Note: this is a very simplistic version of this anticipation scheme,
we just use the "raw" delta for the timer slice.
A more sophisiticated approach would perhaps do either a lowpass
filtered rate (adjust over longer time), or a regression or whatnot.
But again, the default period of 10s is something of an eternity,
so maybe that is superfluous...
2022-06-20 15:58:36 +00:00
Calle Wilund
6921210bf5 commitlog: Add counters for # bytes released/flush requested
Adds "bytes_released" and "bytes_flush_requested", representing
total bytes released from disk as a result of segment release
(as allocation bytes + overhead - not counting unused "waste"),
resp. total size we've requested flush callbacks to release data,
also counted as actual used bytes in segments we request be made
released.

These counters, together with bytes_written, should in ideal use
cases be at an equilibrium (actually equal), thus observing them
should give an idea on whether we are imbalanced in managing to
release bytes in same rate as they are allocated (i.e. transaction
rate).
2022-06-20 15:58:36 +00:00
Calle Wilund
336383c87e commitlog: Keep track of last flush high position to avoid double request
Apparent mismerge or something. We already have an unused "_flush_position",
intended to keep track of the last requested high rp.
Now actually update and use it. The latter to avoid sending requests for
segments/cf id:s we've already requested external flush of. Also enables
us to ensure we don't do double bookkeep here.
2022-06-20 15:58:26 +00:00
Calle Wilund
c904b3cf35 commitlog: Fix counter descriptor language
Remove superfluous "a"
2022-06-20 15:54:20 +00:00
Avi Kivity
4b53af0bd5 treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines
coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime
of the function object is less ambiguous, and so it is safer. Replace all eligible
occurences (i.e. caller is a coroutine).

One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra
attention since there was a handle_exception() continuation attached. It is converted
to a try/catch.

Closes #10699
2022-05-31 09:06:24 +03:00
Avi Kivity
528ab5a502 treewide: change metric calls from make_derive to make_counter
make_derive was recently deprecated in favor of make_counter, so
make the change throughput the codebase.

Closes #10564
2022-05-14 12:53:55 +02:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Calle Wilund
0e2a3e02ae commitlog: Fold named_file continuations into caller coroutine frame
Saves a continuation. That matters very little. But...
Uses a special awaiter type on returns from the "then(...)"-wrapping
named_file methods (which use a then([...update]) to keep internal
size counters up-to-date, making the continuation instead a stored func
into the returned awaiter, executed on successul resume of the caller
co_await.
2022-04-11 16:34:00 +00:00
Calle Wilund
ed8f0df105 commitlog: Use named named_file objects in delete/dispose/recycle lists
Changes delete/close queue, as well as deletetion queue into one, using
named_file objects + marker. Recycle list now also contains said named
file type.

This removes the need to re-eval file sizes on disk when deleting etc,
which in turn means we can dispose of recalculate_footprint on errors,
thus making things simpler and safer.
2022-04-11 16:34:00 +00:00
Calle Wilund
cdd4066006 commitlog: Use named_file size tracking instead of segment var
I.e. "auto-keep-track" of disk footprint
2022-04-11 16:34:00 +00:00
Calle Wilund
320c49e8d3 commitlog: Use named_file in segment
Uses named_file instead of file+string in segments.
Does not do anything particularly useful with it.
2022-04-11 16:34:00 +00:00
Calle Wilund
97bf7b1fc8 commitlog: Add "named_file" file wrapping type
For keeping track of file, name and size, even across
close/rename/delete.
2022-04-11 16:34:00 +00:00
Calle Wilund
7dd7760e8d commitlog: Make flush threshold a config parameter 2022-04-11 16:34:00 +00:00
Calle Wilund
d478896d46 commitlog: kill non-recycled segment management
It has been default for a while now. Makes no sense to not do it.
Even hints can use it (even if it makes no difference there)
2022-04-11 16:34:00 +00:00
Calle Wilund
1e66043412 commitlog: Fix double clearing of _segment_allocating shared_future.
Fixes #10020

Previous fix 445e1d3 tried to close one double invocation,  but added
another, since it failed to ensure all potential nullings of the opt
shared_future happened before a new allocator could reset it.

This simplifies the code by making clearing the shared_future a
pre-requisite for resolving its contents (as read by waiters).

Also removes any need for try-catch etc.

Closes #10024
2022-02-02 23:26:17 +02:00
Calle Wilund
445e1d3e41 commitlog: Ensure we never have more than one new_segment call at a time
Refs #9896

Found by @eliransin. Call to new_segment was wrapped in with_timeout.
This means that if primary caller timed out, we would leave new_segment
calls running, but potentially issue new ones for next caller.

This could lead to reserve segment queue being read simultanously. And
it is not what we want.

Change to always use the shared_future wait, all callers, and clear it
only on result (exception or segment)

Closes #10001
2022-01-31 16:50:22 +02:00
Calle Wilund
43f51e9639 commitlog: Ensure we don't run continuation (task switch) with queues modified
Fixes #9955

In #9348 we handled the problem of failing to delete segment files on disk, and
the need to recompute disk footprint to keep data flow consistent across intermittent
failures. However, because _reserve_segments and _recycled_segments are queues, we
have to empty them to inspect the contents. One would think it is ok for these
queues to be empty for a while, whilst we do some recaclulating, including
disk listing -> continuation switching. But then one (i.e. I) misses the fact
that these queues use the pop_eventually mechanism, which does _not_ handle
a scenario where we push something into an empty queue, thus triggering the
future that resumes a waiting task, but then pop the element immediately, before
the waiting task is run. In fact, _iff_ one does this, not only will things break,
they will in fact start creating undefined behaviour, because the underlying
std::queue<T, circular_buffer> will _not_ do any bounds checks on the pop/push
operations -> we will pop an empty queue, immediately making it non-empty, but
using undefined memory (with luck null/zeroes).

Strictly speakging, seastar::queue::pop_eventually should be fixed to handle
the scenario, but nontheless we can fix the usage here as well, by simply copy
objects and do the calculation "in background" while we potentially start
popping queue again.

Closes #9966
2022-01-26 13:51:01 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Calle Wilund
3c02cab2f7 commitlog: Don't allow error_handler to swallow exception
Fixes #9798

If an exception in allocate_segment_ex is (sub)type of std::system_error,
commit_error_handler might _not_ cause throw (doh), in which case the error
handling code would forget the current exception and return an unusable
segment.

Now only used as an exception pointer replacer.

Closes #9870
2022-01-03 22:46:31 +02:00
Nadav Har'El
b8786b96f4 commitlog: fix missing wait for semaphore units
Commit dcc73c5d4e introduced a semaphore
for excluding concurrent recalculations - _reserve_recalculation_guard.

Unfortunately, the two places in the code which tried to take this
guard just called get_units() - which returns a future<units>, not
units - and never waited for this future to become available.

So this patch adds the missing "co_await" needed to wait for the
units to become available.

Fixes #9770.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211214122612.1462436-1-nyh@scylladb.com>
2021-12-27 16:56:30 +02:00
Avi Kivity
e2c27ee743 Merge 'commitlog: recalculate disk footprint on delete_segment exceptions' from Calle Wilund
If we get errors/exceptions in delete_segments we can (and probably will) loose track of disk footprint counters. This can in turn, if using hard limits, cause us to block indefinitely on segment allocation since we might think we have larger footprint than we actually do.

Of course, if we actually fail deleting a segment, it is 100% true that we still technically hold this disk footprint (now unreachable), but for cases where for example outside forces (or wacky tests) delete a file behind our backs, this might not be true. One could also argue that our footprint is the segments and file names we keep track of, and the rest is exterior sludge.

In any case, if we have any exceptions in delete_segments, we should recalculate disk footprint based on current state, and restart all new_segment paths etc.

Fixes #9348

(Note: this is based on previous PR #9344 - so shows these commits as well. Actual changes are only the latter two).

Closes #9349

* github.com:scylladb/scylla:
  commitlog: Recalculate footprint on delete_segment exceptions
  commitlog_test: Add test for exception in alloc w. deleted underlying file
  commitlog: Ensure failed-to-create-segment is re-deleted
  commitlog::allocate_segment_ex: Don't re-throw out of function
2021-11-16 17:44:56 +02:00
Calle Wilund
3929b7da1f commitlog: Add explicit track var for "wasted space" to avoid double counting
Refs #9331

In segment::close() we add space to managers "wasted" counter. In destructor,
if we can cleanly delete/recycle the file we remove it. However, if we never
went through close (shutdown - ok, exception in batch_cycle - not ok), we can
end up subtracting numbers that were never added in the first place.
Just keep track of the bytes added in a var.

Observed behaviour in above issue is timeouts in batch_cycle, where we
declare the segment closed early (because we cannot add anything more safely
- chunks could get partial/misplaced). Exception will propagate to caller(s),
but the segment will not go through actual close() call -> destructor should
not assume such.

Closes #9598
2021-11-09 09:15:44 +02:00
Calle Wilund
dcc73c5d4e commitlog: Recalculate footprint on delete_segment exceptions
Fixes #9348

If we get exceptions in delete_segments, we can, and probably will, loose
track of footprint counters. We need to recompute the used disk footprint,
otherwise we will flush too often, and even block indefinately on new_seg
iff using hard limits.
2021-09-15 11:53:03 +00:00
Calle Wilund
21152a2f5a commitlog: Ensure failed-to-create-segment is re-deleted
Fixes #9343

If we fail in allocate_segment_ex, we should push the file opened/created
to the delete set to ensure we reclaim the disk space. We should also
ensure that if we did not recycle a file in delete_segments, we still
wake up any recycle waiters iff we made a file delete instead.

Included a small unit test.
2021-09-15 11:40:34 +00:00
Calle Wilund
f3a9f361b9 commitlog::allocate_segment_ex: Don't re-throw out of function
Fixes #9342

commitlog_error_handler rethrows. But we want to not. And run post-handler
cleanup (co_await)
2021-09-15 11:40:34 +00:00
Avi Kivity
cc8fc73761 Merge 'hints: fix bugs in HTTP API for waiting for hints found by running dtest in debug mode' from Piotr Dulikowski
This series of commits fixes a small number of bugs with current implementation of HTTP API which allows to wait until hints are replayed, found by running the `hintedhandoff_sync_point_api_test` dtest in debug mode.

Refs: #9320

Closes #9346

* github.com:scylladb/scylla:
  commitlog: make it possible to provide base segment ID
  hints: fill up missing shards with zeros in decoded sync points
  hints: propagate abort signal correctly in wait_for_sync_point
  hints: fix use-after-free when dismissing replay waiters
2021-09-15 12:55:54 +03:00