Commit Graph

155 Commits

Author SHA1 Message Date
Botond Dénes
d0e99e018b reader_concurrency_semaphore: drop unused stop_ext_{pre,post}()
Left over from primordial times, when reader_concurrency_semaphore was
baseclass for extensions in the separate enterprise repository.
Also remove the now unneded virtual marker from the destructor.

Closes scylladb/scylladb#29399
2026-04-15 14:40:15 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Andrzej Jackowski
10c4b9b5b0 test: verify signal() detects resource negative leak in rcs
reader_concurrency_semaphore::signal() guards against available
resources exceeding the initial limit after a signal, which would
indicate a bug such as double-returning resources. It reports the
issue via on_internal_error_noexcept and clamps resources back to
the initial values. However, before this commit there were no tests
that verified this behavior, so bugs like SCYLLADB-1014 went
undetected.

Add a test that artificially signals resources that were never
consumed and verifies that signal() detects the negative leak and
clamps available resources back to the initial limit.

Refs: SCYLLADB-1014
Fixes: SCYLLADB-1031

Closes scylladb/scylladb#28993
2026-03-20 09:21:20 +03:00
Łukasz Paszkowski
fde09fd136 reader_concurrency_semaphore: Add preemptive_abort_factor to constructors
The new parameter parametrizes the factor used to reject a read
during admission. Its value shall be between 0.0 and 1.0 where
  + 0.0 means a read will never get rejected during admission
  + 1.0 means a read will immediatelly get rejected during admission

Although passing values outside the interaval is possible, they
will have the exact same effects as they were clamped to [0.0, 1.0].
2026-01-28 14:20:01 +01:00
Łukasz Paszkowski
2d3a40e023 permit_reader: Add a new state: preemptive_aborted
A permit gets into the preemptive_aborted state when:
- times out;
- gets rejected from execution due to high chance its execution would
  not finalize on time;

Being in this state means a permit was removed from the wait list,
its internal timer was canceled and semaphore's statistic
`total_reads_shed_due_to_overload` increased.
2026-01-28 14:20:01 +01:00
Łukasz Paszkowski
8829098e90 reader_concurrency_semaphore: Remove cpu_concurrency's default value
The commit 59faa6d, introduces a new parameter called cpu_concurrency
and sets its default value to 1 which violates the commit fbb83dd that
removes all default values from constructors but one used by the unit
tests.

The patch removes the default value of the cpu_concurrency parameter
and alters tests to use the test dedicated reader_concurrency_semaphore
constructor wherever possible.
2026-01-27 15:40:11 +01:00
Benny Halevy
679e73053f reader_concurrency_semaphore: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:48 +03:00
Botond Dénes
f2d5819645 reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload
with_permit() creates a permit, with a self-reference, to avoid
attaching a continuation to the permit's run function. This
self-reference is used to keep the permit alive, until the execution
loop processes it. This self reference has to be carefully cleared on
error-paths, otherwise the permit will become a zombie, effectively
leaking memory.
Instead of trying to handle all loose ends, get rid of this
self-reference altogether: ask caller to provide a place to save the
permit, where it will survive until the end of the call. This makes the
call-site a little bit less nice, but it gets rid of a whole class of
possible bugs.

Fixes: #22588

Closes scylladb/scylladb#22624
2025-02-04 21:27:16 +02:00
Piotr Dulikowski
7383013f43 replica/database: add reader concurrency semaphore groups
Replace the reader concurrency semaphores for user reads and view
updates with the newly introduced reader concurrency semaphore group,
which assigns a semaphore for each service level.

Each group is statically assigned to some pool of memory on startup and
dynamically distribute this memory between the semaphores, relative to
the number of shares of the corresponding scheduling group.

The intent of having a separate reader concurrency semaphore for each
scheduling group is to prevent priority inversion issues due to reads
with different priorities waiting on the same semaphore, as well as make
memory allocation more fair between service levels due to the adjusted
number of shares.
2025-01-02 07:13:34 +01:00
Tomasz Grabiec
bf3d0b3543 reader_concurrency_semaphore: Optimize resource_units destruction by postponing wait list processing
Observed 3% throughput improvement in sstable-heavy workload bounded by CPU.

SStable parsing involves lots of buffer operations which obtain and
destroy resource_units. Before the patch, reosurce_unit destruction
invoked maybe_admit_waiters(), which performs some computations on
waiting permits. We don't really need to admit on each change of
resources, since the CPU is used by other things anyway. We can batch
the computation. There is already a fiber which does this for
processing the _ready_list. We can reuse it for processing _wait_list
as well.

The changes violate an assumption made by tests that releasing
resources immediately triggers an admission check. Therefore, some of
the BOOST_REQUIRE_EQUAL needs to be replaced with REQUIRE_EVENTUALLY_EQUAL
as the admision check is now done in the fiber processing the _ready_list.

`perf-simple-query` --tablets --smp 1 -m 1G results obtained for
fixed 400MHz frequency:

Before:
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...

112590.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41353 insns/op,   17992 cycles/op,        0 errors)
122620.68 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41310 insns/op,   17713 cycles/op,        0 errors)
118169.48 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41353 insns/op,   17857 cycles/op,        0 errors)
120634.65 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41328 insns/op,   17733 cycles/op,        0 errors)
117317.18 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41347 insns/op,   17822 cycles/op,        0 errors)

         throughput: mean=118266.52 standard-deviation=3797.81 median=118169.48 median-absolute-deviation=2368.13 maximum=122620.68 minimum=112590.60
instructions_per_op: mean=41337.86 standard-deviation=18.73 median=41346.89 median-absolute-deviation=14.64 maximum=41352.53 minimum=41309.83
  cpu_cycles_per_op: mean=17823.50 standard-deviation=111.75 median=17821.97 median-absolute-deviation=90.45 maximum=17992.04 minimum=17713.00
```

After
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...

123689.63 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40997 insns/op,   17384 cycles/op,        0 errors)
129643.24 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40997 insns/op,   17325 cycles/op,        0 errors)
128907.27 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41009 insns/op,   17325 cycles/op,        0 errors)
130342.56 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40993 insns/op,   17286 cycles/op,        0 errors)
130294.09 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40972 insns/op,   17336 cycles/op,        0 errors)

         throughput: mean=128575.36 standard-deviation=2792.75 median=129643.24 median-absolute-deviation=1718.73 maximum=130342.56 minimum=123689.63
instructions_per_op: mean=40993.51 standard-deviation=13.23 median=40996.73 median-absolute-deviation=3.30 maximum=41008.86 minimum=40972.48
  cpu_cycles_per_op: mean=17331.16 standard-deviation=35.02 median=17324.84 median-absolute-deviation=6.49 maximum=17383.97 minimum=17286.33
```

Closes scylladb/scylladb#21918

[avi: patch was co-authored by Łukasz Paszkowski <lukasz.paszkowski@scylladb.com>]
2024-12-30 23:37:46 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Botond Dénes
c34127092d reader_concurrency_semaphore: test constructor: don't ignore metrics param
The for_tests constructor has a metrics parameter defaulted to
register_metrics::no, but when delegating to the other constructor, a
hard-coded register_metrics::no is passed. This makes no difference
currently, because all callers use the default and the hard-coded value
corresponds to it. Let's fix it nevertheless to avoid any future
surprises.

Closes scylladb/scylladb#20007
2024-08-04 21:14:42 +03:00
Botond Dénes
07c0a8a6f8 reader_concurrency_semaphore: wire in the configurable cpu concurrency
Before this patch, the semaphore was hard-wired to stop admission, if
there is even a single permit, which is in the need_cpu state.
Therefore, keeping the CPU concurrency at 1.
This patch makes use of the new cpu_concurrency parameter, which was
wired in in the last patches, allowing for a configurable amount of
concurrent need_cpu permits. This is to address workloads where some
small subset of reads are expected to be slow, and can hold up faster
reads behind them in the semaphore queue.
2024-06-27 09:57:11 -04:00
Botond Dénes
59faa6d4ff reader_concurrency_semaphore: add cpu_concurrency constructor parameter
In the case of the user semaphore, this receives the new
reader_concurrency_semaphore_cpu_limit config item.
Not used yet.
2024-06-27 09:57:11 -04:00
Avi Kivity
fdc1449392 treewide: rename flat_mutation_reader_v2 to mutation_reader
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:

  e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
  08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"

as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.

The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit

  026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"

In turn, flat_mutation_reader was introduced in 2017 in commit

  748205ca75 "Introduce flat_mutation_reader"

To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.

Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.

Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.

Some notes about the transition:

 - files were also renamed. In one case (flat_mutation_reader_test.cc), the
   rename target already existed, so we rename to
    mutation_reader_another_test.cc.

 - a namespace 'mutation_reader' with two definitions existed (in
   mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
   class. As a result, a few #includes had to be adjusted.

Closes scylladb/scylladb#19356
2024-06-21 07:12:06 +03:00
Botond Dénes
ba0cc29d82 reader_concurrency_semaphore: make count parameter live-update
So that the amount of count resources can be changed at run-time,
triggered by a e.g. a config change.
Previous constant-count based constructor is left intact, to avoid
patching all clients, as only a small subset will want the new
functionality.
2024-06-13 01:59:21 -04:00
Botond Dénes
3c813fbb99 reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table()
When the new optional parameter has a value, evict only inactive reads,
whose ranges overlap with the provided range. The range for the inactive
read is provided in `register_inactive_read()`. If the inactive read has
no range, ovarlap is assumed and the read is evicted.
This will be used to evict all inactive reads that could potentially use
a cleaned-up tablet.
2024-04-30 01:31:08 -04:00
Botond Dénes
9e7a957ffb reader_concurrency_semaphore: allow storing a range with the inactive reader
This allows specifying the range the inactive read is reading from. To
be used in the next patch to selectively evict inactive reads whose
range overlaps with a certain (tablet) range.
2024-04-30 01:31:08 -04:00
Lakshmi Narayanan Sreethar
76f0d5e35b reader_permit: store schema_ptr instead of raw schema pointer
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.

Fixes #16180

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16658
2024-01-11 08:37:56 +02:00
Avi Kivity
7fce057cda database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
reader_concurrency_sempaphore are triplicated: each metrics is registered
for streaming, user, and system classes.

To fix, just move the metrics registration from database to
reader_concurrency_sempaphore, so each reader_concurrency_sempaphore
instantiated will register its metrics (if its creator asked for it).

Adjust the names given to reader_concurrency_sempaphore so we don't
change the labels.

scylla-gdb is adjusted to support the new names.
2023-12-13 09:16:18 -05:00
Botond Dénes
e1b30f50be reader_concurrency_semaphore: add register_metrics constructor parameter
To be used in the next patch to control whether the semaphore registers
and exports metrics or not. We want to move metric registration to the
semaphore but we don't want all semaphores to export metrics. The
decision on whether a semaphore should or shouldn't export metrics
should be made on a case-by-case basis so this new parameter has no
default value (except for the for_tests constructor).
2023-12-13 06:25:45 -05:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Botond Dénes
804403f618 reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
They is still using the old terminology for permit state names, bring
them up to date with the recent state name changes.
2023-04-19 05:20:42 -04:00
Botond Dénes
89328ce447 reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
It is still using the old terminology for permit state names, bring it
up to date with the recent state name changes.
2023-04-19 05:18:13 -04:00
Botond Dénes
3919effe2d reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
It is still using the old terminology for permit state names, bring it
up to date with the recent state name changes.
2023-04-19 05:17:34 -04:00
Botond Dénes
bd57471e54 reader_concurrency_semaphore: don't evict inactive readers needlessly
Inactive readers should only be evicted to free up resources for waiting
readers. Evicting them when waiters are not admitted for any other
reason than resources is wasteful and leads to extra load later on when
these evicted readers have to be recreated end requeued.
This patch changes the logic on both the registering path and the
admission path to not evict inactive readers unless there are readers
actually waiting on resources.
A unit-test is also added, reproducing the overly-agressive eviction and
checking that it doesn't happen anymore.

Fixes: #11803

Closes #13286
2023-04-13 15:20:18 +03:00
Botond Dénes
156e5d346d reader_permit: keep trace_state pointer on permit
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
2023-03-22 04:58:01 -04:00
Botond Dénes
7b701ac52e reader_concurrency_semaphore: add stats to record reason for queueing permits
When diagnosing problems, knowing why permits were queued is very
valuable. Record the reason in a new stats, one for each reason a permit
can be queued.
2023-03-17 03:15:41 -04:00
Botond Dénes
bb00405818 reader_concurrency_semaphore: can_admit_read(): also return reason for rejection
So caller can bump the appropriate counters or log the reason why the
the request cannot be admitted.
2023-03-17 03:15:40 -04:00
Botond Dénes
4f5657422d reader_concurrency_semaphore: move _permit_list next to the other lists
A mostly cosmetic change. Also add a comment mentioning that this is the
catch-all list.
2023-03-13 08:07:53 -04:00
Botond Dénes
6181c08191 reader_concurrency_semaphore: move inactive_read to .cc
It is not used in the header anymore and moving it to the .cc allows us
to remove the dependency on flat_mutation_reader_v2.hh.
2023-03-13 08:07:53 -04:00
Botond Dénes
e56ec9373d reader_concurrency_semaphore: store permits in _inactive_reads
Add an member of type `inactive_read` to reader permit, and store permit
instances in `_inactive_reads`. This list is now just another intrusive
list the permit can be linked into, depending on its state.
Inactive read handles now just store a reader permit pointer.
2023-03-13 08:07:53 -04:00
Botond Dénes
d11f9efbfe reader_concurrency_semaphore: inactive_read: de-inline more methods
They will soon need to access reader_permit::impl internals, only
available in the .cc file.
2023-03-13 08:07:53 -04:00
Botond Dénes
8e296e8e05 reader_concurrency_semaphore: make _ready_list intrusive
Following the same scheme we used to make the wait lists intrusive.
Permits are added to the ready list intrusive list while waiting to be
executed and moved back to the _permit_list when de-queued from this
list.
We now use a conditional variable for signaling when there are permits
ready to be executed.
2023-03-13 08:07:53 -04:00
Botond Dénes
6229f8b1a6 reader_concurrency_semaphore: make wait lists intrusive
Instead of using expiring_fifo to store queued permits, use the same
intrusive list mechanism we use to keep track of all permits.
Permits are now moved between the _permit_list and the wait queues,
depending on which state they are in. This means _permit_list is now not
the definitive list containing all permits, instead it is the list
containing all permits that are not in a more specialized queue at the
moment.
Code wishing to iterate over all permits should now use
foreach_permits(). For outside code, this was already the only way and
internal users are already patched.
Making the wait lists intrusive allows us to dequeue a permit from any
position, with nothing but a permit reference at hand. It also means
the wait queues don't have any additional memory requirements, other
than the memory for the permit itself.
Timeout while being queued is now handled by the permit's on_timeout()
callback.
2023-03-09 07:11:49 -05:00
Botond Dénes
9ea9a48dbc reader_concurrency_semaphore: move most wait_queue methods out-of-line
They will soon depend on the definition of the reader_permit::impl,
which is only available in the .cc file.
2023-03-09 06:53:11 -05:00
Botond Dénes
1d27dd8f0e reader_concurrency_semaphore: store permits directly in queues
Instead of the `entry` wrapper. In _wait_list and _ready_list, that is.
Data stored in the `entry` wrapper is moved to a new
`reader_permit::auxiliary_data` type. This makes the reader permit
self-sufficient. This in turn prepares the ground for the ability to
de-queue a permit from any queue, with nothing but a permit reference at
hand: no need to have back pointer to wrappers and/or iterators.
2023-03-09 06:53:11 -05:00
Botond Dénes
f5b80fdfd8 reader_concurrency_semaphore: remove redundant waiters() member
There is now a field in stats with the same information, use that.
2023-03-09 06:53:11 -05:00
Botond Dénes
74a5981dbe reader_concurrency_semaphore: add waiters counter
Use it to keep track of all permits that are currently waiting on
something: admission, memory or execution.
Currently we keep track of size, by adding up the result of size() of
the various queues. In future patches we are going to change the queues
such that they will not have constant time size anymore, move to an
explicit counter in preperation to that.
Another change this commit makes is to also include ready list entries
in this counter. Permits in the ready list are also waiters, they wait
to be executed. Soon we will have a separate wait state for this too.
2023-03-09 06:53:11 -05:00
Botond Dénes
23f4e250c2 reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param
This param is from a time when _permit_list was not accessible from the
outside, so it was passed along the semaphore instance to avoid making
the diagnostics methods friends.
To allow the semaphore freedom in how permits are stored, the
diagnostics code is instead made to use foreach_permit(), instead of
accessing the underlying list directly.
As the diagnostics code wants reader_permit::impl& directly, a new
variant of foreach_permit() passing impl references is introduced.
2023-03-09 05:19:59 -05:00
Botond Dénes
59dc15682b reader_concurrency_semaphroe: make foreach_permit() const
It already is conceptually, as it passes const references to the permits
it iterates over. The only reason it wasn't const before is a technical
issue which is solved here with a const_cast.
2023-03-09 05:19:59 -05:00
Botond Dénes
34cdcaffae reader_concurrency_semaphore: un-bless permits when they become inactive
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its blessed status when doing so. This could result
in this permit being re-queued for admission if it was evicted in the
meanwhile, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed

This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.

Fixes: #12603

Closes #12694
2023-02-01 21:02:17 +02:00
Botond Dénes
7f8469db27 reader_concurrency_semaphore: add foreach_permit()
Allows iterating over all permits.
2023-01-17 05:27:04 -05:00
Botond Dénes
4c70b58993 reader_concurrency_semaphore: document the new memory limits 2023-01-17 05:27:04 -05:00
Botond Dénes
edb32cb171 reader_concurrency_semaphore: add OOM killer
When the collective memory consumption of all readers goes above
$kill_limit_multiplier * $memory_limit, consume() will throw
std::bad_alloc(), instantly unwinding the read that is unlucky enough
to have requested the last bytes of memory. This should help situation
where there are some problematic partitions, either because of large
cells or because they are scattered in too many sstables. Currently
nothing prevents such reads from bringing down the entire node via OOM.
2023-01-17 05:27:04 -05:00
Botond Dénes
81e2a2be7d reader_concurrency_semaphore: make consume() and signal() private
Using this API is quite dangerous as any mistakes can lead to leaking
resources from the semaphore. Also, soon we will tie this API closer to
permits, so they won't be as generic. Make them private so we don't have
to worry about correct usage. All external users are patched away
already.
2023-01-17 05:27:04 -05:00
Botond Dénes
8f9e8aafdf reader_concurrency_semaphore: move consume() out-of-line
Its about to get a little bit more complex.
2023-01-17 05:27:04 -05:00
Botond Dénes
9ed5d861be reader_concurrency_semaphore: add request_memory()
A possibly blocking request for more memory. If the collective memory
consumption of all reads goes above
$serialize_limit_multiplier * $memory_limit this request will block for
all but one reader (the first requester). Until this situation is
resolved, that is until memory stays above the above explained limit,
only this one reader is allowed to make progress. This should help reign
in the memory consumption of reads in a situation where their memory
consumption used to baloon without constraints before.
2023-01-17 05:27:04 -05:00
Botond Dénes
969beebe5f reader_concurrency_semaphore: wrap wait list
The wait list will become two lists soon. To keep callers simple (as if
there was still one list) we wrap it with a wrapper which abstracts this
away.
2023-01-16 02:05:27 -05:00
Botond Dénes
8658cfc066 reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters
Propagate the recently added
reader_concurrency_semaphore_{serialize,kill}_limit_multiplier config items
to the semaphore. Not used yet.
2023-01-16 02:05:27 -05:00