Files
scylladb/docs/dev/reader-concurrency-semaphore.md
Botond Dénes f017e9f1c6 docs: document the reader concurrency semaphore diagnostics dump
The diagnostics dumped by the reader concurrency semaphore are pretty
common-sight in logs, as soon as a node becomes problematic. The reason
is that the reader concurrency semaphore acts as the canary in the coal
mine: it is the first that starts screaming when the node or workload is
unhealthy. This patch adds documentation of the content of the
diagnostics and how to diagnose common problems based on it.

Fixes: #10471

Closes #11970
2022-12-06 16:24:44 +02:00

4.5 KiB

Reader concurrency semaphore

The role of the reader concurrency semaphore is to keep resource consumption of reads under a certain limit. Each read has to obtain a permit before it is started. Permits are only issued when there are available resources to start a new read. For more details on its API, check reader_concurrency_semaphore.hh.

There is a separate reader concurrency for each scheduling group:

  • statement (user reads) - 100 count and 2% of shard memory (queue size: 2% memory / 1KB)
  • system (internal reads) - 10 count and 2% of shard memory (no queue limit)
  • streaming (maintenance operations) - 10 count and 2% of shard memory (no queue limit)

There are 3 main ways to create permits:

  • obtain_permit() - this is the most generic way to obtain a permit. The method creates a permit, waits for admission (if necessary) and then returns the permit to be used.
  • with_permit() - the permit is created and then waits for admission as with obtain_permit(). But instead of returning the admitted permit, this method runs the functor passed in as its func parameter once the permit is admitted. This facilitates batch-running cache reads. If a permit is already available (saved paged read resuming), with_ready_permit() can be used to benefit of the batching.
  • make_tracking_only_permit() - make a permit that bypasses admission and is only used to keep track of the memory consumption of a read. Used in places that don't want to wait for admission.

A permit is admitted if the following conditions are met:

  • There are enough resources to admit the read. Currently, each permit takes 1 count resource and 128K memory resource on admission.
  • There are no reads which currently only need CPU to make further progress. Permits can opt-in to participate in this criteria (block other permits from being admitted, while they need more CPU) by being marked as "used".

Reader concurrency semaphore diagnostic dumps

When a read waiting to obtain a permit times out, or if the wait queue of the reader concurrency semaphore overflows, the reader concurrency semaphore will dump diagnostics to the logs, with the aim of helping users to diagnose the problem. Example diagnostics dump:

[shard 1] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 35/100 count and 14858525/209715200 memory resources: timed out, dumping permit diagnostics:
permits count   memory  table/description/state
34  34  14M ks1.table1_mv_0/data-query/active/blocked
1   1   16K ks1.table1_mv_0/data-query/active/used
7   0   0B  ks1.table1/data-query/waiting
1251    0   0B  ks1.table1_mv_0/data-query/waiting

1293    35  14M total

Total: 1293 permits with 35 count and 14M memory resources

Note that the diagnostics dump logging is rate limited to 1 in 30 seconds (as timeouts usually come in bursts). You might also see a message to this effect.

The dump contains the following information:

  • The semaphore's name: _read_concurrency_sem;
  • Currently used count resources: 35;
  • Limit of count resources: 100;
  • Currently used memory resources: 14858525;
  • Limit of memory resources: 209715200;
  • Dump of the permit states;

Permits are grouped by table, description, and state, while groups are sorted by memory consumption. The first group in this example contains 34 permits, all for reads against table ks1.table1_mv_0, all data-query reads and in state active/blocked.

Permits have the following states:

  • waiting - the permit is waiting for admission;
  • active/unused - the permit was admitted but doesn't participate in CPU based admission;
  • active/used - the permit was admitted and it participates in CPU based admission;
  • active/blocked - a previously active/used permit, which needs something other than CPU to proceed, it is waiting on I/O or a remote shards;
  • inactive - the read was marked inactive, it can be evicted to make room for admitting more permits if needed;
  • evicted - the read was inactive and then evicted;

The dump can reveal what the bottleneck holding up the reads is:

  • CPU - there will be one active/used permit (there might be active/blocked and active/unused permits too), both count and memory resources are available (not maxed out);
  • Disk - count resource is maxed out by active/blocked permits using up all count resources;
  • Memory - memory resource is maxed out (usually even above the limit);

There might be inactive reads if CPU is a bottleneck; otherwise, there shouldn't be any (they should be evicted to free up resources).