Files
scylladb/tools
Piotr Sarna 3d816b7c16 Merge 'Move the reader concurrency semaphore in front of the cache' from Botond
This patchset combines two important changes to the way reader permits
are created and admitted:
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) Currently permits are created before the read is started, but they
only wait for admission when going to the disk. This leaves the
resources consumption of cache and memtables reads unbounded, possibly
leading to OOM (rare but happens). This series changes this that permits
are admitted at the moment they are creating making admission up-front
-- at least those reads that pass admission at all (some don't).

(2) Admission currently is based on availability of resources. We have a
certain amount of memory available, which derived from the memory
available to the shard, as well a hardcoded count resource. Reads are
admitted when a count and a certain amount (base cost) of memory is
available. This patchset adds a new aspect to this admission process
beyond the existing resource availability: the number of used/blocked
reads. Namely it only admits new reads if in addition to the necessary
amount of resources being available, all currently used readers are
blocked. In other words we only admit new reads if all currently
admitted reads requires something other than CPU to progress. They are
either waiting on I/O, a remote shard, or attention from their consumers
(not used currently).

The reason for making these two changes at the same time is that
up-front admission means cache reads now need to obtain a permit too.
For cache reads the optimal concurrency is 1. Anything above that just
increases latency (without increasing throughput). So we want to make sure
that if a cache reader hits it doesn't get any competition for CPU and
it can run to completion. We admit new reads only if the read misses and
has to go to disk.

A side effect of these changes is that the execution stages from the
replica-side read path are replaced with the reader concurrency
semaphore as an execution stage. This is necessary due to bad
interaction between said execution stages and up-front admission. This
has an important consequence: read timeouts are more strictly enforced
because the execution stage doesn't have a timeout so it can execute
already timed-out reads too. This is not the case with the semaphore's
queue which will drop timed-out reads. Another consequence is that, now
data and mutation reads share the same execution stage, which increases
its effectiveness, on the other hand system and user reads don't
anymore.

Fixes: #4758
Fixes: #5718

Tests: unit(dev, release, debug)

* 'reader-concurrency-semaphore-in-front-of-the-cache/v5.3' of https://github.com/denesb/scylla: (54 commits)
  test/boost/reader_concurrency_semaphore_test: add used/blocked test
  test/boost/reader_concurrency_semaphore_test: add admission test
  reader_permit: add operator<< for reader_resources
  reader_concurrency_semaphore: add reads_{admitted,enqueued} stats
  table: make_sstable_reader(): fix indentation
  table: clean up make_sstable_reader()
  database: remove now unused query execution stages
  mutation_reader: remove now unused restricting_reader
  sstables: sstable_set: remove now unused make_restricted_range_sstable_reader()
  reader_permit: remove now unused wait_admission()
  reader_concurrency_semaphore: remove now unused obtain_permit_nowait()
  reader_concurrency_semaphore: admission: flip the switch
  database: increase semaphore max queue size
  test: index_with_paging_test: increase semaphore's queue size
  reader_concurrency_semaphore: add set_max_queue_size()
  test: mutation_reader_test: remove restricted reader tests
  reader_concurrency_semaphore: remove now unused make_permit()
  test: reader_concurrency_semaphore_test: move away from make_permit()
  test: move away from make_permit()
  treewide: use make_tracking_only_permit()
  ...
2021-07-14 16:22:56 +02:00
..
2021-07-12 20:07:00 +02:00
2021-06-18 14:19:29 +03:00