Add the following counters:
(1) querier_cache_lookups
(2) querier_cache_misses
(3) querier_cache_drops
(4) querier_cache_time_based_evictions
(5) querier_cache_resource_based_evictions
(6) querier_cache_memory_based_evictions
(6) querier_cache_population
(1) counts the total number of querier cache lookups. Not all
page-fetches will result in a querier lookup. For example the first page
of a query will not do a lookup as there was no previous page to reuse
the querier from. The second, and all subsequent pages however should
attempt to reuse the querier from the previous page.
(2) counts the subset of (1) where the read have missed the querier
cache (failed to find a matching saved querier).
(3) counts the subset of (1) where the querier was recalled and dropped
immediately. This can happen for example if the querier was at the wrong
position.
(4) counts the cached queriers that were evicted due to their TTL
expiring.
(5) counts the cached queriers that were evicted due to reader-resource
(those limited by reader-concurrency limits) shortage.
(6) counts the cached queriers that were evicted due to reaching the
cache's memory limits (currently set to 4% of the shards' memory).
(7) is the current number of entries in the cache
Note:
* The count of cache hits can be derived from these counters as
(1) - (2).
* cache_drop (3) also implies a cache hit (see above). This means that
the number of actually reused queriers is:
(1) - (2) - (3)
Readers serving user-reads need to obtain a permit to start reading.
There exists a restriction on how much active readers can be admitted
based on their count and their memory onsumption.
Since the saved readers of cached queriers are techically active (they
hold a permit) they can block new readers from obtaining a permit.
New readers have a higher priority because a cached reader might be
abandoned or used later at best so in the face of memory pressure we
evict cached readers to free up permits for new readers.
Cached queriers are evicted in LRU order as the oldest queriers are the
most likely to be evicted based on their TTL anyway.
Use the querier_cache (represented by the passed-in
querier_cache_context) object to lookup saved queriers at the start of
the page and save them at the end of it if it is likely that there will
be more page requests.
"
Adds extension points to schema/sstables to enable hooking in
stuff, like, say, something that modifies how sstable disk io
works. (Cough, cough, *encryption*)
Extensions are processed as property keywords in CQL. To add
an extension, a "module" must register it into the extensions
object on boot time. To avoid globals (and yet don't),
extensions are reachable from config (and thus from db).
Table/view tables already contain an extension element, so
we utilize this to persist config.
schema_tables tables/views from mutations now require a "context"
object (currently only extensions, but abstracted for easier
further changes.
Because of how schemas currently operate, there is a super
lame workaround to allow "schema_registry" access to config
and by extension extensions. DB, upon instansiation, calls
a thread local global "init" in schema_registry and registers
the config. It, in turn, can then call table_from_mutations
as required.
Includes the (modified) patch to encapsulate compression
into objects, mainly because it is nice to encapsulate, and
isolate a little.
"
* 'calle/extensions-v5' of github.com:scylladb/seastar-dev:
extensions: Small unit test
sstables: Process extensions on file open
sstables::types: Add optional extensions attribute to scylla metadata
sstables::disk_types: Add hash and comparator(sstring) to disk_string
schema_tables: Load/save extensions table
cql: Add schema extensions processing to properties
schema_tables: Require context object in schema load path
schema_tables: Add opaque context object
config_file_impl: Remove ostream operators
main/init: Formalize configurables + add extensions to init call
db::config: Add extensions as a config sub-object
db::extensions: Configuration object to store various extensions
cql3::statements::property_definitions: Use std::variant instead of any
sstables: Add extension type for wrapping file io
schema: Add opaque type to represent extensions
sstables::compress/compress: Make compression a virtual object
Make sure idx will not be equal to _control_points.size() (and thus
overflow the vector) when looking for the first control-point with
a backlog not smaller then the current one, by stopping when it's equal
to _control_points.size() - 1.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <47841592792573d820650d570fa1ab7e58bdac2c.1518700405.git.bdenes@scylladb.com>
Before 312bd9ce25, boot had to call all shards for each sstable
such that they would agree/disagree on their deletion, an atomic
deletion manager requirement.
After its removal, we can afford to call only the shards that own
a given sstable.
Reducing the operation on each sstable from (SSTABLES) * (SHARD_COUNT)
to usually (SSTABLES). It may be the same as before after resharding,
but resharding is an one-off operation.
Boot time should be significantly reduced for nodes with a high smp
count and column family using leveled strategy (which can end up with
thousands of sstables).
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180220032554.17776-1-raphaelsc@scylladb.com>
"The motivation is that it's no longer needed after new resharding
algorithm that is the sole responsible for working with shared
sstables and regular compaction will not work with those!
So resharding will schedule deletion of shared sstables once it's
certain that shards that own them have the new unshared sstables.
The manager was needed for orchestrating deletion of shared sstable
across shards. It brings extra complexity that's not longer needed,
and it was also overloading shard 0, but the latter could have
been fixed.
Tests:
- unit: release mode
- dtest: resharding_test.py"
* 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla:
Remove SSTable's atomic deletion manager
Stop using SSTable's atomic deletion manager
database: split column_family::rebuild_sstable_list
The motivation is that it's no longer needed after new resharding
algorithm that is the sole responsible for working with shared
sstables and regular compaction will not work with those!
So resharding will schedule deletion of shared sstables once it's
certain that shards that own them have the new unshared sstables.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The motivation is that resharding will not want the code that is
specific to regular compaction after atomic deletion is removed.
Resharding will eventually only need to replace old tables with
new ones, and it will be in charge of deletion of old tables.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
We have had so far an I/O controller, for compactions and memtables, and
a CPU controller, for memtables only -- since the scheduling was still
quota-based.
Now that the CPU scheduler is fully functional, it is time to do away
with the differences and integrate them both into one. We now have a
memtable controller and a compaction controller, and they control both
CPU and I/O.
In the future, we may want to control processes that don't do one of
them, like cache updates. If that ever happens, we'll try to make
controlling one of them optional. But for now, since the I/O and CPU
controllers for our main two processes would look exactly the same we
should integrate them.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We have merged the I/O controller without this, but we want to integrate
the CPU and I/O controllers into one. Currently, the quota can be
statically set for the CPU controller. For now, until we gain more
experience with it we should allow a static value to override the
controller's output as well.
That is particularly important since we don't yet control some
strategies like LCS and the time-based ones. Users in the field may be
using one of those strategies with a static value for background quota.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
It no longer makes sense now that we have the full scheduler +
controllers. In its lieu, we will provide an option to statically set
the controller's shares as a safe guard against us getting this wrong.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Because execution stages defer and batch processing of the function
they run, they escape their fiber's context and therefore the
scheduling group.
Fix (for data_query) by initializing the execution_stage with the
query scheduling_group. To do that we have to move the execution
stage into the database object, so it has access to the scheduling
group during initialization.
Set up scheduling groups for streaming, compaction, memtable flush, query,
and commitlog.
The background writer scheduling group is retired; it is split into
the memtable flush and compaction groups.
Comments from Glauber:
This patch is based in a patch from Avi with the same subject, but the
differences are signficant enough so that I reset authorship. In
particular:
1) A bug/regression is fixed with the boundary calculations for the
memtable controller sampling function.
2) A leftover is removed, where after flushing a memtable we would
go back to the main group before going to the cache group again
3) As per Tomek's suggestion, now the submission of compactions
themselves are run in the compaction scheduling group. Having that
working is what changes this patch the most: we now store the
scheduling group in the compaction manager and let the compaction
manager itself enforce the scheduling group.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
thread_scheduling_groups are converted to plain scheduling_group. Due to
differences in initialization (scheduling_group initializtion defers), we
create the scheduling_groups in main.cc and propagate them to users via
a new class database_config.
The sstable writer loses its thread_scheduling_group parameter and instead
inherits scheduling from its caller.
Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas,
the flush controller was adjusted to return values within the higher ranges.
Requires "workaround" fix for schema_registry and frozen_mutation, since
the former is a free-float thread local, and the latter is a pure data
carrier. frozen_schema can take a parameter for unfreeze, but schema
registry requires being told which the system extensions are.
Introduce class result_options to carry result options through the
request pipeline, which at this point mean the result type and the
digest algorithm. This class allows us to encapsulate the concrete
digest algorithm to use.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Before this patch, our Materialized Views implementation can produce
incorrect results when given concurrent updates of the same base-table
row. Such concurrent updates may result, in certain cases, in two
different rows added to the view table, instead of just one with the latest
data. In this patch we we add locking which serializes the two conflicting
updates, and solves this problem. The locking for a single base-table
column_family is implemented by the row_locker class introduced in a
previous patch.
A long comment in the code of this patch explains in more detail why
this locking is needed, when, and what types of locks are needed: We
sometimes need to lock a single clustering row, sometimes an entire
partition, sometimes an exclusive lock and sometimes a shared lock.
Fixes#3168
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The labels in database active_reads metrics where not define correctly.
Label should be created so it will be possible to select based on their
value.
The current implementation define a label "class" with three instances:
user, streaming, system.
Fixes: #2770
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20180123125206.23660-1-amnon@scylladb.com>
After the new compaction controller code, the monitor has to be kept
alive until the sstable is added to the SSTable set.
This is correctly handled for all the writers, except the streaming big.
That flusher is a big confusing, as it builds an sstable list first and
only later adds the elements in the list to the sstable set. The
monitors are destroyed at the end of phase 1, so we will SIGSEGV later
when calling add_sstable().
The fix for this is to make sure the lifetime of the monitors are tied
to the lifetime of the sstables being handled big the big streaming
flush process.
Caught by dtests, update_cluster_layout_tests.py:TestUpdateClusterLayout.add_node_with_large_partition3_test
Fixes#3131
Tests: update_cluster_layout_tests.py:TestUpdateClusterLayout.add_node_with_large_partition3_test now passes.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180118202230.17107-1-glauber@scylladb.com>
Timeouts are a global property. However, for tables in keyspaces like
the system keyspace, we don't want to uphold that timeout--in fact, we
wan't no timeout there at all.
We already apply such configuration for requests waiting in the queued
sstable queue: system keyspace requests won't be removed. However, the
storage proxy will insert its own timeouts in those requests, causing
them to fail.
This patch changes the storage proxy read layer so that the timeout is
applied based on the column family configuration, which is in turn
inherited from the keyspace configuration. This matches our usual
way of passing db parameters down.
In terms of implementation, we can either move the timeout inside the
abstract read executor or keep it external. The former is a bit cleaner,
the the latter has the nice property that all executors generated will
share the exact same timeout point. In this patch, we chose the latter.
We are also careful to propagate the timeout information to the replica.
So even if we are talking about the local replica, when we add the
request to the concurrency queue, we will do it in accordance with the
timeout specified by the storage proxy layer.
After this patch, Scylla is able to start just fine with very low
timeouts--since read timeouts in the system keyspace are now ignored.
Fixes#2462
Implementation notes, and general comments about open discussion in 2462:
* Because we are not bypassing the timeout, just setting it high enough,
I consider the concerns about the batchlog moot: if we fail for any
other reason that will be propagated. Last case, because the timeout
is per-CF, we could do what we do for the dirty memory manager and
move the batchlog alone to use a different timeout setting.
* Storage proxy likes specifying its timeouts as a time_point, whereas
when we get low enough as to deal with the read_concurrency_config,
we are talking about deltas. So at some point we need to convert time_points
to durations. We do that in the database query functions.
v2:
- use per-request instead of per-table timeouts.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This patch enables passing a timeout to the restricted_mutation_reader
through the read path interface -- using fill_buffer and friends. This
will serve as a basis for having per-timeout requests.
The config structure still has a timeout, but that is so far only used
to actually pass the value to the query interface. Once that starts
coming from the storage proxy layer (next patch) we will remove.
The query callers are patched so that we pass the timeout down. We patch
the callers in database.cc, but leave the streaming ones alone. That can
be safely done because the default for the query path is now no_timeout,
and that is what the streaming code wants. So there is no need to
complicate the interface to allow for passing a timeout that we intend
to disable.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
In the last patch, we enabled per-request timeouts, we enable timeouts
in fill_buffer. There are many places, though, in which we
fast_forward_to before we fill_buffer, so in order to make that
effective we need to propagate the timeouts to fast_forward_to as well.
In the same way as fill_buffer, we make the argument optional wherever
possible in the high level callers, making them mandatory in the
implementations.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
At the moment, various different subsystems use their different
ideas of what a timeout_clock is. This makes it a bit harder to pass
timeouts between them because although most are actually a lowres_clock,
that is not guaranteed to be the case. As a matter of fact, the timeout
for restricted reads is expressed as nanoseconds, which is not a valid
duration in the lowres_clock.
As a first step towards fixing this, we'll consolidate all of the
existing timeout_clocks in one, now called db::timeout_clock. Other
things that tend to be expressed in terms of that clock--like the fact
that the maximum time_point means no timeout and a semaphore that
wait()s with that resolution are also moved to the common header.
In the upcoming patch we will fix the restricted reader timeouts to
be expressed in terms of the new timeout_clock.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
"This patchset implements the compaction controller for I/O shares. The
goal is to automatic adjust compaction shares based on a
strategy-specific backlog. A higher backlog will translate into higher
shares.
As compaction progresses, that reduces the backlog. As new data is
flushed, that increases the backlog. The goal of the controler is to
keep the backlog constant at a certain rate, so that we don't go neither
too fast or too slow.
Tracking reads and writes:
==========================
Tracking of reads and writes happen through the read_monitor and the
write_monitor. The write monitor is an existing interface that has the
purpose of releasing the write permit at particular points of the write
process. We enhance it so to get a reference to an instance that tracks
the current offset inside the sstables::file_writer. This way the
backlog tracker can always know for sure what's the offset of the
current write.
A similar thing is done for reads. The data_consumer already tracks the
position of the current read, and we isolate that into a structure to
which we can get a reference. A read_monitor allows us to connect the
compaction to that reference.
Lifetime management:
====================
In general, tracking objects will be owned by their callers and passed
down as references. The compaction object will own the read monitors and
the compaction write monitors and the memtable flush write monitor will
be kept alive in a do_with block around the flush itself.
The backlog_{write,read}_progress_manager needs to be kept alive until
the SSTable is no longer in progress. For writes, that means until we
are able to add the SSTable charges in full, and for reads (compaction)
that means until we are able to remove the charges in full.
It is important to do that to avoid spikes in the graph. If we remove
the progress managers in a different operation than updating the SSTable
list we will be left in a temporary state where charges appear or
disappear abruptly, to be fixed when the final
add_sstable/remove_sstable happens. So we want those things to happen
together.
The compaction_backlog_tracker is kept alive until the strategy changes,
for example, through ALTER TABLE. Current charges are transferred to the
new strategy's compaction_backlog_tracker object when we do that. If the
type of strategy changes, the current read charges are forgotten. We can
do that because those running compaction will not really contribute to
decrease the backlog of the new compaction strategy.
Tranfer of Charges
==================
When ALTER TABLE happens, we need to transfer ongoing writes to the new
backlog manager. Ongoing reads will still be tracked by the
backlog_manager that originated them.
The rationale for that is that reads still belong to the current
compaction, with the strategy that generated them. But new Tables being
written will add to the backlog of the new strategy.
Note that ALTER TABLE operations not necessarily cause a change of
Strategy. We can be using the same strategy but just changing
properties. If that is the case, we expect no discontinuity in the
backlog graph (tested).
Resharding
==========
Resharding compactions are more complex than normal compactions because
the SSTables are created in one shard and later sent to another shard.
It is better, then, to track resharding compactions separately and let
them have their own backlog tracker, which will insert backlog in
proportion to the amount of data to be resharded.
Memtable Flush I/O Controller
=============================
With the current infrastructure it becomes trivial to add a new
controller, for either I/O or CPU. This patchset then adds an I/O
controller for memtable flushes, using the same backlog algorithm that
we already used for CPU."
* 'compaction-controller-io-v5' of github.com:glommer/scylla:
database: add a controller for I/O on memtable flushes.
document the compaction controller
compaction: adjust shares for compactions
backlog_controllers: implement generic I/O controller
factor out some of the controller code
io shares: multiply all shares by 10
compaction_strategy: implement backlog manager for the SizeTiered strategy
infrastructure for backlog estimator for compaction work.
sstables: notify about end of data component write
sstables: add read_monitor_generator
sstables: add read_monitor
sstables: enhance data consumer with a position tracker
sstables: enhance the file_writer with an offset tracker
sstables: pass references instead of pointers for write_monitor
compaction: control destruction of readers
The algorithm and principle of operation is the same as the CPU
controller. It is, however, always enabled and we will operate on
I/O shares.
I/O-bound workloads are expected to hit the maximum once virtual
dirty fills up and stay there while the load is steady.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Compactions can be a heavy disk user and the I/O scheduler can always
guarantee that it uses its fair share of disk.
Such fair share can, however, be a lot more than what compaction indeed
need. This patch draws on the controllers infrastructure to adjust the
I/O shares that the compaction class will get so that compaction
bandwidth is dynamically adjusted.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The control algorithm we are using for memtables have proven itself
quite successful. We will very likely use the same for other processes,
like compactions.
Make the code a bit more generic, so that a new controller has to only
set the desired parameters
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The issue is triggered by compaction of sstables of level higher than 0.
The problem happens when interval map of partitioned sstable set stores
intervals such as follow:
[-9223362900961284625 : -3695961740249769322 ]
(-3695961740249769322 : -3695961103022958562 ]
When selector is called for first interval above, the exclusive lower
bound of the second interval is returned as next token, but the
inclusivess info is not returned.
So reader_selector was returning that there *were* new readers when
the current token was -3695961740249769322 because it was stored in
selector position field as inclusive, but it's actually exclusive.
This false positive was leading to infinite recursion in combined
reader because sstable set's incremental selector itself knew that
there were actually *no* new readers, and therefore *no* progress
could be made.
Fix is to use ring_position in reader_selector, such that
inclusiveness would be respected.
So reader_selector::has_new_readers() won't return false positive
under the conditions described above.
Fixes#2908.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This patch adds infrastucture in various points in the system to allow
us to determine the amount of work present as backlog from compactions.
What needs to be done can be explained in three major pieces:
1) Add hooks in the points where sstables are added or inserted to a
column family (or more precisely, to a compaction_strategy object).
2) Add hooks in reads and write monitors that allows a compaction
backlog estimator (tracker) to become aware of bytes that are
partially written and compacted away.
3) Add a per-column family class (compaction_backlog_tracker) that
can be used to track work that is done and relevant to compactions
(like the two above), and a compaction manager to provide a
system-wide backlog based on the response of the individual trackers.
The definition of how much backlog one has is strategy-specific. The
Null strategy is easy, as it never really has any backlog, and so is the
major strategy - since what it really matters is the backlog of the
underlying compaction strategy.
Although backlogs are strategy-specific, they should be "compatible", in
the sense that if a particular strategy has more work to do, it should
yield a higher number than its counterparts.
All the others are presented in this patch as unimplemented: they will
always advertise a mild backlog that should yield a constant
CPU-utilization if used alone.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We need to notify the monitor that the offset tracker that we are using is
about to be destroyed and will no longer be valid.
While we could modify the file_writer interface so that we could capture
the offset_tracker and take ownership of it - guaranteeing it is alive
until we reach the existing on_write_completed(), this feels like a
layer violation.
It is also potentially useful in general to offer the monitor callers
with knowledge that writing the data portion is done.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Passing the read monitor down to the sstable readers is tricky. The
point of interest - like compaction - are usually very far from the
interfaces that register the monitor, like read_rows. Between the two,
there is usually a mutation_reader, which is and ought to be totally
unaware of the read monitor: technically, a mutation_reader may not even
know it is backed by sstables.
The solution is to create a read_monitor_generator, that can be passed
from the upper layers, like compaction, to the layers that are actually
making the decision of which sstables to create readers for.
Note that we don't need an equivalent piece of infrastructure for
writes, because writes don't happen through hidden layers and have all
the information they need to initialize their monitors.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Callers, like the memtable flusher or compactions will be able to find
out the current amount of bytes written at any time.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This came from Avi's review on the read_monitors. He suggests we
wouldn't keep shared pointers, and would instead have the caller
ensuring lifetime. That makes sense, but having the writer interface
using shared_ptr and the read interface using references would lead to
an inconsistent interface.
For the sake of consistency we will change the write monitor to take
references before we do that. From database.cc's perspective, we could
now keep the monitors in a do_with() block, but we will keep the
shared_ptrs to manage their lifetime in anticipation of upcoming patches
in this series, where we'll have to pass them somewhere else.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
"When we get two range tombstones with the same lower bound from
different data sources (e.g. two sstable), which need to be combined
into a single stream, they need to be de-overlapped, because each
mutation fragment in the stream must have a different position. If we
have range tombstones [1, 10) and [1, 20), the result of that
de-overlapping will be [1, 10) and [10, 20]. The problem is that if
the stream corresponds to a clustering slice with upper bound greater
than 1, but lower than 10, the second range tombstone would appear as
being out of the query range. This is currently violating assumptions
made by some consumers, like cache populator.
One effect of this may be that a reader will miss rows which are in
the range (1, 10) (after the start of the first range tombstone, and
before the start of the second range tombstone), if the second range
tombstone happens to be the last fragment which was read for a
discontinuous range in cache and we stopped reading at that point
because of a full buffer and cache was evicted before we resumed
reading, so we went to reading from the sstable reader again. There
could be more cases in which this violation may resurface.
There is also a related bug in mutation_fragment_merger. If the reader
is in forwarding mode, and the current range is [1, 5], the reader
would still emit range_tombstone([10, 20]). If that reader is later
fast forwarded to another range, say [6, 8], it may produce fragments
with smaller positions which were emitted before, violating
monotonicity of fragment positions in the stream.
A similar bug was also present in partition_snapshot_flat_reader.
Possible solutions:
1) relax the assumption (in cache) that streams contain only relevant
range tombstones, and only require that they contain at least all
relevant tombstones
2) allow subsequent range tombstones in a stream to share the same
starting position (position is weakly monotonic), then we don't need
to de-overlap the tombstones in readers.
3) teach combining readers about query restrictions so that they can drop
fragments which fall outside the range
4) force leaf readers to trim all range tombstones to query restrictions
This patch implements solution no 2. It simplifies combining readers,
which don't need to accumulate and trim range tombstones.
I don't like solution 3, because it makes combining readers more
complicated, slower, and harder to properly construct (currently
combining readers don't need to know restrictions of the leaf
streams).
Solution 4 is confined to implementations of leaf readers, but also
has disadvantage of making those more complicated and slower.
There is only one consumer which needs the tombstones with monotonic positions, and
that is the sstable writer.
Fixes #3093."
* tag 'tgrabiec/fix-out-of-range-tombstones-v1' of github.com:scylladb/seastar-dev:
tests: row_cache: Introduce test for concurrent read, population and eviction
tests: sstables: Add test for writing combined stream with range tombstones at same position
tests: memtable: Test that combined mutation source is a mutation source
tests: memtable: Test that memtable with many versions is a mutation source
tests: mutation_source: Add test for stream invariants with overlapping tombstones
tests: mutation_reader: Test fast forwarding of combined reader with overlapping range tombstones
tests: mutation_reader: Test combined reader slicing on random mutations
tests: mutation_source_test: Extract random_mutation_generator::make_partition_keys()
mutation_fragment: Introduce range()
clustering_interval_set: Introduce overlaps()
clustering_interval_set: Extract private make_interval()
mutation_reader: Allow range tombstones with same position in the fragment stream
sstables: Handle consecutive range_tombstone fragments with same position
tests: streamed_mutation_assertions: Merge range_tombstones with the same position in produces_range_tombstone()
streamed_mutation: Introduce peek()
mutation_fragment: Extract mergeable_with()
mutation_reader: Move definition of combining mutation reader to source file
mutation_reader: Use make_combined_reader() to create combined reader