Commit Graph

229 Commits

Author SHA1 Message Date
Avi Kivity
4082f57edc Merge 'Make commitlog disk limit a hard limit.' from Calle Wilund
Refs #6148

Commitlog disk limit was previously a "soft" limit, in that we allowed allocating new segments, even if we were over
disk usage max. This would also cause us sometimes to create new segments and delete old ones, if badly timed in
needing and releasing segments, in turn causing useless disk IO for pre-allocation/zeroing.

This patch set does:
* Make limit a hard limit. If we have disk usage > max, we wait for delete or recycle.
* Make flush threshold configurable. Default is ask for flush when over 50% usage. (We do not wait for results)
* Make flush "partial". We flush X% of the used space (used - thres/2), and make the rp limit accordingly. This means we will try to clear the N oldest segments, not all. I.e. "lighter" flush. Of course, if the CL is wholly dominated by a single CF, this will not really help much. But when > 1 cf is used, it means we can skip those not having unflushed data < req rp.
* Force more eager flush/recycle if we're out of segments

Note: flush threshold is not exposed in scylla config (yet). Because I am unsure of wording, and even if it should.
Note: testing is sparse, esp. in regard to latency/timeouts added in high usage scenarios. While I can fairly easily provoke "stalls" (i.e. forced waiting for segments to free up) with simple C-S, it is hard to say exactly where in a more sane config (I set my limits looow) latencies will start accumulating.

Closes #7879

* github.com:scylladb/scylla:
  commitlog: Force earlier cycle/flush iff segment reserve is empty
  commitlog: Make segment allocation wait iff disk usage > max
  commitlog: Do partial (memtable) flushing based on threshold
  commitlog: Make flush threshold configurable
  table: Add a flush RP mark to table, and shortcut if not above
2021-02-08 16:44:05 +02:00
Raphael S. Carvalho
e1261d10f1 table: Avoid useless allocations when updating cache on memtable flush completion
we're unconditionally using make_combined_mutation_source(), which causes extra
allocations, even if memtable was flushed into a single sstable, which is the
most common case. memtable will only be flushed into more than one sstable if
TWCS is used and memtable had old data written into it due to out-of-order
writes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210205182028.439948-1-raphaelsc@scylladb.com>
2021-02-06 20:03:33 +02:00
Benny Halevy
22f6023ac3 sstables: sstable_writer_config: add origin member
Add a string describing where the sstables originated
from (e.g. memtable, repair, streaming, compaction, etc.)

If configure_writer is called with a nullptr, the origin
will be equal to an empty string.

Introduce test_env_sstables_manager that provides an overload
of configure_writer with no parmeters that calls the base-class'
configure_writer with "test" origin.  This was to reduce the
code churn in this patch and to keep the tests simple.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-01 16:45:52 +02:00
Botond Dénes
080bc2ffec sstables: pass partition_range to create_single_key_sstable_reader()
We want to unify the various sstable reader creation methods and this
method taking a ring position instead of a partition range like
everybody else stands in the way of that.

This is effect reverts 68663d0de.
2021-01-27 17:38:14 +02:00
Avi Kivity
df3ef800c2 Merge 'Introduce load and stream feature' from Asias He
storage_service: Introduce load_and_stream

=== Introduction ===

This feature extends the nodetool refresh to allow loading arbitrary sstables
that do not belong to a node into the cluster. It loads the sstables from disk
and calculates the owning nodes of the data and streams to the owners
automatically.

From example, say the old cluster has 6 nodes and the new cluster has 3 nodes.
We can copy the sstables from the old cluster to any of the new nodes and
trigger the load and stream process.

This can make restores and migrations much easier.

=== Performance ===

I managed to get 40MB/s per shard on my build machine.
CPU: AMD Ryzen 7 1800X Eight-Core Processor
DISK: Samsung SSD 970 PRO 512GB

Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32
shards, we can finish the load and stream 1TB of data in 13 mins on each
node.

1TB / 40 MB per shard * 32 shard / 60 s = 13 mins

=== Tests ===

backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test
which creates a cluster with 4 nodes and inserts data, then use
load_and_stream to restore to a 2 nodes cluster.

=== Usage ===

curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true

=== Notes ===

Btw, with the old nodetool refresh, the node will not pick up the data
that does not belong to this node but it will not delete it either. One
has to run nodetool cleanup to remove those data manually which is a
surprise to me and probably to users as well. With load and stream, the
process will delete the sstables once it finishes stream, so no nodetool
cleanup is needed.

The name of this feature load and stream follows load and store in CPU world.

Fixes #7831

Closes #7846

* github.com:scylladb/scylla:
  storage_service: Introduce load_and_stream
  distributed_loader: Add get_sstables_from_upload_dir
  table: Add make_streaming_reader for given sstables set
2021-01-18 15:08:19 +02:00
Raphael S. Carvalho
00c29e1e24 table: Move notify_bootstrap_or_replace_*() out of line
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210117045747.69891-9-raphaelsc@scylladb.com>
2021-01-17 10:36:13 +02:00
Calle Wilund
c3d95811da table: Add a flush RP mark to table, and shortcut if not above
Adds a second RP to table, marking where we flushed last.
If a new flush request comes in that is below this mark, we
can skip a second flush.

This is to (in future) support incremental CL flush.
2021-01-05 18:16:09 +00:00
Raphael S. Carvalho
9124a708f1 table: Wire interposer consumer for memtable flush
From now on, memtable flush will use the strategy's interposer consumer
iff split_during_flush is enabled (disabled by default).
It has effect only for TWCS users as TWCS it's the only strategy that
goes on to implement this interposer consumer, which consists of
segregating data according to the window configuration.

Fixes #4617.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-01-04 16:26:07 -03:00
Raphael S. Carvalho
c926a948e5 table: Add write_memtable_to_sstable variant which accepts flat_mutation_reader
This new variant will be needed for interposer consumer.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-01-04 16:23:00 -03:00
Raphael S. Carvalho
32acb44fec table: Allow sstable write permit to be shared across monitors
As a preparation for interposer on flush, let's allow database write monitor
to store a shared sstable write permit, which will be released as soon as
any of the sstable writers reach the sealing stage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-01-04 14:46:43 -03:00
Raphael S. Carvalho
5519fdba72 table: Extend cache update to operate a memtable split into multiple sstables
This extension is needed for future work where a memtable will be segregated
during flush into one sstable or more. So now multiple sstables can be added
to the set after a memtable flush, and compaction is only triggered at the
end.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-01-04 13:24:10 -03:00
Asias He
84f482bde4 table: Add make_streaming_reader for given sstables set
Add a streaming reader that streams from a given sstables set.

Refs #7831
2020-12-30 08:32:42 +08:00
Raphael S. Carvalho
8dd7280107 table: Fix potential reactor stall on LCS compaction completion
On every compaction completion, sstable set is rebuilt from scratch.
With LCS and ~160G of data per shard, it means we'll have to create
a new sstable set with ~1000 entries whenever compaction completes,
which will likely result in reactor stalling for a significant
amount of time.

This is fixed by futurizing build_new_sstable_list(), so it will
yield whenever needed.

Fixes #7758.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-12-28 13:17:50 -03:00
Raphael S. Carvalho
6082da4703 table: decouple preparation from execution when updating sstable set
row cache now allows updater to first prepare the work, and then execute
the update atomically as the last step. let's do that when rebuilding
the set, so now new set is created in the preparation phase, and the
new set replaces the old one in the execution phase, satisfying the
atomicity requirement of row cache.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-12-28 13:17:48 -03:00
Raphael S. Carvalho
43f0200b8f table: change rebuild_sstable_list to return new sstable set
procedure is changed to return the new set, so caller will be responsible
for replacing the old set with the new one. this will allow our future
work where building new set and enabling it will be decoupled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-12-28 13:17:47 -03:00
Raphael S. Carvalho
198b87503f row_cache: allow external updater to decouple preparation from execution
External updater may do some preparatory work like constructing a new sstable list,
and at the end atomically replace the old list by the new one.

Decoupling the preparation from execution will give us the following benefits:
- the preparation step can now yield if needed to avoid reactor stalls, as it's
been futurized.
- the execution step will now be able to provide strong exception guarantees, as
it's now decoupled from the preparation step which can be non-exception-safe.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-12-28 13:17:45 -03:00
Calle Wilund
71c5dc82df database: Verify iff we actually are writing memtables to disk in truncate
Fixes #7732

When truncating with auto_snapshot on, we try to verify the low rp mark
from the CF against the sstables discarded by the truncation timestamp.
However, in a scenario like:

Fill memtables
Flush
Truncate with snapshot A
Fill memtables some more
Truncate
Move snapshot A to upload + refresh (load old tables)
Truncate

The last op will assert, because while we have sstables loaded, which
will be discarded now, we did not in fact generate any _new_ ones
(since memtables are empty), and the RP we get back from discard is
one from an earlier generation set.

(Any permutation of events that create the situation "empty memtable" +
"non-empty sstables with only old tables" will generate the same error).

Added a check that before flushing checks if we actually have any
data, and if not, does not uphold the RP relation assert.

Closes #7799
2020-12-15 16:24:36 +02:00
Piotr Sarna
cd1e351dc1 table: unify waiting for pending operations
In order to reduce code duplication which already caused a bug,
waiting for pending operations is now unified with a single helper
function.
2020-12-15 13:11:25 +01:00
Piotr Sarna
df3204426d database: add a phaser for flush operations
Pending flushes can participate in races when a table
with auto_snapshot==false is dropped. The race is as follows:
1. A flush of table T is initiated
2. The flush operation is preempted
3. Table T is dropped without flushing, because it has auto_snapshot off
4. The flush operation from (2.) wakes up and continues
   working on table T, which is already dropped
5. Segfault/memory corruption

To prevent such races, a phaser for pending flushes is introduced
2020-12-15 12:59:36 +01:00
Avi Kivity
f802356572 Revert "Revert "Merge "raft: fix replication if existing log on leader" from Gleb""
This reverts commit dc77d128e9. It was reverted
due to a strange and unexplained diff, which is now explained. The
HEAD on the working directory being pulled from was set back, so git
thought it was merging the intended commits, plus all the work that was
committed from HEAD to master. So it is safe to restore it.
2020-12-08 19:19:55 +02:00
Avi Kivity
dc77d128e9 Revert "Merge "raft: fix replication if existing log on leader" from Gleb"
This reverts commit 0aa1f7c70a, reversing
changes made to 72c59e8000. The diff is
strange, including unrelated commits. There is no understanding of the
cause, so to be safe, revert and try again.
2020-12-06 11:34:19 +02:00
Kamil Braun
68663d0de0 sstables: pass ring_position to create_single_key_sstable_reader
instead of partition_range.

It would be best to pass `partition_key` or `decorated_key` here.
However, the implementation of this function needs a `partition_range`
to pass into `sstable_set::select`, and `partition_range` must be
constructed from `ring_position`s. We could create the `ring_position`
internally from the key but that would involve a copy which we want to
avoid.
2020-11-23 12:33:24 +01:00
Kamil Braun
40d8bfa394 sstables: move sstable reader creation functions to sstable_set
Lower level functions such as `create_single_key_sstable_reader`
were made methods of `sstable_set`.

The motivation is that each concrete sstable_set
may decide to use a better sstable reading algorithm specific to the
data structures used by this sstable_set. For this it needs to access
the set's internals.

A nice side effect is that we moved some code out of table.cc
and database.hh which are huge files.
2020-11-19 17:52:39 +01:00
Avi Kivity
5d45662804 database, streaming: remove remnants of memtable-base streaming
Commit e5be3352cf ("database, streaming, messaging: drop
streaming memtables") removed streaming memtables; this removes
the mechanisms to synchronize them: _streaming_flush_gate and
_streaming_flush_phaser. The memory manager for streaming is removed,
and its 10% reserve is evenly distributed between memtables and
general use (e.g. cache).

Note that _streaming_flush_phaser and _streaming_flush_date are
no longer used to syncrhonize anything - the gate is only used
to protect the phaser, and the phaser isn't used for anything.

Closes #7454
2020-11-16 14:32:19 +01:00
Michał Chojnowski
1eb19976b9 database: make changes to durable_writes effective immediately
Users can change `durable_writes` anytime with ALTER KEYSPACE.
Cassandra reads the value of `durable_writes` every time when applying
a mutation, so changes to that setting take effect immediately. That is,
mutations are added to the commitlog only when `durable_writes` is `true`
at the moment of their application.
Scylla reads the value of `durable_writes` only at `keyspace` construction time,
so changes to that setting take effect only after Scylla is restarted.
This patch fixes the inconsistency.

Fixes #3034

Closes #7533
2020-11-06 17:53:22 +01:00
Benny Halevy
82aabab054 table: get rid of reshuffle_sstables
It is unused since 7351db7cab

Refs #6950

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201026074914.34721-1-bhalevy@scylladb.com>
2020-10-26 09:50:21 +02:00
Benny Halevy
70219b423f table: add_sstable: provide strong exception guarantees
Do not leave side-effects on nexception.

Fixes #6658

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201020145429.19426-1-bhalevy@scylladb.com>
2020-10-21 11:40:03 +03:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Avi Kivity
4d6739c2e6 Merge "Use max_concurrent_for_each" from Benny
"
max_concurrent_for_each was added to seastar for replacing
sstable_directory::parallel_for_each_restricted by using
more efficient concurrency control that doesn't create
unlimited number of continuations.

The series replaces the use of sstable_directory::parallel_for_each_restricted
with max_concurrent_for_each and exposes the sstable_directory::do_for_each_sstable
via a static method.

This method is used here by table::snapshot to limit concurrency
do snapshot operations that suffer from the same unbound
concurrency problem sstable_directory solved.

In addition sstable_directory::_load_semaphore that was used
across calls to do_for_each_sstable was replaced by a static per-shard
semaphore that caps concurrency across all calls to `do_for_each_sstable`
on that shard.  This makes sense since the disk is a shared resource.

In the future, we may want to have a load semaphore per device rather than
a single global one.  We should experiment with that.

Test: unit(dev)
"

* tag 'max_concurrent_for_each-v5' of github.com:bhalevy/scylla:
  table: snapshot: use max_concurrent_for_each
  sstable_directory: use a external load_semaphore
  test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory
  distributed_loader: process_upload_dir: use initial_sstable_loading_concurrency
  sstables: sstable_directory: use max_concurrent_for_each
2020-10-12 09:43:12 +03:00
Benny Halevy
1ba9e253c4 table: snapshot: use max_concurrent_for_each
Tables may have thousands of sstables and a number of component files
for each sstables.  Using parallel_for_each on all sstables (and
parallel_for_each in sstables::create_links for each file)
needlessly overloads the system with unbounded number of continuations.

Use max_concurrent_for_each and acquire the db sst_dir_semaphore to
limit parallelism.

Note that although snapshot is called while scylla already
loaded the sstable we use the configured initial_sstable_loading_concurrency().

As a future follow-up we may want to define yet another config
variable for on-going operations on sstable directories
if we see that it warrants a diffrent setting than the initial
loading concurrency.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-08 11:57:06 +03:00
Avi Kivity
0ef85a102f table: fix mishandled _sstable_deleted_gate exception in on_compaction_completion
on_compaction_completion tries to handle a gate_closed_exception, but
with_gate() throws rather than creating an exceptional future, so
the extra handling is lost. This is relatively benign since it will
just fail the compaction, requiring that work to be redone later.

Fix by using the safer try_with_gate().
2020-10-06 08:31:28 +03:00
Avi Kivity
a43d5079f3 table: fix on_compaction_completion corrupting _sstables_compacted_but_not_deleted during self-race
on_compaction_completion() updates _sstables_compacted_but_not_deleted
through a temporary to avoid an exception causing a partial update:

  1. copy _sstables_compacted_but_not_deleted to a temporary
  2. update temporary
  3. do dangerous stuff
  4. move temporary to _sstables_compacted_but_not_deleted

This is racy when we have parallel compactions, since step 3 yields.
We can have two invocations running in parallel, taking snapshots
of the same _sstables_compacted_but_not_deleted in step 1, each
modifying it in different ways, and only one of them winning the
race and assigning in step 4. With the right timing we can end
with extra sstables in _sstables_compacted_but_not_deleted.

Before a5369881b3, this was a benign race (only resulting in
deleted file space not being reclaimed until the service is shut
down), but afterwards, extra sstable references result in the service
refusing to shut down. This was observed in database_test in debug
mode, where the race more or less reliably happens for system.truncated.

Fix by using a different method to protect
_sstables_compacted_but_not_deleted. We unconditionally update it,
and also unconditionally fix it up (on success or failure) using
seastar::defer(). The fixup includes a call to rebuild_statistics()
which must happen every time we touch the sstable list.

Fixes #7331.
2020-10-06 08:29:34 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Avi Kivity
88ea02bfeb table: clear sstable set when stopping
Drop references to a table's sstables when stopping it, so that
the sstable_manager can start deleting it. This includes staging
sstables.

Although the table is no longer in use at this time, maintain
cache synchronity by calling row_cache::invalidate() (this also
has the benefit of avoiding a stall in row_cache's destructor).
We also refresh the cache's view of the sstable set to drop
the cache's references.
2020-09-23 20:55:05 +03:00
Avi Kivity
9932e6a899 table: prevent table::stop() race with table::query()
Take the gate in table::query() so that stop() waits for queries. The gate
is already waited for in table::stop().

This allows us to know we are no longer using the table's sstables in table::stop().
2020-09-23 20:55:05 +03:00
Avi Kivity
64c7c81bac Merge "Update log messages to {fmt} rules" from Pavel E
"
Before seastar is updated with the {fmt} engine under the
logging hood, some changes are to be made in scylla to
conform to {fmt} standards.

Compilation and tests checked against both -- old (current)
and new seastar-s.

tests: unit(dev), manual
"

* 'br-logging-update' of https://github.com/xemul/scylla:
  code: Force formatting of pointer in .debug and .trace
  code: Format { and } as {fmt} needs
  streaming: Do not reveal raw pointer in info message
  mp_row_consumer: Provide hex-formatting wrapper for bytes_view
  heat_load_balance: Include fmt/ranges.h
2020-09-03 15:10:09 +03:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Tomasz Grabiec
cf12b5e537 db: view: Refactor view_info::initialize_base_dependent_fields()
It is no longer called once for a given view_info, so the name
"initialize" is not appropriate.

This patch splits the "initialize" method into the "make" part, which
makes a new base_info object, and the "set" part, which changes the
current base_info object attached to the view.
2020-08-20 14:53:07 +02:00
Tomasz Grabiec
f8df214836 db: view: Fix incorrect schema access during view building after base table schema changes
The view building process was accessing mutation fragments using
current table's schema. This is not correct, fragments must be
accessed using the schema of the generating reader.

This could lead to undefined behavior when the column set of the base
table changes. out_of_range exceptions could be observed, or data in
the view ending up in the wrong column.

Refs #7061.

The fix has two parts. First, we always use the reader's schema to
access fragments generated by the reader.

Second, when calling populate_views() we upgrade the fragment-wrapping
reader's schema to the base table schema so that it matches the base
table schema of view_and_base snapshots passed to populate_views().
2020-08-20 14:53:07 +02:00
Tomasz Grabiec
3a6ec9933c db: views: Fix undefined behavior on base table schema changes
The view_info object, which is attached to the schema object of the
view, contains a data structure called
"base_non_pk_columns_in_view_pk". This data structure contains column
ids of the base table so is valid only for a particular version of the
base table schema. This data structure is used by materialized view
code to interpret mutations of the base table, those coming from base
table writes, or reads of the base table done as part of view updates
or view building.

The base table schema version of that data structure must match the
schema version of the mutation fragments, otherwise we hit undefined
behavior. This may include aborts, exceptions, segfaults, or data
corruption (e.g. writes landing in the wrong column in the view).

Before this patch, we could get schema version mismatch here after the
base table was altered. That's because the view schema does not change
when the base table is altered.

Part of the fix is to extract base_non_pk_columns_in_view_pk into a
third entitiy called base_dependent_view_info, which changes both on
base table schema changes and view schema changes.

It is managed by a shared pointer so that we can take immutable
snapshots of it, just like with schema_ptr. When starting the view
update, the base table schema_ptr and the corresponding
base_dependent_view_info have to match. So we must obtain them
atomically, and base_dependent_view_info cannot change during update.

Also, whenever the base table schema changes, we must update
base_dependent_view_infos of all attached views (atomically) so that
it matches the base table schema.

Refs #7061.
2020-08-20 14:53:07 +02:00
Botond Dénes
78f94ba36a table: get_sstables_by_partition_key(): don't make a copy of selected sstables
Currently we assign the reference to the vector of selected sstables to
`auto sst`. This makes a copy and we pass this local variable to
`do_for_each()`, which will result in a use-after-free if the latter
defers.
Fix by not making a copy and instead just keep the reference.

Fixes: #7060

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200818091241.2341332-1-bdenes@scylladb.com>
2020-08-18 14:20:31 +03:00
Raphael S. Carvalho
11df96718a compaction: Prevent non-regular compaction from picking compacting SSTables
After 8014c7124, cleanup can potentially pick a compacting SSTable.
Upgrade and scrub can also pick a compacting SSTable.
The problem is that table::candidates_for_compaction() was badly named.
It misleads the user into thinking that the SSTables returned are perfect
candidates for compaction, but manager still need to filter out the
compacting SSTables from the returned set. So it's being renamed.

When the same SSTable is compacted in parallel, the strategy invariant
can be broken like overlapping being introduced in LCS, and also
some deletion failures as more than one compaction process would try
to delete the same files.

Let's fix scrub, cleanup and ugprade by calling the manager function
which gets the correct candidates for compaction.

Fixes #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>
2020-08-16 17:31:03 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Nadav Har'El
8135647906 merge: Add metrics to semaphores
Merged pull request https://github.com/scylladb/scylla/pull/7018
by Piotr Sarna:

This series addresses various issues with metrics and semaphores - it mainly adds missing metrics, which makes it possible to see the length of the queues attached to the semaphores. In case of view building and view update generation, metrics was not present in these services at all, so a first, basic implementation is added.

More precise semaphore metrics would ease the testing and development of load shedding and admission control.

	view_builder: add metrics
	db, view: add view update generator metrics
	hints: track resource_manager sending queue length
	hints: add drain queue length to metrics
	table: add metrics for sstable deletion semaphore
	database: remove unused semaphore
2020-08-12 12:39:59 +03:00
Piotr Sarna
8b56b24737 table: add metrics for sstable deletion semaphore
It's now possible to read the number of tasks waiting on the
sstable deletion semaphore.
2020-08-11 17:43:53 +02:00
Benny Halevy
7cfca519cb table: filter_sstable_for_reader: allow clustering filtering md-format sstables
Now that it is safe to filter md format sstable by min/max column names
we can remove the `filtering_broken` variable that disabled filtering
in 19b76bf75b to fix #4442.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
ab67629ea6 table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
To prevent https://github.com/scylladb/scylla/issues/3552
we want to ensure that in any case that the partition exists in any
sstable, we emit partition_start/end, even when returning no rows.

In the first filtering pass, filter_sstable_for_reader_by_pk filters
the input sstables based on the partition key, and num_sstables is set the size
of the sstables list after the first filtering pass.

An empty sstables list at this stage means there are indeed no sstables
with the required partition so returning an empty result will leave the
cache in the desired state.

Otherwise, we filter again, using filter_sstable_for_reader_by_ck,
and examine the list of the remaining readers.

If num_readers != num_sstables, we know that
some sstables were filterd by clustering key, so
we append a flat_mutation_reader_from_mutations to
the list of readers and return a combined reader as before.
This will ensure that we will always have a partition_start/end
mutations for the queried partition, even if the filtered
readers emit no rows.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
a672747da3 table: filter_sstable_for_reader: adjust to md-format
With the md sstable format, min/max column names in the metadata now
track clustering rows (with or without row tombstones),
range tombstones, and partition tombstones (that are
reflected with empty min/max column names - indicating
the full range).

As such, min and max column names may be of different lengths
due to range tombstones and potentially short clustering key
prefixes with compact storage, so the current matching algorithm
must be changed to take this into account.

To determine if a slice range overlaps the min/max range
we are using position_range::overlaps.

sstable::clustering_components_ranges was renamed to position_range
as it now holds a single position_range rather than a vector of bytes_view ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:30 +03:00
Benny Halevy
90d0fea7df table: filter_sstable_for_reader: include non-scylla sstables with tombstones
Move contains_rows from table code to sstable::may_contain_rows
since its implementation now has too specific knowledge of sstable
internals.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
2a57ec8c3d table: filter_sstable_for_reader: do not filter if static column is requested
Static rows aren't reflected in the sstable min/max clustering keys metadata.
Since we don't have any indication in the metadata that the sstable stores
static rows, we must read all sstables if a static column is requested.

Refs #3553

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00