Commit Graph

2263 Commits

Author SHA1 Message Date
Avi Kivity
71398f3fb4 Merge "Cleanup sstable writer" from Benny
"
This series cleans up the legacy and common ssatble writer code.

metadata_collector::_ancestors were moved to class sstable
so that the former can be moved out of sstable into file_writer_impl.

Moved setting of replay position and sstable level via
sstable_writer_config so that compaction won't need to access
the metadata_collector via the sstable.

With that, metadata_collector could be moved from class sstable
to sstable_writer::writer_impl along with the column_stats.

That allowed moved "generic" file_writer methods that were actually
k/l format specific into sstable_writer_k_l.

Eventually `file_writer` code is moved into sstables/writer.cc
and sstable_writer_k_l into sstables/kl/writer.{hh,cc}

A bonus cleanup is the ability to get rid of
sstable::_correctly_serialize_non_compound_range_tombstones as
it's now available to the writers via the writer configuration
and not required to be stored in the sstable object.

Fixes #3012

Test: unit(dev)
"

* tag 'cleanup-sstable-writer-v2' of github.com:bhalevy/scylla:
  sstables: move writer code away to writer.cc
  sstables: move sstable_writer_k_l away to kl/writer
  sstables: get rid of sstable::_correctly_serialize_non_compound_range_tombstones
  sstables: move writer methods to sstable_writer_k_l
  sstables: move compaction ancestors to sstable
  sstables: sstable_writer: optionally set sstable level via config
  sstables: sstable_writer: optionally set replay position via config
  sstables: compaction: make_sstable_writer_config
  sstables: open code update_stats_on_end_of_stream in sstable_writer::consume_end_of_stream
  sstables: fold components_writer into sstable_writer_k_l
  sstables: move sstable_writer_k_l definition upwards
  sstables: components_writer: turn _index into unique_ptr
2020-10-15 10:40:28 +03:00
Benny Halevy
279865e56c sstables: move writer code away to writer.cc
Move `file_writer` code into sstables/writer.cc

Fixes #3012

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 23:41:47 +03:00
Benny Halevy
20adb96f62 sstables: move sstable_writer_k_l away to kl/writer
Move the sstable_writer_k_l code into sstables/kl/writer.{hh,cc}

Refs #3012

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 23:40:56 +03:00
Benny Halevy
8cd4d53643 sstables: mx/writer: fix copy-paste error in reader_semaphore name
It was copied from sstables.cc in
6ca0464af5.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201014171651.541232-1-bhalevy@scylladb.com>
2020-10-14 22:17:49 +02:00
Benny Halevy
96cd6adc71 sstables: get rid of sstable::_correctly_serialize_non_compound_range_tombstones
Now it's available to the writers via the writer configuration
and not required to be stored in the sstable object.

Refs #3012

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:53:23 +03:00
Benny Halevy
97a446f9fa sstables: move writer methods to sstable_writer_k_l
They are called solely from the sstable_writer_k_l path.
With that, moce the metadata collector and column stats
to writer_impl.  They are now only used by the sstable
writers.

Refs #3012

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:52:17 +03:00
Benny Halevy
e1692bec17 sstables: move compaction ancestors to sstable
Compaction needs access to the sstable's ancestors so we need to
keep the ancestors for the sstable separately from the metadata collector
as the latter is about to be moved to the sstable writer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:51:26 +03:00
Benny Halevy
a49a5f36c1 sstables: sstable_writer: optionally set sstable level via config
And use compaction::make_sstable_writer_config to pass
the compaction's `_sstable_level` to the writer
via sstable_writer_config, instead of via the sstable
metadata_collector, that is going to move from the sstable
to the write_impl.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:49:36 +03:00
Benny Halevy
ac3c33ffca sstables: sstable_writer: optionally set replay position via config
And use compaction::make_sstable_writer_config to pass
the compaction's replay_position (`_rp`) to the writer
via sstable_writer_config, instead of via the sstable
metadata_collector, that is going to move from the sstable
to the write_impl.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:39:46 +03:00
Benny Halevy
e314eb3f78 sstables: compaction: make_sstable_writer_config
Consolidate the code to make the sstable_writer_config
for sstable writers into a helper method.

Folowing patches will add the ability to set the
replay position and sstable level via that config
structure.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 18:01:46 +03:00
Benny Halevy
55d73ec2bc sstables: open code update_stats_on_end_of_stream in sstable_writer::consume_end_of_stream
In preparation to moving sstable methods to sstable_writer_k_l
as part of #3012.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 17:46:26 +03:00
Benny Halevy
27e3c03ce2 sstables: fold components_writer into sstable_writer_k_l
It serves no purpose being a different class but being called
by sstable_writer_k_l.

Refs #3012.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 17:40:47 +03:00
Benny Halevy
56a6a4ff17 sstables: move sstable_writer_k_l definition upwards
To facilitate consolidation of components_writer and
some sstable methods into sstable_writer_k_l.

Refs #3012.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 17:13:44 +03:00
Benny Halevy
8f239f8f4c sstables: components_writer: turn _index into unique_ptr
In preparation to folding components_writer into sstable_writer_k_l
in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 17:10:31 +03:00
Avi Kivity
86bbf1763d Merge "reader concurrency semaphore: dump permit diagnostics on timeout or queue overflow" from Botond
"
The reader concurrency semaphore timing out or its queue being overflown
are fairly common events both in production and in testing. At the same
time it is a hard to diagnose problem that often has a benign cause
(especially during testing), but it is equally possible that it points
to something serious. So when this error starts to appear in logs,
usually we want to investigate and the investigation is lengthy...
either involves looking at metrics or coredumps or both.

This patch intends to jumpstart this process by dumping a diagnostics on
semaphore timeout or queue overflow. The diagnostics is printed to the
log with debug level to avoid excessive spamming. It contains a
histogram of all the permits associated with the problematic semaphore
organized by table, operation and state.

Example:

DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore -
Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics:
Permits with state admitted, sorted by memory
memory  count   name
3499M   27      ks.test:data-query

3499M   27      total

Permits with state waiting, sorted by count
count   memory  name
1       0B      ks.test:drain
7650    0B      ks.test:data-query

7651    0B      total

Permits with state registered, sorted by count
count   memory  name

0       0B      total

Total: permits: 7678, memory: 3499M

This allows determining several things at glance:
* What are the tables involved
* What are the operations involved
* Where is the memory

This can speed up a follow-up investigation greatly, or it can even be
enough on its own to determine that the issue is benign.

Tests: unit(dev, debug)
"

* 'dump-diagnostics-on-semaphore-timeout/v2' of https://github.com/denesb/scylla:
  reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow
  utils: add to_hr_size()
  reader_concurrency_semaphore: link permits into an intrusive list
  reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line
  reader_concurrency_semaphore: move constructors out-of-line
  reader_concurrency_semaphore: add state to permits
  reader_concurrency_semaphore: name permits
  querier_cache_test: test_immediate_evict_on_insert: use two permits
  multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader()
  multishard_combining_reader: add permit parameter
  multishard_combining_reader: shard_reader: use multishard reader's permit
2020-10-13 12:44:23 +03:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Avi Kivity
3451579d81 sstables: move component_type formatter to namespace sstables
Without this, clang complains that we violate argument dependent
lookup rules:

  note: 'operator<<' should be declared prior to the call site or in namespace 'sstables'
  std::ostream& operator<<(std::ostream&, const sstables::component_type&);

we can't enforce the #include order, but we can easily move it it
to namespace sstables (where it belongs anyway), so let's do that.

gcc is happy either way.

Closes #7413
2020-10-12 21:49:25 +02:00
Avi Kivity
5065ae835f sstables: move bound_kind_m formatter to namespace sstables
Without this, clang complains that we violate argument dependent
lookup rules:

  note: 'operator<<' should be declared prior to the call site or in namespace 'sstables'
  std::ostream& operator<<(std::ostream&, const sstables::bound_kind_m&);

we can't enforce the #include order, but we can easily move it it
to namespace sstables (where it belongs anyway), so let's do that.

gcc is happy either way.
2020-10-12 20:38:11 +03:00
Avi Kivity
a00fca1a69 sstables: move bound_kind_m formatter to its natural place
Move bound_kind_m's formatter to the same header file where
is is defined. This prevents cases where the compiler decays
the type (an enum) to the underlying integral type because it
does not see the formatter declaration, resulting in the wrong
output.
2020-10-12 20:36:10 +03:00
Avi Kivity
69c3533d97 sstables: deinline bound_kind_m formatter
The formatter is by no means hot code and should not be inlined.
2020-10-12 20:35:08 +03:00
Avi Kivity
4d6739c2e6 Merge "Use max_concurrent_for_each" from Benny
"
max_concurrent_for_each was added to seastar for replacing
sstable_directory::parallel_for_each_restricted by using
more efficient concurrency control that doesn't create
unlimited number of continuations.

The series replaces the use of sstable_directory::parallel_for_each_restricted
with max_concurrent_for_each and exposes the sstable_directory::do_for_each_sstable
via a static method.

This method is used here by table::snapshot to limit concurrency
do snapshot operations that suffer from the same unbound
concurrency problem sstable_directory solved.

In addition sstable_directory::_load_semaphore that was used
across calls to do_for_each_sstable was replaced by a static per-shard
semaphore that caps concurrency across all calls to `do_for_each_sstable`
on that shard.  This makes sense since the disk is a shared resource.

In the future, we may want to have a load semaphore per device rather than
a single global one.  We should experiment with that.

Test: unit(dev)
"

* tag 'max_concurrent_for_each-v5' of github.com:bhalevy/scylla:
  table: snapshot: use max_concurrent_for_each
  sstable_directory: use a external load_semaphore
  test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory
  distributed_loader: process_upload_dir: use initial_sstable_loading_concurrency
  sstables: sstable_directory: use max_concurrent_for_each
2020-10-12 09:43:12 +03:00
Avi Kivity
b172e4c2ce sstables: make index_bound a non-nested struct
Due to a longstanding bug in clang[1], the compiler doesn't think
that such a class is default-constructible. This causes
std::optional<index_bound>::optional() not to compile. Because it
depends on open_tt_marker, extract that too.

[1] https://stackoverflow.com/questions/47974898/clang-5-stdoptional-instantiation-screws-stdis-constructible-trait-of-the-p

Closes #7387
2020-10-11 17:40:01 +03:00
Avi Kivity
8932c4e919 compaction: allow _max_sstable_size = 0
Some test (run_based_compaction_test at least) use _max_sstable_size = 0
in order to force one partition per sstable. That triggers an overflow
when calculating the expected bloom filter size. The overflow doesn't
matter for normal operation, because the result later appears on a
divisor, but does trigger a ubsan error.

Squelch the error by bot dividing by zero here.

I tried using _max_sstable_size = 1, but the test failed for other
reasons.

Closes #7375
2020-10-11 15:43:51 +03:00
Benny Halevy
57cc5f6ae1 sstable_directory: use a external load_semaphore
Although each sstable_directory limits concurrency using
max_concurrent_for_each, there could be a large number
of calls to do_for_each_sstable running in parallel
(e.g per keyspace X per table in the distributed_loader).

To cap parallelism across sstable_directory instances and
concurrent calls to do_for_each_sstable, start a sharded<semaphore>
and pass a shared semaphore& to the sstable_directory:s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-08 11:57:06 +03:00
Benny Halevy
c26c784882 sstables: sstable_directory: use max_concurrent_for_each
Use max_concurrent_for_each instead of parallel_for_each in
sstable_directory::parallel_for_each_restricted to avoid
creating potentially thousands of continuations,
one for each sstable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-07 14:45:20 +03:00
Botond Dénes
dd372c8457 flat_mutation_reader: de-virtualize buffer_size()
The main user of this method, the one which required this method to
return the collective buffer size of the entire reader tree, is now
gone. The remaining two users just use it to check the size of the
reader instance they are working with.
So de-virtualize this method and reduce its responsibility to just
returning the buffer size of the current reader instance.
2020-10-06 08:22:56 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
72a88e0257 mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_range_tombstone() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
4f5ccf82cb mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
f2b9cad4c6 mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_static_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Avi Kivity
2bd264ec6a sstables: remove background_jobs(), await_background_jobs()
There are no more users for registering background jobs, so remove
the mechanism and the remaining calls.
2020-09-23 20:55:17 +03:00
Avi Kivity
5db96170a5 sstables: make sstables_manager take charge of closing sstables
Currently, closing sstables happens from the sstable destructor.
This is problematic since a destructor cannot wait for I/O, so
we launch the file close process in the background. We therefore
lose track of when the closing actually takes place.

This patch makes sstables_manager take charge of the close process.
Every sstable is linked into one of two intrusive lists in its
manager: _active or _undergoing_close. When the reference count
of the sstable drops to zero, we move it from _active to
_undergoing_close and begin closing the files. sstables_manager
remembers all closes and when sstables_manager::close() is called,
it waits for all of them to complete. Therefore,
sstables_manager::close() allows us to know that all files it
manages are closed (and deleted if necessary).

The sstables_manager also gains a destructor, which disables
move construction.
2020-09-23 20:55:17 +03:00
Avi Kivity
f9aa50dcbf test: sstables test_env: introduce manager() accessor
This returns the sstables_manager carried by the test_env. We
will soon retire the global test_sstables_manager, so we need
to provide access to one.
2020-09-23 20:55:10 +03:00
Avi Kivity
a90a511d36 sstables_manager: introduce a stub close()
sstables_manager is going to take charge of its sstables lifetimes,
so it will need a close() to wait until sstables are deleted.

This patch adds sstables_manager::close() so that the surrounding
infrastructure can be wired to call it. Once that's done, we can
make it do the waiting.
2020-09-23 20:55:04 +03:00
Avi Kivity
d19c6c0d98 sstables: size_tiered_backlog_tracker: avoid assignment of non-constexpr expression to constexpr object
std::log() is not constexpr, so it cannot be assigned to a constexpr object.

Make it non-constexpr and automatic. The optimizer still figures out that it's
constant and optimizes it.

Found by clang. Apparently gcc only checks the expression is constant, not
constexpr.
2020-09-21 16:32:53 +03:00
Avi Kivity
a155b2bced sstables: leveled_manifest: prevent benign precision loss warning
Casting from the maximum int64_t to double loses precision, because
int64_t has 64 bits of precision while double has only 53. Clang
warns about it. Since it's not a real problem here, add an explicit
cast to silence the warning.
2020-09-21 16:32:53 +03:00
Avi Kivity
aa7426bde6 sstables: index_reader: make 'index_bound' public
index_reader::index_bound must be constructible by non-friend classes
since it's used in std::optional (which isn't anyone's friend). This
now works in gcc because gcc's inter-template access checking is broken,
but clang correctly rejects it.
2020-09-21 16:32:53 +03:00
Avi Kivity
bd42bdd6b5 sstables: index_reader: disambiguate promoted_index_blocks_reader "state" type and data member
promoted_index_blocks_reader has a data member called "state", and a type member
called "state". Somehow gcc manages to disambiguate the two when used, but
clang doesn't. I believe clang is correct here, one member should subsume the other.

Change the type member to have a different name to disambiguate the two.
2020-09-21 16:32:53 +03:00
Piotr Sarna
16b4b86697 sstables: drop checks for non-compound range tombstones support
Correct non-compound range tombstones are supported for over 2 years
and upgrades are only allowed from versions which already have the
support, so the checks are hereby dropped.
2020-09-14 12:09:51 +02:00
Piotr Sarna
f8ed1b5b67 sstables: drop checks for correct counter order support
Correct counter order is supported for over 2 years and upgrades are only
allowed from versions which already have the support, so the checks
are hereby dropped.
2020-09-14 12:05:11 +02:00
Avi Kivity
64c7c81bac Merge "Update log messages to {fmt} rules" from Pavel E
"
Before seastar is updated with the {fmt} engine under the
logging hood, some changes are to be made in scylla to
conform to {fmt} standards.

Compilation and tests checked against both -- old (current)
and new seastar-s.

tests: unit(dev), manual
"

* 'br-logging-update' of https://github.com/xemul/scylla:
  code: Force formatting of pointer in .debug and .trace
  code: Format { and } as {fmt} needs
  streaming: Do not reveal raw pointer in info message
  mp_row_consumer: Provide hex-formatting wrapper for bytes_view
  heat_load_balance: Include fmt/ranges.h
2020-09-03 15:10:09 +03:00
Raphael S. Carvalho
adf576f769 compaction_manager: export method that returns if table has ongoing compaction
A compaction strategy, that supports parallel compaction, may want to know
if the table has compaction running on its behalf before making a decision.
For example, a size-tiered-like strategy may not want to trigger a behavior,
like cross-tier compaction, when there's ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200901134306.23961-1-raphaelsc@scylladb.com>
2020-09-02 16:46:49 +03:00
Raphael S. Carvalho
7f7f366cb5 compaction: add debug msg to inform the amount of expired ssts skipped by compaction
this information is useful when debugging compaction issues that involve
fully expired ssts.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200828140401.96440-1-raphaelsc@scylladb.com>
2020-08-31 17:18:47 +03:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
50e3a30dae mp_row_consumer: Provide hex-formatting wrapper for bytes_view
By default {fmt} doesn't know how to format this type (although it's a
basic_string_view instantiated), and even providing formatter/operator<<
does not help -- it anyway hits an earlier assertion in args mapper about
the disallowance of character types mixing.

The hex-wrapper with own operator<< solves the problem.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Benny Halevy
f5ffd5fc5f sstables: Fix reactor stall in sstables::seal_summary()
With relatively big summaries, reactor can be stalled for a couple
of milliseconds.

This patch:
a. allocates positions upfront to avoid excessive reallocation.
b. returns a future from seal_summary() and uses `seastar::do_for_each`
to iterate over the summary entries so the loop can yield if necessary.

Fixes #7108.

Based on 2470aad5a389dfd32621737d2c17c7e319437692 by Raphael S. Carvalho <raphaelsc@scylladb.com>

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200826091337.28530-1-bhalevy@scylladb.com>
2020-08-26 12:18:05 +03:00
Benny Halevy
78a44dda57 sstables: avoid double close in file_writer destructor
If file_writer::close() fails to close the output stream
closing will be retried in file_writer::~file_writer,
leading to:
```
include/seastar/core/future.hh:1892: seastar::future<T ...> seastar::promise<T>::get_future() [with T = {}]: Assertion `!this->_future && this->_state && !this->_task' failed.
```
as seen in https://github.com/scylladb/scylla/issues/7085

Fixes #7085

Test: unit(dev), database_test with injected error in posix_file_impl::close()
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200826062456.661708-1-bhalevy@scylladb.com>
2020-08-26 11:33:23 +03:00
Rafael Ávila de Espíndola
5fcfbd76a9 sstables: Delete duplicated code
For some reason date_tiered_compaction_strategy had its own identical
copy of get_value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200819211509.106594-1-espindola@scylladb.com>
2020-08-26 11:33:23 +03:00
Pavel Emelyanov
171822cff8 compaction: Use database from options to get local ranges
The cleanup compaction wants to keep local tokens on-board and gets
them from storage_service.get_local_ranges().

This method is the wrapper around database.get_keyspace_local_ranges()
created in previous patch, the live database reference is already
available on the descriptor's options, so we can short-cut the call.

This allows removing the last explicit call for global storage_service
instance from compaction code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-21 14:58:40 +03:00