Commit Graph

2237 Commits

Author SHA1 Message Date
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
72a88e0257 mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_range_tombstone() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
4f5ccf82cb mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
f2b9cad4c6 mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_static_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Avi Kivity
2bd264ec6a sstables: remove background_jobs(), await_background_jobs()
There are no more users for registering background jobs, so remove
the mechanism and the remaining calls.
2020-09-23 20:55:17 +03:00
Avi Kivity
5db96170a5 sstables: make sstables_manager take charge of closing sstables
Currently, closing sstables happens from the sstable destructor.
This is problematic since a destructor cannot wait for I/O, so
we launch the file close process in the background. We therefore
lose track of when the closing actually takes place.

This patch makes sstables_manager take charge of the close process.
Every sstable is linked into one of two intrusive lists in its
manager: _active or _undergoing_close. When the reference count
of the sstable drops to zero, we move it from _active to
_undergoing_close and begin closing the files. sstables_manager
remembers all closes and when sstables_manager::close() is called,
it waits for all of them to complete. Therefore,
sstables_manager::close() allows us to know that all files it
manages are closed (and deleted if necessary).

The sstables_manager also gains a destructor, which disables
move construction.
2020-09-23 20:55:17 +03:00
Avi Kivity
f9aa50dcbf test: sstables test_env: introduce manager() accessor
This returns the sstables_manager carried by the test_env. We
will soon retire the global test_sstables_manager, so we need
to provide access to one.
2020-09-23 20:55:10 +03:00
Avi Kivity
a90a511d36 sstables_manager: introduce a stub close()
sstables_manager is going to take charge of its sstables lifetimes,
so it will need a close() to wait until sstables are deleted.

This patch adds sstables_manager::close() so that the surrounding
infrastructure can be wired to call it. Once that's done, we can
make it do the waiting.
2020-09-23 20:55:04 +03:00
Avi Kivity
d19c6c0d98 sstables: size_tiered_backlog_tracker: avoid assignment of non-constexpr expression to constexpr object
std::log() is not constexpr, so it cannot be assigned to a constexpr object.

Make it non-constexpr and automatic. The optimizer still figures out that it's
constant and optimizes it.

Found by clang. Apparently gcc only checks the expression is constant, not
constexpr.
2020-09-21 16:32:53 +03:00
Avi Kivity
a155b2bced sstables: leveled_manifest: prevent benign precision loss warning
Casting from the maximum int64_t to double loses precision, because
int64_t has 64 bits of precision while double has only 53. Clang
warns about it. Since it's not a real problem here, add an explicit
cast to silence the warning.
2020-09-21 16:32:53 +03:00
Avi Kivity
aa7426bde6 sstables: index_reader: make 'index_bound' public
index_reader::index_bound must be constructible by non-friend classes
since it's used in std::optional (which isn't anyone's friend). This
now works in gcc because gcc's inter-template access checking is broken,
but clang correctly rejects it.
2020-09-21 16:32:53 +03:00
Avi Kivity
bd42bdd6b5 sstables: index_reader: disambiguate promoted_index_blocks_reader "state" type and data member
promoted_index_blocks_reader has a data member called "state", and a type member
called "state". Somehow gcc manages to disambiguate the two when used, but
clang doesn't. I believe clang is correct here, one member should subsume the other.

Change the type member to have a different name to disambiguate the two.
2020-09-21 16:32:53 +03:00
Piotr Sarna
16b4b86697 sstables: drop checks for non-compound range tombstones support
Correct non-compound range tombstones are supported for over 2 years
and upgrades are only allowed from versions which already have the
support, so the checks are hereby dropped.
2020-09-14 12:09:51 +02:00
Piotr Sarna
f8ed1b5b67 sstables: drop checks for correct counter order support
Correct counter order is supported for over 2 years and upgrades are only
allowed from versions which already have the support, so the checks
are hereby dropped.
2020-09-14 12:05:11 +02:00
Avi Kivity
64c7c81bac Merge "Update log messages to {fmt} rules" from Pavel E
"
Before seastar is updated with the {fmt} engine under the
logging hood, some changes are to be made in scylla to
conform to {fmt} standards.

Compilation and tests checked against both -- old (current)
and new seastar-s.

tests: unit(dev), manual
"

* 'br-logging-update' of https://github.com/xemul/scylla:
  code: Force formatting of pointer in .debug and .trace
  code: Format { and } as {fmt} needs
  streaming: Do not reveal raw pointer in info message
  mp_row_consumer: Provide hex-formatting wrapper for bytes_view
  heat_load_balance: Include fmt/ranges.h
2020-09-03 15:10:09 +03:00
Raphael S. Carvalho
adf576f769 compaction_manager: export method that returns if table has ongoing compaction
A compaction strategy, that supports parallel compaction, may want to know
if the table has compaction running on its behalf before making a decision.
For example, a size-tiered-like strategy may not want to trigger a behavior,
like cross-tier compaction, when there's ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200901134306.23961-1-raphaelsc@scylladb.com>
2020-09-02 16:46:49 +03:00
Raphael S. Carvalho
7f7f366cb5 compaction: add debug msg to inform the amount of expired ssts skipped by compaction
this information is useful when debugging compaction issues that involve
fully expired ssts.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200828140401.96440-1-raphaelsc@scylladb.com>
2020-08-31 17:18:47 +03:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Pavel Emelyanov
50e3a30dae mp_row_consumer: Provide hex-formatting wrapper for bytes_view
By default {fmt} doesn't know how to format this type (although it's a
basic_string_view instantiated), and even providing formatter/operator<<
does not help -- it anyway hits an earlier assertion in args mapper about
the disallowance of character types mixing.

The hex-wrapper with own operator<< solves the problem.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Benny Halevy
f5ffd5fc5f sstables: Fix reactor stall in sstables::seal_summary()
With relatively big summaries, reactor can be stalled for a couple
of milliseconds.

This patch:
a. allocates positions upfront to avoid excessive reallocation.
b. returns a future from seal_summary() and uses `seastar::do_for_each`
to iterate over the summary entries so the loop can yield if necessary.

Fixes #7108.

Based on 2470aad5a389dfd32621737d2c17c7e319437692 by Raphael S. Carvalho <raphaelsc@scylladb.com>

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200826091337.28530-1-bhalevy@scylladb.com>
2020-08-26 12:18:05 +03:00
Benny Halevy
78a44dda57 sstables: avoid double close in file_writer destructor
If file_writer::close() fails to close the output stream
closing will be retried in file_writer::~file_writer,
leading to:
```
include/seastar/core/future.hh:1892: seastar::future<T ...> seastar::promise<T>::get_future() [with T = {}]: Assertion `!this->_future && this->_state && !this->_task' failed.
```
as seen in https://github.com/scylladb/scylla/issues/7085

Fixes #7085

Test: unit(dev), database_test with injected error in posix_file_impl::close()
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200826062456.661708-1-bhalevy@scylladb.com>
2020-08-26 11:33:23 +03:00
Rafael Ávila de Espíndola
5fcfbd76a9 sstables: Delete duplicated code
For some reason date_tiered_compaction_strategy had its own identical
copy of get_value.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200819211509.106594-1-espindola@scylladb.com>
2020-08-26 11:33:23 +03:00
Pavel Emelyanov
171822cff8 compaction: Use database from options to get local ranges
The cleanup compaction wants to keep local tokens on-board and gets
them from storage_service.get_local_ranges().

This method is the wrapper around database.get_keyspace_local_ranges()
created in previous patch, the live database reference is already
available on the descriptor's options, so we can short-cut the call.

This allows removing the last explicit call for global storage_service
instance from compaction code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-21 14:58:40 +03:00
Pavel Emelyanov
8333fed8aa compaction: Keep database reference on upgrade options
The only place that creates them is the API upgrade_sstables call.

The created options object doesn't over-survive the returned
future, so it's safe to keep this reference there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-21 14:58:40 +03:00
Pavel Emelyanov
a6e6856e1f compaction: Keep database reference on cleanup options
The database is available at both places that create the options --
tests and API perform_cleanup call.

Options object doesn't over-survive the returned future, so it's
safe to keep the reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-21 14:58:40 +03:00
Raphael S. Carvalho
a0e0195a77 sstables: Avoid excessive reallocations when creating sharding metadata
Let's reserve space for sharding metadata in advance, to avoid excessive
allocations in create_sharding_metadata().
With the default ignore_msb_bits=12, it was observed that the # of
reallocations is frequently 11-12. With ignore_msb_bits=16, the number
can easily go up to 50.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200814210250.39361-1-raphaelsc@scylladb.com>
2020-08-19 17:58:29 +03:00
Avi Kivity
6f986df458 Merge "Fix TWCS compaction aggressiveness due to data segregation" from Raphael
"
After data segregation feature, anything that cause out-of-order writes,
like read repair, can result in small updates to past time windows.
This causes compaction to be very aggressive because whenever a past time
window is updated like that, that time window is recompacted into a
single SSTable.
Users expect that once a window is closed, it will no longer be written
to, but that has changed since the introduction of the data segregation
future. We didn't anticipate the write amplification issues that the
feature would cause. To fix this problem, let's perform size-tiered
compaction on the windows that are no longer active and were updated
because data was segregated. The current behavior where the last active
window is merged into one file is kept. But thereafter, that same
window will only be compacted using STCS.

Fixes #6928.
"

* 'fix_twcs_agressiveness_after_data_segregation_v2' of github.com:raphaelsc/scylla:
  compaction/twcs: improve further debug messages
  compaction/twcs: Improve debug log which shows all windows
  test: Check that TWCS properly performs size-tiered compaction on past windows
  compaction/twcs: Make task estimation take into account the size-tiered behavior
  compaction/stcs: Export static function that estimates pending tasks
  compaction/stcs: Make get_buckets() static
  compact/twcs: Perform size-tiered compaction on past time windows
  compaction/twcs: Make strategy easier to extend by removing duplicated knowledge
  compaction/twcs: Make newest_bucket() non-static
  compaction/twcs: Move TWCS implementation into source file
2020-08-19 17:19:01 +03:00
Avi Kivity
f6b66456fd Update seastar submodule
Contains patch from Rafael to fix up includes.

* seastar c872c3408c...7f7cf0f232 (9):
  > future: Consider result_unavailable invalid in future_state_base::ignore()
  > future: Consider result_unavailable invalid in future_state_base::valid()
  > Merge "future-util: split header" from Benny
  > docs: corrected some text and code-examples in streaming-rpc docs
  > future: Reduce nesting in future::then
  > demos: coroutines: include std-compat.hh
  > sstring: mark str() and methods using it as noexcept
  > tls: Add an assert
  > future: fix coroutine compilation
2020-08-19 17:18:57 +03:00
Rafael Ávila de Espíndola
56724d084d sstables: Move date_tiered_compaction_strategy_options::date_tiered_compaction_strategy_options out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-6-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
07b3ead752 sstables: Move size_tiered_compaction_strategy_options::size_tiered_compaction_strategy_options out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-5-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
7b3946fa0e sstables: Move compaction_strategy_impl::compaction_strategy_impl out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-4-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
9ba765fe6f sstables: Move compaction_strategy_impl::get_value out of line
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-3-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Rafael Ávila de Espíndola
06b15aa7e3 sstables: Move time_window_compaction_strategy_options' constructors to a .cc
These are not trivial and not hot.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200812232915.442564-2-espindola@scylladb.com>
2020-08-19 11:34:13 +03:00
Raphael S. Carvalho
d601f78b4b compaction/twcs: improve further debug messages
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
086f277584 compaction/twcs: Improve debug log which shows all windows
The current log prints one log entry for each window, it doesn't print
the # of SSTs in the bucket, and the now information is copied across
all the window entries.

previously, it looked like this:

[shard 0] compaction - Key 1597331160000000, now 1597331160000000
[shard 0] compaction - Key 1597331100000000, now 1597331160000000
[shard 0] compaction - Key 1597331040000000, now 1597331160000000
[shard 0] compaction - Key 1597330980000000, now 1597331160000000

this made it harder to group all windows which reflect the state of
the strategy in a given time.

now, it looks like as follow:

[shard 0] compaction - time_window_compaction_strategy::newest_bucket:
  now 1597331160000000
  buckets = {
    key=1597331160000000, size=1
    key=1597331100000000, size=2
    key=1597331040000000, size=1
    key=1597330980000000, size=1
  }

Also the level of this log is changed from debug to trace, given that
now it's compressed and only printed once.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
96436312be compaction/twcs: Make task estimation take into account the size-tiered behavior
The task estimation was not taking into account that TWCS does size-tiered
on the the windows, and it only added 1 to the estimation when there
could be more tasks than that depending on the amount of SSTables in
all the existing size tiers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
d287b1c198 compaction/stcs: Export static function that estimates pending tasks
That will be useful for allowing other compaction strategies that use
STCS to properly estimate the pending tasks.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:09 -03:00
Raphael S. Carvalho
b62737fd05 compaction/stcs: Make get_buckets() static
STCS will export a static function to estimate pending tasks, and
it relies on get_buckets() being static too.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-18 15:14:07 -03:00
Dejan Mircevski
fb6c011b52 everywhere: Insert space after switch
Quoth @avikivity: "switch is not a function, and we celebrate that by
putting a space after it like other control-flow keywords."

https://github.com/scylladb/scylla/pull/7052#discussion_r471932710

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-18 14:31:04 +03:00
Raphael S. Carvalho
f9f0be9ac8 compact/twcs: Perform size-tiered compaction on past time windows
After data segregation feature, anything that cause out-of-order writes,
like read repair, can result in small updates to past time windows.
This causes compaction to be very aggressive because whenever a past time
window is updated like that, that time window is recompacted into a
single SSTable.
Users expect that once a window is closed, it will no longer be written
to, but that has changed since the introduction of the data segregation
future. We didn't anticipate the write amplification issues that the
feature would cause. To fix this problem, let's perform size-tiered
compaction on the windows that are no longer active and were updated
because data was segregated. The current behavior where the last active
window is merged into one file is kept. But thereafter, that same
window will only be compacted using STCS.

Fixes #6928.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
820b47e9a3 compaction/twcs: Make strategy easier to extend by removing duplicated knowledge
TWCS is hard to extend because its knowledge on what to do with a window
bucket is duplicated in two functions. Let's remove this duplication by
placing the knowledge into a single function.

This is important for the coming change that will perform size-tiered
instead of major on windows that are no longer active.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
f2b588cfc4 compaction/twcs: Make newest_bucket() non-static
To fix #6928, newest_bucket() will have to access the class fields.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
b95359314d compaction/twcs: Move TWCS implementation into source file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-08-17 12:29:34 -03:00
Raphael S. Carvalho
81ec49c82f sstables/sstable_set: rename method to retrieve sstable runs
select() is too generic for the method that retrieve sstable runs,
and it has a completely different meaning that the former select
method used to select sstables based on token range.
let's give it a more descriptive name.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811193401.22749-1-raphaelsc@scylladb.com>
2020-08-16 17:41:16 +03:00
Raphael S. Carvalho
b07920dd1f sstables: Fix remove_by_toc_name() on temporary toc
regression caused by 55cf219c97.

remove_by_toc_name() must work both for a sealed sstable with toc,
and also a partial sstable with tmp toc.
so dirname() should be called conditionally on the condition of
the sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200813160612.101117-1-raphaelsc@scylladb.com>
2020-08-16 17:35:55 +03:00
Raphael S. Carvalho
7d7f9e1c54 sstables/LCS: increase per-level overlapping tolerance in reshape
LCS can have its overlapping invariant broken after operations that can
proceed in parallel to regular compaction like cleanup. That's because
there could be two compactions in parallel placing data in overlapping
token ranges of a given level > 0.
After reshape, the whole table will be rewritten, on restart, if a
given level has more than (fan_out*2)=20 overlaps.
That may sound like enough, but that's not taking into account the
exponential growth in # of SSTables per level, so 20 overlaps may
sound like a lot for level 2 which can afford 100 sstables, but it's
only 2% of level 3, and 0.2% of level 4. So let's change the
overlapping tolerance from the constant of fan_out*2 to 10% of level
limit on # of SSTables, or fan_out, whichever is higher.

Refs #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200810154510.32794-1-raphaelsc@scylladb.com>
2020-08-16 17:33:48 +03:00
Raphael S. Carvalho
11df96718a compaction: Prevent non-regular compaction from picking compacting SSTables
After 8014c7124, cleanup can potentially pick a compacting SSTable.
Upgrade and scrub can also pick a compacting SSTable.
The problem is that table::candidates_for_compaction() was badly named.
It misleads the user into thinking that the SSTables returned are perfect
candidates for compaction, but manager still need to filter out the
compacting SSTables from the returned set. So it's being renamed.

When the same SSTable is compacted in parallel, the strategy invariant
can be broken like overlapping being introduced in LCS, and also
some deletion failures as more than one compaction process would try
to delete the same files.

Let's fix scrub, cleanup and ugprade by calling the manager function
which gets the correct candidates for compaction.

Fixes #6938.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200811200135.25421-1-raphaelsc@scylladb.com>
2020-08-16 17:31:03 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Benny Halevy
13f437157a compaction_manager: register_compacting_sstables: allocate before registering sstables
make all required allocations in advance to merging sstables
into _compacting_sstables so it should not throw
after registering some sstables, but not all.

Test: database_test(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200811132440.416945-1-bhalevy@scylladb.com>
2020-08-11 18:14:58 +03:00