Commit Graph

64 Commits

Author SHA1 Message Date
Avi Kivity
582802825a treewide: use system-#include (angle brackets) for seastar
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.

Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.

Closes #10433
2022-04-26 14:46:42 +03:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Botond Dénes
11c378a175 mutation_reader: move queue reader to readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
d0ea895671 readers: move multishard reader & friends to reader/multishard.cc
Since the multishard reader family weighs more than 1K SLOC, it gets
its own .cc file.
2022-03-30 15:42:51 +03:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Botond Dénes
ab440e1a07 mutation_writer: drop now unused v1 variants of bucket_writer feed_writer()
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220302145945.189607-2-bdenes@scylladb.com>
2022-03-10 15:20:07 +02:00
Botond Dénes
108d921fc9 mutation_writer: partition_based_splitting_writer: convert implementation to v2
Although its API was long converted to v2, its implementation stayed v1
because the memtable and mutation API were still v1. Now that the
memtable flush returns a v2 reader we can have a second look at
converting this. While the mutation API still uses v1, this can easily
be worked around by using going through `mutation_rebuilder_v2`.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220302145945.189607-1-bdenes@scylladb.com>
2022-03-10 15:20:07 +02:00
Avi Kivity
e1c326a5ba Merge "Convert multishard writer to v2" from Botond
"
Also convert the foreign_reader used by it in the process.

Tests: unit(dev)
"

* 'multishard-writer-v2/v1' of https://github.com/denesb/scylla:
  mutation_writer/multishard_writer: remove now unused v1 factory overloads
  test/boost/mutation_writer_test: test the v2 variant of distribute_reader_and_consume_on_shards()
  flat_mutation_reader: add v2 variant of make_generating_reader()
  mutation_reader: multishard_writer: migrate implementation to v2
  mutation_reader: convert foreign_reader to v2
  streaming/consumer: convert to v2
  mutation_writer/multishard_writer: add v2 variant of distribute_reader_and_consume_on_shards()
2022-03-09 19:28:05 +02:00
Botond Dénes
b2061688a5 mutation_writer/multishard_writer: remove now unused v1 factory overloads 2022-03-02 09:58:38 +02:00
Botond Dénes
bbf8e26a3a mutation_reader: multishard_writer: migrate implementation to v2 2022-03-02 09:56:10 +02:00
Botond Dénes
cdf7e74da8 mutation_reader: convert foreign_reader to v2 2022-03-02 09:55:38 +02:00
Michael Livshin
34ed752885 memtable::make_flush_reader(): return flat_mutation_reader_v2
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-02-28 17:11:54 +02:00
Botond Dénes
d27259ca5b mutation_writer/multishard_writer: add v2 variant of distribute_reader_and_consume_on_shards()
Just the factory function itself. The underlying machinery stays v1 for
now. Behind the scenes the v2 variant still invokes the v1 one, with the
necessary conversions.
This allows migrating users to the v2 interface, migrating the machinery
later.
2022-02-28 10:48:08 +02:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Botond Dénes
3ce526082f mutation_writer: remove v1 version segregate_by_partition() 2022-01-14 10:19:56 +02:00
Botond Dénes
e772326b10 mutation_writer: add v2 version of segregate_by_partition()
Just a facade using converters behind the scenes. The actual segregator
is not worth migrating to v2 while mutation and the flushing readers
don't have a v2 versions. Still, migrating all users to a v2 API allows
the conversion to happen at a single point where more work is necessary,
instead of scattered around all the users.
We leave the v1 version in place to aid incremental migration to the v2
one.
2022-01-14 08:54:26 +02:00
Botond Dénes
9826b5d732 mutation_writer: migrate timestamp_based_splitting_writer to v2 2022-01-07 13:51:48 +02:00
Botond Dénes
0601a465a2 mutation_writer: migrate shard_based_splitting_writer to v2 2022-01-07 13:48:53 +02:00
Botond Dénes
92244ae8ec mutation_writer: add v2 clone of feed_writer and bucket_writer
Since we have multiple writers using this that we don't want to migrate
all at once, we create a v2 version of said classes so we can migrate
them incrementally.
2022-01-07 13:48:43 +02:00
Botond Dénes
4b6c0fe592 mutation_writer/feed_writer: don't drop readers with small amount of content
Due to an error in transforming the above routine, readers who have <= a
buffer worth of content are dropped without consuming them.
This is due to the outer consume loop being conditioned on
`is_end_of_stream()`, which will be set for readers that eagerly
pre-fill their buffer and also have no more data then what is in their
buffer.
Change the condition to also check for `is_buffer_empty()` and only drop
the reader if both of these are true.

Fixes: #9594

Tests: unit(mutation_writer_test --repeat=200, dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211108092923.104504-1-bdenes@scylladb.com>
2021-11-09 09:15:44 +02:00
Botond Dénes
74f2290e49 mutation_writer: remove now unused on-disk partition segregator
Also removes related tests, including the exception safety test which
just spins forever with the memtable method.
2021-11-02 12:24:33 +02:00
Botond Dénes
18599f26fa mutation_writer/partition_based_splitting_writer: add memtable-based segregator
The current method of segregating partitions doesn't work well for huge
number of small partitions. For especially bad input, it can produce
hundreds or even thousands of buckets. This patch adds a new segregator
specialized for this use-case. This segregator uses a memtable to sort
out-of-order partitions in-memory. When the memtable size reaches the
provided max-memory limit, it is flushed to disk and a new empty one is
created. In-order partitions bypass the sorting altogether and go to the
fast-path bucket.

The new method is not used yet, this will come in the next patch.
2021-11-02 08:23:16 +02:00
Botond Dénes
2ca6552909 mutation_writer: segregate_by_partition(): make exception safe
Close reader if feed_writer() fails in the setup phase.
2021-10-21 06:50:22 +03:00
Botond Dénes
de55ab571b mutation_writer: feed_writers(): make it a coroutine
The current code leaks exceptional futures. Instead of attempting to
fix, just convert to cleaner and exception-safe coroutines.
2021-10-21 06:50:22 +03:00
Botond Dénes
40ca728a20 mutation_writer: partition_based_splitting_writer: erase old bucket if we fail to create replacement
So we don't attempt to close already closed bucket again in
`partition_based_splitting_writer::close()`.
2021-10-21 06:50:22 +03:00
Botond Dénes
970fe9a339 mutation_writer: partition_based_splitting_writer: limit number of max buckets
Recently we observed an OOM caused by the partition based splitting
writer going crazy, creating 1.7K buckets while scrubbing an especially
broken sstable. To avoid situations like that in the future, this patch
provides a max limit for the number of live buckets. When the number of
buckets reach this number, the largest bucket is closed and replaced by
a bucket. This will end up creating more output sstables during scrub
overall, but now they won't all be written at the same time causing
insane memory pressure and possibly OOM.
Scrub compaction sets this limit to 100, the same limit the TWCS's
timestamp based splitting writer uses (implemented through the
classifier -
time_window_compaction_strategy::max_data_segregation_window_count).

Fixes: #9400

Tests: unit(dev)

Closes #9401
2021-09-29 16:31:29 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
fe479aca1d reader_permit: add timeout member
To replace the timeout parameter passed
to flat_mutation_reader methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Botond Dénes
7bfa40a2f1 treewide: use make_tracking_only_permit()
For all those reads that don't (won't or can't) pass through admission
currently.
2021-07-14 17:19:02 +03:00
Botond Dénes
7a4381b491 mutation_writer: multishard_writer: migrate off the global reader concurrency semaphore
Use a local one instead, and stop it when the writer is destroyed.
2021-07-08 12:31:36 +03:00
Benny Halevy
693d5d9e6b mutation_writer: bucket_writer: consume: propagate _consume_fut if queue_reader_handle is_terminated
When the queue_reader_handle is terminated it was
either explicitly aborted or the reader was closed prematurely.

In this case _consume_fut should hold the root-cause error
(e.g. when compaction is stopped).  Return it instead
of trying to push the mutation fragment.
If no error is returned from _consume_fut, make to sure to
return either the queue_reader_handle error, if available,
or a generic error since the writer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-06-16 17:25:16 +03:00
Benny Halevy
5f31beaf97 flat_mutation_reader: unify reader_consumer declarations
Put the reader_consumer declaration in flat_mutation_reader.hh
and include it instead of declaring the same `using reader_consumer`
declaration in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210607075020.31671-1-bhalevy@scylladb.com>
2021-06-07 16:11:18 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Botond Dénes
a53e6bc6e8 mutation_writer: add segregate_by_partition
Add a new segregator which segregates a stream, potentially containing
duplicate or even out-of-order partitions, into multiple output streams,
such that each output stream has strictly monotonic partitions.
This segregator will be used by a new scrub compaction mode which is
meant to fix sstables containing duplicate or out-of-order data.
2021-05-05 12:03:42 +03:00
Benny Halevy
e453f890f2 mutation_writer: multishard_writer: close readers when done
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
64c5b7fda6 mutation_writer: feed_writer: close reader when done
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
f29732573a mutation_writer: bucket_writer: add close
bucket_writer::close waits for the _consumer_fut.
It is called both after consume_end_of_stream()
and after abort().

_consumer_fut is expected to return an exception
on the abort path.  Wait for it and drop any exception
so it won't be abandoned as seen in #7904.

With that moved to close() time, consume_end_of_stream
doesn't need to return a future and is made void
all the way in the stack.  This is ok since
queue_reader_handle::push_end_of_stream is synchronous too.

Added a unit test that aborts the reader consumer
during `segregate_by_timestamp`, reproducing the
Exceptional future ignored issue without the fix.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-01-19 19:03:58 +02:00
Benny Halevy
fc3f9a57ff mutation_writer/feed_writers: refactor bucket/shard writers
Consolidate shard_based_splitting_writer::shard_writer
and timestamp_based_splitting_writer::bucket_writer
common code into mutation_writer::bucket_writer.

This provides a common place to handle consume_end_of_stream()
and abort(), and in particular the handling of the underlying
_conmsumer_fut.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-01-19 18:48:01 +02:00
Benny Halevy
a9d91a2d09 mutation_writer: update bucket/shard writers consume_end_of_stream
After 61520a33d6
feed_writers doesn't call consume_end_of_stream
after abort() so no need to test
            if (!_handle.is_terminated()) {

and consume_end_of_stream is now called in then_wrapped
rather than `finally` so it's ok if it throws.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-01-19 18:44:40 +02:00
Gleb Natapov
61520a33d6 mutation_writer: pass exceptions through feed_writer
feed_writer() eats exception and transforms it into an end of stream
instead. Downstream validators hate when this happens.

Fixes #7482
Message-Id: <20201216090038.GB3244976@scylladb.com>
2020-12-16 13:18:19 +02:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Avi Kivity
422a7e07a3 timestamp_based_splitting_writer: supply a parameter to std::out_of_range contructor
std::out-of-range does not have a default constructor, yet gcc somehow
accepts a no-argument construction. Clang (correctly) doesn't, so add
a parameter.
2020-09-21 16:32:53 +03:00
Piotr Jastrzebski
01ea159fde codebase wide: use try_emplace when appropriate
C++17 introduced try_emplace for maps to replace a pattern:
if(element not in a map) {
    map.emplace(...)
}

try_emplace is more efficient and results in a more concise code.

This commit introduces usage of try_emplace when it's appropriate.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>
2020-08-16 14:41:09 +03:00
Rafael Ávila de Espíndola
f6e407ecd2 everywhere: Prepare for seastar api v4 (when_all_succeed return value)
The seastar api v4 changes the return type of when_all_succeed. This
patch adds discard_result when that is best solution to handle the
change.

This doesn't do the actual update to v4 since there are still a few
issues left to fix in seastar. A patch doing just the update will
follow.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200617233150.918110-1-espindola@scylladb.com>
2020-06-18 15:13:56 +03:00
Avi Kivity
a4c44cab88 treewide: update concepts language from the Concepts TS to C++20
Seastar recently lost support for the experimental Concepts Technical
Specification (TS) and gained support for C++20 concepts. Re-enable
concepts in Scylla by updating our use of concepts to the C++20
standard.

This change:
 - peels off uses of the GCC6_CONCEPT macro
 - removes inclusions of <seastar/gcc6-concepts.hh>
 - replaces function-style concepts (no longer supported) with
   equation-style concepts
 - semicolons added and removed as needed
 - deprecated std::is_pod replaced by recommended replacement
 - updates return type constraints to use concepts instead of
   type names (either std::same_as or std::convertible_to, with
   std::same_as chosen when possible)

No attempt is made to improve the concepts; this is a specification
update only.
Message-Id: <20200531110254.2555854-1-avi@scylladb.com>
2020-06-02 09:12:21 +03:00
Raphael S. Carvalho
9ebf7b442e timestamp_based_splitting_writer: fix use-after-move look-alike
rt is moved before rt.tomb.timestamp is retrieved, so there's a
something that looks like use-after-move here (but really isn't).

found it while auditting the code.

[avi: adjusted changelog to note that it's not really a use-after-move]
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200525141047.168968-1-raphaelsc@scylladb.com>
2020-05-27 08:40:05 +03:00
Glauber Costa
e44b2826ab compaction: avoid abandoned futures when using interposers
When using interposers, cancelling compactions can leave futures
that are not waited for (resharding, twcs)

The reason is when consume_end_of_stream gets called, it tries to
push end_of_stream into the queue_reader_handle. Because cancelling
a compaction is done through an exception, the queue_reader_handle
is terminated already at this time. Trying to push to it generates
another exception and prevents us from returning the future right
below it.

This patch adds a new method is_terminated() and if we detect
that the queue_reader_handle is already terminated by this point,
we don't try to push. We call it is_terminated() because the check
is to see if the queue_reader_handle has a _reader. The reader is
also set to null on successful destruction.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200430175839.8292-1-glauber@scylladb.com>
2020-05-01 16:30:23 +03:00