Commit Graph

67 Commits

Author SHA1 Message Date
Paweł Dziepak
d8dad04564 mutation_reader: drop make_restricted_reader()
make_restricted_reader() has been replaced by
make_restricted_flat_reader().
2017-12-13 11:57:22 +00:00
Paweł Dziepak
3839bc5d60 mutation_reader: convert restricted reader to flat streams 2017-12-13 10:46:41 +00:00
Botond Dénes
1ff65f41fd mutation_reader_merger: don't query the kind of moved-from fragment
Call mutation_fragment_kind() on the fragment *before* it's moved as
there are not guarantees for the state of a moved-from object (apart
from that it's in a valid one).

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c47b1e22877bb9499f1fbb9d513093c29ef1901b.1512635422.git.bdenes@scylladb.com>
2017-12-07 10:40:31 +02:00
Botond Dénes
9661769313 combined_mutation_reader: fix fast-fowarding related row-skipping bug
When fast forwarding is enabled and all readers positioned inside the
current partition return EOS, return EOS from the combined-reader
too. Instead of skipping to the next partition if there are idle readers
(positioned at some later partition) available. This will cause rows to
be skipped in some cases.

The fix is to distinguish EOS'd readers that are only halted (waiting
for a fast-forward) from thoose really out of data. To achieve this we
track the last fragment-kind the reader emitted. If that was a
partition-end then the reader is out of data, otherwise it might emit
more fragments after a fast-forward. Without this additional information
it is impossible to determine why a reader reached EOS and the code
later may make the wrong decision about whether the combined-reader as
a whole is at EOS or not.
Also when fast-forwarding between partition-ranges or calling
next_partition() we set the last fragment-kind of forwarded readers
because they should emit a partition-start, otherwise they are out of
data.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <6f0b21b1ec62e1197de6b46510d5508cdb4a6977.1512569218.git.bdenes@scylladb.com>
2017-12-06 16:09:05 +02:00
Botond Dénes
e7535f5e88 Add flat_mutation_reader overload of make_combined_reader 2017-12-04 07:57:43 +02:00
Botond Dénes
8731c1bc66 Flatten the implementation of combined_mutation_reader
In fact flatten mutation_reader_merger and adjust combined_mutation_reader
accordingly.
2017-12-04 07:57:43 +02:00
Botond Dénes
3f8110b5b6 Make combined_mutation_reader a flat_mutation_reader
For now only the interface is converted, behind the scenes the previous
implementation remains, it's output is simply converted by
flat_mutation_reader_from_mutation_reader. The implementation will be
converted in the following patches.
2017-12-04 07:57:43 +02:00
Botond Dénes
c011747c30 Move the mutation merging logic to combined_mutation_reader
This is the second step in splitting the combined readers's logic into
two parts as outlined in the previous patch.
2017-12-04 07:57:43 +02:00
Botond Dénes
3681e17555 Remove the unnecessary indirection of mutation_reader_merger::next() 2017-12-04 07:57:43 +02:00
Botond Dénes
c5e57e0961 Move the implementation of combined_mutation_reader into mutation_reader_merger
This simple code-movement and patch lays the groundwork for splitting
the logic in combined_mutation_reader into two blocks:
* one that takes care of moving the readers in lockstep and emits their
    output as a non-decreasing stream of streamed_mutations and
* one that takes care of merging the above stream into
    strictly-increasing stream of streamed_mutations.

This in turn is preparation-work to the transformation of
combined_mutation_reader into a flat_mutation_reader::impl.
2017-12-04 07:57:43 +02:00
Piotr Jastrzebski
3f70dfc939 Introduce conversion from flat_mutation_reader to streamed_mutation
Allows splitting migration into small steps.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-15 15:33:23 +01:00
Paweł Dziepak
97767963a0 mutation_reader: drop multi_range_reader 2017-11-13 16:49:52 +00:00
Piotr Jastrzebski
acfc6fef55 Simplify flat_mutation_reader wrappers
If a wrapper takes a flat_mutation_reader in a constructor
then it does not have to take schema_ptr because it can obtain
it from the inner flat_mutation_reader.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <88c3672df08d2ac465711e9138d426e43ae9c62b.1510331382.git.piotr@scylladb.com>
2017-11-13 08:53:34 +01:00
Tomasz Grabiec
92e3449d59 mutation_reader: Do not call fast_forward_to() on a reference to a capture
The range reference is supposed to be valid as long as the reader is
used, not just around fast_forward_to().

Introduced in a6b9186cab
Message-Id: <1509710642-12713-1-git-send-email-tgrabiec@scylladb.com>
2017-11-03 12:09:42 +00:00
Paweł Dziepak
8c3b7fea81 Merge "Introduce new API and converters from/to old mutation_reader" from Piotr
"This changeset is the first step to flatten mutation_reader.
Then it introduces new mutation_fragment types for partition header and end of partition.
Using those a new flat_mutation_reader is defined.
Finally it introduces converters between new flat_mutation_reader and
old mutation_reader."

* 'haaawk/flattened_mutation_reader_v12' of github.com:scylladb/seastar-dev:
  Add tests for flat_mutation_reader
  Introduce conversion from flat_mutation_reader to mutation_reader
  Introduce conversion from mutation_reader to flat_mutation_reader
  Introduce flat_mutation_reader
  Extract FlattenedConsumer concept using GCC6_CONCEPT
  Introduce partition_end mutation_fragment
  Introduce a position for end of partition
  Introduce partition_start mutation_fragment
  Introduce FragmentConsumer
  Introduce a position for partition start
  streamed_mutation: Extract concepts using GCC6_CONCEPT macro
2017-10-16 12:14:23 +01:00
Piotr Jastrzebski
31733a7eeb Introduce conversion from flat_mutation_reader to mutation_reader
This will be used in transition from mutation_reader
to flat_mutation_reader

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-13 16:08:59 +02:00
Botond Dénes
a43901f842 row_consumer: de-virtualize io_priority() and resource_tracker()
Fixes #2830

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <448a1f739ab8c88a7a5562bce8dce5ae6efdf934.1507302530.git.bdenes@scylladb.com>
2017-10-06 18:50:12 +01:00
Botond Dénes
fea6214a0a Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics. Use labels to distinguish between system, user and
streaming reads related metrics.
2017-10-03 12:44:17 +03:00
Botond Dénes
47e07b787e restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-10-03 12:44:12 +03:00
Avi Kivity
78eae8bf48 Revert "Merge "Make restricting_mutation_reader more accurate" from Botond"
This reverts commit c6e5dcc556, reversing
changes made to 19b21a0ab2. Failes to build,
plus author has more changes.
2017-10-03 11:58:59 +03:00
Botond Dénes
43dba8f173 Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics.
2017-09-20 11:16:21 +03:00
Botond Dénes
33e97e7457 restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-09-20 11:14:35 +03:00
Duarte Nunes
7fb6a74302 combined_mutation_reader: Drop exhausted readers if not in FF mode
Exhausted readers can be fast forwarded, so we have to keep them
around. However, if the current reader is not fast forwardable, then
we can drop those readers and their buffers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-14 14:37:27 +02:00
Duarte Nunes
0b53f88a42 combined_mutation_reader: Remove superfluous mutation_readers list
The _all_readers variable can do the same job.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-14 14:37:27 +02:00
Duarte Nunes
08e284a07e combined_mutation_reader: Don't drop mutation readers
This patch fixes a regression introduced in a6b9186ca.

We should keep the readers around in case a subsequent call to
fast_forward() will require them.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170811160444.12795-1-duarte@scylladb.com>
2017-08-11 19:17:29 +03:00
Duarte Nunes
e7d56884c0 list_reader_selector: Prevent infinite loop
In case the readers are empty.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170811153142.8926-1-duarte@scylladb.com>
2017-08-11 18:34:55 +03:00
Botond Dénes
3e97a5cd6b Remove range_sstable_reader
range_sstable_reader is replaced with combined_mutation_reader, using
the incremental_reader_selector.
2017-08-10 12:38:10 +03:00
Botond Dénes
a6b9186cab Add reader_selector to combined_mutation_reader
combined_mutation_reader now accepts as a constructor argument a
reader_selector instance whoose task is to create new readers on
each call to operator()() if needed and possible.
This way it is possible to control how readers are created through
different specializations of reader_selector.

The previous logic is refactored into list_reader_selector which
is using a pre-provided mutation_reader list and forwards all of them to
combined_mutation_reader at once.
2017-08-10 12:37:40 +03:00
Paweł Dziepak
402799fcc0 mutation_reader: drop move_and_clear()
Since the discovery of std::exchange(x, {}) move_and_clear has become
obsolete. Beside, the name was wrong, it did not clear the vector but
recreated it meaning that any allocated memory wasn't reused (not that
it mattered in the existing usages).

Message-Id: <20170731123549.10887-1-pdziepak@scylladb.com>
2017-07-31 15:51:19 +03:00
Paweł Dziepak
2b53a560c8 combined_mutation_reader: avoid unnecessary merge_mutations()
Merging mutations is quite an expensive operation. The creation of
streamed mutation merger involves several allocations (mostly coming
from various std::vector) and then all mutation_fragments need to go
through a heap.

All this is completely unnecessary if there is only one mutation, so
let's skip a call to merge_mutations() in such cases. This also means
that we can reuse memory allocated by _current vector if merge is not
required.
2017-07-31 12:35:40 +01:00
Paweł Dziepak
f78f2b3c92 combined_mutation_reader: do not pop mutation with different key
Originally, the loop insidecombined_mutation_reader::next() so that it
was popping mutation from the heap and when it encountered one with a
different decorated key it was pushed back and the ones accumulated so
far merged and emitted. In other words, every time the reader progressed
to the next mutation it did needless pop and push operations on the
heap.

This patch rearranges the code so that the key of the next mutation is
compared before it is popped from the heap.
2017-07-31 12:35:40 +01:00
Tomasz Grabiec
58d5e1393b mutation_reader: Introduce make_combined_mutation_source() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
1e2463a382 mutation_reader: Introduce make_empty_*_source() 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
ab72241e22 mutation_reader: Accept forwarding flag in make_reader_returning()
By default make_reader_returning creates a reader that does not
support fast forwarding but the second parameter can be used to
make it support fast forwarding.

[tgrabiec: Improve title]

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
358bf88cf8 mutation_reader: Fix abort when streaming more than one range
multi_range_mutation_reader uses fast_forward_to() to skip between
ranges, so we always need to create the underlying reader with with
mutation_reader::forwarding::yes if there is more than one range,
irrespective of whether multi_range_mutation_reader itself will be
forwarded or not.

Fixes #2510.

Introduced in commit 3018df1.

Message-Id: <1497943032-18696-1-git-send-email-tgrabiec@scylladb.com>
2017-06-20 10:29:45 +03:00
Nadav Har'El
3018df11b5 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170619152629.11703-1-nyh@scylladb.com>
2017-06-19 18:31:32 +03:00
Avi Kivity
6e2c9ef9fb Revert "Allow reading exactly desired byte ranges and fast_forward_to"
This reverts commit 317d7fc253 (and also the
related 2c57ab84b2).  It causes crashes
during range scans, reported by Gleb:

"To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s
dataset and 3 node cluster.

Backtrace:
    at /home/gleb/work/seastar/seastar/core/apply.hh:36
    rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57
    range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142
    at ./seastar/core/future.hh:890
    at /home/gleb/work/seastar/seastar/core/future-util.hh:119
    at /home/gleb/work/seastar/seastar/core/future-util.hh:142
2017-06-18 16:10:21 +03:00
Nadav Har'El
317d7fc253 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170614072122.13473-1-nyh@scylladb.com>
2017-06-15 13:22:46 +01:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Tomasz Grabiec
892d4a2165 db: Enable creating forwardable readers via mutation_source
Right now all mutation source implementations will use
make_forwardable() wrapper.
2017-02-23 18:50:44 +01:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Paweł Dziepak
52a4e79210 mutation_reader: add multi_range_reader
So far, the only way to combine outputs of multiple readers was to use
combining reader. It is very general and, in particular, supports case
when the readers emit mutations from overlapping ranges.

However, we have cases (e.g. streaming) when we need to read from
several disjoint ranges. Combining reader is a suboptimal solution as it
requires to creating a reader for each range and ignores the fact that
they do not overlap.

This patch introduces multi_range_mutation_reader which takes a
mutation_source and a sorted set of disjoint ranges. Internally, it uses
mutation_reader::fast_forward_to() to move to the next range once the
current one is completed.
2016-12-15 13:07:31 +00:00
Paweł Dziepak
bcd374c05d mutation_reader: forward fast_forward_to() calls
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
b7b7b2bd63 combined_mutation_reader: implement fast_forward_to()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
2c0cdd55fc mutation_reader: make combinded_reader public
We want to be able to fast forward sstable readers. However, just
implementing fast_forward_to() for combined_reader is not enough as the
sstables we are reading from may need to change.

Following patches are going to introduce a combined sstable reader that
derives from combined_reader. To make that possible we first need to
make combined_reader public.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Piotr Jastrzebski
3607d99269 Remove clustering_key_filtering_context.
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-08-30 20:31:55 +02:00
Piotr Jastrzebski
b05b90b3a5 Introduce clustering_key_filter_ranges.
This fixes the problem of multiple concurrent get_ranges calls.
Previously each call was invalidating the result of the previous
call. Now they don't step on each other foot.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-08-30 19:46:38 +02:00
Avi Kivity
9ac730dcc9 mutation_reader: make restricting_mutation_reader even more restricting
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.

To limit the damage, add two limits on the queue:
 - a timeout, which is equal to the read timeout
 - a queue length limit, which is equal to 2% of the shard memory divided
   by an estimate of the queued request size (1kb)

Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>
2016-06-29 15:17:35 +02:00