Commit Graph

86 Commits

Author SHA1 Message Date
Piotr Jastrzebski
9233ee7309 Move FlattenedConsumer concept to flat_mutation_reader.hh
This concept will be used both in flat_mutation_reader.hh
and mutation_reader.hh. mutation_reader.hh includes
flat_mutation_reader.hh so we have to move the concept to
make it accessible in both files.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-08 14:14:51 +01:00
Piotr Jastrzebski
6efda10790 Add mutation_source::make_flat_mutation_reader
This will be used as an intermediate state of migration
from mutation_reader to flat_mutation_reader.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-08 12:58:31 +01:00
Piotr Jastrzebski
93e8b43e7b Add flat reader mutation source implementation
This will be used by sources that are migrated to
flat_mutation_reader.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-08 12:41:12 +01:00
Piotr Jastrzebski
1a7936561e Prepare mutation_source for more than one implementation
There will be a second implementation that will be used by
sources that are converted to flat_mutation_reader.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-08 12:41:12 +01:00
Duarte Nunes
baeec0935f Replace query::full_slice with schema::full_slice()
query::full_slice doesn't select any regular or static columns, which
is at odds with the expectations of its users. This patch replaces it
with the schema::full_slice() version.

Refs #2885

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>
2017-10-17 11:25:53 +02:00
Paweł Dziepak
8c3b7fea81 Merge "Introduce new API and converters from/to old mutation_reader" from Piotr
"This changeset is the first step to flatten mutation_reader.
Then it introduces new mutation_fragment types for partition header and end of partition.
Using those a new flat_mutation_reader is defined.
Finally it introduces converters between new flat_mutation_reader and
old mutation_reader."

* 'haaawk/flattened_mutation_reader_v12' of github.com:scylladb/seastar-dev:
  Add tests for flat_mutation_reader
  Introduce conversion from flat_mutation_reader to mutation_reader
  Introduce conversion from mutation_reader to flat_mutation_reader
  Introduce flat_mutation_reader
  Extract FlattenedConsumer concept using GCC6_CONCEPT
  Introduce partition_end mutation_fragment
  Introduce a position for end of partition
  Introduce partition_start mutation_fragment
  Introduce FragmentConsumer
  Introduce a position for partition start
  streamed_mutation: Extract concepts using GCC6_CONCEPT macro
2017-10-16 12:14:23 +01:00
Piotr Jastrzebski
31733a7eeb Introduce conversion from flat_mutation_reader to mutation_reader
This will be used in transition from mutation_reader
to flat_mutation_reader

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-13 16:08:59 +02:00
Piotr Jastrzebski
f325fef362 Extract FlattenedConsumer concept using GCC6_CONCEPT
This concept will be used in flat_mutation_reader::consume

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-10 16:15:59 +02:00
Piotr Jastrzebski
2516b42752 Introduce partition_start mutation_fragment
This type of mutation_fragment will be used in new mutation_reader
to signal the beginning of the next partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-10 16:15:59 +02:00
Botond Dénes
a43901f842 row_consumer: de-virtualize io_priority() and resource_tracker()
Fixes #2830

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <448a1f739ab8c88a7a5562bce8dce5ae6efdf934.1507302530.git.bdenes@scylladb.com>
2017-10-06 18:50:12 +01:00
Botond Dénes
fea6214a0a Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics. Use labels to distinguish between system, user and
streaming reads related metrics.
2017-10-03 12:44:17 +03:00
Botond Dénes
47e07b787e restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-10-03 12:44:12 +03:00
Botond Dénes
0a07e9e7c7 mutation_reader.hh: Move restricted_reader related code
In preparation of make_restricted_reader taking a mutation_source as
its argument.
2017-10-03 12:39:22 +03:00
Avi Kivity
78eae8bf48 Revert "Merge "Make restricting_mutation_reader more accurate" from Botond"
This reverts commit c6e5dcc556, reversing
changes made to 19b21a0ab2. Failes to build,
plus author has more changes.
2017-10-03 11:58:59 +03:00
Botond Dénes
43dba8f173 Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics.
2017-09-20 11:16:21 +03:00
Botond Dénes
33e97e7457 restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-09-20 11:14:35 +03:00
Botond Dénes
e4a9e55e0d mutation_reader.hh: Move restricted_reader related code
In preparation of make_restricted_reader taking a mutation_source as
its argument.
2017-09-20 11:12:57 +03:00
Tomasz Grabiec
8a9f0f86e7 mutation_source: Introduce mutation_source::make_partition_presence_checker()
Every mutation source can have a presence checker. By default all
answer "maybe contains".

Having this on mutation_source level will be useful for simplifying
cache update flow. The cache can ask the right snapshot for a presence
checker rather than relying on database to know when and how to make
the right one which preserves all invariants.

This will be especially useful once all updates of the underlying
mutation source of cache (e.g. sstable list) will have to go through
cache for safety reasons.
2017-09-04 10:04:29 +02:00
Tomasz Grabiec
065feb1b7b mutation_reader: Move definitions up in the header 2017-09-04 10:04:29 +02:00
Tomasz Grabiec
4e4839082b mutation_reader: Use constructor delegation to reduce code duplication 2017-09-04 10:04:29 +02:00
Duarte Nunes
7fb6a74302 combined_mutation_reader: Drop exhausted readers if not in FF mode
Exhausted readers can be fast forwarded, so we have to keep them
around. However, if the current reader is not fast forwardable, then
we can drop those readers and their buffers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-14 14:37:27 +02:00
Duarte Nunes
0b53f88a42 combined_mutation_reader: Remove superfluous mutation_readers list
The _all_readers variable can do the same job.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-14 14:37:27 +02:00
Botond Dénes
3e97a5cd6b Remove range_sstable_reader
range_sstable_reader is replaced with combined_mutation_reader, using
the incremental_reader_selector.
2017-08-10 12:38:10 +03:00
Botond Dénes
a6b9186cab Add reader_selector to combined_mutation_reader
combined_mutation_reader now accepts as a constructor argument a
reader_selector instance whoose task is to create new readers on
each call to operator()() if needed and possible.
This way it is possible to control how readers are created through
different specializations of reader_selector.

The previous logic is refactored into list_reader_selector which
is using a pre-provided mutation_reader list and forwards all of them to
combined_mutation_reader at once.
2017-08-10 12:37:40 +03:00
Tomasz Grabiec
ddfcf64966 mutation_source: Make copying cheaper
Cache readers will need to take snapshots by copying the
mutation_source. That's going to happen quite often, so make copying
cheaper.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
58d5e1393b mutation_reader: Introduce make_combined_mutation_source() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
1e2463a382 mutation_reader: Introduce make_empty_*_source() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
289d01c2cc mutation_reader: Introduce concept of snapshot_source 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
9380dd1ee3 mutation_source: make sure we never ignore fast forwarding
mutation source sometimes ignore fast forwarding parameter so
this change adds assertion to check that this parameter
can be safely ignored.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
ab72241e22 mutation_reader: Accept forwarding flag in make_reader_returning()
By default make_reader_returning creates a reader that does not
support fast forwarding but the second parameter can be used to
make it support fast forwarding.

[tgrabiec: Improve title]

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Nadav Har'El
3018df11b5 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170619152629.11703-1-nyh@scylladb.com>
2017-06-19 18:31:32 +03:00
Avi Kivity
6e2c9ef9fb Revert "Allow reading exactly desired byte ranges and fast_forward_to"
This reverts commit 317d7fc253 (and also the
related 2c57ab84b2).  It causes crashes
during range scans, reported by Gleb:

"To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s
dataset and 3 node cluster.

Backtrace:
    at /home/gleb/work/seastar/seastar/core/apply.hh:36
    rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57
    range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142
    at ./seastar/core/future.hh:890
    at /home/gleb/work/seastar/seastar/core/future-util.hh:119
    at /home/gleb/work/seastar/seastar/core/future-util.hh:142
2017-06-18 16:10:21 +03:00
Avi Kivity
2c57ab84b2 mutation_reader: fix typo in forwarding_tag
The typo went unnoticed since the compiler picked up the global scope's
forwarding_tag.  The bug made streamed_mutation::forwarding and
mutation_reader::forwarding the same type, but fortunately there were
no type mixups due to this.
2017-06-15 20:13:01 +03:00
Nadav Har'El
317d7fc253 Allow reading exactly desired byte ranges and fast_forward_to
In commit c63e88d556, support was added for
fast_forward_to() in data_consume_rows(). Because an input stream's end
cannot be changed after creation, that patch ignores the specified end
byte, and uses the end of file as the end position of the stream.

As result of this, even when we want to read a specific byte range (e.g.,
in the repair code to checksum the partitions in a given range), the code
reads an entire 128K buffer around the end byte, or significantly more, with
read-ahead enabled. This causes repair to do more than 10 times the amount
of I/O it really has to do in the checksumming phase (which in the current
implementation, reads small ranges of partitions at a time).

This patch has two levels:

1. In the lower level, sstable::data_consume_rows(), which reads all
   partitions in a given disk byte range, now gets another byte position,
   "last_end". That can be the range's end, the end of the file, or anything
   in between the two. It opens the disk stream until last_end, which means
   1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is
   not allowed beyond last_end.

2. In the upper level, we add to the various layers of sstable readers,
   mutation readers, etc., a boolean flag mutation_reader::forwarding, which
   says whether fast_forward_to() is allowed on the stream of mutations to
   move the stream to a different partition range.

   Note that this flag is separate from the existing boolean flag
   streamed_mutation::fowarding - that one talks about skipping inside a
   single partition, while the flag we are adding is about switching the
   partition range being read. Most of the functions that previously
   accepted streamed_mutation::forwarding now accept *also* the option
   mutation_reader::forwarding. The exception are functions which are known
   to read only a single partition, and not support fast_forward_to() a
   different partition range.

   We note that if mutation_reader::forwarding::no is requested, and
   fast_forward_to() is forbidden, there is no point in reading anything
   beyond the range's end, so data_consume_rows() is called with last_end as
   the range's end. But if forwarding::yes is requested, we use the end of the
   file as last_end, exactly like the code before this patch did.

Importantly, we note that the repair's partition reading code,
column_family::make_streaming_reader, uses mutation_reader::forwarding::no,
while the other existing reading code will use the default forwarding::yes.

In the future, we can further optimize the amount of bytes read from disk
by replacing forwarding::yes by an actual last partition that may ever be
read, and use its byte position as the last_end passed to data_consume_rows.
But we don't do this yet, and it's not a regression from the existing code,
which also opened the file input stream until the end of the file, and not
until the end of the range query. Moreover, such an improvement will not
improve of anything if the overall range is always very large, in which
case not over-reading at its end will not improve performance.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170614072122.13473-1-nyh@scylladb.com>
2017-06-15 13:22:46 +01:00
Paweł Dziepak
12135dbe21 mutation_reader: make mutation_source nothrow movable 2017-03-09 09:27:43 +00:00
Tomasz Grabiec
892d4a2165 db: Enable creating forwardable readers via mutation_source
Right now all mutation source implementations will use
make_forwardable() wrapper.
2017-02-23 18:50:44 +01:00
Tomasz Grabiec
b1d1091906 mutation_source: Document liveness requirements 2017-02-23 18:23:52 +01:00
Tomasz Grabiec
15db80188b mutation_source: Cleanup
- combines telescopic overloads into one method with default paramters.
 - Introduce func_type for a full handler to avoid some duplication.
2017-02-23 18:23:52 +01:00
Tomasz Grabiec
586dbaa8d3 db: Replace virtual_reader_type with mutation_source_opt
Virtual reader is a mutation_source.
2017-02-23 18:23:52 +01:00
Tomasz Grabiec
78844fa2e5 db: Use incremental selector in partition_presence_checker
This reduces the number of sstables we need to check to only those
whose token range overlaps with the key. Reduces cache update
time. Especially effective with leveled compaction strategy.

Refs #1943.

Incremental selector works with an immutable sstable set, so cache
updates need to be serialized. Otherwise we could mispopulate due to
stale presence information.

Presence checker interface was changed to accept decorated key in
order to gain easy access to the token, which is required by
the incremental selector.
2016-12-19 14:20:58 +01:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Paweł Dziepak
52a4e79210 mutation_reader: add multi_range_reader
So far, the only way to combine outputs of multiple readers was to use
combining reader. It is very general and, in particular, supports case
when the readers emit mutations from overlapping ranges.

However, we have cases (e.g. streaming) when we need to read from
several disjoint ranges. Combining reader is a suboptimal solution as it
requires to creating a reader for each range and ignores the fact that
they do not overlap.

This patch introduces multi_range_mutation_reader which takes a
mutation_source and a sorted set of disjoint ranges. Internally, it uses
mutation_reader::fast_forward_to() to move to the next range once the
current one is completed.
2016-12-15 13:07:31 +00:00
Paweł Dziepak
bcd374c05d mutation_reader: forward fast_forward_to() calls
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
b7b7b2bd63 combined_mutation_reader: implement fast_forward_to()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
2c0cdd55fc mutation_reader: make combinded_reader public
We want to be able to fast forward sstable readers. However, just
implementing fast_forward_to() for combined_reader is not enough as the
sstables we are reading from may need to change.

Following patches are going to introduce a combined sstable reader that
derives from combined_reader. To make that possible we first need to
make combined_reader public.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Paweł Dziepak
62c9492d33 mutation_reader: introduce fast_forward_to()
This patch introduces the interface for fast forwarding mutation
readers. The main user of this feature is going to be cache which, while
serving range query, may need to read multiple small ranges from the
sstables to populate itself with the missing entries.

Fast forwarding is an alternative to recreating a reader with different
range. Its main advantage is fact that it avoids dropping data that has
already been read.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Duarte Nunes
5fd66f00c2 mutation_reader: Accept trace_state_ptr
This patch changes the mutation_reader so it optionally accepts a
trace_state_ptr. This will allow us to trace, for example, which
sstables are accessed during a request.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:00:31 +02:00
Piotr Jastrzebski
3607d99269 Remove clustering_key_filtering_context.
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-08-30 20:31:55 +02:00