Commit Graph

91 Commits

Author SHA1 Message Date
Glauber Costa
5140aaea00 add a timeout to fast forward to
In the last patch, we enabled per-request timeouts, we enable timeouts
in fill_buffer. There are many places, though, in which we
fast_forward_to before we fill_buffer, so in order to make that
effective we need to propagate the timeouts to fast_forward_to as well.

In the same way as fill_buffer, we make the argument optional wherever
possible in the high level callers, making them mandatory in the
implementations.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-12 07:43:19 -05:00
Glauber Costa
d965af42b0 add a timeout to fill_buffer
As part of the work to enable per-request timeouts, we enable timeouts
in fill_buffer.

The argument is made optional at the main classes, but mandatory in all
the ::impl versions. This way we'll make sure we didn't forget anything.

At this point we're still mostly passing that information around and
don't have any entity that will act on those timeouts. In the next patch
we will wire that up.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-11 12:07:41 -05:00
Avi Kivity
8795238869 Merge "Fix handling of range tombstones starting at same position" from Tomasz
"When we get two range tombstones with the same lower bound from
different data sources (e.g. two sstable), which need to be combined
into a single stream, they need to be de-overlapped, because each
mutation fragment in the stream must have a different position. If we
have range tombstones [1, 10) and [1, 20), the result of that
de-overlapping will be [1, 10) and [10, 20]. The problem is that if
the stream corresponds to a clustering slice with upper bound greater
than 1, but lower than 10, the second range tombstone would appear as
being out of the query range. This is currently violating assumptions
made by some consumers, like cache populator.

One effect of this may be that a reader will miss rows which are in
the range (1, 10) (after the start of the first range tombstone, and
before the start of the second range tombstone), if the second range
tombstone happens to be the last fragment which was read for a
discontinuous range in cache and we stopped reading at that point
because of a full buffer and cache was evicted before we resumed
reading, so we went to reading from the sstable reader again. There
could be more cases in which this violation may resurface.

There is also a related bug in mutation_fragment_merger. If the reader
is in forwarding mode, and the current range is [1, 5], the reader
would still emit range_tombstone([10, 20]). If that reader is later
fast forwarded to another range, say [6, 8], it may produce fragments
with smaller positions which were emitted before, violating
monotonicity of fragment positions in the stream.

A similar bug was also present in partition_snapshot_flat_reader.

Possible solutions:

 1) relax the assumption (in cache) that streams contain only relevant
 range tombstones, and only require that they contain at least all
 relevant tombstones

 2) allow subsequent range tombstones in a stream to share the same
 starting position (position is weakly monotonic), then we don't need
 to de-overlap the tombstones in readers.

 3) teach combining readers about query restrictions so that they can drop
fragments which fall outside the range

 4) force leaf readers to trim all range tombstones to query restrictions

This patch implements solution no 2. It simplifies combining readers,
which don't need to accumulate and trim range tombstones.

I don't like solution 3, because it makes combining readers more
complicated, slower, and harder to properly construct (currently
combining readers don't need to know restrictions of the leaf
streams).

Solution 4 is confined to implementations of leaf readers, but also
has disadvantage of making those more complicated and slower.

There is only one consumer which needs the tombstones with monotonic positions, and
that is the sstable writer.

Fixes #3093."

* tag 'tgrabiec/fix-out-of-range-tombstones-v1' of github.com:scylladb/seastar-dev:
  tests: row_cache: Introduce test for concurrent read, population and eviction
  tests: sstables: Add test for writing combined stream with range tombstones at same position
  tests: memtable: Test that combined mutation source is a mutation source
  tests: memtable: Test that memtable with many versions is a mutation source
  tests: mutation_source: Add test for stream invariants with overlapping tombstones
  tests: mutation_reader: Test fast forwarding of combined reader with overlapping range tombstones
  tests: mutation_reader: Test combined reader slicing on random mutations
  tests: mutation_source_test: Extract random_mutation_generator::make_partition_keys()
  mutation_fragment: Introduce range()
  clustering_interval_set: Introduce overlaps()
  clustering_interval_set: Extract private make_interval()
  mutation_reader: Allow range tombstones with same position in the fragment stream
  sstables: Handle consecutive range_tombstone fragments with same position
  tests: streamed_mutation_assertions: Merge range_tombstones with the same position in produces_range_tombstone()
  streamed_mutation: Introduce peek()
  mutation_fragment: Extract mergeable_with()
  mutation_reader: Move definition of combining mutation reader to source file
  mutation_reader: Use make_combined_reader() to create combined reader
2018-01-02 18:32:09 +02:00
Duarte Nunes
1374f898b9 Merge seastar upstream
Class optimized_optional was moved into seastar, and its usage
simplified so move_and_disengage() is replaced in favour of
std::exchange(_, { }).

* seastar adaca37...b0f5591 (9):
  > Merge "core: Introduce cancellation mechanism" from Duarte
  > Fix Seastar build that no longer builds with --enable-dpdk after the recent commit fd87ea2
  > noncopyable_function: support function objects whose move constructors throw
  > Adding new hardware options to new config format, using new config format for dpdk device
  > Fix check for Boost version during pre-build configuration.
  > variant_utils: add variant_visitor constructor for C++17 mode
  > Merge "Allows json object to be stream to an" from Amnon
  > Merge 'Default to C++17' from Avi
  > Add const version of subscript operator to circular_buffer

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20171228112126.18142-1-duarte@scylladb.com>
2017-12-28 13:24:18 +02:00
Tomasz Grabiec
2be3cbbb81 mutation_fragment: Introduce range() 2017-12-22 11:06:33 +01:00
Tomasz Grabiec
41ede08a1d mutation_reader: Allow range tombstones with same position in the fragment stream
When we get two range tombstones with the same lower bound from
different data sources (e.g. two sstable), which need to be combined
into a single stream, they need to be de-overlapped, because each
mutation fragment in the stream must have a different position. If we
have range tombstones [1, 10) and [1, 20), the result of that
de-overlapping will be [1, 10) and [10, 20]. The problem is that if
the stream corresponds to a clustering slice with upper bound greater
than 1, but lower than 10, the second range tombstone would appear as
being out of the query range. This is currently violating assumptions
made by some consumers, like cache populator.

One effect of this may be that a reader will miss rows which are in
the range (1, 10) (after the start of the first range tombstone, and
before the start of the second range tombstone), if the second range
tombstone happens to be the last fragment which was read for a
discontinuous range in cache and we stopped reading at that point
because of a full buffer and cache was evicted before we resumed
reading, so we went to reading from the sstable reader again. There
could be more cases in which this violation may resurface.

There is also a related bug in mutation_fragment_merger. If the reader
is in forwarding mode, and the current range is [1, 5], the reader
would still emit range_tombstone([10, 20]). If that reader is later
fast forwarded to another range, say [6, 8], it may produce fragments
with smaller positions which were emitted before, violating
monotonicity of fragment positions in the stream.

A similar bug was also present in partition_snapshot_flat_reader.

Possible solutions:

 1) relax the assumption (in cache) that streams contain only relevant
 range tombstones, and only require that they contain at least all
 relevant tombstones

 2) allow subsequent range tombstones in a stream to share the same
 starting position (position is weakly monotonic), then we don't need
 to de-overlap the tombstones in readers.

 3) teach combining readers about query restrictions so that they can drop
fragments which fall outside the range

 4) force leaf readers to trim all range tombstones to query restrictions

This patch implements solution no 2. It simplifies combining readers,
which don't need to accumulate and trim range tombstones.

I don't like solution 3, because it makes combining readers more
complicated, slower, and harder to properly construct (currently
combining readers don't need to know restrictions of the leaf
streams).

Solution 4 is confined to implementations of leaf readers, but also
has disadvantage of making those more complicated and slower.

Fixes #3093.
2017-12-22 11:06:20 +01:00
Tomasz Grabiec
815cd254e2 streamed_mutation: Introduce peek()
Will be used in assertions to merge consecutive range tombstones.
2017-12-21 22:45:35 +01:00
Tomasz Grabiec
c5f82aa5bd mutation_fragment: Extract mergeable_with() 2017-12-21 21:24:11 +01:00
Botond Dénes
e47791810b Add non-const overload of partition_start::partition_tombstone()
And make the const version return a const reference so that code
mutating the returned value won't compile if the partition_start object
is const.
2017-12-04 07:57:43 +02:00
Paweł Dziepak
9b39d3b023 streamed_mutation: drop streamed_mutation_returning() 2017-11-23 18:14:31 +00:00
Paweł Dziepak
8baf682216 streamed_mutation: drop reverse_streamed_mutation() 2017-11-21 11:37:04 +00:00
Avi Kivity
09e730f9f2 Merge "Fix bugs in cache related to handling of bad_alloc" from Tomasz
"Fixes #2944."

* tag 'tgrabiec/cache-exception-safety-fixes-v2' of github.com:scylladb/seastar-dev:
  tests: row_cache: Add test for exception safety of multi-partition scans
  tests: row_cache: Add test for exception safety of single-partition reads
  tests: mutation_source_tests: Always print the seed
  tests: Disable alloc failure injection in test assertions
  tests: Avoid needless copies
  row_cache: Fix exception safety of cache_entry::read()
  row_cache: scanning_and_populating_reader: Fix exception unsafety causing read to skip data
  row_cache: partition_range_cursor: Extract valid() and advance_to() from refresh()
  cache_streamed_mutation: Add trace-level logging to cache_streamed_mutation
  mvcc: Lift noexcept off partition_snapshot_row_weakref assignment/constructors
  cache_streamed_mutation: Make advancing to the next range exception-safe
  cache_streamed_mutation: Make add_clustering_row_to_buffer() exception-safe
  cache_streamed_mutation: Make drain_tombstones() exception-safe
  cache_streamed_mutation: Return void from start_reading_from_underlying()
  cache_streamed_mutation: Document invariants related to exception-safety
  streamed_mutation: Add reserve_one()
  lsa: Guarantee invalidated references on allocating section retry
  mvcc: partition_snapshot_row_cursor: Mark allocation points
2017-11-14 11:42:13 +02:00
Tomasz Grabiec
53f4452b47 streamed_mutation: Add reserve_one() 2017-11-13 20:55:13 +01:00
Paweł Dziepak
7866e5b4a9 streamed_mutation: drop mutation_hasher 2017-11-13 16:49:52 +00:00
Paweł Dziepak
af4fa6152b partition_start: make partition_tombstone() const 2017-11-13 16:49:52 +00:00
Piotr Jastrzebski
1c9e4ba04c Add flat_mutation_reader_assertions
This will be usefull in tests.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-11-08 12:58:31 +01:00
Piotr Jastrzebski
f325fef362 Extract FlattenedConsumer concept using GCC6_CONCEPT
This concept will be used in flat_mutation_reader::consume

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-10 16:15:59 +02:00
Piotr Jastrzebski
46727f12e0 Introduce partition_end mutation_fragment
This type of mutation_fragment will be used in new mutation_reader
to signal the end of the current partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-10 16:15:59 +02:00
Piotr Jastrzebski
2516b42752 Introduce partition_start mutation_fragment
This type of mutation_fragment will be used in new mutation_reader
to signal the beginning of the next partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-10 16:15:59 +02:00
Piotr Jastrzebski
1f4fb6dd4a Introduce FragmentConsumer
This concept helps define StreamedMutationConsumer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-10-10 16:05:44 +02:00
Piotr Jastrzebski
e1f7d1f25d streamed_mutation: Extract concepts using GCC6_CONCEPT macro
It makes it easier to actually use those concepts.

Lambdas passed to mutation_fragment::visit have to declare
return type otherwise compiler fails with:

internal compiler error: Segmentation fault

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-09-20 11:34:03 +02:00
Tomasz Grabiec
cb16b038ef streamed_mutation: Allow setting buffer capacity
Needed in tests to limit amount of prefetching done by readers, so
that it's easier to test interleaving of various events.
2017-09-13 17:47:03 +02:00
Piotr Jastrzebski
477068d2c3 Make streamed_mutation more exception safe
Make sure that push_mutation_fragment leaves
_buffer_size with a correct value if exception
is thrown from emplace_back.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <83398412aa78332d88d91336b79140aecc988602.1503474403.git.piotr@scylladb.com>
2017-08-23 09:37:04 +01:00
Paweł Dziepak
f02bef7917 streamed_mutation: do not call fill_buffer() ahead of time
consume_mutation_fragments_until() allows consuming mutation fragments
until a specified condition happens. This patch reorganises its
implementation so that we avoid situations when fill_buffer() is called
with stop condition being true.
Message-Id: <20170727122218.7703-1-pdziepak@scylladb.com>
2017-07-27 17:47:57 +02:00
Paweł Dziepak
c9ccd813ab mutation_fragment: make destructor always_inline
mutation_fragment destructor was already made inline-friendly by moving
most of the logic to a separate function. However, the compiler still is
quite reluctant to inline it in certain cases, so let's give it a
stronger hint.
2017-07-26 14:38:27 +01:00
Paweł Dziepak
2066354de3 streamed_mutation: introduce consume_mutation_fragments_until()
consume_mutation_fragments_until() is a consumer based interface that
avoids indirect calls and continuation overhead present in the naive
streamed_mutation::operator() approach.
2017-07-26 14:37:20 +01:00
Tomasz Grabiec
1f23130b07 mutation_fragment: Implement equality check 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
04aebaa2cb streamed_mutation: Introduce transform() 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
a17fa5726f Introduce streamed_mutation_from_forwarding_streamed_mutation
This will allow conversion from streamed_mutation that
supports fast forwarding to streamed_mutation that does not.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
11191b7aef streamed_mutation: Introduce make_empty_streamed_mutation() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
1594ace4d3 range_tombstone_stream: Make printable 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
5a29c70f3e mutation_fragment: make mutation_fragment copyable
This will be needed by implementation of cache_streamed_mutation

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
94c957c2ff Extract position_in_partition to separate header
This will allow it's usage in mutation_partition.hh

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
0fd4dedc6a position_in_partition: Add after_all_clustered_rows() to view
This is a position that's always in the end after any
other position. It will be used for dummy rows_entry.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-06-24 18:06:11 +02:00
Nadav Har'El
984da1d8d7 Make forwarding_tag local to streamed_mutation
As Avi noticed, the "forwarding_tag" which was meant to be local in
streamed_mutation, became global. If another class copied the same trick,
it would share the same type instead of being distinct types as intended.

The problem is that in:

	using forwarding = bool_class<class forwarding_tag>;

Apparently, the "class forwarding_tag" forward-declares a global type - it
does not create a local-scope type as intended, which the following apparently
does (even though no actual definition is given for that class):

	class forwarding_tag;
	using forwarding = bool_class<forwarding_tag>;

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170619153933.13116-1-nyh@scylladb.com>
2017-06-19 20:04:47 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Tomasz Grabiec
e56711a54d sstables: mutation_reader: Avoid reading index when restrictions cover whole partition
The check for is_static_row() used to be enough, but it no longer is
after optimization made in commit 3e06065, which avoids reading the
static row.
Message-Id: <1494241164-25810-1-git-send-email-tgrabiec@scylladb.com>
2017-05-09 11:03:18 +01:00
Duarte Nunes
4e693383f7 mutation_partion: Use row_tombstone
This patch replaces the current row tombstone representation by a
row_tombstone.

The intent of the patch is thus to reify the idea of shadowable
tombstones, that up until now we considered all materialized view row
tombstones to be.

We need to distinguish shadowable from non-shadowable row tombstones
to support scenarios such as, when inserting to a table with a
materialzied view:

1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1
2. delete from base using timestamp 2 where p = 3
3. insert into base (p, v1) values (3, 1) using timestamp 3

These should yield a view row where v2 is definitely null, but with
the current implementation, v2 will pop back with its value v2=3@TS=1,
even though its dead in the base row. This is because the row
tombstone inserted at 2) is a shadowable one.

This patch only addresses the memory representation of such
row_tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-04-25 11:46:33 +02:00
Tomasz Grabiec
c85fe3183c position_range: Allow stealing of bounds 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
503c68de44 position_in_partition: Add more factory methods 2017-04-20 10:54:36 +02:00
Tomasz Grabiec
5b813898bc position_range: Introduce all_clustered_rows() factory method 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
622713be60 position_in_partition: Introduce for_key()/after_key() factory methods 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
e27fa712f5 position_in_partition: Add factory methods for positions around all rows 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
b90275f8e3 position_in_partition: Introduce for_range_start()/for_range_end() 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
5c29f4dd04 position_in_partition: Fix friendship declaration 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
2ff6c1705b position_in_partition: Make comparable with composites 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
dc7b93a326 streamed_mutation: Add streamed_mutation_returning() helper 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
a32cf6c4cc position_in_partition: Add is_static_row() in the view 2017-03-10 14:42:21 +01:00
Tomasz Grabiec
e4db643730 range_tombstone_stream: Add reset() 2017-03-10 14:42:21 +01:00
Tomasz Grabiec
48ad2e2d64 range_tombstone_stream: Add get_next(position_in_partition_view) 2017-03-10 14:42:21 +01:00