This type of mutation_fragment will be used in new mutation_reader
to signal the end of the current partition.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This type of mutation_fragment will be used in new mutation_reader
to signal the beginning of the next partition.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
It makes it easier to actually use those concepts.
Lambdas passed to mutation_fragment::visit have to declare
return type otherwise compiler fails with:
internal compiler error: Segmentation fault
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
If we read a partition from a single sstable (a fairly common case),
we can bypass mutation_merger and just return the input.
Message-Id: <20170918181418.14021-1-avi@scylladb.com>
It is possible that a call to fill_buffer() will return an immediately
ready future. This patch avoids uncontrolled recursion in case when all
merged streamed mutation do not defer ini fill_buffer() and also
optimises for non-deferring case by avoiding some of the logic.
This will allow conversion from streamed_mutation that
supports fast forwarding to streamed_mutation that does not.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This will allow expressing lack of information about certain ranges of
rows (including the static row), which will be used in cache to
determine if information in cache is complete or not.
Continuity is represented internally using flags on row entries. The
key range between two consecutive entries is continuous iff
rows_entry::continuous() is true for the later entry. The range
starting after the last entry is assumed to be continuous. The range
corresponding to the key of the entry is continuous iff
rows_entry::dummy() is false.
[tgrabiec:
- based on the following commits:
4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry
773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry
- documented that partition tombstone is always complete
- require specifying the partition tombstone when creating an incomplete entry
- replaced rows_entry(dummy_tag, ...) constructor with more general
rows_entry(position_in_partition, ...)
- documented continuity semantics on mutation_partition
- fixed _static_row_cached being lost by mutation_partition copy constructors
- fixed conversion to streamed_mutation to ignore dummy entries
- fixed mutation_partition serializer to drop dummy entries
- documented semantics of continuity on mutation_partition level
- dropped assumptions that dummy entries can be only at the last position
- changed equality to ignore continuity completely, rather than
partially (it was not ignoring dummy entries, but ignoring
continuity flag)
- added printout of continuity information in mutation_partition
- fixed handling of empty entries in apply_reversibly() with regards
to continuity; we no longer can remove empty entries before
merging, since that may affect continuity of the right-hand
mutation. Added _erased flag.
- fixed mutation_partition::clustered_row() with dummy==true to not ignore the key
- fixed partition_builder to not ignore continuity
- renamed dummy_tag_t to dummy_tag. _t suffix is reserved.
- standardized all APIs on is_dummy and is_continuous bool_class:es
- replaced add_dummy_entry() with ensure_last_dummy() with safer semantics
- dropped unused remove_dummy_entry()
- simplified and inlined cache_entry::add_dummy_entry()
- fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous
]
key() will not be valid for dummy entries, but position() is always
valid.
[tgrabiec: Extracted from other commits]
[tgrabiec: Added missing change to range_tombstone_stream::get_next]
Clang warns that passing a non-POD to a C-style variadic function will
result in an abort(). That happens to be exactly what we want, but to
silence the warning, use a template instead. Since templates aren't
allowed in local classes, move the containing class to namespace scope.
mutation_fragments are going to be caching their size in memory. In
order to be able to invalidate that correctly, they need to know when
that size may change (but avoid invalidation when it is not necessary).
This patch ensures the mutation_merger emits any deferred tombstones
that it still may be holding before closing the stream.
Together with the range_tombstone_list: Properly implement
difference() patch set, this fixes breakage of streamed_mutation_test
and row_cache_test.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170123195643.9876-1-duarte@scylladb.com>
After we call unlink_leftmost_without_rebalance(), we must unlink all
elements before mutatation is destroyed. We did this properly from
~reader, but it would not be called if reader construction failed,
which it may.
Message-Id: <1484572581-6537-1-git-send-email-tgrabiec@scylladb.com>
Once unlink_leftmost_without_rebalance() has been called on a bi::set no
other method can be used. This includes clear_and_disposed() used by the
mutation_partition destructor.
We like unlink_leftmost_without_rebalance() because it is efficient, so
the solution is to manually finish destroying clustering row and range
tombstone sets in the reader destructor using that function.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
mutation_fragment() constructor allocates memory. If it fails the
already unlinked parts of mutation (either rows_entry or range_tombtone)
will be leaked.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch avoids moving entries from range tombstones and clustering
rows sets in streamed_mutation_from_mutation(). Such action breaks these
sets as the entries will be left in some unknown state.
Instead, the sets are being broken in a supported and predictable way
using unlink_leftmost_without_rebalance().
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468843205-18852-1-git-send-email-pdziepak@scylladb.com>
Originally, streamed_mutations guaranteed that emitted tombstones are
disjoint. In order to achieve that two separate objects were produced
for each range tombstone: range_tombstone_begin and range_tombstone_end.
Unfortunately, this forced sstable writer to accumulate all clustering
rows between range_tombstone_begin and range_tombstone_end.
However, since there is no need to write disjoint tombstones to sstables
(see #1153 "Write range tombstones to sstables like Cassandra does") it
is also not necessary for streamed_mutations to produce disjoint range
tombstones.
This patch changes that by making streamed_mutation produce
range_tombstone objects directly.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
reverse_streamed_mutation() is an inefficient way of reversing
streamed_mutations. First, it collects all mutation_fragments and then
it emits them in the reversed orders (except static row which always is
the first element and it also flips the bounds of range tombstones).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
range_tombstone_stream encapsulates logic responsible for turning
range_tombstone_list into a stream of mutation_fragments and merging
that stream with a stream of clustering rows.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
streamed_mutation represents a mutation in a form of a stream of
mutation_fragments. streamed_mutation emits mutation fragments in the
order they should appear in the sstables, i.e. static row is always
the first one, then clustering rows and range tombstones are emitted
according to the lexicographical ordering of their clustering keys and
bounds of the range tombstones.
Range tombstones are disjoint, i.e. after emitting
range_tombstone_begin it is guaranteed that there is going to be a
single range_tombstone_end before another range_tombstone_begin is
emitted.
The ordering of mutation_fragments also guarantees that by the time
the consumer sees a clustering row it has already received all
relevant tombstones.
Partition key and partition tombstone are not streamed and is part of
the streamed_mutation itself.
streamed_mutation uses batching. The mutation implementations are
supposed to fill a buffer with mutation fragments until is_buffer_full()
or the end of stream is encountered.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This commit introduces mutation_fragment class which represents the parts
of mutation streamed by streamed_mutation.
mutation_fragment can be:
- a static row (only one in the mutation)
- a clustering row
- start of range tombstone
- end of range rombstone
There is an ordering (implemented in position_in_partition class) between
mutation_fragment objects. It reflects the order in which content of
partition appears in the sstables.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>