Once that is added, also add a method to a memtable entry to calculate
the entire size of a memtable entry. Right now we only have one method
to calculate the size minus rows.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
[tgrabiec:
- extracted from a larger commit
- fix heap comparator in apply_incomplete_target to order versions properly
- extracted partition_version detaching into
partition_entry::with_detached_versions()
- dropped unnecessary rows_iterator::_version field
- dropped unnecessary allocation of rows_entry and key copies
in rows_iterator
- dropped row_pointer
- replaced apply_reversibly() with weaker and faster apply()
- added handling of dummy entries at any position
- fixed exception safety issue in apply_to_incomplete() which may
result in data loss. We cannot move data out of applied versions
into a new synthetic row and then apply it, because if exception
happens in the middle, the data which was moved from the source
will be lost. To fix that, row_iterator::consume_row() is
introduced which allows in-place consumption of data without
construction of temporary deletable_row.
]
[tgrabiec:
- Extracted from a different patch
- Renamed concept names to more familiar Map and Reduce
- Renamed aggregate() to squashed() to match the existing nomenclature
- Uncommented the concepts
]
This will be used by partial cache in later patches.
[tgrabiec:
- changed title,
- documented meaning of the variable,
- renamed the variable,
- introduced open_version(),
- fixed continuity of the static row not being preserved in case
a new version is created]
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This will allow expressing lack of information about certain ranges of
rows (including the static row), which will be used in cache to
determine if information in cache is complete or not.
Continuity is represented internally using flags on row entries. The
key range between two consecutive entries is continuous iff
rows_entry::continuous() is true for the later entry. The range
starting after the last entry is assumed to be continuous. The range
corresponding to the key of the entry is continuous iff
rows_entry::dummy() is false.
[tgrabiec:
- based on the following commits:
4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry
773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry
- documented that partition tombstone is always complete
- require specifying the partition tombstone when creating an incomplete entry
- replaced rows_entry(dummy_tag, ...) constructor with more general
rows_entry(position_in_partition, ...)
- documented continuity semantics on mutation_partition
- fixed _static_row_cached being lost by mutation_partition copy constructors
- fixed conversion to streamed_mutation to ignore dummy entries
- fixed mutation_partition serializer to drop dummy entries
- documented semantics of continuity on mutation_partition level
- dropped assumptions that dummy entries can be only at the last position
- changed equality to ignore continuity completely, rather than
partially (it was not ignoring dummy entries, but ignoring
continuity flag)
- added printout of continuity information in mutation_partition
- fixed handling of empty entries in apply_reversibly() with regards
to continuity; we no longer can remove empty entries before
merging, since that may affect continuity of the right-hand
mutation. Added _erased flag.
- fixed mutation_partition::clustered_row() with dummy==true to not ignore the key
- fixed partition_builder to not ignore continuity
- renamed dummy_tag_t to dummy_tag. _t suffix is reserved.
- standardized all APIs on is_dummy and is_continuous bool_class:es
- replaced add_dummy_entry() with ensure_last_dummy() with safer semantics
- dropped unused remove_dummy_entry()
- simplified and inlined cache_entry::add_dummy_entry()
- fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous
]
So that streamed_mutation is created in only one of the overloads and
others delegate to that one. Later there will be common logic added to
the construction and doing this will help avoid a duplication.
mutation_fragments are going to be caching their size in memory. In
order to be able to invalidate that correctly, they need to know when
that size may change (but avoid invalidation when it is not necessary).
Snapshot destructor may free some objects managed by the LSA. That's why
partition_snapshot_reader destructor explicitly destroys the snapshot it
uses. However, it was possible that exception thrown by _read_section
prevented that from happenning making snapshot destoryed implicitly
without current allocator set to LSA.
Refs #1831.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478778570-2795-1-git-send-email-pdziepak@scylladb.com>
By default, we don't do any accounting. By specializing this class and providing
an accounter class, we can account how much memory are we reading as we read
through the elements.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This is so we can template it without worrying about declaring the
specializations in the .cc file.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This fixes the problem of multiple concurrent get_ranges calls.
Previously each call was invalidating the result of the previous
call. Now they don't step on each other foot.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Currently, partition snapshot destructor can throw which is a big no-no.
The solution is to ignore the exception and leave versions unmerged and
hope that subsequent reads will succeed at merging.
However, another problem is that the merge doesn't use allocating
sections which means that memory won't be reclaimed to satisfy its
needs. If the cache is full this may result in partition versions not
being merged for a very long time.
This patch introduces partition_snapshot::merge_partition_versions()
which contains all the version merging logic that was previously present
in the snapshot destructor. This function may throw so that it can be
used with allocating sections.
The actual merging and handling of potential erros is done from
partition_snapshot_reader destructor. It tries to merge versions under
the allocating section. Only if that fails it gives up and leaves them
unmerged.
Fixes#1578Fixes#1579.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1471265544-23579-1-git-send-email-pdziepak@scylladb.com>
To ensure isolation of operation when streaming a mutation from a
mutable source (such as cache or memtable) MVCC is used.
Each entry in memtable or cache is actually a list of used versions of
that entry. Incoming writes are either applied directly to the last
verion (if it wasn't being read by anyone) or preprended to the list
(if the former head was being read by someone). When reader finishes it
tries to squash versions together provided there is no other reader that
could prevent this.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>