Changing _ck_ranges_curr and _lower_bound should be atomic, either
both fail or both succeed. Currently it could happen that if
position_in_partition::for_range_start() fails, _ck_ranges_curr would
be advanced but _lower_bound not.
We need to maintain the following invariants:
(1) no fragment with position >= _lower_bound was pushed yet
(2) If _lower_bound > mf.position(), mf was emitted
Before this patch (1) could be violated if drain_tombstones() failed
in the middle. (2) could be violated if push_mutation_fragment()
failed.
range_tombstone_list::apply() has no exception safety guarantees about
the logical state. The target mutation_partition in cache should be
assumed to be left in unspecified state. In particular, some of the
preexisting overlapping tombstones may be removed and not reinserted,
so the cache would be missing some of the range tombstone information
in case the whole allocating section fails.
Use apply_monotonically() which provides the needed guarantees.
Fixes#2938.
Fixes the case of continuity not being populated when the row which is
the upper bound of the population range belongs to a non-latest
version. In such case we wouldn't mark the range as continuous,
because we can't modify rows of non-latest versions. To fix this,
create an empty entry in latest version which will just override the
continuity flag of the old entry.
Before this patch only ranges between returned row fragments were
marked as continuous. In the extreme case, there could be no such
fragments, in which case next read would miss as well. To avoid this,
mark whole query range as continuous by inserting dummy entries when
necessary.
Refs #2579.
Current code will not mark the range as continuous if the previous
entry does not come in the latest version. Fix that by switching
to partition_snapshot_row_pointer, which is capable of checking
in older versions as necessary.
Also, we avoid the key comparison if we know that the iterator
is still valid.
We were checking if the cursor is up_to_date(), but this is not enough
to guarantee that the cursor is valid, merely that its iterators are
valid. The cursor may be invalidated even if its iterators are valid
if there was an insertion after cursor's position.
Fixes#2834.
On bad_alloc the section is retried. If the exception happened inside
fast_forward_to() on the underlying reader, that call will be
retried. However, the reader should not be used after exception is
thrown, since it is in unspecified state. Also, calling
fast_forward_to() with cache region locked increases the chances of it
failing to allocate.
We shouldn't call fast_forward_to() with the cache region locked.
Fixes#2791.
We're in one state at a time, so it's better to express it as a single
variable rather than N independent flags.
In preparation before adding more states.
Equality comparator may be much cheaper than the fully fledged
trichotomic comparator, especially if the component types are byte order
equal but not byte order comparable.
clustering_row can stores quite a lot of data internally which makes its
move constructor not exactly cheap.
If possible it is better to move mutation_fragment around as it keeps
everything externally. This also avoids some cases when clustering row
would be extracted from mutation_fragment only to be made to create
another mutation_fragment later.
cache_streamed_mutation assumed that at least one clustering range was
specified. That was wrong since the readers are allowed to query just
for a static row (e.g. counter update that modifies only static
columns).
Fixes#2604.
Message-Id: <20170725131220.17467-1-pdziepak@scylladb.com>
This streamed mutation populates cache with
the rows requested by the read. It takes whatever
it can find in the cache and fetches the remainings
from underlying source.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
[tgrabiec:
- fixed maybe_add_to_cache_and_update_continuity() leaking entries if
the key already exists in the snapshot
- fixed a problem where population race could result in a read
missing some rows, because cache_streamed_mutation was advancing
the cursor, then deferring, and then checking continuity. We
should check continuity atomically with advancing.
- fixed rows_handle.maybe_refresh() being accessed outside of update
section in read_from_underlying() (undefined behavior)
- fixed a problem in start_reading_from_underlying() where we would
use incorrect start if lower_bound ended with a range tombstone
starting before a key.
- range tombstone trimming in add_to_buffer() could create a
tombstone which has too low start bound if last_rt.end was a
prefix and had inclusive end. invert_kind(end_kind) should be used
instead of unconditional inc_start.
- range tombstone trimming incorrectly assumed it is fine to trim
the tombstone from underlying to the previous fragment's end and
emit such tombstone. That would mean the stream can't emit any
fragments which start before previous tombstone's end. Solve with
range_tombstone_stream.
- split add_to_buffer() into overloads for clustering_row, and
range_tombstone. Better than wrapping into mutation_fragment
before the call and having add_to_buffer() rediscover the
information.
- changed maybe_add_to_cache_and_update_continuity() to not set
continuity to false for existing entries, it's not necessary
- moved range tombstone trimming to range_tombstone class
- moved range tombstone slicing code to range_tombstone_list and partition_snapshot
- can_populate::can_use_cache was unused, dropped
- dropped assumption that dummy entries are only at the end
- renamed maybe_add_to_cache_and_update_continuity() to maybe_add_to_cache()
- dropped no longer needed lower_bound class
- extracted row_handle to a seaparate patch
- made the copy-from-cache loop preemptable
- split maybe_add_next_to_buffer_and_update_continuity(bool)
- dropped cache_populator
- replaced "underlying" class with use of read_context
- replaced can_populate class with a function
- simplified lsa_manager methods to avoid moves
]