"Cache entries for wide partitions are usually smaller than other
entries and the cost of recreating them is higher so it makes sense
to keep them longer than ordinary entries."
This will allow us to see how big is an amount
of evictions of cached info about wide partitions.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Since continuity flag introduction row cache contains a single dummy
entry. cache_tracker knows nothing about it so that it doesn't appear in
any of the metrics. However, cache destructor calls
cache_tracker::on_erase() for every entry in the cache including the
dummy one. This is incorrect since the tracker wasn't informed when the
dummy entry was created.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1478608776-10363-1-git-send-email-pdziepak@scylladb.com>
Wrapping ranges are a pain, so we are moving wrap handling to the edges.
Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.
Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
Conditions in row_cache::do_find_or_create_entry() make it possible that
std::prev(it) is going to be dereferenced even if it is a begin
iterator.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
If the continuity flag was set for the first element _first_element flag
would not be cleared. This shouldn't cause any correctness problems but
properly setting the flag allows to avoid some unnecessary key
comparisons.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
In original implementation the continuity flag indicated that cache has
full information about the range the between current partition and the
one following it, hence when evicting an entry the one preceeding it
had to have its continuity flag cleared.
This was changed, however, and now the continuiy flag tells whether the
cache is continuous between the current element and the one before it.
This means that eviction code needs to clear the flag for the entry
directly following the evicted one.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This uncomfortably large patch overhauls cache range reader so that it
can take advantage of fast forwarding mutation readers.
A significant change in the cache itself is that the continuity flag now
is used to determine whether cache is contiguous between the previous
entry and the current one. This allows for a significant simplification
of the cache code and easier integration with reader fast forwarding.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Flags are easier to manage if they are in a single structure.
Especially, default initialization and move contstructors are simpler
and less error prone.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Make split_after() more generic by allowing split_point to be anywhere,
not just within the input range. If the split_point is before, the entire
range is returned; and if it is after, stdx::nullopt is returned.
"before" and "after" are not well defined for wrap-around ranges, so
but we are phasing them out and soon there will not be
wrapping_range::split_after() users.
This is a prerequisite for converting partition_range and friends to
nonwrapping_range.
Message-Id: <1475765099-10657-1-git-send-email-avi@scylladb.com>
In row_cache::make_reader, we update statistics inside an
allocating_section, which retries the supplied function until it can
satisfy all allocations by way of reserving LSA memory up front. Since
those updates are interleave with allocations, retries can lead to
miscounts.
This patch fixes this by updating statistics after all allocations.
Fixes#1659
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>
This reverts commit 1726b1d0cc.
Reverting this patch turns our SSTable access counter into a miss counter only.
The estimated histogram always starts its first bucket at 1, so by marking cache
accesses we will be wrongly feeding "1" into the buckets.
Notice that this is not yet ideal: nodetool is supposed to show a histogram of
all reads, and by doing this we are changing its meaning slightly. Workloads
that serve mostly from cache will be distorted towards their misses.
The real solution is to use a different histogram, but we will need to enforce
a newer version of nodetool for that: the current issue is that nodetool expects
an EstimatedHistogram in a specific format in the other side.
Conflicts:
row_cache.hh
Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy
lladb.com>
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This patch changes the row_cache so it accepts a trace_state_ptr,
which it is responsible of flowing to the underlying mutation_reader
if needed.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
If we have a cache hit, we still need to update our sstable histogram - notting
that we have touched 0 SSTables.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This fixes the problem of multiple concurrent get_ranges calls.
Previously each call was invalidating the result of the previous
call. Now they don't step on each other foot.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Once we encounter a wide partition store information
about this in cache entry and don't try to read it all
and cache next time it's requested.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
[Paweł: rebased, moved large partition reading logic to
cache_entry::read_wide()]
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
There were cases of use-after-free introduced by the code responsible
for creating mutation_readers for large partitions – the lifetimes
of partition ranges and the readers themselves weren't sufficiently
extended.
Another problem, was that if the partition was no longer present in the
sstable the reader would return EOS which was then returned by
range_populating_reader itself causing its users to incorrectly
interpret that as an end of stream.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Range queries need to take special care when transitioning between
ranges that are read from sstables and ranges that are already in the
cache.
Original code in such case just started a secondary reader and told it
to unconditionally mark the last entry as continuous (primary reader has
already returned an element tha immediately follows the range that is
going to be read form sstables).
However, that information may get stale. For instance, by the time
secondary reader finish reading its range the element immediately
following it may get evicted from the cache thus causing continuity flag
to be incorrectly set.
The solution is to ensure that the element immediately after the range
read from sstables is still in the cache.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468586893-15266-1-git-send-email-pdziepak@scylladb.com>
The LSA memory pressure mechanism will let us know which region is the best
candidate for eviction when under pressure. We need to somehow then translate
region -> memtable -> column family.
The easiest way to convert from region to memtable, is having memtable inherit
from region. Despite the fact that this requires multiple inheritance, which
always raise a flag a bit, the other class we inherit from is
enable_shared_from_this, which has a very simple and well defined interface. So
I think it is worthy for us to do it.
Once we have the memtable, grabing the column family is easy provided we have a
database object. We can grab it from the schema.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Add contiguity flag to cache entry and set it in scanning reader.
Partitions fetched during scanning are continuous
and we know there's nothing between them.
Clear contiguity flag on cache entries
when the succeeding entry is removed.
Use continuous flag in range queries.
Don't go do disk if we know that there's nothing
between two entries we have in cache. We know that
when continuous flag of the first one is set to true.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>
Commit daad2eb "row_cache: fix memory leak in case of schema upgrade
failure" has fixed a memory leak caused by failed upgrade_entry().
However, in case of upgrade failure memtable_entry used to create the
new cache entry was left in some invalid state. If the operation was
retried the cache would attempt again to apply that memtable_entry which
now would be in invalid state.
The solution is to either to ignore upgrade_entry() exceptions or do not
call it at all and let the cache entry be upgraded on demand. This patch
implements the latter.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>
When update() causes a new entry to be inserted to the cache the
procedure is as follows:
1. allocate and construct new entry
2. upgrade entry schema
3. add entry to lru list and cache tree
Step 2 may fail and at this point the pointer to the entry is neither
protected by RAII nor added in any of the cache containers. The solution
is to swap steps 2 and 3 so that even if the upgrade fails the entry is
already owned by the cache and won't leak.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>
"Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.
To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.
Fixes #1291."
Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.
To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.
As part of moving the derived statistic in to scylla, this replaces the
counter in the row_cache stats to
timed_rate_moving_average_and_histogram.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>