Commit Graph

87 Commits

Author SHA1 Message Date
Tomasz Grabiec
b224ff6ede Merge 'pdziepak/row-cache-wide-entries/v4' from seastar-dev.git
This series adds the ability for partition cache to keep information
whether partition size makes it uncacheable. During, reads these
entries save us IO operations since we already know that the partiiton
is too big to be put in the cache.

First part of the patchset makes all mutation_readers allow the
streamed_mutations they produce to outlive them, which is a guarantee
used later by the code handling reading large partitions.

(cherry picked from commit d2ed75c9ff)
2016-08-02 20:24:29 +02:00
Piotr Jastrzebski
6960fce9b2 Use continuity flag correctly with concurrent invalidations
Between reading cache entry and actually using it
invalidations can happen so we have to check if no flag was
cleared if it was we need to read the entry again.

Fixes #1464.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com>
(cherry picked from commit fdfd1af694)
2016-08-02 20:24:22 +02:00
Piotr Jastrzebski
02cf5a517a Add collectd counter for uncached wide partitions.
Keep track of every read of wide partition that's
not cached.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 37a7d49676)
2016-07-27 14:09:40 +03:00
Piotr Jastrzebski
ec3d59bf13 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 636a4acfd0)
2016-07-27 14:09:34 +03:00
Piotr Jastrzebski
30c72ef3b4 Try to read whole streamed_mutation up to limit
If limit is exceeded then return the streamed_mutation
and don't cache it.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 98c12dc2e2)
2016-07-27 14:09:29 +03:00
Paweł Dziepak
81e4952c78 row_cache: fix marking last entry as continuous
Range queries need to take special care when transitioning between
ranges that are read from sstables and ranges that are already in the
cache.

Original code in such case just started a secondary reader and told it
to unconditionally mark the last entry as continuous (primary reader has
already returned an element tha immediately follows the range that is
going to be read form sstables).

However, that information may get stale. For instance, by the time
secondary reader finish reading its range the element immediately
following it may get evicted from the cache thus causing continuity flag
to be incorrectly set.

The solution is to ensure that the element immediately after the range
read from sstables is still in the cache.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468586893-15266-1-git-send-email-pdziepak@scylladb.com>
2016-07-15 15:15:02 +02:00
Avi Kivity
9a8788019d row_cache: fix visitor for boost <= 1.55
Older boosts can't return a future from a visitor (likely lacking support
for move-only objects).  Supply a dirty hackaround.

Message-Id: <1467822548-25940-1-git-send-email-avi@scylladb.com>
2016-07-06 19:55:51 +03:00
Glauber Costa
d41fcd45d1 memtables: make memtable inherit from region
The LSA memory pressure mechanism will let us know which region is the best
candidate for eviction when under pressure. We need to somehow then translate
region -> memtable -> column family.

The easiest way to convert from region to memtable, is having memtable inherit
from region. Despite the fact that this requires multiple inheritance, which
always raise a flag a bit, the other class we inherit from is
enable_shared_from_this, which has a very simple and well defined interface. So
I think it is worthy for us to do it.

Once we have the memtable, grabing the column family is easy provided we have a
database object. We can grab it from the schema.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 15:05:29 -04:00
Piotr Jastrzebski
59d0d9e666 Fix cache_tracker::clear
Make sure that artificial entries for
all column families are set to non continuous.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f9e517fe40482c05f6c388faab7d6b9eca6b159e.1467103548.git.piotr@scylladb.com>
2016-06-28 11:18:23 +02:00
Piotr Jastrzebski
27575a0528 Fix previous_entry_is_continuous
Rename it to check_previous_entry.
Remove unnesessary test.
Make sure ring_position always has working relation_to_keys method.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>
2016-06-28 10:27:08 +02:00
Piotr Jastrzebski
68e5a199e9 Clean continuous flag of cache entry
preceeding invalidated decorated key even
when it's not found.

Add test.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c7b8f4df37256363bf304e0396f84b5f37921b81.1467059472.git.piotr@scylladb.com>
2016-06-28 10:26:02 +02:00
Piotr Jastrzebski
cd9f3f94c4 Fix row_cache::update
Clear continuous flag on the last cache entry
with key smaller than a partition being dropped from
memtable on flush and not saved in cache.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <0b5293cc0bf8bb858e62aa8dd00ae7fe7a484380.1467059472.git.piotr@scylladb.com>
2016-06-28 10:25:38 +02:00
Piotr Jastrzebski
eb959a8b81 Change check for artificial entry
in cache_entry destructor from
_key.has_key() to _lru_link.is_linked()

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f6d3d1bc49d9f6dd5b67a10cbe862466047b039d.1467059472.git.piotr@scylladb.com>
2016-06-28 10:24:29 +02:00
Piotr Jastrzebski
9b011bff18 row_cache: add contiguity flag to cache entry to reduce disk IO during scans
Add contiguity flag to cache entry and set it in scanning reader.
Partitions fetched during scanning are continuous
and we know there's nothing between them.

Clear contiguity flag on cache entries
when the succeeding entry is removed.

Use continuous flag in range queries.
Don't go do disk if we know that there's nothing
between two entries we have in cache. We know that
when continuous flag of the first one is set to true.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>
2016-06-23 09:43:15 +03:00
Paweł Dziepak
b2c37429e7 row_cache: drop slicing_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
f605499aec row_cache: fully support streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
2ab1a73efa memtable: rename partition_entry to memtable_entry
partition_entry is going to be a more general object used by both
cache and memtable entries.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
737eb73499 mutation_reader: make readers return streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
5b45d46f82 row_cache: simplify slicing_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
dde87e0b0e row_cache: drop schema upgrade for new entries in update()
Commit daad2eb "row_cache: fix memory leak in case of schema upgrade
failure" has fixed a memory leak caused by failed upgrade_entry().
However, in case of upgrade failure memtable_entry used to create the
new cache entry was left in some invalid state. If the operation was
retried the cache would attempt again to apply that memtable_entry which
now would be in invalid state.

The solution is to either to ignore upgrade_entry() exceptions or do not
call it at all and let the cache entry be upgraded on demand. This patch
implements the latter.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>
2016-06-17 13:43:01 +02:00
Paweł Dziepak
daad2ebf81 row_cache: fix memory leak in case of schema upgrade failure
When update() causes a new entry to be inserted to the cache the
procedure is as follows:
1. allocate and construct new entry
2. upgrade entry schema
3. add entry to lru list and cache tree

Step 2 may fail and at this point the pointer to the entry is neither
protected by RAII nor added in any of the cache containers. The solution
is to swap steps 2 and 3 so that even if the upgrade fails the entry is
already owned by the cache and won't leak.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>
2016-06-17 13:12:01 +02:00
Avi Kivity
465c0a4ead Merge "Make stronger guarantees in row_cache's clear/invalidate" from Tomasz
"Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.

To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.

Fixes #1291."
2016-06-13 09:55:29 +03:00
Tomasz Grabiec
d5a2d7a88d row_cache: Add eviciton and removal counters
Fixes #1273.

Message-Id: <1465315433-8473-1-git-send-email-tgrabiec@scylladb.com>
2016-06-08 16:08:32 -04:00
Tomasz Grabiec
170a214628 row_cache: Make stronger guarantees in clear/invalidate
Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.

To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.
2016-06-06 13:21:06 +02:00
Tomasz Grabiec
2ab18dcd2d row_cache: Implement clear() using invalidate()
Reduces code duplication.
2016-06-03 13:34:40 +02:00
Avi Kivity
9637c2232c Merge "Move the JMX timer polling logic to Scylla" from Amnon 2016-05-24 13:07:52 +03:00
Amnon Heiman
468bcfbf1f row_cache: Change counter to timed_rate_moving_average_and_histogram
As part of moving the derived statistic in to scylla, this replaces the
counter in the row_cache stats to
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Piotr Jastrzebski
dcba6f5c45 Pass clustering_row_ranges to mutation readers.
This will allow readers to reduce the amount of data read.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 14:36:57 +02:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Tomasz Grabiec
3997421b2c row_cache: Let the cleanup guard do invalidation of unmerged partitions 2016-02-26 16:57:31 +01:00
Tomasz Grabiec
aa15268249 row_cache: Delete the entry even if invalidation failed
Otherwise we will leak it, and region destructor will fail:

row_cache_test: utils/logalloc.cc:1211: virtual logalloc::region_impl::~region_impl(): Assertion `seg->is_empty()' failed.

Fixes regression in row_cache_test.
2016-02-26 16:57:31 +01:00
Tomasz Grabiec
be24816c8a row_cache: Clear partitions with region locked
Since invalidate() may allocate, we need to take the region lock to
keep m.partitions references valid around whole clear_and_dispose(),
which relies on that.
2016-02-26 16:57:31 +01:00
Avi Kivity
fbe6961827 row_cache: run partiton-touching operations of row_cache::update in a linearization context
To avoid scattered keys (and values, though those are already protected)
from being accessed, run the update procedure in a managed_bytes linearization
context.

Fixes #807.
2016-02-16 14:37:44 +02:00
Avi Kivity
ad58663c96 row_cache: reindent 2016-02-07 13:25:29 +02:00
Paweł Dziepak
490201fd1c row_cache: protect against stale entries
row_cache::update() does not explicitly invalidate the entries it failed
to update in case of a failure. This could lead to inconsistency between
row cache and sstables.

In paractice that's not a problem because before row_cache::update()
fails it will cause all entries in the cache to be invalidated during
memory reclaim, but it's better to be safe and explicitly remove entries
that should be updated but it was not possible to do so.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453829681-29239-1-git-send-email-pdziepak@scylladb.com>
2016-01-26 20:34:41 +01:00
Glauber Costa
f6cfb04d61 add a priority class to mutation readers
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.

Some of those readers will be used to read through a mutation_source, and those
have to patched as well.

Right now, whenever we need to pass a class, we pass Seastar's default priority
class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Tomasz Grabiec
d332fcaefc row_cache: Restore indentation 2016-01-15 15:33:17 +01:00
Tomasz Grabiec
6b059fd828 row_cache: Guard against wrap-around range in make_reader() 2016-01-13 17:50:55 +01:00
Tomasz Grabiec
7fb0bc4e15 row_cache: Take the reclaim lock in invalidate()
It's needed to keep the iterators valid in case eviciton is triggered
somehwere in between. It probably isn't because destructors should not
allocate, but better be safe.
2016-01-13 17:50:55 +01:00
Tomasz Grabiec
50cc0c162e row_cache: Make invalidate() handle wrap-around ranges
Currently for wrap around the "begin" iterator would not meet with the
"end" iterator, invoking undefined behavior in erase_and_dispose()
which results in a crash.

Fixes #785
2016-01-13 17:50:55 +01:00
Tomasz Grabiec
d81a46d7b5 column_family: Add schema setters
There is one current schema for given column_family. Entries in
memtables and cache can be at any of the previous schemas, but they're
always upgraded to current schema on access.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
5184381a0b memtable: Deconstify memtable in readers
We want to upgrade entries on read and for that we need mutating
permission.
2016-01-11 10:34:51 +01:00
Paweł Dziepak
59245e7913 row_cache: add functions for invalidating entries in cache
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-15 13:21:11 +01:00
Tomasz Grabiec
7c3e6c306b row_cache: Wait for in-flight populations on update
Before this change, populations could race with update from flushed
memtable, which might result in cache being populated with older
data. Populations started before the flush are not considering the
memtable nor its sstable.

The fix employed here is to make update wait for populations which
were started before the flushed memtable's sstable was added to the
undrelying data source. All populatinos started after that are
guaranteed to see the new data.
2015-11-29 16:25:21 +01:00
Paweł Dziepak
513ab87b47 row_cache: update hit and miss stats in scanning reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:25:02 +03:00
Paweł Dziepak
b1b830bcbb row_cache: merge cache_entry::compare and ring_position_compare
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-22 12:25:02 +03:00
Paweł Dziepak
c2b53c5282 row_cache: add scanning_and_populating_reader
This reader enables range queries on row cache. An underlying key_reader
is used to obtain information about partitions that belong to the
specified range and if any of them isn't in the cache an underlying
mutation reader is used to read the missing data.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00
Paweł Dziepak
9abefbfa28 row_cache: add just_cache_scanning_reader
This mutation reader returns mutations from cache that are in a given
range. There may be other mutations in the system (e.g. in sstables)
that won't be returned, so this reader on its own cannot really satisfy
any query.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-10-20 20:27:53 +02:00