Commit Graph

131 Commits

Author SHA1 Message Date
Tomasz Grabiec
d899ae0f02 mvcc: Encapsulate construction of evictable entries
Internal invariants of MVCC are better preserved by partition_entry
methods, so move construction of partition entries out of cache_entry
constructors.
2018-02-05 17:54:03 +01:00
Piotr Jastrzebski
39ec13133f row_cache: rename make_flat_reader to make_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-01-24 20:54:45 +01:00
Piotr Jastrzebski
0f45df96ca row_cache: Delete unused make_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-01-24 20:54:45 +01:00
Duarte Nunes
a66c8d7973 row_cache: Don't require external_updater to be copyable
No good reason to copy it around, and even less reason to impose that
constraint on callers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180118181142.15408-1-duarte@scylladb.com>
2018-01-19 13:00:49 +01:00
Piotr Jastrzebski
14d98aaa0b Rename row_cache::create_underlying_flat_reader to
create_underlying_reader

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 16:37:57 +01:00
Piotr Jastrzebski
49993e56a9 Remove unused row_cache::create_underlying_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 16:37:57 +01:00
Piotr Jastrzebski
1457a3d771 Rename cache_entry::*read_flat to cache_entry::*read
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 16:37:57 +01:00
Piotr Jastrzebski
8d37b71843 Rename autoupdating_underlying_flat_reader to autoupdating_underlying_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 16:37:57 +01:00
Piotr Jastrzebski
9789c37e9d Remove autoupdating_underlying_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 16:37:56 +01:00
Piotr Jastrzebski
df17bad13b Remove unused cache_entry::read and do_read
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 16:37:56 +01:00
Piotr Jastrzebski
a9b6551584 Add cache_entry::read_flat
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 13:28:33 +01:00
Piotr Jastrzebski
82c603069b Make cache_flat_mutation_reader a friend of row_cache and cache_tracker
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 13:28:33 +01:00
Piotr Jastrzebski
072fc2a309 Move lsa_manager to row_cache.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 13:28:33 +01:00
Piotr Jastrzebski
77b6f7c599 read_context: create a copy of autoupdating_underlying_reader
called autoupdating_underlying_flat_reader. It will be modified
in the next patch to use flat reader to underlying.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 13:28:09 +01:00
Piotr Jastrzebski
bf4e1c0c54 Add row_cache::create_underlying_flat_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 13:28:09 +01:00
Piotr Jastrzebski
16a0d306fd Turn scanning_and_populating_reader into flat reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-18 13:28:09 +01:00
Piotr Jastrzebski
ceaf0dee99 Introduce row_cache::make_flat_reader
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2017-12-14 12:49:39 +01:00
Glauber Costa
1d7617723d row cache: pin real dirty during cache updates.
Right now, once a region is moved to the cache is no longer visible to
the dirty memory system. Not as real dirty nor virtual dirty.

The problem is that until a particular partition is moved to the cache
it is not evictable. As a result we can OOM the system if we have a lot
of pending cache updates as the writes will not be throttled and memory
won't be made available.

This patch pins the memory used by the region as real dirty before the
cache update starts, and unpins it when it is over. In the mean time it
gradually releases memory of the partitions that are being moved to
cache.

I have verified in a couple of workloads that the amount of memory
accounted through this is the same amount of memory accounted through
the memtable flush procedure.

Fixes #1942

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-11-08 19:46:36 -05:00
Duarte Nunes
baeec0935f Replace query::full_slice with schema::full_slice()
query::full_slice doesn't select any regular or static columns, which
is at odds with the expectations of its users. This patch replaces it
with the schema::full_slice() version.

Refs #2885

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>
2017-10-17 11:25:53 +02:00
Tomasz Grabiec
c78047fa5b row_cache: Evict partition snapshots
If snapshots are not evicted, they may pin unbouned amount of memory
for a long time in cache, which may lead to OOM. Evict snapshots
together with the entry.

Fixes #2775.
Fixes #2730.
2017-09-13 17:47:03 +02:00
Tomasz Grabiec
adb159d51b row_cache: Reuse allocation_strategy::invalidate_references()
Modification count in the tracker is redundant, we can rely on
allocator's invalidation counter.
2017-09-13 17:38:08 +02:00
Tomasz Grabiec
27a3b4bca9 row_cache: Don't invalidate references on insertion
modification_count is currently only used to detect invalidation of
references, intended to be incremented on erasure.

Insertion into intrusive set doesn't invalidate references, so no
need to increment the counter.
2017-09-13 17:38:08 +02:00
Tomasz Grabiec
d22fdf4261 row_cache: Improve safety of cache updates
Cache imposes requirements on how updates to the on-disk mutation source
are made:
  1) each change to the on-disk muation source must be followed
     by cache synchronization reflecting that change
  2) The two must be serialized with other synchronizations
  3) must have strong failure guarantees (atomicity)

Because of that, sstable list update and cache synchronization must be
done under a lock, and cache synchronization cannot fail to synchronize.

Normally cache synchronization achieves no-failure thing by wiping the
cache (which is noexcept) in case failure is detect. There are some
setup steps hoever which cannot be skipped, e.g. taking a lock
followed by switching cache to use the new snapshot. That truly cannot
fail.  The lock inside cache synchronizers is redundant, since the
user needs to take it anyway around the combined operation.

In order to make ensuring strong exception guarantees easier, and
making the cache interface easier to use correctly, this patch moves
the control of the combined update into the cache. This is done by
having cache::update() et al accept a callback (external_updater)
which is supposed to perform modiciation of the underlying mutation
source when invoked.

This is in-line with the layering. Cache is layered on top of the
on-disk mutation source (it wraps it) and reading has to go through
cache. After the patch, modification also goes through cache. This way
more of cache's requirements can be confined to its implementation.

The failure semantics of update() and other synchronizers needed to
change due to strong exception guaratnees. Now if it fails, it means
the update was not performed, neither to the cache nor to the
underlying mutation source.

The database::_cache_update_sem goes away, serialization is done
internally by the cache.

The external_updater needs to have strong exception guarantees. This
requirement is not new. It is however currently violated in some
places. This patch marks those callbacks as noexcept and leaves a
FIXME. Those should be fixed, but that's not in the scope of this
patch. Aborting is still better than corrupting the state.

Fixes #2754.

Also fixes the following test failure:

  tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed

which started to trigger after commit 318423d50b. Thread stack
allocation may fail, in which case we did not do the necessary
invalidation.
2017-09-04 10:04:29 +02:00
Tomasz Grabiec
b0f3efa577 row_cache: Extract invalidate_sync() 2017-09-04 10:04:29 +02:00
Tomasz Grabiec
56e3ce05db row_cache: Don't require presence checker to be supplied externally
The API is simpler and safer this way.
2017-09-04 10:04:29 +02:00
Tomasz Grabiec
bc3112a187 row_cache: Allow marking as fully continuous on construction
Will be needed in tests.
2017-09-04 10:04:29 +02:00
Raphael S. Carvalho
637f3bfa50 db: refresh row cache's underlying data source after compaction
Underlying data source in row cache holds a reference to sstable set
prior to compaction which isn't released until a memtable flush, which
means file descriptors of deleted sstables remains opened, wasting
disk space.
The fix is to refresh underlying data source in row cache.

Fixes #2570.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-07-24 15:49:11 -03:00
Piotr Jastrzebski
b950c59bbb row_cache: Fix wrong comment on continuity flag
This comment was stating exactly the opposite to the
truth. This is very misleading

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <79062a061e22ef4c4add24cbdf723cbfb5cda060.1499345295.git.piotr@scylladb.com>
2017-07-07 10:29:19 +02:00
Tomasz Grabiec
62c76abf71 row_cache: Rename num_entries() to partitions() for clarity 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
60c2a86192 row_cache: Track mispopulations also at row level 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
94547db620 row_cache: Track row insertions 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
a58f2c8640 row_cache: Track row hits and misses 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
a5fdff2ac2 row_cache: Add partition_ prefix to current counters
In preparation for adding per-row counters.
2017-07-04 13:55:06 +02:00
Tomasz Grabiec
6a22cbceaf row_cache: Add metrics for operations on underlying reader 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
5c7b6fc164 row_cache: Add reader-related metrics 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
be2e89d596 row_cache: Remove dead code 2017-07-04 13:55:06 +02:00
Tomasz Grabiec
b56232b216 row_cache: Introduce evict() 2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
c4e8effffa tests: Add cache_streamed_mutation_test
[tgrabiec:
  - extracted from a larger commit
  - removed coupling with how cache_streamed_mutation is created (the
    code went out of sync), used more stable make_reader(). it's simpler too.
  - replaced false/true literals with is_continuous/is_dummy where appropraite
  - dropped tests for cache::underlying (class is gone)
  - reused streamed_mutation_assertions, it has better error messages
  - fixed the tests to not create tombstones with missing timestamps
  - relaxed range tombstone assertions to only check information relevant for the query range
  - print cache on failure for improved debuggability
]
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
6f6575f456 row_cache: Enable partial partition population 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
e792220c3a row_cache: Introduce update_invalidating() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
c29878f49f row_cache: Extract memtable walking logic from update() into do_update()
So that it can be reused in update_invalidating().
2017-06-24 18:06:11 +02:00
Piotr Jastrzebski
3755641c4b row_cache: Introduce cache_streamed_mutation
This streamed mutation populates cache with
the rows requested by the read. It takes whatever
it can find in the cache and fetches the remainings
from underlying source.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>

[tgrabiec:

  - fixed maybe_add_to_cache_and_update_continuity() leaking entries if
    the key already exists in the snapshot

  - fixed a problem where population race could result in a read
    missing some rows, because cache_streamed_mutation was advancing
    the cursor, then deferring, and then checking continuity. We
    should check continuity atomically with advancing.

  - fixed rows_handle.maybe_refresh() being accessed outside of update
    section in read_from_underlying() (undefined behavior)

  - fixed a problem in start_reading_from_underlying() where we would
    use incorrect start if lower_bound ended with a range tombstone
    starting before a key.

  - range tombstone trimming in add_to_buffer() could create a
    tombstone which has too low start bound if last_rt.end was a
    prefix and had inclusive end. invert_kind(end_kind) should be used
    instead of unconditional inc_start.

  - range tombstone trimming incorrectly assumed it is fine to trim
    the tombstone from underlying to the previous fragment's end and
    emit such tombstone. That would mean the stream can't emit any
    fragments which start before previous tombstone's end. Solve with
    range_tombstone_stream.

  - split add_to_buffer() into overloads for clustering_row, and
    range_tombstone. Better than wrapping into mutation_fragment
    before the call and having add_to_buffer() rediscover the
    information.

  - changed maybe_add_to_cache_and_update_continuity() to not set
    continuity to false for existing entries, it's not necessary

  - moved range tombstone trimming to range_tombstone class
  - moved range tombstone slicing code to range_tombstone_list and partition_snapshot
  - can_populate::can_use_cache was unused, dropped
  - dropped assumption that dummy entries are only at the end
  - renamed maybe_add_to_cache_and_update_continuity() to maybe_add_to_cache()
  - dropped no longer needed lower_bound class
  - extracted row_handle to a seaparate patch
  - made the copy-from-cache loop preemptable
  - split maybe_add_next_to_buffer_and_update_continuity(bool)
  - dropped cache_populator
  - replaced "underlying" class with use of read_context
  - replaced can_populate class with a function
  - simplified lsa_manager methods to avoid moves
]
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
a2207ee9a6 row_cache: Move autoupdating_underlying_reader to read_context.hh 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
045888d5f3 row_cache: Introduce partition_range_cursor 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
f0cc86e5db row_cache: Introduce cache_entry::position() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
e3272526a1 row_cache: Allow comparing with ring_position views in row_cache::compare 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
64626b32b0 row_cache: Make printable 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
54b3da1910 row_cache: Introduce find_or_create() helper 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
f2d2c221d4 row_cache: Return cache_entry reference from do_find_or_create_entry
Will be useful when additional action needs to be done on the entry
after it was created or constructed.
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
0db6bdc916 row_cache: Introduce cache_entry constructor which constructs incomplete entry 2017-06-24 18:06:11 +02:00