scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	641bcd0b35	cache: Introduce unlink_from_lru() Will be used in row_cache_alloc_stress to unlink partitions which we don't want to get evicted, instead of reapeatedly calling touch() on them after each subsequent population. After switching to row-level LRU, doing so greatly increases run time of the test due to quadratic behavior.	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	b9d22584bb	cache: Add row-level stats about cache update from memtable	2018-03-07 16:52:58 +01:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	dce9185fc9	cache: Track static row insertions separately from regular rows So that row eviction counter, which doesn't look at the static row, can be in sync with row insertion counter.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	30df3ddd7d	cache: Do not evict from cache_entry destructor We will need to propagate a cache_tracker reference to evict(). Instead of evicting from destructor, do so before cache_entry gets unlinked from the tree. Entries which are not linked, don't need to be explicitly evicted.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	4efab6f6a6	cache: Use on_evicted() in cache_tracker::clear() In preparation for switching LRU to row level.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	2118bdce01	cache: Extract cache_entry::on_evicted()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	24c5949518	cache: cache_tracker: Rename on_merge() to on_partition_merge()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	d66e864310	cache: cache_tracer: Rename on_erase() to on_partition_erase()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	d9a38c1c85	mutation_partition: Add API to walk from rows_entry to cache_entry Will be needed on row eviction, to unlink containers when they become fully evicted.	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	b0b57b8143	mvcc: Do not move unevictable snapshots to cache Commit `6ccd317` introduced a bug in partition_entry::evict() where a partition entry may be partially evicted if there are non-evictable snapshots in it. Partially evicting some of the versions may violate consistency of a snapshot which includes evicted versions. For one, continuity flags are interpreted realtive to the merged view, not within a version, so evicting from some of the versions may mark reanges as continuous when before they were discontinuous. Also, range tombtsones of the snapshot are taken from all versions, so we can't partially evict some of them without marking all affected ranges as discontinuous. The fix is to revert back to full eviciton, and avoid moving non-evictable snapshots to cache. When moving whole partition entry to cache, we first create a neutral empty partition entry and then merge the memtable entry into it just like we would if the entry already existed. Fixes #3215. Tests: unit (release) Message-Id: <1518710592-21925-2-git-send-email-tgrabiec@scylladb.com>	2018-02-15 16:48:07 +00:00
Tomasz Grabiec	27b114fe45	cache: Handle exceptions from make_evictable() cache_entry constructor was marked noexcept, yet make_evictable() may fail in rare cases due to allocation in add_version(). Lift the annotation and make sure that construction has strong exception guarantees for the moved-in state so that it can be retried without data loss inside allocating section.	2018-02-14 16:42:49 +01:00
Tomasz Grabiec	cce1a2bce8	Merge "Use the CPU scheduler" from Glauber & Avi In this patchset I am resubmitting Avi's enablement of the CPU scheduler in his behalf. I've done a ton of testing in the series and there are some improvements / changes that I had previously sent as a separate series. What you see here is the result of merging that work. After this patchset is applied, workloads are smoother and we are able to uphold the pre-defined shares among the various actors. We also finally have everything we need to merge the CPU and I/O controllers. After that is done the code is now much simpler. But also, as a bonus, controllers that were previously available for I/O only (compactions) are enabled for CPU as well. * git@github.com:glommer/scylla.git cpusched-v7: Avi Kivity (4): database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler memtable, database: make memtable::clear_gently() inherit scheduling_group config: mark background_writer_scheduling_quota as Unused database: place data_query execution stage into scheduling_group Glauber Costa (9): database, main: set up scheduling_groups for our main tasks row_cache: actually use the scheduling group for update_cache allow update_cache and clear_gently to use the entire task quota. database: remove cpu_flush_quota metric controllers: retire auto_adjust_flush_quota controllers: allow memtable I/O controller to have shares statically set controllers: update control points for memtable I/O controller controllers: allow a static priority to override the controller output controllers: unify the I/O and CPU controllers	2018-02-08 15:58:40 +01:00
Glauber Costa	a3a4d0a17a	row_cache: actually use the scheduling group for update_cache We have moved clear_gently from using a seastar::thread's scheduling_group to using the CPU scheduler's. However, update_cache was forgotten. This patch fixes that and gets rid of the old group just in case. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Tomasz Grabiec	d398aa913e	cache: Fix calculation of active_reads() Message-Id: <1518023341-27855-1-git-send-email-tgrabiec@scylladb.com>	2018-02-07 17:20:00 +00:00
Tomasz Grabiec	d899ae0f02	mvcc: Encapsulate construction of evictable entries Internal invariants of MVCC are better preserved by partition_entry methods, so move construction of partition entries out of cache_entry constructors.	2018-02-05 17:54:03 +01:00
Piotr Jastrzebski	39ec13133f	row_cache: rename make_flat_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	0f45df96ca	row_cache: Delete unused make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Duarte Nunes	a66c8d7973	row_cache: Don't require external_updater to be copyable No good reason to copy it around, and even less reason to impose that constraint on callers. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180118181142.15408-1-duarte@scylladb.com>	2018-01-19 13:00:49 +01:00
Piotr Jastrzebski	14d98aaa0b	Rename row_cache::create_underlying_flat_reader to create_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	49993e56a9	Remove unused row_cache::create_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	1457a3d771	Rename cache_entry::read_flat to cache_entry::read Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	8d37b71843	Rename autoupdating_underlying_flat_reader to autoupdating_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	9789c37e9d	Remove autoupdating_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:56 +01:00
Piotr Jastrzebski	df17bad13b	Remove unused cache_entry::read and do_read Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:56 +01:00
Piotr Jastrzebski	a9b6551584	Add cache_entry::read_flat Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:33 +01:00
Piotr Jastrzebski	82c603069b	Make cache_flat_mutation_reader a friend of row_cache and cache_tracker Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:33 +01:00
Piotr Jastrzebski	072fc2a309	Move lsa_manager to row_cache.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:33 +01:00
Piotr Jastrzebski	77b6f7c599	read_context: create a copy of autoupdating_underlying_reader called autoupdating_underlying_flat_reader. It will be modified in the next patch to use flat reader to underlying. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:09 +01:00
Piotr Jastrzebski	bf4e1c0c54	Add row_cache::create_underlying_flat_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:09 +01:00
Piotr Jastrzebski	16a0d306fd	Turn scanning_and_populating_reader into flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:09 +01:00
Piotr Jastrzebski	ceaf0dee99	Introduce row_cache::make_flat_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-14 12:49:39 +01:00
Glauber Costa	1d7617723d	row cache: pin real dirty during cache updates. Right now, once a region is moved to the cache is no longer visible to the dirty memory system. Not as real dirty nor virtual dirty. The problem is that until a particular partition is moved to the cache it is not evictable. As a result we can OOM the system if we have a lot of pending cache updates as the writes will not be throttled and memory won't be made available. This patch pins the memory used by the region as real dirty before the cache update starts, and unpins it when it is over. In the mean time it gradually releases memory of the partitions that are being moved to cache. I have verified in a couple of workloads that the amount of memory accounted through this is the same amount of memory accounted through the memtable flush procedure. Fixes #1942 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 19:46:36 -05:00
Duarte Nunes	baeec0935f	Replace query::full_slice with schema::full_slice() query::full_slice doesn't select any regular or static columns, which is at odds with the expectations of its users. This patch replaces it with the schema::full_slice() version. Refs #2885 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>	2017-10-17 11:25:53 +02:00
Tomasz Grabiec	c78047fa5b	row_cache: Evict partition snapshots If snapshots are not evicted, they may pin unbouned amount of memory for a long time in cache, which may lead to OOM. Evict snapshots together with the entry. Fixes #2775. Fixes #2730.	2017-09-13 17:47:03 +02:00
Tomasz Grabiec	adb159d51b	row_cache: Reuse allocation_strategy::invalidate_references() Modification count in the tracker is redundant, we can rely on allocator's invalidation counter.	2017-09-13 17:38:08 +02:00
Tomasz Grabiec	27a3b4bca9	row_cache: Don't invalidate references on insertion modification_count is currently only used to detect invalidation of references, intended to be incremented on erasure. Insertion into intrusive set doesn't invalidate references, so no need to increment the counter.	2017-09-13 17:38:08 +02:00
Tomasz Grabiec	d22fdf4261	row_cache: Improve safety of cache updates Cache imposes requirements on how updates to the on-disk mutation source are made: 1) each change to the on-disk muation source must be followed by cache synchronization reflecting that change 2) The two must be serialized with other synchronizations 3) must have strong failure guarantees (atomicity) Because of that, sstable list update and cache synchronization must be done under a lock, and cache synchronization cannot fail to synchronize. Normally cache synchronization achieves no-failure thing by wiping the cache (which is noexcept) in case failure is detect. There are some setup steps hoever which cannot be skipped, e.g. taking a lock followed by switching cache to use the new snapshot. That truly cannot fail. The lock inside cache synchronizers is redundant, since the user needs to take it anyway around the combined operation. In order to make ensuring strong exception guarantees easier, and making the cache interface easier to use correctly, this patch moves the control of the combined update into the cache. This is done by having cache::update() et al accept a callback (external_updater) which is supposed to perform modiciation of the underlying mutation source when invoked. This is in-line with the layering. Cache is layered on top of the on-disk mutation source (it wraps it) and reading has to go through cache. After the patch, modification also goes through cache. This way more of cache's requirements can be confined to its implementation. The failure semantics of update() and other synchronizers needed to change due to strong exception guaratnees. Now if it fails, it means the update was not performed, neither to the cache nor to the underlying mutation source. The database::_cache_update_sem goes away, serialization is done internally by the cache. The external_updater needs to have strong exception guarantees. This requirement is not new. It is however currently violated in some places. This patch marks those callbacks as noexcept and leaves a FIXME. Those should be fixed, but that's not in the scope of this patch. Aborting is still better than corrupting the state. Fixes #2754. Also fixes the following test failure: tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed which started to trigger after commit `318423d50b`. Thread stack allocation may fail, in which case we did not do the necessary invalidation.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	b0f3efa577	row_cache: Extract invalidate_sync()	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	56e3ce05db	row_cache: Don't require presence checker to be supplied externally The API is simpler and safer this way.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	bc3112a187	row_cache: Allow marking as fully continuous on construction Will be needed in tests.	2017-09-04 10:04:29 +02:00
Raphael S. Carvalho	637f3bfa50	db: refresh row cache's underlying data source after compaction Underlying data source in row cache holds a reference to sstable set prior to compaction which isn't released until a memtable flush, which means file descriptors of deleted sstables remains opened, wasting disk space. The fix is to refresh underlying data source in row cache. Fixes #2570. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-24 15:49:11 -03:00
Piotr Jastrzebski	b950c59bbb	row_cache: Fix wrong comment on continuity flag This comment was stating exactly the opposite to the truth. This is very misleading Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <79062a061e22ef4c4add24cbdf723cbfb5cda060.1499345295.git.piotr@scylladb.com>	2017-07-07 10:29:19 +02:00
Tomasz Grabiec	62c76abf71	row_cache: Rename num_entries() to partitions() for clarity	2017-07-04 13:55:06 +02:00
Tomasz Grabiec	60c2a86192	row_cache: Track mispopulations also at row level	2017-07-04 13:55:06 +02:00
Tomasz Grabiec	94547db620	row_cache: Track row insertions	2017-07-04 13:55:06 +02:00
Tomasz Grabiec	a58f2c8640	row_cache: Track row hits and misses	2017-07-04 13:55:06 +02:00
Tomasz Grabiec	a5fdff2ac2	row_cache: Add partition_ prefix to current counters In preparation for adding per-row counters.	2017-07-04 13:55:06 +02:00

1 2 3

148 Commits