scylladb

Author	SHA1	Message	Date
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Tomasz Grabiec	074be4d4e8	memtable, cache: Run mutation_cleaner worker in its own scheduling group The worker is responsible for merging MVCC snapshots, which is similar to merging sstables, but in memory. The new scheduling group will be therefore called "memory compaction". We should run it in a separate scheduling group instead of main/memtables, so that it doesn't disrupt writes and other system activities. It's also nice for monitoring how much CPU time we spend on this.	2018-06-27 21:51:04 +02:00
Paweł Dziepak	96b0577343	row_cache: deglobalise row cache tracker Row cache tracker has numerous implicit dependencies on ohter objects (e.g. LSA migrators for data held by mutation_cleaner). The fact that both cache tracker and some of those dependencies are thread local objects makes it hard to guarantee correct destruction order. Let's deglobalise cache tracker and put in in the database class.	2018-06-25 09:37:43 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Tomasz Grabiec	70c72773be	cache: Defer during partition merging	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	3f19f76c67	mvcc: Destroy memtable partition versions gently Now all snapshots will have a mutation_cleaner which they will use to gently destroy freed partition_version objects. Destruction of memtable entries during cache update is also using the gentle cleaner now. We need to have a separate cleaner for memtable objects even though they're owned by cache's region, because memtable versions must be cleared without a cache_tracker. Each memtable will have its own cleaner, which will be merged with the cache's cleaner when memtable is merged into cache. Fixes some sources of reactor stalls on cache update when there are large partition entries in memtables.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	f0c1edd672	cache: Destroy partition versions incrementally Instead of destroying whole partition_versions at once, we will do that gently using mutation_cleaner to avoid reactor stalls. Large deletions could happen when large partition gets invalidated, upgraded to a new schema, or when it's abandaned by a detached snapshot. Refs #3289.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	40cc766cf2	database: Add API for incremental clearing of partition entries Partitions can get very large. Destroying them all at once can stall the reactor for significant amount of time. We want to avoid that by doing destruction incrementally, deferring in between. A new API is added for that at various levels: stop_iteration clear_gently() noexcept; It returns stop_iteration::yes when the object is fully cleared and can be now destroyed quickly. So a deferring destruction can look like this: return repeat([this] { return clear_gently(); }); The reason why clear_gently() doesn't return a future<> itself is that some contexts cannot defer, like memory reclamation.	2018-05-30 12:18:56 +02:00
Tomasz Grabiec	2f75212ca4	cache: Define trivial methods inline They have users in a different compilation unit, in partition_version.cc	2018-05-30 12:18:56 +02:00
Tomasz Grabiec	641bcd0b35	cache: Introduce unlink_from_lru() Will be used in row_cache_alloc_stress to unlink partitions which we don't want to get evicted, instead of reapeatedly calling touch() on them after each subsequent population. After switching to row-level LRU, doing so greatly increases run time of the test due to quadratic behavior.	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	b9d22584bb	cache: Add row-level stats about cache update from memtable	2018-03-07 16:52:58 +01:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	dce9185fc9	cache: Track static row insertions separately from regular rows So that row eviction counter, which doesn't look at the static row, can be in sync with row insertion counter.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	30df3ddd7d	cache: Do not evict from cache_entry destructor We will need to propagate a cache_tracker reference to evict(). Instead of evicting from destructor, do so before cache_entry gets unlinked from the tree. Entries which are not linked, don't need to be explicitly evicted.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	4efab6f6a6	cache: Use on_evicted() in cache_tracker::clear() In preparation for switching LRU to row level.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	2118bdce01	cache: Extract cache_entry::on_evicted()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	24c5949518	cache: cache_tracker: Rename on_merge() to on_partition_merge()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	d66e864310	cache: cache_tracer: Rename on_erase() to on_partition_erase()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	d9a38c1c85	mutation_partition: Add API to walk from rows_entry to cache_entry Will be needed on row eviction, to unlink containers when they become fully evicted.	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	b0b57b8143	mvcc: Do not move unevictable snapshots to cache Commit `6ccd317` introduced a bug in partition_entry::evict() where a partition entry may be partially evicted if there are non-evictable snapshots in it. Partially evicting some of the versions may violate consistency of a snapshot which includes evicted versions. For one, continuity flags are interpreted realtive to the merged view, not within a version, so evicting from some of the versions may mark reanges as continuous when before they were discontinuous. Also, range tombtsones of the snapshot are taken from all versions, so we can't partially evict some of them without marking all affected ranges as discontinuous. The fix is to revert back to full eviciton, and avoid moving non-evictable snapshots to cache. When moving whole partition entry to cache, we first create a neutral empty partition entry and then merge the memtable entry into it just like we would if the entry already existed. Fixes #3215. Tests: unit (release) Message-Id: <1518710592-21925-2-git-send-email-tgrabiec@scylladb.com>	2018-02-15 16:48:07 +00:00
Tomasz Grabiec	27b114fe45	cache: Handle exceptions from make_evictable() cache_entry constructor was marked noexcept, yet make_evictable() may fail in rare cases due to allocation in add_version(). Lift the annotation and make sure that construction has strong exception guarantees for the moved-in state so that it can be retried without data loss inside allocating section.	2018-02-14 16:42:49 +01:00
Tomasz Grabiec	cce1a2bce8	Merge "Use the CPU scheduler" from Glauber & Avi In this patchset I am resubmitting Avi's enablement of the CPU scheduler in his behalf. I've done a ton of testing in the series and there are some improvements / changes that I had previously sent as a separate series. What you see here is the result of merging that work. After this patchset is applied, workloads are smoother and we are able to uphold the pre-defined shares among the various actors. We also finally have everything we need to merge the CPU and I/O controllers. After that is done the code is now much simpler. But also, as a bonus, controllers that were previously available for I/O only (compactions) are enabled for CPU as well. * git@github.com:glommer/scylla.git cpusched-v7: Avi Kivity (4): database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler memtable, database: make memtable::clear_gently() inherit scheduling_group config: mark background_writer_scheduling_quota as Unused database: place data_query execution stage into scheduling_group Glauber Costa (9): database, main: set up scheduling_groups for our main tasks row_cache: actually use the scheduling group for update_cache allow update_cache and clear_gently to use the entire task quota. database: remove cpu_flush_quota metric controllers: retire auto_adjust_flush_quota controllers: allow memtable I/O controller to have shares statically set controllers: update control points for memtable I/O controller controllers: allow a static priority to override the controller output controllers: unify the I/O and CPU controllers	2018-02-08 15:58:40 +01:00
Glauber Costa	a3a4d0a17a	row_cache: actually use the scheduling group for update_cache We have moved clear_gently from using a seastar::thread's scheduling_group to using the CPU scheduler's. However, update_cache was forgotten. This patch fixes that and gets rid of the old group just in case. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Tomasz Grabiec	d398aa913e	cache: Fix calculation of active_reads() Message-Id: <1518023341-27855-1-git-send-email-tgrabiec@scylladb.com>	2018-02-07 17:20:00 +00:00
Tomasz Grabiec	d899ae0f02	mvcc: Encapsulate construction of evictable entries Internal invariants of MVCC are better preserved by partition_entry methods, so move construction of partition entries out of cache_entry constructors.	2018-02-05 17:54:03 +01:00
Piotr Jastrzebski	39ec13133f	row_cache: rename make_flat_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	0f45df96ca	row_cache: Delete unused make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Duarte Nunes	a66c8d7973	row_cache: Don't require external_updater to be copyable No good reason to copy it around, and even less reason to impose that constraint on callers. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180118181142.15408-1-duarte@scylladb.com>	2018-01-19 13:00:49 +01:00
Piotr Jastrzebski	14d98aaa0b	Rename row_cache::create_underlying_flat_reader to create_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	49993e56a9	Remove unused row_cache::create_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	1457a3d771	Rename cache_entry::read_flat to cache_entry::read Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	8d37b71843	Rename autoupdating_underlying_flat_reader to autoupdating_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	9789c37e9d	Remove autoupdating_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:56 +01:00
Piotr Jastrzebski	df17bad13b	Remove unused cache_entry::read and do_read Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:56 +01:00
Piotr Jastrzebski	a9b6551584	Add cache_entry::read_flat Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:33 +01:00
Piotr Jastrzebski	82c603069b	Make cache_flat_mutation_reader a friend of row_cache and cache_tracker Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:33 +01:00
Piotr Jastrzebski	072fc2a309	Move lsa_manager to row_cache.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:33 +01:00
Piotr Jastrzebski	77b6f7c599	read_context: create a copy of autoupdating_underlying_reader called autoupdating_underlying_flat_reader. It will be modified in the next patch to use flat reader to underlying. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:09 +01:00
Piotr Jastrzebski	bf4e1c0c54	Add row_cache::create_underlying_flat_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:09 +01:00
Piotr Jastrzebski	16a0d306fd	Turn scanning_and_populating_reader into flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 13:28:09 +01:00
Piotr Jastrzebski	ceaf0dee99	Introduce row_cache::make_flat_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-14 12:49:39 +01:00
Glauber Costa	1d7617723d	row cache: pin real dirty during cache updates. Right now, once a region is moved to the cache is no longer visible to the dirty memory system. Not as real dirty nor virtual dirty. The problem is that until a particular partition is moved to the cache it is not evictable. As a result we can OOM the system if we have a lot of pending cache updates as the writes will not be throttled and memory won't be made available. This patch pins the memory used by the region as real dirty before the cache update starts, and unpins it when it is over. In the mean time it gradually releases memory of the partitions that are being moved to cache. I have verified in a couple of workloads that the amount of memory accounted through this is the same amount of memory accounted through the memtable flush procedure. Fixes #1942 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 19:46:36 -05:00
Duarte Nunes	baeec0935f	Replace query::full_slice with schema::full_slice() query::full_slice doesn't select any regular or static columns, which is at odds with the expectations of its users. This patch replaces it with the schema::full_slice() version. Refs #2885 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>	2017-10-17 11:25:53 +02:00
Tomasz Grabiec	c78047fa5b	row_cache: Evict partition snapshots If snapshots are not evicted, they may pin unbouned amount of memory for a long time in cache, which may lead to OOM. Evict snapshots together with the entry. Fixes #2775. Fixes #2730.	2017-09-13 17:47:03 +02:00
Tomasz Grabiec	adb159d51b	row_cache: Reuse allocation_strategy::invalidate_references() Modification count in the tracker is redundant, we can rely on allocator's invalidation counter.	2017-09-13 17:38:08 +02:00
Tomasz Grabiec	27a3b4bca9	row_cache: Don't invalidate references on insertion modification_count is currently only used to detect invalidation of references, intended to be incremented on erasure. Insertion into intrusive set doesn't invalidate references, so no need to increment the counter.	2017-09-13 17:38:08 +02:00
Tomasz Grabiec	d22fdf4261	row_cache: Improve safety of cache updates Cache imposes requirements on how updates to the on-disk mutation source are made: 1) each change to the on-disk muation source must be followed by cache synchronization reflecting that change 2) The two must be serialized with other synchronizations 3) must have strong failure guarantees (atomicity) Because of that, sstable list update and cache synchronization must be done under a lock, and cache synchronization cannot fail to synchronize. Normally cache synchronization achieves no-failure thing by wiping the cache (which is noexcept) in case failure is detect. There are some setup steps hoever which cannot be skipped, e.g. taking a lock followed by switching cache to use the new snapshot. That truly cannot fail. The lock inside cache synchronizers is redundant, since the user needs to take it anyway around the combined operation. In order to make ensuring strong exception guarantees easier, and making the cache interface easier to use correctly, this patch moves the control of the combined update into the cache. This is done by having cache::update() et al accept a callback (external_updater) which is supposed to perform modiciation of the underlying mutation source when invoked. This is in-line with the layering. Cache is layered on top of the on-disk mutation source (it wraps it) and reading has to go through cache. After the patch, modification also goes through cache. This way more of cache's requirements can be confined to its implementation. The failure semantics of update() and other synchronizers needed to change due to strong exception guaratnees. Now if it fails, it means the update was not performed, neither to the cache nor to the underlying mutation source. The database::_cache_update_sem goes away, serialization is done internally by the cache. The external_updater needs to have strong exception guarantees. This requirement is not new. It is however currently violated in some places. This patch marks those callbacks as noexcept and leaves a FIXME. Those should be fixed, but that's not in the scope of this patch. Aborting is still better than corrupting the state. Fixes #2754. Also fixes the following test failure: tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed which started to trigger after commit `318423d50b`. Thread stack allocation may fail, in which case we did not do the necessary invalidation.	2017-09-04 10:04:29 +02:00

1 2 3 4

158 Commits