scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	32f711ce56	row_cache: Fix crash on memtable flush with LCS Presence checker is constructed and destroyed in the standard allocator context, but the presence check was invoked in the LSA context. If the presence checker allocates and caches some managed objects, there will be alloc-dealloc mismatch. That is the case with LeveledCompactionStrategy, which uses incremental_selector. Fix by invoking the presence check in the standard allocator context. Fixes #4063. Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com>	2019-01-15 16:53:36 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Paweł Dziepak	df1d438fcd	row_cache: drop support for streamed_mutation::forwarding::yes entirely	2018-12-20 13:27:25 +00:00
Paweł Dziepak	adcb3ec20c	row_cache: read is not single-partition if inter-partition forwarding is enabled	2018-12-20 13:27:25 +00:00
Paweł Dziepak	7ecee197c4	row_cache: use make_forwardable() to implement streamed_mutation::forwarding Implementing intra-partition fast-forwarding adds more complexity to already very-much-not-trivial cache readers and isn't really critical in any way since it is not used outside of the tests. Let's use the generic adapter instead of natively implementing it.	2018-12-20 13:27:25 +00:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Avi Kivity	8cca3b2879	row_cache: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Paweł Dziepak	637b9a7b3b	atomic_cell_or_collection: make operator<< show cell content After the new in-memory representation of cells was introduced there was a regression in atomic_cell_or_collection::operator<< which stopped printing the content of the cell. This makes debugging more incovenient are time-consuming. This patch fixes the problem. Schema is propagated to the atomic_cell_or_collection printer and the full content of the cell is printed. Fixes #3571. Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>	2018-10-24 13:29:51 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Tomasz Grabiec	567da3e063	memtable, cache: Fix exception safety of partition entry insertions boost::intrusive::set::insert() may throw if keys require linearization and that fails, in which case we will leak the entry. When this happens in cache, we will also violate the invariant for entry eviction, which assumes all tracked entries are linked, and cause a SEGFAULT. Use the non-throwing and faster insert_before() instead. Where we can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure that entry is deallocated on insert failure. Fixes #3585.	2018-07-17 16:30:01 +02:00
Tomasz Grabiec	074be4d4e8	memtable, cache: Run mutation_cleaner worker in its own scheduling group The worker is responsible for merging MVCC snapshots, which is similar to merging sstables, but in memory. The new scheduling group will be therefore called "memory compaction". We should run it in a separate scheduling group instead of main/memtables, so that it doesn't disrupt writes and other system activities. It's also nice for monitoring how much CPU time we spend on this.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	6c6ffaee71	mutation_cleaner: Make merge() redirect old instance to the new one If memtable snapshot goes away after memtable started merging to cache, it would enqueue the snapshots for cleaning on the memtable's cleaner, which will have to clean without deferrring when the memtable is destroyed. That may stall the reactor. To avoid this, make merge() cause the old instance of the cleaner to redirect to the new instance (owned by cache), like we do for regions. This way the snapshots mentioned earlier can be cleaned after memtable is destroyed, gracefully.	2018-06-27 21:51:04 +02:00
Paweł Dziepak	96b0577343	row_cache: deglobalise row cache tracker Row cache tracker has numerous implicit dependencies on ohter objects (e.g. LSA migrators for data held by mutation_cleaner). The fact that both cache tracker and some of those dependencies are thread local objects makes it hard to guarantee correct destruction order. Let's deglobalise cache tracker and put in in the database class.	2018-06-25 09:37:43 +01:00
Tomasz Grabiec	78274276f5	row_cache: Use the memtable cleaner to create memtable snapshot during update Memtable entries should be cleaned using memtable cleaner, which unlike the cache' cleaner is not associated with the cache tracker. It's an error to clean a snapshot using tracker which doesn't own the entries. This will corrupt cache tracker's row counter. Fixes failure of test_exception_safety_of_update_from_memtable from row_cache.cc in debug mode and with allocation failure injection enabled. Introduce in "cache: Defer during partition merging" (`70c72773be`). Message-Id: <1528988256-20578-1-git-send-email-tgrabiec@scylladb.com>	2018-06-14 18:03:02 +03:00
Tomasz Grabiec	8d66f6da58	cache: real_dirty_memory_accounter: Move unpinning out of the hot path Instead of calling into real dirty memory manager per row, call it per deferring point.	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	5bc201df10	cache: Release dirty memory with row granularity	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	70c72773be	cache: Defer during partition merging	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	1792be3697	cache: Propagate phase to apply_to_incomplete() It will be needed to create snapshots with appropriate phase markers.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	494cb3f3da	cache: Prepare for incremental apply_to_incomplete() Incremental merging will be implemented by the means of resumable functions, which return stop_iteration::no when not yet finished. We're not using futures, so that the caller can do work around preemption points as well.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	6ecda1ccd7	cache: Extract real_dirty_memory_accounter	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	3f19f76c67	mvcc: Destroy memtable partition versions gently Now all snapshots will have a mutation_cleaner which they will use to gently destroy freed partition_version objects. Destruction of memtable entries during cache update is also using the gentle cleaner now. We need to have a separate cleaner for memtable objects even though they're owned by cache's region, because memtable versions must be cleared without a cache_tracker. Each memtable will have its own cleaner, which will be merged with the cache's cleaner when memtable is merged into cache. Fixes some sources of reactor stalls on cache update when there are large partition entries in memtables.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	f0c1edd672	cache: Destroy partition versions incrementally Instead of destroying whole partition_versions at once, we will do that gently using mutation_cleaner to avoid reactor stalls. Large deletions could happen when large partition gets invalidated, upgraded to a new schema, or when it's abandaned by a detached snapshot. Refs #3289.	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	2f75212ca4	cache: Define trivial methods inline They have users in a different compilation unit, in partition_version.cc	2018-05-30 12:18:56 +02:00
Botond Dénes	f488ae3917	Add buffer_size() to flat_mutation_reader buffer_size() exposes the collective size of the external memory consumed by the mutattion-fragments in the flat reader's buffer. This provides a basis to build basic memory accounting on. Altought this is not the entire memory consumption of any given reader it is the most volatile component and usually by far the largest one too.	2018-03-13 10:34:34 +02:00
Avi Kivity	0ebfe448e3	Merge "Row-level eviction" from Tomasz " This series switches granularity of memory-pressure-induced eviction in cache from a partition to a row. Since `9b21a9b` cache can store partial partitions with row granularity but they were still evicted as a unit. This is problematic for the following reasons: - more is evicted than necessary, which decreases cache efficiency. In the worst case, whole cache gets evicted at once - evicting large amounts of memory (large partitions) at once may impact latency badly Fixes #2576. See the documentation added in patch titled "doc: Document row cache eviction" for details on how eviction works. Open issues to be fixed incrementally: - range tombstones are not evictable - cache update still has partition granularity, which causes bad latency on memtable flush with large partitions " * tag 'tgrabiec/row-level-eviction-v3' of github.com:scylladb/seastar-dev: (43 commits) doc: Document row cache eviction tests: cache: Add tests for row-level eviction tests: cache: Check that data is evictable after schema change tests: cache: Move definitions to the top tests: perf_cache_eviction: Switch eviction counter to row granularity tests: row_cache_alloc_stress: Avoid quadratic behavior cache: Introduce unlink_from_lru() cache: Add row-level stats about cache update from memtable mvcc: Propagate information if insertion happened from ensure_entry_if_complete() cache: Track number of rows and row invalidations cache: Evict with row granularity cache: Track static row insertions separately from regular rows tests: mvcc: Use apply_to_incomplete() to create versions tests: mvcc: Fix test_apply_to_incomplete() tests: cache: Do not depend on particular granularity of eviction tests: cache: Make sure readers touch rows in test_eviction() mvcc: Store complete rows in each version in evictable entries mvcc: Introduce partition_snapshot_row_cursor::ensure_entry_in_latest() tests: cache: Invoke partial eviction in test_concurrent_reads_and_eviction cache: Ensure all evictable partition_versions have a dummy after all rows ...	2018-03-07 17:57:07 +02:00
Tomasz Grabiec	641bcd0b35	cache: Introduce unlink_from_lru() Will be used in row_cache_alloc_stress to unlink partitions which we don't want to get evicted, instead of reapeatedly calling touch() on them after each subsequent population. After switching to row-level LRU, doing so greatly increases run time of the test due to quadratic behavior.	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	b9d22584bb	cache: Add row-level stats about cache update from memtable	2018-03-07 16:52:58 +01:00
Tomasz Grabiec	ad7e2f7460	cache: Add back parition count argument to row_cache_update_one_batch_end probe sebug/scylla_row_cache_report.stp expects it. Removed in `c4974392b7`. Message-Id: <1520412152-10680-1-git-send-email-tgrabiec@scylladb.com>	2018-03-07 11:15:56 +02:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	dce9185fc9	cache: Track static row insertions separately from regular rows So that row eviction counter, which doesn't look at the static row, can be in sync with row insertion counter.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	30df3ddd7d	cache: Do not evict from cache_entry destructor We will need to propagate a cache_tracker reference to evict(). Instead of evicting from destructor, do so before cache_entry gets unlinked from the tree. Entries which are not linked, don't need to be explicitly evicted.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	4efab6f6a6	cache: Use on_evicted() in cache_tracker::clear() In preparation for switching LRU to row level.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	2118bdce01	cache: Extract cache_entry::on_evicted()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	24c5949518	cache: cache_tracker: Rename on_merge() to on_partition_merge()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	d66e864310	cache: cache_tracer: Rename on_erase() to on_partition_erase()	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	b0b57b8143	mvcc: Do not move unevictable snapshots to cache Commit `6ccd317` introduced a bug in partition_entry::evict() where a partition entry may be partially evicted if there are non-evictable snapshots in it. Partially evicting some of the versions may violate consistency of a snapshot which includes evicted versions. For one, continuity flags are interpreted realtive to the merged view, not within a version, so evicting from some of the versions may mark reanges as continuous when before they were discontinuous. Also, range tombtsones of the snapshot are taken from all versions, so we can't partially evict some of them without marking all affected ranges as discontinuous. The fix is to revert back to full eviciton, and avoid moving non-evictable snapshots to cache. When moving whole partition entry to cache, we first create a neutral empty partition entry and then merge the memtable entry into it just like we would if the entry already existed. Fixes #3215. Tests: unit (release) Message-Id: <1518710592-21925-2-git-send-email-tgrabiec@scylladb.com>	2018-02-15 16:48:07 +00:00
Glauber Costa	c4974392b7	allow update_cache and clear_gently to use the entire task quota. We have had a quota of partitions to process in clear_gently / update_cache, so that we don't overwork. However, with those things now being in their own task group there is no harm in allowing it to run until we reach a natural preemption point. While we are at it, clear_gently did not check for need_preempt() before, so this patch fixes it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	a3a4d0a17a	row_cache: actually use the scheduling group for update_cache We have moved clear_gently from using a seastar::thread's scheduling_group to using the CPU scheduler's. However, update_cache was forgotten. This patch fixes that and gets rid of the old group just in case. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Piotr Jastrzebski	39ec13133f	row_cache: rename make_flat_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	0f45df96ca	row_cache: Delete unused make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:54:45 +01:00
Piotr Jastrzebski	d266eaa01e	mutation_source: rename make_flat_mutation_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-19 09:30:12 +01:00
Glauber Costa	5140aaea00	add a timeout to fast forward to In the last patch, we enabled per-request timeouts, we enable timeouts in fill_buffer. There are many places, though, in which we fast_forward_to before we fill_buffer, so in order to make that effective we need to propagate the timeouts to fast_forward_to as well. In the same way as fill_buffer, we make the argument optional wherever possible in the high level callers, making them mandatory in the implementations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:19 -05:00
Glauber Costa	d965af42b0	add a timeout to fill_buffer As part of the work to enable per-request timeouts, we enable timeouts in fill_buffer. The argument is made optional at the main classes, but mandatory in all the ::impl versions. This way we'll make sure we didn't forget anything. At this point we're still mostly passing that information around and don't have any entity that will act on those timeouts. In the next patch we will wire that up. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Duarte Nunes	2618209c2d	Remove obsolete includes and fix build move.hh was deleted, but files weren't updated to reflect that. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-12-28 12:03:44 +00:00
Tomasz Grabiec	7b36c8423c	row_cache: Fix single_partition_populating_reader not waiting on create_underlying() to resolve Results in undefined behavior. Message-Id: <1513691679-27081-1-git-send-email-tgrabiec@scylladb.com>	2017-12-19 16:12:11 +02:00
Raphael S. Carvalho	928beae242	Fix compilation of db/hints/manager.cc and row_cache.cc compiler: gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) Problems introduced in `f6a461c7a4` and `37b19ae6ba`, respectively. They both fail to compile due to use of method in lambda without explicit mention of this. Some of failure is fixed by not using auto in lambda parameter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20171218222144.12297-1-raphaelsc@scylladb.com>	2017-12-19 11:15:45 +01:00
Piotr Jastrzebski	14d98aaa0b	Rename row_cache::create_underlying_flat_reader to create_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00
Piotr Jastrzebski	49993e56a9	Remove unused row_cache::create_underlying_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-18 16:37:57 +01:00

1 2 3 4 5 ...

257 Commits