scylladb

Author	SHA1	Message	Date
Vlad Zolotarov	e581fd1463	loading_cache: make size() return the size of lru_list instead of loading_shared_values reloading flow may hold the items in the underlying loading_shared_values after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size() doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> (cherry picked from commit `1e56c7dd58`)	2018-09-06 16:57:22 +03:00
Vlad Zolotarov	b366bff998	loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' Reloading may hold value in the underlying loading_shared_values while the corresponding cache values have already been deleted. This may create weird situations like this: <populate cache with 10 entries> cache.remove(key1); for (auto& e : cache) { std::out << e << std::endl; } <all 10 entries are printed, including the one for "key1"> In order to avoid such situations we are going to make the loading_cache::iterator to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator because lru_list contains entries only for cached items. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> (cherry picked from commit `945d26e4ee`)	2018-09-06 16:57:12 +03:00
Tomasz Grabiec	74e61528a6	managed_vector: Make external_memory_usage() ignore reserved space This ensures that row::external_memory_usage() is invariant to insertion order of cells. It should be so, so that accounting of a clustering_row, merged from multiple MVCC versions by the partition_snapshot_flat_reader on behalf of a memtable flush, doesn't give a greater result than what is used by the memtable region. Overaccounting leads to assertion failure in ~flush_memory_accounter. Fixes #3625 (hopefully). Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `4fb3f7e8eb`)	2018-09-03 19:58:10 +03:00
Duarte Nunes	5eb4fde2d5	Merge 'utils::loading_cache: improve reload() robustness' from Vlad "This series introduces a few improvements related to a reload flow. From now on the callback may assume that the "key" parameter value is kept alive till the end of its execution in the reloading flow. It may also safely evict as many items from the cache as needed." Fixes #3606 * 'loading_cache_improve_reload-v1' of https://github.com/vladzcloudius/scylla: utils::loading_cache: hold a shared_value_ptr to the value when we reload utils::loading_cache::on_timer(): remove not needed capture of "this" utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload (cherry picked from commit `f6aadd8077`)	2018-08-29 10:12:32 +01:00
Duarte Nunes	cc0703f8ca	utils/loading_cache: Avoid using invalidated iterators When periodically reloading the values in the loading_cache, we would iterate over the list of entries and call the load() function for those which need to be reloaded. For some concrete caches, load() can remove the entry from the LRU set, and can be executed inline from the parallel_for_each(). This means we could potentially keep iterating using an invalidated iterator. Fix this by using a temporary container to hold those entries to be reloaded. Spotted when reading the code. Also use if constexpr and fix the comment in the function containing the changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180712124143.13638-1-duarte@scylladb.com> (cherry picked from commit `63b63b0461`)	2018-08-29 10:12:11 +01:00
Botond Dénes	678283a5bb	loading_cache::reload(): obtain key before calling _load() The continuation attached to _load() needs the key of the loaded entry to check whether it was disposed during the load. However if _load() invalidates the entry the continuation's capture line will access invalid memory while trying to obtain the key. To avoid this save a copy of the key before calling _load() and pass it to both _load() and the continuation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b571b73076ca863690f907fbd3fb4ff54e597b28.1531393608.git.bdenes@scylladb.com> (cherry picked from commit `2e7bf9c6f9`)	2018-08-29 10:12:03 +01:00
Tomasz Grabiec	01165a9ae7	database: Make soft-pressure memtable flusher not consider already flushed memtables The flusher picks the memtable list which contains the largest region according to region_impl::evictable_occupancy().total_space(), which follows region::occupancy().total_space(). But only the latest memtable in the list can start flushing. It can happen that the memtable corresponding to the largest region was already flushed to an sstable (flush permit released), but not yet fsynced or moved to cache, so it's still in the memtable list. The latest memtable in the winning list may be small, or empty, in which case the soft pressure flusher will not be able to make much progress. There could be other memtable lists with non-empty (flushable) latest memtables. This can lead to writes unnecessarily blocking on dirty. I observed this for the system memtable group, where it's easy for the memtables to overshoot small soft pressure limits. The flusher kept trying to flush empty memtables, while the previous non-empty memtable was still in the group. The CPU scheduler makes this worse, because it runs memtable_to_cache in a separate scheduling group, so it further defers in time the removal of the flushed memtable from the memtable list. This patch fixes the problem by making regions corresponding to memtables which started flushing report evictable_occupancy() as 0, so that they're picked by the flusher last. Fixes #3716. Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `1e50f85288`)	2018-08-26 18:05:35 +03:00
Tomasz Grabiec	5cdb963768	logalloc: Make evictable_occupancy() indicate no free space Doesn't fix any bug, but it's closer to the truth that all segments are used rather than none is used. Message-Id: <1535040132-11153-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `364418b5c5`)	2018-08-26 18:05:35 +03:00
Calle Wilund	d9c178063c	sstables::compress: Ensure unqualified compressor name if possible Fixes #3546 Both older origin and scylla writes "known" compressor names (i.e. those in origin namespace) unqualified (i.e. LZ4Compressor). This behaviour was not preserved in the virtualization change. But probably should be. Message-Id: <20180627110930.1619-1-calle@scylladb.com> (cherry picked from commit `054514a47a`)	2018-06-27 17:01:22 +03:00
Avi Kivity	9a7ecdb3b9	Merge "Deglobalise cache_tracker" from Paweł " Cache tracker is a thread-local global object that indirectly depends on the lifetimes of other objects. In particular, a member of cache_tracker: mutation_cleaner may extend the lifetime of a mutation_partition until the cleaner is destroyed. The mutation_partition itself depends on LSA migrators which are thread-local objects. Since, there is no direct dependency between LSA-migrators and cache_tracker it is not guarantee that the former won't be destroyed before the latter. The easiest (barring some unit tests that repeat the same code several billion times) solution is to stop using globals. This series also improves the part of LSA sanitiser that deals with migrators. Fixes #3526. Tests: unit(release) " * tag 'deglobalise-cache-tracker/v1-rebased' of https://github.com/pdziepak/scylla: mutation_cleaner: add disclaimer about mutation_partition lifetime lsa: enhance sanitizer for migrators lsa: formalise migrator id requirements row_cache: deglobalise row cache tracker	2018-06-26 16:38:12 +01:00
Paweł Dziepak	55bf9d78a6	lsa: enhance sanitizer for migrators Current LSA sanitizer performs only basic checks on the migrators use, without doing any additonal reporting in case an error is detected. This patch enhances it so that when a problem is detected relevant stack traces get printed.	2018-06-25 09:37:43 +01:00
Paweł Dziepak	fcd9b1f821	lsa: formalise migrator id requirements object_descriptor uses special encoding for migrator ids which assumes that the valid ones are in a range smaller than uint32_t. Let's add some static asserts that make this fact more visible.	2018-06-25 09:37:43 +01:00
Paweł Dziepak	b4c5e1a6d4	utils: add reusable_buffer This commit adds a helper class reusable_buffer which can be used to avoid excessive memory allocations of large buffers when bytes_ostream needs to be linearised. The idea is that reusable_buffer in most cases is going to be thread local so that multiple continuation chains can reuse the same large buffer.	2018-06-25 09:21:47 +01:00
Gleb Natapov	b38ced0fcd	Configure logalloc memory size during initialization	2018-06-11 15:34:14 +03:00
Paweł Dziepak	ba5e64383a	utils: add metaprogramming helper functions	2018-05-31 10:09:01 +01:00
Paweł Dziepak	c41b9fc7ec	utils: add fragment range This patch introduces a FragmentRange concept which is the minimal interface all classes representing a fragmented buffer should satisfy.	2018-05-31 10:09:01 +01:00
Tomasz Grabiec	db36ff0643	utils: Extract small_vector.hh	2018-05-30 14:41:41 +02:00
Tomasz Grabiec	a19c5cbc16	Introduce a coroutine wrapper Represents a deferring operation which defers cooperatively with the caller. The operation is started and resumed by calling run(), which returns with stop_iteration::no whenever the operation defers and is not completed yet. When the operation is finally complete, run() returns with stop_iteration::yes. This allows the caller to: 1) execute some post-defer and pre-resume actions atomically 2) have control over when the operation is resumed and in which context, in particular the caller can cancel the operation at deferring points. It will be used to implement deferring partition_version::apply_to_incomplete().	2018-05-30 14:41:40 +02:00
Tomasz Grabiec	e5aa02efeb	mvcc: Introduce partition_version_list	2018-05-30 12:18:56 +02:00
Vlad Zolotarov	3114cef42c	loading_shared_values: introduce the templated find() overload This overload alows searching the elements by an arbitrary key as long as it is "hashable" to the same values as the default key and if there is a comparator for this new key. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-22 20:15:00 -04:00
Vlad Zolotarov	34620deee4	utils::loading_cache: add remove(key)/remove(iterator) methods remove(key): removes the entry with the given key if exists, otherwise does nothing. remote(iterator): removes an entry by a given iterator (returned from loading_cache::find()). Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-22 20:05:00 -04:00
Tomasz Grabiec	498a4132c5	lsa: Add use for debug::static_migrators Otherwise GDB complains about it being optimized out, breaking our debug scritps.	2018-05-17 14:22:14 +02:00
Avi Kivity	05cec4a265	Merge "Reduce LSA memory reclamation overhead" from Tomasz " Main optimization is in the patch titled "lsa: Reduce amount of segment compactions". I measured 50% reduction of cache update run time in a steady state for an append-only workload with large partition, in perf_row_cache_update version from: `c3f9e6ce1f/tests/perf_row_cache_update.cc` Other workloads, and other allocation sites probably also could see the improvement. " * tag 'tgrabiec/reduce-lsa-segment-compactions-v1' of github.com:tgrabiec/scylla: lsa: Expose counters for allocation and compaction throughput lsa: Reduce amount of segment compactions lsa: Avoid the call to segment_pool::descriptor() in compact() lsa: Make reclamation on reserve refill more efficient	2018-05-16 10:24:20 +03:00
Tomasz Grabiec	4fdd61f1b0	lsa: Expose counters for allocation and compaction throughput Allow observing amplification induced by segment compaction.	2018-05-15 21:49:01 +02:00
Tomasz Grabiec	3775a9ecec	lsa: Reduce amount of segment compactions Reclaiming memory through segment compaction is expensive. For occupancy of 85%, in order to reclaim one free segment, we need to compact 7 segments, by migrating 6 segments worth of data. This results in significant amplification. Compaction involves moving objects, which in some cases is expensive in itself as well (See https://github.com/scylladb/scylla/issues/3247). This patch reduces amount of segment compactions in favor of doing more eviction. It especially helps workloads in which LRU order matches allocation order, in which case there will be no segment compaction, and just eviction. In perf_row_cache_update test case for large partition with lots of rows, which simulates appending workload, I measured that for each new object allocated, 2 need to be migrated, before the patch. After the patch, only 0.003 objects are migrated. This reduces run time of cache update part by 50%.	2018-05-15 21:49:01 +02:00
Glauber Costa	2ba08178ca	large_bitset: be more accurate with memory usage We are slightly underestimating the amount of memory we use. Now that the chunked vector can exports its internal memory usage we can use that directly. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-15 11:22:21 -04:00
Glauber Costa	7190bb4f95	chunked_vector: exports its current memory usage There are times in which we would like to estimate how much memory a chunked_vector is using. We have two strategies to do it: 1) multiply the size by the size of the elements. That is wrong, because the chunked_vector can allocate larger chunks in anticipation of more elements to come. 2) multiply the number of chunks by 128kB. That is also wrong, because the chunk_vector will not always allocate the entire chunk if there are only a few elements in it. The best way to deal with it is to allow the chunked_vector to exports its current memory usage. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-15 11:22:21 -04:00
Tomasz Grabiec	8faafdaae5	lsa: Avoid the call to segment_pool::descriptor() in compact()	2018-05-11 19:07:23 +02:00
Tomasz Grabiec	19edf3970e	lsa: Make reclamation on reserve refill more efficient Currently reserve refill allocates segments repeatedly until the reserve threhsold is met. If single segment allocation needs to reclaim memory, it will ask the reclaimer for one segment. The reclaimer could make better decisions if it knew the total number of segments we try to allocate. In particular, it would not attempt to compact any segment until it evicts total amount of memory first, which may reduce the total amount of segment compactions during refill. This patch changes refill to increase reclamation step used by allocate_segment() so that it matches the total amount of memory we refill.	2018-05-11 19:07:23 +02:00
Vladimir Krivopalov	e5477c6c6c	utils: Use dedicated enum for Bloom filter format instead of a boolean. It better reflects the purpose of the parameter and provides better type-safety. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <10a4fc16dafa0fb3234969041f68f9e7bfc61312.1525899669.git.vladimir@scylladb.com>	2018-05-10 09:47:41 +03:00
Paweł Dziepak	c6c5accd19	lsa: provide migrator with the object size While the migration function should have enough information to obtain the object size itself, the LSA logic needs to compute it as well. IMR is going to make calculating object sizes more expensive, so by providing the infromation to the migrator we can avoid some needless operations.	2018-05-09 16:52:26 +01:00
Paweł Dziepak	884888dc11	lsa: add free() that does not require object size It is non-trivial to get the size of an IMR object. However, the standard allocator doesn't really need it and LSA can compute it itself by asking the migrator.	2018-05-09 16:52:26 +01:00
Paweł Dziepak	f7438a8b96	mutable_view: add default constructor and const_iterator Makes the interface more consistent with bytes_view.	2018-05-09 16:52:26 +01:00
Paweł Dziepak	b1bec336b3	lsa: sanitize use of migrators Having migrators dynamically registered and deregistered opens a new class of bugs. This patch adds some additional checks in the debug mode with the hopes of catching any misuse early.	2018-05-09 16:52:26 +01:00
Paweł Dziepak	cca9f8c944	lsa: reuse registered migrator ids With the introduction of the new in-memory representation we will get type- and schema-dependent migrators. Since there is no bound how many times they can be created and destroyed it is better to be safe and reuse registered migrator ids.	2018-05-09 16:52:20 +01:00
Paweł Dziepak	b3699f286d	lsa: make migrators table thread-local Migrators can be registered and deregistered at any time. If the table is not thread-local we risk race conditions.	2018-05-09 16:10:46 +01:00
Paweł Dziepak	920131b2f7	Merge "mvcc: Fix partition_snapshot::merge_partition_versions() to not leave latest versions unmerged" from Tomasz "Fixes a bug in partition_snapshot::merge_partition_versions(), which would not attempt merging if the snapshot is attached to the latest version (in which case _version is nullptr and _entry is != nullptr). This would cause partition_version objects to accumulate if there was an older snapshot and it went away before the latest snapshot. Versions will be removed when the whole entry goes away (flush or eviction). May cause performance problems. Fixes #3402." * 'tgrabiec/fix-merge_partition_versions' of github.com:tgrabiec/scylla: mvcc: Test version merging when snapshots go away anchorless_list: Make ranges conform to SinglePassRange anchorless_list: Drop deprecated use of std::iterator mvcc: Fix partition_snapshot::merge_partition_versions() to not leave latest versions unmerged	2018-05-09 15:10:56 +01:00
Vladimir Krivopalov	0f37c0e684	Support Bloom filter format used in SSTables 3.0. The two hash values, base and increment, used to produce indices for setting bits in the filter, have been swapped in SSTables 3.0. See CASSANDRA-8413 for details. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-08 15:28:27 -07:00
Vladimir Krivopalov	fe2358e8bd	Remove unused overload of i_filter::get_filter(). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-08 15:28:18 -07:00
Vladimir Krivopalov	8b8c9a5d10	Add class for tracking both extremum values (min and max) on updates. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Tomasz Grabiec	0455a19ce0	anchorless_list: Make ranges conform to SinglePassRange They were missing const version of iterators as well as iterator and const_iterator member type aliases.	2018-04-30 18:45:32 +02:00
Tomasz Grabiec	9b7e49ef35	anchorless_list: Drop deprecated use of std::iterator	2018-04-30 18:45:32 +02:00
Avi Kivity	7161244130	Merge seastar upstream * seastar 70aecca...ac02df7 (5): > Merge "Prefix preprocessor definitions" from Jesse > cmake: Do not enable warnings transitively > posix: prevent unused variable warning > build: Adjust DPDK options to fix compilation > io_scheduler: adjust property names DEBUG, DEFAULT_ALLOCATOR, and HAVE_LZ4_COMPRESS_DEFAULT macro references prefixed with SEASTAR_. Some may need to become Scylla macros.	2018-04-29 11:03:21 +03:00
Vladimir Krivopalov	f6f99919da	Factor out min_tracker and max_tracker as common helpers. They will be re-used for collecting encoding statistics which is needed to write SSTables 3.0. Part of #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-25 14:58:47 -07:00
Piotr Jastrzebski	fdad8eba97	buffer_input_stream: make it possible to specify chunk size This will allow to force input stream to return its data in chunks of a specified size. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-16 21:11:13 +02:00
Piotr Jastrzebski	cc6e619aa9	Introduce make_limiting_data_source This method takes a data_source and returns another data_source that returns data from the input source but in chunks of limited size. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-16 20:56:30 +02:00
Avi Kivity	fc488adc72	logalloc: remove segment_descriptor::_lsa_managed _lsa_managed is always 1:1 with _region, so we can remove it, saving some space in the segment descriptor vector. Tests: unit (release), logalloc_test (debug) Message-Id: <20180410122606.10671-1-avi@scylladb.com>	2018-04-10 13:54:38 +01:00
Glauber Costa	b2f9958071	large_bitset: use a chunked_vector internally and simplify API save and load functions for the large_bitset were introduced by Avi with `d590e327c0`. In that commit, Avi says: "... providing iterator-based load() and save() methods. The methods support partial load/save so that access to very large bitmaps can be split over multiple tasks." The only user of this interface is SSTables. And turns out we don't really split the access like that. What we do instead is to create a chunked vector and then pass its begin() method with position = 0 and let it write everything. The problem here is that this require the chunked vector to be fully initialized, not just reserved. If the bitmap is large enough that in itself can take a long time without yielding (up to 16ms seen in my setup). We can simplify things considerably by moving the large_bitset to use a chunked vector internally: it already uses a poor man's version of it by allocating chunks internally (it predates the chunked_vector). By doing that, we can turn save() into a simple copy operation, and do away with load altogether by adding a new constructor that will just copy an existing chunked_vector. Fixes #3341 Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180409234726.28219-1-glauber@scylladb.com>	2018-04-10 10:25:06 +03:00
Avi Kivity	2c670f6161	logalloc: limit std segment allocations in debug mode Address Sanitizer has a global limit on the number of allocations (note: not number of allocations less number of frees, but cumulative number of allocations). Running some tests in debug mode on a machine with sufficient memory can break that limit. Work around that limit by restricting the amount of memory the debug mode segment_pool can allocate. It's also nicer for running the test on a workstation.	2018-04-07 21:04:10 +03:00
Avi Kivity	2baa16b371	logalloc: introduce prime_segment_pool() To segregate std and lsa allocations, we prime the segment pool during initialization so that lsa will release lower-addressed memory to std, rather than lsa and std competing for memory at random addresses. However, tests often evict all of lsa memory for their own purposes, which defeats this priming. Extract the functionality into a new prime_segment_pool() function for use in tests that rely on allocation segregation.	2018-04-07 14:52:58 +03:00

1 2 3 4 5 ...

538 Commits