scylladb

Author	SHA1	Message	Date
Glauber Costa	2bffa8af74	logalloc: make sure allocations in release_requests don't recurse back into the allocator Calls like later() and with_gate() may allocate memory, although that is not very common. This can create a problem in the sense that it will potentially recurse and bring us back to the allocator during free - which is the very thing we are trying to avoid with the call to later(). This patch wraps the relevant calls in the reclaimer lock. This do mean that the allocation may fail if we are under severe pressure - which includes having exhausted all reserved space - but at least we won't recurse back to the allocator. To make sure we do this as early as possible, we just fold both release_requests and do_release_requests into a single function Thanks Tomek for the suggestion. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <980245ccc17960cf4fcbbfedb29d1878a98d85d8.1470254846.git.glauber@scylladb.com> (cherry picked from commit `fe6a0d97d1`)	2016-08-04 11:17:54 +02:00
Avi Kivity	d261927fa3	logalloc: change sprint() of a pointer to use void* explicitly Otherwise, fmtlib dislikes it.	2016-07-18 19:37:16 +03:00
Glauber Costa	4e81f19ab5	LSA: fix typo in region merge There are many potentially tricky things about referring to different regions from the LSA perspective. Madness, however, is not one of them. I can only assume we meant made? Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <8eb81f35de4b208a494e43cb392eea07b87b2bf1.1466534798.git.glauber@scylladb.com>	2016-06-21 22:58:44 +03:00
Tomasz Grabiec	e783b58e3b	Merge branch 'glommer/LSA-throttler-v6' from git@github.com:glommer/scylla.gi From Glauber: This is my new take at the "Move throttler to the LSA" series, except this one don't actually move anything anywhere: I am leaving all memtable conversion out, and instead I am sending just the LSA bits + LSA active reclaim. This should help us see where we are going, and then we can discuss all memtable changes in a series on its own, logically separated (and hopefully already integrated with virtual dirty). [tgrabiec: trivial merge conflicts in logalloc.cc]	2016-06-21 10:22:26 +02:00
Glauber Costa	579d121db8	LSA: export largest region We now keep the regions sorted by size, and the children region groups as well. Internally, the LSA has all information it needs to make size-based reclaim decisions. However, we don't do reclaim internally, but rather warn our user that a pressure situation is mounted. The user of a region_group doesn't need to evict the largest region in case of pressure and is free to do whatever it chooses - including nothing. But more likely than not, taking into account which region is the largest makes sense. This patch puts together this last missing piece of the puzzle, and exports the information we have internally to the user. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:51:00 -04:00
Glauber Costa	35f8a2ce2c	LSA: add a backpointer to the region from its private data Region is implemented using the pimpl pattern (region_impl), and all its relevant data is present in a private structure instead of the region itself. That private structure is the one that the other parts of the LSA will refer to, the region_group being the prime example. To allow classes such as the region_group the externally export a particular region, we will introduce a backpointer region_impl -> region. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:50:59 -04:00
Glauber Costa	38a402307d	LSA: enhance region_group reclaimer We are currently just allowing the region_group to specify a throttle_threshold, that triggers throttling when a certain amount of memory is reached. We would like to notify the callers that such condition is reached, so that the callers can do something to alleviate it - like triggering flushes of their structures. The approach we are taking here is to pass a reclaimer instance. Any user of a region_group can specialize its methods start_reclaiming and stop_reclaiming that will be called when the region_group becomes under pressure or ceases to be, respectively. Now that we have such facility, it makes more sense to move the throttle_threshold here than having it separately. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:50:59 -04:00
Glauber Costa	6404028c6a	LSA: move subgroups to a heap as well When we decide to evict from a specific region_group due to excessive memory usage, we must also consider looking at each of their children (subgroups). It could very well be that most of memory is used by one of the subgroups, and we'll have to evict from there. We also want to make sure we are evicting from the biggest region of all, and not the biggest region in the biggest region_group. To understand why this is important, consider the case in which the regions are memtables associated with dirty region groups. It could be that a very big memtable was recently flushed, and a fairly small one took its place. That region group is still quite large because the memtable hasn't finished flushing yet, but that doesn't mean we should evict from it. To allow us to efficiently pick which region is the largest, each root of each subtree will keep track of its maximal score, defined as the maximum between our largest region total_space and the maximum maximal score of subtrees. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:50:13 -04:00
Glauber Costa	e1eab5c845	LSA: store regions in a heap for regions_group Currently, the regions in a region group are organized in a simple vector. We can do better by using a binomial heap, as we do for segments, and then updating when there is change. Internally to the LSA, we are in good position to always know when change happens, so that's really the best way to do it. The end game here, is to easily call for the reclaim of the largest offending region (potentially asynchronously). Because of that, we aren't really interested in the region occupancy, but in the region reclaimable occuppancy instead: that's simply equal to the occupancy if the region is reclaimable, and 0 otherwise. Doing that effectively lists all non reclaimable regions in the end of the heap, in no particular order. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:50:13 -04:00
Glauber Costa	54d4d46cf7	LSA: move throttling code to LSA. The database code uses a throttling function to make sure that memory used for the dirty region never is over the limit. We track that with a region group, so it makes sense to move this as generic functionality into LSA. This patch implements the LSA-side functionality and a later patch will convert the current memtable throttler to use it. Unlike the current throttling mechanism, we'll not use a timer-based mechanism here. Aside from being more generic and friendlier towards other users, this is a good change for current memtable by itself. The constants - 10ms and 1MB chosen by the current throttler are arbitrary, and we would be better off without them. Let's discuss the merits of each separately: 1) 10ms timer: If we are throttling, we expect somebody to flush the memtables for memory to be released. Since we are in position to know exactly when a memtable was written, thus releasing memory, we can just call unthrottle at that point, instead of using a timer. 2) 1MB release threshold: we do that because we have no idea how much memory a request will use, so we put the cut somehow. However, because of 1) we don't call unthrottle through a timer anymore, and do it directly instead. This means that we can just execute the request and see how much memory it has used, with no need to guess. So we'll call unthrottle at the end of every request that was previously throttled. Writing the code this way also has the advantage that we need one less continuation in the common case of the database not being throttled. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-20 18:34:19 -04:00
Glauber Costa	7cd0c0731e	region_group: delete move constructor Tomek correctly points out that since we are now using "this" in lambda captures, we should make the region_group not movable. We currently define a move constructor, but there are no users. So we should just remove them. copy constructor is already deleted, and so are the copy and move assignment operators. So by removing the move constructor, we should be fine. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-15 22:26:50 -04:00
Tomasz Grabiec	cd9955d2ce	lsa: Reclaim 1 segment by default Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no decrease in throughput compared to the step of 16 segments in neither of these modes: - full segments, reclaim by random evicition - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274.	2016-06-14 15:13:15 +02:00
Tomasz Grabiec	86b76171a8	lsa: Use the same step in both internal and external reclamations	2016-06-14 15:13:15 +02:00
Tomasz Grabiec	d74d902a01	lsa: Make reclamation step configurable	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	93bb95bd0d	lsa: Log reclamation rate	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	cb18418022	lsa: Print more details before aborting	2016-06-14 15:13:14 +02:00
Piotr Jastrzebski	136b8148d2	Use idle CPU to compact LSA memory Register an idle CPU handler that compacts a single segment every time there's nothing better to execute on CPU. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>	2016-05-26 12:43:53 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	a0cba3c86f	logalloc: Introduce tracker::occupancy() Returns occupancy information for all memory allocated by LSA, including segment pools / zones.	2016-03-22 16:28:10 +01:00
Tomasz Grabiec	529c8b8858	logalloc: Rename tracker::occupancy() to region_occupancy()	2016-03-22 14:56:44 +01:00
Paweł Dziepak	338fd34770	lsa: update _closed_occupancy after freeing all segments _closed_occupancy will be used when a region is removed from its region group, make sure that it is accurate. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-17 11:12:05 +00:00
Paweł Dziepak	99b61d3944	lsa: set _active to nullptr in region destructor In region destructor, after active segments is freed pointer to it is left unchanged. This confuses the remaining parts of the destructor logic (namely, removal from region group) which may rely on the information in region_impl::_active. In this particular case the problem was that code removing from the region group called region_impl::occupancy() which was dereferencing _active if not null. Fixes #993. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>	2016-03-07 10:15:28 +01:00
Tomasz Grabiec	237819c31f	logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space Historically the purpose of the metric is to show how much memory is in standard allocations. After zones were introduced, this would also include free space in lsa zones, which is almost all memory, and thus the metric lost its original meaning. This change brings it back to its original meaning. Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>	2016-01-18 10:48:14 +02:00
Avi Kivity	c8b09a69a9	lsa: disable constant_time_size in binomial_heap implementation Corrupts heap on boost < 1.60, and not needed. Fixes #698.	2015-12-29 12:59:00 +01:00
Vlad Zolotarov	33552829b2	core: use steady_clock where monotinic clock is required Use steady_clock instead of high_resolution_clock where monotonic clock is required. high_resolution_clock is essentially a system_clock (Wall Clock) therefore may not to be assumed monotonic since Wall Clock may move backwards due to time/date adjustments. Fixes issue #638 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-12-27 18:07:53 +02:00
Lucas Meneghel Rodrigues	2167173251	utils/logalloc.cc - Declare member minimum_size from segment_zone struct This fixes compile error: In function `logalloc::segment_zone::segment_zone()': /home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>	2015-12-10 12:54:34 +02:00
Paweł Dziepak	0d66300d43	lsa: add more counters Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	83b004b2fb	lsa: avoid fragmenting memory Originally, lsa allocated each segment independently what could result in high memory fragmentation. As a result many compaction and eviction passes may be needed to release a sufficiently big contiguous memory block. These problems are solved by introduction of segment zones, contiguous groups of segments. All segments are allocated from zones and the algorithm tries to keep the number of zones to a minimum. Moreover, segments can be migrated between zones or inside a zone in order to deal with fragmentation inside zone. Segment zones can be shrunk but cannot grow. Segment pool keeps a tree containing all zones ordered by their base addresses. This tree is used only by the memory reclamer. There is also a list of zones that have at least one free segments that is used during allocation. Segment allocation doesn't have any preferences which segment (and zone) to choose. Each zone contains a free list of unused segments. If there are no zones with free segments a new one is created. Segment reclamation migrates segments from the zones higher in memory to the ones at lower addresses. The remaining zones are shrunk until the requested number of segments is reclaimed. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	40dda261f2	lsa: maintain segment to region mapping Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2e94086a2c	lsa: use bi::list to implement segment_stack Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Avi Kivity	0c2fba7e0b	lsa: advertize our preferred maximum allocation size Let managed_bytes know that allocating below a tenth of the segment size is the right thing to do.	2015-12-08 15:17:09 +02:00
Paweł Dziepak	89f7f746cb	lsa: fix printing object_descriptor::_alignment object_descriptor::_alignment is of type uint8_t which is actually an unsigned char. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 20:13:29 +01:00
Paweł Dziepak	65875124b7	lsa: guarantee that segment_heap doesn't throw boost::heap::binomial_heap allocates helper object in push() and, therefore, may throw an exception. This shouldn't happen during compaction. The solution is to reserve space for this helper object in segment_descriptor and use a custom allocator with boost::heap::binomial_heap. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 19:51:22 +01:00
Paweł Dziepak	273b8daeeb	lsa: add no-op default constructor for segment Zero initialization of segment::data when segment is value initialized is undesirable. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 16:37:37 +01:00
Paweł Dziepak	e6cf3e915f	lsa: add counters for memory used by large objects Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 16:36:27 +01:00
Paweł Dziepak	6b113a9a7a	lsa: fix eviction of large blobs LSA memory reclaimer logic assumes that the amount of memory used by LSA equals: segments_in_use * segment_size. However, LSA is also responsible for eviction of large objects which do not affect the used segmentcount, e.g. region with no used segments may still use a lot of memory for large objects. The solution is to switch from measuring memory in used segments to used bytes count that includes also large objects. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 16:29:09 +01:00
Paweł Dziepak	c37afcfdee	lsa: account for size of objects too big for LSA While the objects above max_manage_object_size aren't stored in the LSA segments they are still considered to be belonging to the LSA region and are evictable using that region evictor. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-16 12:22:12 +01:00
Paweł Dziepak	64f1c2866c	lsa: free segment in trim_emergency_reserve_to_max() _emergency_reserve is an intrusive containers and it doesn't care about segment lifetime. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-05 18:04:38 +02:00
Avi Kivity	16006949d0	logalloc: make migrator an object, not a function pointer The migrator tells lsa how to move an object when it is compacted. Currently it is a function pointer, which means we must know how to move the object at compile time. Making it an object allows us to build the migration function at runtime, making it suitable for runtime-defined types (such as tuples and user-defined types). In the future, we may also store the size there for fixed-size types, reducing lsa overhead. C++ variable templates would have made this patch smaller, but unfortunately they are only supported on gcc 5+.	2015-10-21 11:24:56 +02:00
Tomasz Grabiec	67d0f9c7df	lsa: Restore heap invariant before calling _segments.erase() This is certainly the right thing to do and seems to fix #403. However I didn't manage to convince myself that this would cause problems for binomial_heap, given that binomial_heap::erase() calls siftup() anyway: void erase(handle_type handle) { node_pointer n = handle.node_; siftup(n, force_inf()); top_element = n; pop(); } void increase (handle_type handle) { node_pointer n = handle.node_; siftup(n, *this); update_top_element(); sanity_check(); }	2015-10-20 15:18:05 +03:00
Avi Kivity	9c5a36efd0	logalloc: fix segment free in debug mode Must match allocation function.	2015-09-30 09:45:25 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Tomasz Grabiec	53caf5ecca	lsa: Fix segment heap corruption The segment heap is a max-heap, with sparser segments on the top. When we free from a segment its occupancy is decreased, but its position in the heap increases. This bug caused that we picked up segments for compaction in the wrong order. In extreme cases this can lead to a livelock, in some cases may just increase compaction latency.	2015-09-10 17:20:04 +03:00
Avi Kivity	6d0a2b5075	logalloc: don't invalidate merged region A region being merged can still be in use; but after merging, compaction_lock and the reclaim counter will no longer work. This can lead to use-after-compact-without-re-lookup errors. Fix by making the source region be the same as the target region; they will share compaction locks and reclaim counters, so lookup avoidance will still work correctly. Fixes #286.	2015-09-08 08:55:44 +02:00
Tomasz Grabiec	fecc87e601	lsa: stub allocation_section with default allocator memory::stats() always returns 0 as free memory which confuses guard::enter().	2015-09-07 17:23:02 +02:00
Paweł Dziepak	03f5827570	logalloc: add missing methods to DEFAULT_ALLOCATOR version Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 16:59:27 +02:00
Tomasz Grabiec	3b441416fa	lsa: Make segment size publicly accessible Some tests depend on segment size.	2015-09-06 21:25:44 +02:00
Tomasz Grabiec	c82325a76c	lsa: Make region evictor signal forward progress In some cases region may be in a state where it is not empty and nothing could be evicted from it. For example when creating the first entry, reclaimer may get invoked during creation before it gets linked. We therefore can't rely on emptiness as a stop condition for reclamation, the evction function shall signal us if it made forward progress.	2015-09-06 21:25:44 +02:00
Tomasz Grabiec	94f0db933f	lsa: Fix typo in the word 'emergency'	2015-09-06 21:24:59 +02:00
Tomasz Grabiec	200562abe7	lsa: Reclaim over-max segments from segment pool reserve	2015-09-06 21:24:59 +02:00

1 2

85 Commits