scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 03:20:37 +00:00

Author	SHA1	Message	Date
Nadav Har'El	f55bdea364	compaction manager: avoid spurious "asked to stop" message at the end of the log This patch removes the log message about "compaction_manager - Asked to stop" at the very end of Scylla runs. This log message is confusing because it only has the "asked to stop" part, without finally a "stopped", and may lead a user to incorrectly fear that the shutdown hung - when it in fact finished just fine. The database object holds a compaction_manager and stop()s it when the database is stop()ed - and that is the very last thing our shutdown does. However, much earlier, as the first shutdown operation (i.e., the last at_exit() in main.cc), we already stop() the compaction manager. The second stop() call does nothing, but unfortunately prints the log message just before checking if it has anything to stop. So this patch just moves the log message to after the check. Fixes #4238. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217142657.19963-1-nyh@scylladb.com>	2019-02-21 12:32:47 +01:00
Raphael S. Carvalho	f5301990fc	compaction: release reference of cleaned sstable in compaction manager Compaction manager holds reference to all cleaning sstables till the very end, and that becomes a problem because disk space of cleaned sstables cannot be reclaimed due to respective file descriptors opened. Fixes #3735. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181221000941.15024-1-raphaelsc@scylladb.com>	2019-01-08 14:14:01 +02:00
Raphael S. Carvalho	3d9566e40d	compaction: introduce notion of compaction-strategy-aware major compaction That's only the very first step which introduces the machinery for making major compaction aware of all strategies. By the time being, default implementation is used for them all which only suits size tiered. Refs #1431. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-12-06 18:22:30 -02:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	e88d1d54b9	sstables/compaction_manager: prevent partial run from being selected for compaction Filter out sstable belonging to a partial run being generated by an ongoing compaction. Otherwise, that could lead to wrong decisions by the compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:22 -02:00
Raphael S. Carvalho	fc92fb955d	sstables/compaction_manager: release reference to exhausted sstable through callback That's important for the reference to sstable to not be kept throughout the compaction procedure, which would break the goal of releasing space during compaction. Manager passes a callback to compaction which calls it whenever there's sstable replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:16 -02:00
Avi Kivity	455f00e993	sstables: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Benny Halevy	44e5c2643b	compaction_manager::maybe_stop_on_error: add stop_iteration param some call sites are stopping in any case, regardless of what maybe_stop_on_error returns. Reflect that in the log messages. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-2-bhalevy@scylladb.com>	2018-10-24 18:39:52 +03:00
Raphael S. Carvalho	dfd1e1229e	sstables/compaction_manager: fix typo in function name to reevaluate postponed compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180702185343.26682-1-raphaelsc@scylladb.com>	2018-07-05 18:54:14 +03:00
Raphael S. Carvalho	7d6af5da3a	sstables/compaction_manager: properly reevaluate postponed compactions for leveled strategy Function to reevaluate postponed compaction was called too early for strategies that don't allow parallel compaction (only leveled strategy (LCS) at this moment). Such strategies must first have the ongoing compaction deregistered before reevaluating the postponed ones. Manager uses task list of ongoing compaction to decides if there's ongoing compaction for a given column family. So compaction could stop making progress at all if and only if we stop flushing new data. So it could happen that a column family would be left with lots of pending compaction, leading the user to think all compacting is done, but after reboot, there will be lots of compaction activity. We'll both improve method to detect parallel compaction here and also add a call to reevaluate postponed compaction after compaction is done. Fixes #3534. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180702185327.26615-1-raphaelsc@scylladb.com>	2018-07-04 16:30:21 +01:00
Gleb Natapov	59da525e0d	Provide available memory size to compaction_manager object during creation	2018-06-11 15:34:14 +03:00
Glauber Costa	d4e7783188	compaction_manager: disable backlog tracker if we see an exception If we see an exception when adding or removing SSTables from the backlog tracker, the backlog tracker can be inconsistent forever. It would be best if we act before that happens and disable the backlog tracker. Once the backlog tracker is disabled it will default to returning a fixed number of shares. We can either disable the backlog tracker or remove it. But if we remove it we can end up with a backlog of zero if that's the only tracker with a backlog. We then keep it registered but mark it as disabled. This also leaves room for recovery in some situations: we can recover the backlog by a doing a schema change in the column family that had the backlog disabled, for instance. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:32 -04:00
Glauber Costa	fde26ec633	backlog tracker: protect against exceptions in backlog calculation. Backlog calculations should be exception free, but there are at cases in which I can see they happening. One example is if some backlog tracker that uses temporary objects fails an allocation. Memory shortages can be specially pernicious: if we leave the responsibility of catching those to the individual backlog tracker, we will keep trying to make more allocations in the other backlog trackers if we have many column families. By handling it here we can stop that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:22 -04:00
Glauber Costa	9320d6f17f	compaction: make sure that user-initiated compactions always have a minimum priority We have observed the following behavior with user initiated compactions, like major compactions: - if there are no writes, the backlog doesn't increase. - as compaction progresses the backlog decreases. - at some point, the backlog is so low that compaction barely makes any progress. Going forward, we should allow one to read from the generated partial SSTables, in which case this doesn't matter that much. But for user-iniated compactions we would like to guarantee a minimum baseline. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:33:25 -04:00
Glauber Costa	c55ab93178	backlog_controller: add constants to represent a globally disabled controller There are situations in which we want the controllers to stop working altogether. Usually that's when we have an unimplemented controller or some exception. We want to return fixed shares in this case, but this is a very different situation from when we want fixed shares for one backlog tracker: we want to return fixed shares, yes, but if we disable 200 backlog trackers (because they all failed, for instance), we don't want that fixed number x 200 to be our backlog. So the mechanism to globally disable the controller is still granted, and infinity is a good way to represent that. It's a float that the controller can easily test against. But actually using infinity in the code is confusing. People reading it may interpret it as the other way around from what it means, just meaning "a very large backlog". Let's turn that into a constant instead. It will help us convey meaning. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:25:23 -04:00
Glauber Costa	d758a416f8	backlog_controller: move compaction controller to the compaction manager There was recently an attempt to add minimum shares to major compactions which ended up being harder than it should be due to all the plumbing necessary to call the compaction controller from inside the compaction manager-- since it is currently a database object. We had this problem again when trying to return fixed shares in case of an exception. Taking a step back, all of those problems stem from the fact that the compaction controller really shouldn't be a part of the database: as it deals with compactions and its consequences it is a lot more natural to have it inside the compaction manager to begin with. Once we do that, all the aforementioned problems go away. So let's move there where it belongs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:24:19 -04:00
Raphael S. Carvalho	b65bc511fe	sstables/compaction_manager: log user initiated compaction Sometimes it's hard to figure out from log whether user run major compaction. Fixes #1303. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180504181047.20277-1-raphaelsc@scylladb.com>	2018-05-04 19:15:58 +01:00
Raphael S. Carvalho	638a647b7d	sstables/compaction_manager: do not break lcs invariant by not allowing parallel compaction for it After change to serialize compaction on compaction weight (`eff62bc61e`), LCS invariant may break because parallel compaction can start, and it's not currently supported for LCS. The condition is that weight is deregistered right before last sstable for a leveled compaction is sealed, so it may happen that a new compaction starts for the same column family meanwhile that will promote a sstable to an overlapping token range. That leads to strategy restoring invariant when it finds the overlapping, and that means wasted resources. The fix is about removing a fast path check which is incorrect now because we release weight early and also fixing a check for ongoing compaction which prevented compaction from starting for LCS whenever weight tracker was not empty. Fixes #3279. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180410034538.30486-1-raphaelsc@scylladb.com>	2018-04-10 20:02:08 +03:00
Avi Kivity	55168592ad	compaction_manager: fix use-after-free of column_family Commit `cce1a2bce8` ("Use the CPU scheduler") placed some compaction manager code in a scheduling_group. Unfortunately, downstream code relied on the callers not deferring, so it can rely on the column_family's existence. That doesn't happen if the column_family is removed quickly, as with_scheduling_group() always defers. Fix applying the scheduling group after we've taken the lock and guaranteed the stability of the column_family object. Fixes #3196. Message-Id: <20180211165155.18179-1-avi@scylladb.com>	2018-02-11 17:53:35 +00:00
Glauber Costa	956af9f099	database, main: set up scheduling_groups for our main tasks Set up scheduling groups for streaming, compaction, memtable flush, query, and commitlog. The background writer scheduling group is retired; it is split into the memtable flush and compaction groups. Comments from Glauber: This patch is based in a patch from Avi with the same subject, but the differences are signficant enough so that I reset authorship. In particular: 1) A bug/regression is fixed with the boundary calculations for the memtable controller sampling function. 2) A leftover is removed, where after flushing a memtable we would go back to the main group before going to the cache group again 3) As per Tomek's suggestion, now the submission of compactions themselves are run in the compaction scheduling group. Having that working is what changes this patch the most: we now store the scheduling group in the compaction manager and let the compaction manager itself enforce the scheduling group. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Duarte Nunes	cbbdfde979	sstables/compaction_backlog_tracker: Constify backlog() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180111004914.25796-1-duarte@scylladb.com>	2018-01-11 13:20:57 +02:00
Duarte Nunes	43ad5bd182	sstables/compaction_backlog_manager: Fix user-after-free If the compaction_backlog_manager's lifetime ends before the linked compaction_backlog_tracker's, the latter's _manager pointer not being cleared, can lead to a use-after-free error when running ~compaction_backlog_tracker(), as evidenced by unit-tests failed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180111004914.25796-2-duarte@scylladb.com>	2018-01-11 13:20:55 +02:00
Glauber Costa	ca284174d0	infrastructure for backlog estimator for compaction work. This patch adds infrastucture in various points in the system to allow us to determine the amount of work present as backlog from compactions. What needs to be done can be explained in three major pieces: 1) Add hooks in the points where sstables are added or inserted to a column family (or more precisely, to a compaction_strategy object). 2) Add hooks in reads and write monitors that allows a compaction backlog estimator (tracker) to become aware of bytes that are partially written and compacted away. 3) Add a per-column family class (compaction_backlog_tracker) that can be used to track work that is done and relevant to compactions (like the two above), and a compaction manager to provide a system-wide backlog based on the response of the individual trackers. The definition of how much backlog one has is strategy-specific. The Null strategy is easy, as it never really has any backlog, and so is the major strategy - since what it really matters is the backlog of the underlying compaction strategy. Although backlogs are strategy-specific, they should be "compatible", in the sense that if a particular strategy has more work to do, it should yield a higher number than its counterparts. All the others are presented in this patch as unimplemented: they will always advertise a mild backlog that should yield a constant CPU-utilization if used alone. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Raphael S. Carvalho	daaadfd515	compaction_manager: remove dead sstable rewrite submission function this rewrite submission was used by old resharding, but it's no longer needed, so let's remove it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20171219191052.13689-1-raphaelsc@scylladb.com>	2017-12-20 09:29:43 +02:00
Raphael S. Carvalho	38318c753a	sstables: remove column_family from compaction_weight_registration Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:42:52 -02:00
Raphael S. Carvalho	eff62bc61e	compaction_manager: serialize compaction of same size tier for different cfs Currently, compaction manager will serialize compaction of same size tier (or weight) if they belong to the same column family. However, it fails to do so if the compaction jobs belong to different column families. That can lead to an ungodly amount of running compaction which gets worse the higher the number of shards and active column families. The problem is that it may affect overall system performance due to excessive resource usage. It's easy to trigger it during bootstraping after loading node with new sstables or repairing, or if lots of cfs are being actively written. That being said, compaction jobs of same size tier are now serialized on a given shard, such that maximum number of compaction (system wise) is now: (SHARDS) * (SIZE TIERS) instead of: (SHARDS) * (COLUMN FAMILIES) * (SIZE TIERS) We'll work hard to release a size tier (weight) for a column family waiting on it as fast as possible, given that we wouldn't like to underutilize resources available for compaction. We want one starting after the other. Compaction for a column family that cannot run now because the size tier is taken, will be postponed. There's a worker that will be sleeping on a condition variable that will be signalled whenever a compaction completes. FIFO ordering is used on postponed list for fairness. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:42:48 -02:00
Raphael S. Carvalho	fa0e53f626	sstables: introduces deregister() and weight() to compaction_weight_registration Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:34:08 -02:00
Raphael S. Carvalho	20d8a2c045	sstables: move compaction_weight_registration to its own header That will be needed for using it in compaction.hh. We can't declare compaction_weight_registration in compaction_manager.hh, because compaction.hh can't include the former due to cyclic dependency, so compaction_weight_registration will be declared in its own header. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:26:51 -02:00
Raphael S. Carvalho	ef18b1162b	sstables/compaction_manager: rename and better explain reshard function submit doesn't properly describe the function and also improve explanation of the relationship between function itself and its job parameter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170912032034.23043-1-raphaelsc@scylladb.com>	2017-09-12 12:25:17 +03:00
Avi Kivity	eb62b2c00d	compaction_manager: add missing include	2017-09-12 10:43:05 +03:00
Tzach Livyatan	83eab5c8d7	Remove comment about Too high number of concurrent compactions from scylla_compaction_manager_compactions help It should never happen and its not clear what too high stands for Signed-off-by: Tzach Livyatan <tzach@scylladb.com> Message-Id: <20170911085645.21222-1-tzach@scylladb.com>	2017-09-11 13:27:35 +03:00
Raphael S. Carvalho	10eaa2339e	compaction: Make resharding go through compaction manager Two reasons for this change: 1) every compaction should be multiplexed to manager which in turn will make decision when to schedule. improvements on it will immediately benefit every existing compaction type. 2) active tasks metric will now track ongoing reshard jobs. Fixes #2671. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>	2017-08-20 11:35:14 +03:00
Avi Kivity	7c809917b6	compaction_manager: fix debug mode build (periodic_compaction_submission_interval) Turn static constexpr variable into a function.	2017-07-01 19:34:46 +03:00
Raphael S. Carvalho	0d21129cc7	compaction_manager: periodically submit cfs for compaction This is useful for a column family which isn't generating new content and will have lots of expired data later on that can be purged. Compaction submission is NO-OP if there's nothing to do, so I think it's reasonable to do it at an interval of 1 hour. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:03 -03:00
Raphael S. Carvalho	f49bdb6839	compaction_manager: dont go on with major compaction if task was stopped A column family which was truncated will remove itself from compaction manager. Any task running a compaction should be interrupted and a task waiting to run should bail out when it wakes up. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170425224350.15965-3-raphaelsc@scylladb.com>	2017-04-26 17:18:37 +03:00
Raphael S. Carvalho	c44a2319e6	prevent regular compaction from choosing shared sstables For new resharding, it's important to exclude resharding sstables from the list of candidates for regular compaction. That's doesn't affect current resharding because it marks the sstables as compacting. That won't work with new resharding which will work with sstables from multiple shards. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:26 -03:00
Raphael S. Carvalho	3286f7aaa6	compaction: make major compaction go through compaction manager From now on, major compaction will go through compaction manager. Major compaction is serialized to reduce disk space requirement. Each column family will be running either minor and major compaction at a given time. The only issue is number of small sstables growing while major compaction is running, but major compaction itself will reduce the number of tables considerably. If this turns out to be an issue, we can allow minor to start in parallel to major, but not the other way around. Fixes #1156. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170417233125.14092-1-raphaelsc@scylladb.com>	2017-04-19 15:44:21 +03:00
Raphael S. Carvalho	e78db43b79	compaction_manager: fix crash when dropping a resharding column family Problem is that column family field of task wasn't being set for resharding, so column family wasn't being properly removed from compaction manager. In addition to fixing this issue, we'll also interrupt ongoing compactions when dropping a column family, exactly like we do with shutdown. Fixes #2291. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170418125807.7712-1-raphaelsc@scylladb.com>	2017-04-18 17:39:27 +03:00
Raphael S. Carvalho	6b6bb38f38	compaction_manager: stop manager after storage io error Manager will stop itself if a compaction fails due to storage io error, which unconditionally results in stop of transportation services. Fixes #2147. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170316054538.23423-1-raphaelsc@scylladb.com>	2017-03-16 10:37:47 +02:00
Vlad Zolotarov	00e37c389b	sstables::compaction_manager: move collectd metrics registration to the metrics registration layer Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-01-10 16:24:54 -05:00
Glauber Costa	56df53f51e	compaction_manager: fix shutdown sequence By the time we are able to acquire this semaphore, we may be stopped already. So we need to test it before we go ahead. I can see shutdown hangs before this patch that are fixed with it applied. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <e5b378893128d086d584ffbb2acd3fb687648e5c.1481655433.git.glauber@scylladb.com>	2016-12-14 09:26:24 +01:00
Glauber Costa	5803957ab5	compaction: fix build Commit `732ee275` moved tracking of one statistics value inside a lambda without capturing this in that lambda. Compilation fails as a result. Signed-off-by: Glauber Costa <glauber@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <68860640f4533dd43e43f341f1620e25464b700b.1481313455.git.glauber@scylladb.com>	2016-12-10 09:00:20 +02:00
Raphael S. Carvalho	732ee275f8	compaction: fix running compaction counter when splitting sstables The counter was being increased before taking the semaphore, so every pending split would count as a running compaction which misleads the user as a result. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <f2050cc3599cee7af29d4579368a154708b37731.1481248048.git.raphaelsc@scylladb.com>	2016-12-09 15:01:43 +02:00
Raphael S. Carvalho	e86de40b49	compaction_manager: inform about compaction cancelled by shutdown After some changes in compaction manager, user no longer is informed that compaction was cancelled in event of shutdown. That's because we only ignore ready future when compaction manager was asked to stop. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <02ca29b5a93fe3a558896598f325b0dce069e82c.1478277317.git.raphaelsc@scylladb.com>	2016-11-14 16:37:33 +02:00
Raphael S. Carvalho	56a50784f8	compaction_manager: make registration of sstables and weight exception safe Compacting sstables and weight could be left unregistered in event of an exception. Let's make it safe by using a RAII approach. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <f2cf9d0c12f22046293bd2185ef14ede3f4d63d4.1469114161.git.raphaelsc@scylladb.com>	2016-07-22 07:02:48 +01:00
Raphael S. Carvalho	ed5e7e6842	compaction: refactor compaction manager Previously, same function was used to handle both regular compaction and cleanup requests. That's bad because a lot of conditions were added for both compaction types to live in the same function. Now, cleanup and regular compaction will live in different functions. They share a lot of code, so helper functions were introduced. This change is also important for user-initiated compaction that will go through compaction manager in the future. Code is also a lot easier to read now. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 16:37:53 -03:00
Raphael S. Carvalho	da6a2b429d	compaction: add functions to register and deregister compacting sstables Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 16:00:51 -03:00
Raphael S. Carvalho	4d6dce8ec9	compaction: add helper function to get candidates for strategy Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:06:14 -03:00
Raphael S. Carvalho	bfc5376548	compaction: remove gate from compaction manager task There is no longer a need to use gate for regular termination of fiber that runs compaction. Now, we only set task->stopping to true, ask for compaction termination, and wait for its future to resolve. Code is simplified a lot with this change. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:05:10 -03:00

1 2

69 Commits