scylladb

Author	SHA1	Message	Date
Lakshmi Narayanan Sreethar	9cb766f929	db/config: introduce new config parameter `compaction_max_shares` Add support for the new configuration parameter `compaction_max_shares`, and update the compaction manager to pass it down to the compaction controller when it changes. The shares allocated to compaction jobs will be limited by this new parameter. Fixes #9431 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-11-24 12:52:29 -03:00
Lakshmi Narayanan Sreethar	f2b0489d8c	compaction_controller: add configurable maximum shares Add a `max_shares` constructor parameter to compaction_controller to allow configuring the maximum output of the control points at construction time. The constructor now calls `set_max_shares()` with the provided max_shares value. The subsequent commits will wire this value to a new configuration option. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-11-24 11:43:24 -03:00
Lakshmi Narayanan Sreethar	853811be90	compaction_controller: introduce `set_max_shares()` Add a method to dynamically adjust the maximum output of control points in the compaction controller. This is required for supporting runtime configuration of the maximum shares allocated to the compaction process by the controller. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-24 11:43:20 -03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	db9e314965	treewide: apply codespell to the comments in source code for less spelling errors in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16408	2023-12-20 10:25:03 +02:00
Pavel Emelyanov	5412c7947a	backlog_controller: Unwrap scheduling_group Some time ago (`997a34bf8c`) the backlog controller was generalized to maintain some scheduling group. Back then the group was the pair of seastar::scheduling_group and seastar::io_priority_class. Now the latter is gone, so the controller's notion of what sched group is can be relaxed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14266	2023-06-16 12:02:14 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Benny Halevy	774a10017c	backlog_controller: destroy _update_timer before _current_backlog The _update_timer callback calls adjust() that depends on _current_backlog and currently, _current_backlog is destroyed before _update_timer. This is benign since there are no preemption points in the destructor, but it's more correct and elegant to destroy the timer first, before other members it depends on. Fixes #14056 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14057	2023-05-29 23:03:24 +03:00
Benny Halevy	c9a9720247	backlog_controller: keep scheduling_group by value There is no need to keep a mutable reference to the scheduling_group passed at construction time since setting / updating shares is using the schedulig_group / io_priority_class id as a handle, and the id itself is never changed by the backlog_controller. Note that the class names are misleading, in hind sight, they would better be called scheduling_group_id and io_priority_class_id, respectively. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-02 07:38:40 +03:00
Benny Halevy	78ad1c70a2	backlog_controller: scheduling_group: keep io_priority_class by value Exactly like the cpu scheduling_group, io_priority_class contains the class id, which is a handle to the io_priority_class and so can be kept by value, rather than by reference, and be safely copied around. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-02 07:38:40 +03:00
Benny Halevy	450ecd60c6	backlog_controller: scheduling_group: define default member initializers To prepare for the next patch, implement default initialization of the scheduling_group and io_priority_class, to the default values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-02 07:38:40 +03:00
Benny Halevy	3e6622180e	backlog_controller: get rid of _interval member It isn't used outside the constructor. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-02 07:38:40 +03:00
Igor Ribeiro Barbosa Duarte	8dd0f4672d	compaction: Make compaction_static_shares liveupdateable This patch makes compaction_static_shares liveupdateable to avoid having to restart the cluster after updating this config. Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-07-19 10:10:46 -03:00
Igor Ribeiro Barbosa Duarte	c2ee6492e6	backlog_controller: Unify backlog_controller constructors This patch adds the _static_shares variable to the backlog_controller so that instead of having to use a separate constructor when controller is disabled, we can use a single constructor and periodically check on the adjust method if we should use the static shares or the controller. This will be useful on the next patches to make compaction_static_shares and memtable_flush_static_shares live updateable. Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-07-19 10:06:12 -03:00
Pavel Emelyanov	997a34bf8c	backlog_controller: Generalize scheduling groups Make struct scheduling_group be sub-class of the backlog controller. Its new meaning is now -- the group under controller maintenance. Both database and compaction manager derive their sched groups from this one. This makes backlog controller construction simpler, prepares the ground for sched groups unification in seastar and facilitates next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-16 17:40:19 +03:00
Pavel Emelyanov	fbb59fc920	compaction_manager: Keep compaction_sg on board This is mainly to make next patch simpler. Also this makes the backlog controller API smaller by removing its sg() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-16 17:40:19 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Avi Kivity	b0980ba7c6	compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads The workload in #3844 has these characteristics: - very small data set size (a few gigabytes per shard) - large working set size (all the data, enough for high cache miss rate) - high overwrite rate (so a compaction results in 12X data reduction) As a result, the compaction backlog controller assigns very few shares to compaction (low data set size -> low backlog), so compaction proceeds very slowly. Meanwhile, we have tons of cache misses, and each cache miss needs to read from a large number of sstables (since compaction isn't progressing). The end result is a high read amplification, and in this test, timeouts. While we could declare that the scenario is very artificial, there are other real-world scenarios that could trigger it. Consider a 100% write load (population phase) followed by 100% read. Towards the end of the last compaction, the backlog will drop more and more until compaction slows to a crawl, and until it completes, all the data (for that compaction) will have to be read from its input sstables, resulting in read amplification. We should probably have read amplification affect the backlog, but for now the simpler solution is to increase the minimum shares to 50 so that compaction always makes forward progress. This will result in higher-than-needed compaction bandwidth in some low write rate scenarios so we will see fluctuations in request rate (what the controller was designed to avoid), but these fluctioations will be limited to 5%. Since the base class backlog_controller has a fixed (0, 0) point, remove it and add it to derived classes (setting it to (0, 50) for compaction). Fixes #3844 (or at least improves it). Message-Id: <20181231162710.29410-1-avi@scylladb.com>	2019-01-04 10:58:43 +01:00
Glauber Costa	70c47eb045	controller: adjust constants for compaction controller Right now the controller adjusts its shares based on how big the backlog is in comparison to shard memory. We have seen in some tests that if the dataset becomes too big, this may cause compactions to dominate. While we may change the input altogether in future versions, I'd like to propose a quick change for the time being: move the high point from 10x memory size to 30x memory size. This will cause compactions to increase in shares more slowly. While this is as magic as the 10 before, they will allow us to err in the side of caution, with compactions not becoming aggressive enough to overly disrupt workloads. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-18 15:16:38 -04:00
Glauber Costa	c55ab93178	backlog_controller: add constants to represent a globally disabled controller There are situations in which we want the controllers to stop working altogether. Usually that's when we have an unimplemented controller or some exception. We want to return fixed shares in this case, but this is a very different situation from when we want fixed shares for one backlog tracker: we want to return fixed shares, yes, but if we disable 200 backlog trackers (because they all failed, for instance), we don't want that fixed number x 200 to be our backlog. So the mechanism to globally disable the controller is still granted, and infinity is a good way to represent that. It's a float that the controller can easily test against. But actually using infinity in the code is confusing. People reading it may interpret it as the other way around from what it means, just meaning "a very large backlog". Let's turn that into a constant instead. It will help us convey meaning. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:25:23 -04:00
Glauber Costa	d758a416f8	backlog_controller: move compaction controller to the compaction manager There was recently an attempt to add minimum shares to major compactions which ended up being harder than it should be due to all the plumbing necessary to call the compaction controller from inside the compaction manager-- since it is currently a database object. We had this problem again when trying to return fixed shares in case of an exception. Taking a step back, all of those problems stem from the fact that the compaction controller really shouldn't be a part of the database: as it deals with compactions and its consequences it is a lot more natural to have it inside the compaction manager to begin with. Once we do that, all the aforementioned problems go away. So let's move there where it belongs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:24:19 -04:00
Glauber Costa	d3f985ef46	backlog_controller: allow users to compute inverse function of shares There are some situations in which we want to force a specific amount of shares and don't have a backlog. We can provide a function to get that from the controller. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-21 19:35:07 -04:00
Avi Kivity	80651e6dcc	database: reduce idle memtable flush cpu shares to 1% Commit `1671d9c433` (not on any release branch) accidentally bumped the idle memtable flush cpu shares to 100 (representing 10%), causing flushes to be too when they don't comsume too much cpu. Fixes #3243. Message-Id: <20180408104601.9607-1-avi@scylladb.com>	2018-04-08 17:12:14 +01:00
Duarte Nunes	b7bd9b8058	backlog_controller: Stop update timer On database shutdown, this timer can cause use-after-free errors if not stopped. Refs #3315 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180324140822.3743-1-duarte@scylladb.com>	2018-03-26 14:36:16 +03:00
Glauber Costa	4272279bbb	controllers: unify the I/O and CPU controllers We have had so far an I/O controller, for compactions and memtables, and a CPU controller, for memtables only -- since the scheduling was still quota-based. Now that the CPU scheduler is fully functional, it is time to do away with the differences and integrate them both into one. We now have a memtable controller and a compaction controller, and they control both CPU and I/O. In the future, we may want to control processes that don't do one of them, like cache updates. If that ever happens, we'll try to make controlling one of them optional. But for now, since the I/O and CPU controllers for our main two processes would look exactly the same we should integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:30 -05:00
Glauber Costa	7b6f188e27	controllers: allow a static priority to override the controller output We have merged the I/O controller without this, but we want to integrate the CPU and I/O controllers into one. Currently, the quota can be statically set for the CPU controller. For now, until we gain more experience with it we should allow a static value to override the controller's output as well. That is particularly important since we don't yet control some strategies like LCS and the time-based ones. Users in the field may be using one of those strategies with a static value for background quota. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	6f295a2a8a	controllers: update control points for memtable I/O controller Right now CPU and I/O controllers have slightly different control points for no good reason. Let's use the CPU controller ones as the standard, as we have been using it in the field for longer and trust it more. The end goal is to fully integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	b895d495cc	controllers: allow memtable I/O controller to have shares statically set This is so it looks more like the CPU controller. The end goal is to integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	c099c98676	controllers: retire auto_adjust_flush_quota It no longer makes sense now that we have the full scheduler + controllers. In its lieu, we will provide an option to statically set the controller's shares as a safe guard against us getting this wrong. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	956af9f099	database, main: set up scheduling_groups for our main tasks Set up scheduling groups for streaming, compaction, memtable flush, query, and commitlog. The background writer scheduling group is retired; it is split into the memtable flush and compaction groups. Comments from Glauber: This patch is based in a patch from Avi with the same subject, but the differences are signficant enough so that I reset authorship. In particular: 1) A bug/regression is fixed with the boundary calculations for the memtable controller sampling function. 2) A leftover is removed, where after flushing a memtable we would go back to the main group before going to the cache group again 3) As per Tomek's suggestion, now the submission of compactions themselves are run in the compaction scheduling group. Having that working is what changes this patch the most: we now store the scheduling group in the compaction manager and let the compaction manager itself enforce the scheduling group. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Avi Kivity	641aaba12c	database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler thread_scheduling_groups are converted to plain scheduling_group. Due to differences in initialization (scheduling_group initializtion defers), we create the scheduling_groups in main.cc and propagate them to users via a new class database_config. The sstable writer loses its thread_scheduling_group parameter and instead inherits scheduling from its caller. Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas, the flush controller was adjusted to return values within the higher ranges.	2018-02-07 17:19:29 -05:00
Glauber Costa	4f1b875784	database: add a controller for I/O on memtable flushes. The algorithm and principle of operation is the same as the CPU controller. It is, however, always enabled and we will operate on I/O shares. I/O-bound workloads are expected to hit the maximum once virtual dirty fills up and stay there while the load is steady. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:58:57 -05:00
Glauber Costa	244c564aac	compaction: adjust shares for compactions Compactions can be a heavy disk user and the I/O scheduler can always guarantee that it uses its fair share of disk. Such fair share can, however, be a lot more than what compaction indeed need. This patch draws on the controllers infrastructure to adjust the I/O shares that the compaction class will get so that compaction bandwidth is dynamically adjusted. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:58:57 -05:00
Glauber Costa	4b44a22236	backlog_controllers: implement generic I/O controller Like the CPU controller, but will act on I/O priorities. Shares can go from 0 to 1000. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:56:54 -05:00
Glauber Costa	1671d9c433	factor out some of the controller code The control algorithm we are using for memtables have proven itself quite successful. We will very likely use the same for other processes, like compactions. Make the code a bit more generic, so that a new controller has to only set the desired parameters Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:56:54 -05:00

37 Commits