scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 17:40:34 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	1c5934c934	sstables: fix procedure to get fully expired sstables with MC format MC format lacks ancestors metadata, so we need to workaround it by using ancestors in metadata collector, which is only available for a sstable written during this instance. It works fine here because we only want to know if a sstable recently compacted has an ancestor which wasn't yet deleted. Fixes #3852. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>	2018-11-06 09:28:37 +02:00
Avi Kivity	455f00e993	sstables: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	738e713edf	sstables: fix bad format string syntax Some sprint() calls use the fmt language instead of the printf syntax. Convert them all the way to format().	2018-11-01 13:16:17 +00:00
Glauber Costa	51906f7144	compactions: log tokens that we decide not to write down to an SSTable May be important when debugging issues related to cleanups Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181015162643.7834-1-glauber@scylladb.com>	2018-10-15 19:28:00 +03:00
Avi Kivity	7c8143c3c4	Revert "compaction: demote compaction start/end messages to DEBUG level" This reverts commit `b443a9b930`. The compaction history table doesn't have enough information to be a replacement for this log message yet.	2018-10-03 13:13:37 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Avi Kivity	b443a9b930	compaction: demote compaction start/end messages to DEBUG level Compactions start and end all the time, especially with many shards, and don't contribute much to understanding what is going on these days. Compaction throughput is available through the metrics and other information is available via the compaction history table. Demote compaction start and end messages to DEBUG level to keep the log clean. Cleaning and resharding compactions are kept as INFO, at least for now, since they are manual operations and therefore rarer. Message-Id: <20180724132859.14109-1-avi@scylladb.com>	2018-07-25 09:53:39 +01:00
Botond Dénes	a8e795a16e	sstables_set::incremental_selector: use ring_position instead of token Currently `sstable_set::incremental_selector` works in terms of tokens. Sstables can be selected with tokens and internally the token-space is partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as well. This is problematic for severeal reasons. The sub-range sstables cover from the token-space is defined in terms of decorated keys. It is even possible that multiple sstables cover multiple non-overlapping sub-ranges of a single token. The current system is unable to model this and will at best result in selecting unnecessary sstables. The usage of token for providing the next position where the intersecting sstables change [1] causes further problems. Attempting to walk over the token-space by repeatedly calling `select()` with the `next_position` returned from the previous call will quite possibly lead to an infinite loop as a token cannot express inclusiveness/exclusiveness and thus the incremental selector will not be able to make progress when the upper and lower bounds of two neighbouring intervals share the same token with different inclusiveness e.g. [t1, t2](t2, t3]. To solve these problems update incremental_selector to work in terms of ring position. This makes it possible to partition the token-space amoing sstables at decorated key granularity. It also makes it possible for select() to return a next_position that is guaranteed to make progress. partitioned_sstable_set now builds the internal interval map using the decorated key of the sstables, not just the tokens. incremental_selector::select() now uses `dht::ring_position_view` as both the selector and the next_position. ring_position_view can express positions between keys so it can also include information about inclusiveness/exclusiveness of the next interval guaranteeing forward progress. [1] `sstable_set::incremental_selector::selection::next_position`	2018-07-04 17:42:33 +03:00
Glauber Costa	7e3093709a	backlog: add level to write progress monitor For SSTables being written, we don't know their level yet. Add that information to the write monitor. New SSTables will always be at L0. Compacted SSTables will have their level determined by the compaction process. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-31 21:09:38 -04:00
Glauber Costa	b573a2ff61	backlog: keep track of maximum timestamp in write monitor For sealed SSTables we can get the maximum timestamp from the statistics component. But for partially written SSTables, the metadata is not yet available. One way to solve this would be to make the SSTable statistics available earlier. But we would end up with a maximum timestamp that potentially changes all the time as we write more cells. A better approach is to take note of what's the maximum timestamp in a memtable before we start to flush, and when time comes for us to flush we will use the progress manager to inform the consumers about the maximum timestamp. For SSTables being compacted, we can't know for sure what is the maximum timestamp as some entries could be TTLd already. But the maximum of all SSTables present in the compaction is a good enough estimation for this purposes. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 12:55:58 -04:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Vladimir Krivopalov	15ef4ca73c	Support for writing SSTables 3.0 ('mc') Data.db and Index.db files - rows only. This fix adds functionality for writing data in 'mc' format to Data.db file according to the SSTables 3.0 data format as described at https://github.com/scylladb/scylla/wiki/SSTables-3.0-Data-File-Format and Index.db file according to the specification at https://github.com/scylladb/scylla/wiki/SSTables-3.0-Index-File-Format The following cases are not supported yet: - writing counter cells - range tombstones In Index.db, end open markers are not written since range tombstones are not supported for data files yet. For #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-26 14:34:20 -07:00
Raphael S. Carvalho	11940ca39e	sstables: Fix bloom filter size after resharding by properly estimating partition count We were feeding the total estimation partition count of an input shared sstable to the output unshared ones. So sstable writer thinks, from estimation, that each sstable created by resharding will have the same data amount as the shared sstable they are being created from. That's a problem because estimation is feeded to bloom filter creation which directly influences its size. So if we're resharding all sstables that belong to all shards, the disk usage taken by filter components will be multiplied by the number of shards. That becomes more of a problem with #3302. Partition count estimation for a shard S will now be done as follow: // // TE, the total estimated partition count for a shard S, is defined as // TE = Sum(i = 0...N) { Ei / Si }. // // where i is an input sstable that belongs to shard S, // Ei is the estimated partition count for sstable i, // Si is the total number of shards that own sstable i. Fixes #2672. Refs #3302. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180423151001.9995-1-raphaelsc@scylladb.com>	2018-04-23 18:11:20 +03:00
Avi Kivity	03c22ad524	Merge "Support for Cassandra 2.2 (LA) SSTable formats" from Daniel " These patches add support for C* 2.2 file(name) format. Namely: * It forces Scylla to write files in la format. * Adds storage-service feature for them. * cf and ks are determined from directory, not from file-name (for 2.2 format). * Adds some other fixes to make dtest happy. * Unit tests work with la format or with both formats. " * 'danfiala/filename-format-2.2-v4' of https://github.com/hagrid-the-developer/scylla: tests/sstables: Tests use la format or iterate over both formats. tests/sstables: Helper functions support 2.2 format directory structure. stables: Use 2.2 (la) format as a default format to store sstables if it is enabled by feature-bits. storage_service: Support la sstable storage format as a feature. sstables: make_descriptor accepts sstable-directory, because it is necessary to determine cf and ks in 2.2 format. sstables: Throw more detail exception for unknown item in reverse_map. sstables/compaction: Suppress NaN in a report of a throughput.	2018-03-19 17:49:44 +02:00
Daniel Fiala	c5eca593fc	sstables/compaction: Suppress NaN in a report of a throughput. * It causes failures in dtest. Signed-off-by: Daniel Fiala <daniel@scylladb.com>	2018-03-18 05:46:32 +01:00
Raphael S. Carvalho	aa75684ee7	sstables: Warn when an extra-large partition is written Based on https://issues.apache.org/jira/browse/CASSANDRA-9643 For compaction_large_partition_warning_threshold_mb option set to 1, follow an example output: WARN 2018-02-22 19:52:11,029 [shard 0] sstable - Writing large row system/local:{key: pk{00056c6f63616c}, token:-7564491331177403445} (1276758 bytes) Fixes #2209. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180306175912.19259-1-raphaelsc@scylladb.com>	2018-03-07 15:49:46 +00:00
Glauber Costa	956af9f099	database, main: set up scheduling_groups for our main tasks Set up scheduling groups for streaming, compaction, memtable flush, query, and commitlog. The background writer scheduling group is retired; it is split into the memtable flush and compaction groups. Comments from Glauber: This patch is based in a patch from Avi with the same subject, but the differences are signficant enough so that I reset authorship. In particular: 1) A bug/regression is fixed with the boundary calculations for the memtable controller sampling function. 2) A leftover is removed, where after flushing a memtable we would go back to the main group before going to the cache group again 3) As per Tomek's suggestion, now the submission of compactions themselves are run in the compaction scheduling group. Having that working is what changes this patch the most: we now store the scheduling group in the compaction manager and let the compaction manager itself enforce the scheduling group. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Avi Kivity	641aaba12c	database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler thread_scheduling_groups are converted to plain scheduling_group. Due to differences in initialization (scheduling_group initializtion defers), we create the scheduling_groups in main.cc and propagate them to users via a new class database_config. The sstable writer loses its thread_scheduling_group parameter and instead inherits scheduling from its caller. Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas, the flush controller was adjusted to return values within the higher ranges.	2018-02-07 17:19:29 -05:00
Duarte Nunes	cbbdfde979	sstables/compaction_backlog_tracker: Constify backlog() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180111004914.25796-1-duarte@scylladb.com>	2018-01-11 13:20:57 +02:00
Glauber Costa	ca284174d0	infrastructure for backlog estimator for compaction work. This patch adds infrastucture in various points in the system to allow us to determine the amount of work present as backlog from compactions. What needs to be done can be explained in three major pieces: 1) Add hooks in the points where sstables are added or inserted to a column family (or more precisely, to a compaction_strategy object). 2) Add hooks in reads and write monitors that allows a compaction backlog estimator (tracker) to become aware of bytes that are partially written and compacted away. 3) Add a per-column family class (compaction_backlog_tracker) that can be used to track work that is done and relevant to compactions (like the two above), and a compaction manager to provide a system-wide backlog based on the response of the individual trackers. The definition of how much backlog one has is strategy-specific. The Null strategy is easy, as it never really has any backlog, and so is the major strategy - since what it really matters is the backlog of the underlying compaction strategy. Although backlogs are strategy-specific, they should be "compatible", in the sense that if a particular strategy has more work to do, it should yield a higher number than its counterparts. All the others are presented in this patch as unimplemented: they will always advertise a mild backlog that should yield a constant CPU-utilization if used alone. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Glauber Costa	d4109ebb80	compaction: control destruction of readers Compactions run from a seastar::thread, in run(). They will either fail or succeed, and from the point of view of ordering of destruction between the compaction object and its readers: - if compaction succeed, we have no control over who gets destructed first since both objects will be going out of scope. - if they fail, we will forceably destruct the compaction object, at which point the readers are still alive From the point of view of lifetime management, it would be nice to make sure that the compaction object outlives whichever other objects it needs during compaction. This nice to have will become paramount when we start adding read_monitors to the compaction object, that have to, themselves outlive the readers. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:06 -05:00
Raphael S. Carvalho	eff62bc61e	compaction_manager: serialize compaction of same size tier for different cfs Currently, compaction manager will serialize compaction of same size tier (or weight) if they belong to the same column family. However, it fails to do so if the compaction jobs belong to different column families. That can lead to an ungodly amount of running compaction which gets worse the higher the number of shards and active column families. The problem is that it may affect overall system performance due to excessive resource usage. It's easy to trigger it during bootstraping after loading node with new sstables or repairing, or if lots of cfs are being actively written. That being said, compaction jobs of same size tier are now serialized on a given shard, such that maximum number of compaction (system wise) is now: (SHARDS) * (SIZE TIERS) instead of: (SHARDS) * (COLUMN FAMILIES) * (SIZE TIERS) We'll work hard to release a size tier (weight) for a column family waiting on it as fast as possible, given that we wouldn't like to underutilize resources available for compaction. We want one starting after the other. Compaction for a column family that cannot run now because the size tier is taken, will be postponed. There's a worker that will be sleeping on a condition variable that will be signalled whenever a compaction completes. FIFO ordering is used on postponed list for fairness. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:42:48 -02:00
Raphael S. Carvalho	49f3cfe746	sstables: improve compact_sstables() interface Motivation is that a new field in the descriptor will be forwarded to compaction procedure without extending parameter list even more. Also beautifies the interface, making it concise and easier to play with. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:22:19 -02:00
Paweł Dziepak	73b3d02cc0	db: make make_range_sstable_reader() return flat reader	2017-12-13 12:01:03 +00:00
Paweł Dziepak	e12959616c	db: make column_family::make_sstable_reader() return a flat reader	2017-12-13 12:01:03 +00:00
Avi Kivity	d934ca55a7	Merge "SSTable resharding fixes" from Raphael "Didn't affect any release. Regression introduced in `301358e`. Fixes #3041" * 'resharding_fix_v4' of github.com:raphaelsc/scylla: tests: add sstable resharding test to test.py tests: fix sstable resharding test sstables: Fix resharding by not filtering out mutation that belongs to other shard db: introduce make_range_sstable_reader rename make_range_sstable_reader to make_local_shard_sstable_reader db: extract sstable reader creation from incremental_reader_selector db: reuse make_range_sstable_reader in make_sstable_reader	2017-12-07 16:42:48 +02:00
Raphael S. Carvalho	bad21ba444	sstables: Fix resharding by not filtering out mutation that belongs to other shard After `301358e`, sstable resharding stopped work because shared sstables would use a filtering reader, which excludes mutation that belong to other shards. That completely breaks which relies on compaction of mutations that belong to different shards. The fix is about using recently introduced non local shard reader. Fixes #3041. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 03:15:26 -02:00
Raphael S. Carvalho	d1b146baa6	rename make_range_sstable_reader to make_local_shard_sstable_reader Tomek says: "I think that the least surprising behavior for a function named like this is to read the sstables unfiltered (it just reads them), and the filtering should be indicated specially in the name or by accepting a parameter." Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 03:15:25 -02:00
Raphael S. Carvalho	809b30c4a2	sstables/compaction: do not actually compact fully expired sstables There's no need to actually compact a sstable which is fully expired and which deletion of all its data will not ressurect older data. For that, a sstable will only be considered fully expired if it doesn't contain data newer than its overlapping counterparts. That way, there could be a false negative, but never a false positive. Currently, a fully expired sstable would unnecessarily waste read bandwidth of disk. This will help a lot time series workloads in which data for a given time window is all deleted at once using TTL. Fixes #2620. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:33 -02:00
Raphael S. Carvalho	d2ab154f12	sstables: switch to const ref wherever possible Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:33 -02:00
Raphael S. Carvalho	d916c8cdad	sstables: use gc_clock::time_point for gc_before Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:33 -02:00
Raphael S. Carvalho	45c11865fa	sstables: change return value type of get_fully_expired_sstables unordered_set will allow us to quickly extract fully expired tables from a set of compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 18:45:55 -02:00
Paweł Dziepak	b64dd21751	sstables: convert compaction to flat_mutation_reader	2017-11-23 18:14:31 +00:00
Duarte Nunes	baeec0935f	Replace query::full_slice with schema::full_slice() query::full_slice doesn't select any regular or static columns, which is at odds with the expectations of its users. This patch replaces it with the schema::full_slice() version. Refs #2885 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>	2017-10-17 11:25:53 +02:00
Botond Dénes	47e07b787e	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-10-03 12:44:12 +03:00
Avi Kivity	78eae8bf48	Revert "Merge "Make restricting_mutation_reader more accurate" from Botond" This reverts commit `c6e5dcc556`, reversing changes made to `19b21a0ab2`. Failes to build, plus author has more changes.	2017-10-03 11:58:59 +03:00
Avi Kivity	c6e5dcc556	Merge "Make restricting_mutation_reader more accurate" from Botond "Currently restricting_mutation_reader restricts mutation_readears on a count basis. This is inaccurate on multiple levels. The reader might be a combined_mutation_reader, which might be composed of multiple individual readers, whose number might change during the lifetime of the reader. The memory consumption of the readers can vary and may change during the lifetime of the reader as well. To remedy this, make the restriction memory-consumption based. The restricting semaphore is now configured with the amound of memory (bytes) that its readers are allowed to consume in total. New readers consume 128k units up-front to account for read-ahead buffers, and then consume additional units for any buffer (returned from input_stream<>::read()) they keep around. Like before, readers already allowed to read will not be blocked, instead new readers will be blocked on their first read if all the units all consumed." Fixes #2692. * 'bdenes/restricting_mutation_reader-v4' of https://github.com/denesb/scylla: Update reader restriction related metrics Add restricted_reader_test unit test restricted_mutation_reader: restrict based-on memory consumption mutation_reader.hh: Move restricted_reader related code	2017-10-03 11:15:34 +03:00
Raphael S. Carvalho	e34c1db642	db: update compaction history outside the sstable write lock The reason to do that is because compaction can deadlock if refresh disables write which waits for compaction, and compaction in turn waits for dirty memory[1] that would be released by memtable write. Dirty memory manager for non-system cfs was being used for system cfs, which was useful for exposing this problem. [1]: when updating compaction history. Fixes #2769. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170918215238.9810-2-raphaelsc@scylladb.com>	2017-09-26 19:51:12 +02:00
Botond Dénes	33e97e7457	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-09-20 11:14:35 +03:00
Avi Kivity	9b540eccb0	database: remove dependency on compaction.hh and compaction_manager.hh	2017-09-11 20:09:45 +03:00
Glauber Costa	db846326f8	compaction: remove dead code This code has no more users. Bury it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20170908005305.29925-1-glauber@scylladb.com>	2017-09-08 08:17:15 +02:00
Botond Dénes	611774b1d9	Use the incremental reader for compaction As leveled compaction strategy stands to gain the most from incrementally opening sstables. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <292648d3fa4ea97376c0b4360754a20132194f63.1502822066.git.bdenes@scylladb.com>	2017-08-15 21:38:04 +03:00
Duarte Nunes	7fb6a74302	combined_mutation_reader: Drop exhausted readers if not in FF mode Exhausted readers can be fast forwarded, so we have to keep them around. However, if the current reader is not fast forwardable, then we can drop those readers and their buffers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Botond Dénes	94fc550e68	sstable_set::incremental_selector: select() now returns a selection A seletion contains - in addition to the list of sstables - a next_token which is a hint as to what is the next best token to call select() with. This should be the smallest token such that at the next call to select() the least number of new sstables will be returned, without skipping any.	2017-08-09 16:27:33 +03:00
Glauber Costa	4f01ec0910	restrict background writers to 50 % of CPU. In scylla, we have foreground processes, which are latency sensitive and need to be responded to as fast as possible in order to maintain good latency profiles, and background process, which are less so. The most important background processes we have during normal write workload operations are memtable writes and sstable compactions. Those processes are quite CPU-intensive, and left unchecked will easily dominate the CPU. Lower values of task-quota usually help, as it will force those processes to preempt more, but aren't enough to guarantee good isolation. We have seen boxes with good NVMe storage having their throughput reduced to less than half of the original baseline in a short dive down for the duration of a compaction. In the long run, our goal is to leverage the CPU scheduler to make sure that those processes are balanced with respect to all the others. However, the current state of affairs is causing grievances as this very moment. Thankfully, those processes live in a seastar::thread, that ships with its own rudimentary bandwidth control mechanism: the scheduling group. The goal of this patch is to wrap background processes together in a scheduling group, and assign to such group 50 % of our CPU power; the remainder being left to foreground processes. While we pride ourselves in dynamically adjusting things to the workload, we won't be able to do this properly before the CPU scheduler lands - and let's face it, leaving background processes run wild is not adaptative either. Every workload would benefit most from a different value for such shares, but 50 % is as fair as it gets if we really need static partitining in the mean time. As a defense against unforeseen consequences, we'll leave the actual value as an option, but will do our best to hide it - as this is not a tunable that we want to be part of a normal Scylla setup. The most convenient place for this tunable is still db::config, so we can easily pass it down to the database layer - but we will not document it in the yaml, and will clearly note in the help string that it is not supposed to be tuned. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-07-18 23:35:33 -04:00
Raphael S. Carvalho	4351e0a996	compaction: introduce new compaction type for reshard so now user can look at nodetool compactionstats and determine whether or not resharding is running, for example: $ ./bin/nodetool compactionstats pending tasks: 3 id compaction type keyspace table completed total unit progress <none> RESHARD system compaction_history 11 256 keys 4.30% <none> RESHARD system compaction_history 2 256 keys 0.78% <none> RESHARD system compaction_history 10 256 keys 3.91% <none> RESHARD system compaction_history 8 256 keys 3.12% <none> RESHARD system compaction_history 7 256 keys 2.73% Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170620175733.25882-1-raphaelsc@scylladb.com>	2017-06-22 14:48:38 +03:00
Raphael S. Carvalho	41137c7fb6	compaction: use sstable::bytes_on_disk for calculating start and end size Currently, start and end size of compaction are calculated using the uncompressed size of data component. bytes_on_disk() returns size used by all components. NOTE: start and end size are written to compaction history, so users who monitor it should be aware of this change. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170525212129.6758-1-raphaelsc@scylladb.com>	2017-05-28 11:33:24 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Raphael S. Carvalho	ddc1d80c28	compaction: remove dead function declaration Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170504013046.23522-2-raphaelsc@scylladb.com>	2017-05-04 11:48:51 +03:00
Raphael S. Carvalho	61229ab88c	compaction: fix type for cleanup After compaction revamp, compaction type set by cleanup at its ctor is being overwritten at compaction::setup(). Consequently, cleanup would not be stopped by 'nodetool stop cleanup' and cleanup would be listed as regular compaction in 'nodetool compactionstats'. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170504013046.23522-1-raphaelsc@scylladb.com>	2017-05-04 11:48:50 +03:00

1 2 3

148 Commits