scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 04:26:48 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	050a7019b8	sstables/index_reader: fix index reader for summary entry spanning lots of keys quantity prevents index_reader from reading all index entries of a summary entry that span more than min_index_interval entries. That can happen after introduction of size-based sampling, and consequently, sstable will not be able to return a key which logical position in summary entry is beyond min_index_interval. It's ok to not use quantity because index_reader will read all indexes until either next summary entry or end of file is reached. Fixes test_sstable_conforms_to_mutation_source Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>	2017-08-12 09:44:16 +03:00
Raphael S. Carvalho	872412d31a	db/config: introduce sstable_summary_ratio option Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 01:36:21 -03:00
Raphael S. Carvalho	8726ee937d	sstables: introduce size-based sampling for sstable summary Currently, a summary entry is added after min_index_interval index entries were written. Not taking into account size of index entries becomes a problem with large partitions which may create big index entries due to promoted indexes. Read performance is affected as a consequence because index entries spanned by summary are all read from disk to serve request. What we wanna do is to also add a summary entry after index reaches a boundary. To deal with oversampling, we want to write 1 byte to summary for every 2000 bytes written to data file (this will be eventually made into an option in the config file). Both conditions must be met to avoid under or oversampling. That way, the amount of data needed from index file to satify the request is drastically reduced. Fixes #1842. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 00:30:12 -03:00
Raphael S. Carvalho	da7489720b	sstables: make components_writer::offset const qualified and uint64_t Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-10 21:48:11 -03:00
Raphael S. Carvalho	881c479be8	sstables: make writer::offset const qualified and uint64_t Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-10 21:46:39 -03:00
Botond Dénes	94fc550e68	sstable_set::incremental_selector: select() now returns a selection A seletion contains - in addition to the list of sstables - a next_token which is a hint as to what is the next best token to call select() with. This should be the smallest token such that at the next call to select() the least number of new sstables will be returned, without skipping any.	2017-08-09 16:27:33 +03:00
Raphael S. Carvalho	dddbd34b52	sstables: close index file when sstable writer fails index's file output stream uses write behind but it's not closed when sstable write fails and that may lead to crash. It happened before for data file (which is obviously easier to reproduce for it) and was fixed by `0977f4fdf8`. Fixes #2673. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170807171146.10243-1-raphaelsc@scylladb.com>	2017-08-08 09:53:14 +03:00
Duarte Nunes	569bbf2edd	sstables/sstables: Use per-cpu noop_write_monitor We employ a thread-per-core architecture, so don't go about sharing seastar::shared_ptrs across cpus. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170801144153.17354-1-duarte@scylladb.com>	2017-08-01 18:10:49 +03:00
Avi Kivity	db7329b1cb	Merge "Ensure correct EOC for PI block cell names" from Duarte "This series ensures the always write correct cell names to promoted index cell blocks, taking into account the eoc of range tombstones. Fixes #2333" * 'pi-cell-name/v1' of github.com:duarten/scylla: tests/sstable_mutation_test: Test promoted index blocks are monotonic sstables: Consider eoc when flushing pi block sstables: Extract out converting bound_kind to eoc	2017-08-01 18:09:07 +03:00
Avi Kivity	1e8bb972b6	compaction: fix iteration in leveled compaction droppable tombstones loop Since get_level_count() is unsigned, it will never be negative, and the loop may never terminate. Message-Id: <20170719133502.13316-1-avi@scylladb.com>	2017-08-01 13:40:36 +03:00
Avi Kivity	ba2e170e4b	compaction: fix return in leveled compaction droppable tombstones loop If the loop ever terminates, we need to return something. Message-Id: <20170719133508.13374-1-avi@scylladb.com>	2017-08-01 13:33:02 +03:00
Duarte Nunes	1a33cc6847	sstables: Release the flush permit before fsyncing This allows a queued flush to start while we fsync the current sstable, which helps reduce the overall time new writes are blocked on dirty memory. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Duarte Nunes	784a078e72	sstables: Introduce write_monitor The write_monitor provides callbacks to inform an observer of the state of the ongoing sstable write. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Avi Kivity	e855a28fae	Revert "Merge "memtable flush: Fixes and improvements" from Duarte" This reverts commit `733a64a1df`, reversing changes made to `e11e66723a`. Breaks sstable_test and perf_fast_forward.	2017-07-31 12:44:28 +03:00
Duarte Nunes	5e64839e85	sstables: Release the flush permit before fsyncing This allows a queued flush to start while we fsync the current sstable, which helps reduce the overall time new writes are blocked on dirty memory. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	a737577881	sstables: Introduce write_monitor The write_monitor provides callbacks to inform an observer of the state of the ongoing sstable write. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	06728bdfe9	sstables: Consider eoc when flushing pi block When flushing a promoted index block using a range tombstone cell name as a bound, use the right eoc value instead of always writing composite::eoc::none. Fixes #2333 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 18:23:58 +02:00
Duarte Nunes	718517ed91	sstables: Extract out converting bound_kind to eoc Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 18:23:58 +02:00
Paweł Dziepak	7b0f75c0d1	sstables: avoid indirect calls to abstract_type::is_multi_cell()	2017-07-26 14:38:27 +01:00
Paweł Dziepak	28c105e4a7	sstables: avoid copying key components	2017-07-26 14:38:27 +01:00
Paweł Dziepak	960a140880	index_reader: advance_and_check_if_present() use index_comparator	2017-07-26 14:36:37 +01:00
Paweł Dziepak	dc7bad9a50	sstables: cache token in index entries When a sstable reader is fast forwarded some index entries may be read (and compared) multiple times. This patch makes sure that once a token is computed we keep it around and reuse if the entry is accessed again.	2017-07-26 14:36:37 +01:00
Paweł Dziepak	bfb7b56c74	sstable: keep a pre-computed token in summary_entry Each sstable index lookup involves a binary search in the summary and each time a partition key of summary entry is compared with anything its token needs to be calculated. Since we keep summary in the memory all the time it is better to also keep the tokens around.	2017-07-26 14:36:36 +01:00
Paweł Dziepak	31d7cfdefb	sstables: introduce decorated_key_view	2017-07-26 14:36:36 +01:00
Paweł Dziepak	e0a04cb7fe	sstables: make sure that fill_buffer() actually fills buffer streamed_mutation::impl::fill_buffer() is supposed to either push mutation fragments to the buffer or set EOS flag. However, it was possible that mp_row_consumer would return proceed::no if a skip was needed without satisfying any of these conditions.	2017-07-26 14:36:36 +01:00
Avi Kivity	c5ee62a6a4	Merge "restrict background writers with scheduling groups" from Glauber "This patchset restricts background writers - such as compactions, streaming flushes and memtable flushes to a maximum amount of CPU usage through a seastar::thread_scheduling_group. The said maximum is recommended to be set 50 % - it is default disabled, but can be adjusted through a configuration option until we are able to auto-tune this. The second patch in this series provides a preview on how such auto-tune would look like. By implementing a simple controller we automatically adjust the quota for the memtable writer processes, so that the rate at which bytes come in is equal to the rates at which bytes are flushed. Tail latencies are greatly reduced by this series, and heavy spikes that previously appeared on CPU-bound workloads are no more." * 'memtable-controller-v5' of https://github.com/glommer/scylla: simple controller for memtable/streaming writer shares. restrict background writers to 50 % of CPU.	2017-07-20 10:58:53 +03:00
Tomasz Grabiec	a9237c1666	schema: Revert back to the 1.7 layout of static compact tables in memory We are using C* 3.x compatible layout in schema tables but want to keep using the 1.7 layout in memory for compatibility during rolling upgrade. This patch switches the schema and schema_builder classes back to the old layout. Translation of layout happens when converting to/from schema mutations. Notable changes: 1) Includes a revert of commit `6260f31e08` "thrift: Update CQL mapping of static CFs". 2) Brings back the "default_validation_class" schema attribute. In v3 it can be dervied from column definitions, but in v2 it can't, so we have to store it. 3) legacy_schema_migrator and schema_builder don't have to do conversions to v3, this is now handled by the v3_columns class. schema_builder works with the same layout as schema, that is v2. 4) Includes a revert of commit `66991a7ccb` "v3 schema test fixes" Fixes #2555.	2017-07-19 09:52:15 +02:00
Raphael S. Carvalho	7ecedac222	compaction: wire up time window compaction strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-19 02:58:37 -03:00
Raphael S. Carvalho	01886c23a8	compaction/twcs: override default values with options in schema Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-19 02:58:37 -03:00
Raphael S. Carvalho	206d30c52a	sstables: implement time window compaction strategy For more details, https://issues.apache.org/jira/browse/CASSANDRA-9666 Fixes #1432. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-19 02:58:35 -03:00
Glauber Costa	4f01ec0910	restrict background writers to 50 % of CPU. In scylla, we have foreground processes, which are latency sensitive and need to be responded to as fast as possible in order to maintain good latency profiles, and background process, which are less so. The most important background processes we have during normal write workload operations are memtable writes and sstable compactions. Those processes are quite CPU-intensive, and left unchecked will easily dominate the CPU. Lower values of task-quota usually help, as it will force those processes to preempt more, but aren't enough to guarantee good isolation. We have seen boxes with good NVMe storage having their throughput reduced to less than half of the original baseline in a short dive down for the duration of a compaction. In the long run, our goal is to leverage the CPU scheduler to make sure that those processes are balanced with respect to all the others. However, the current state of affairs is causing grievances as this very moment. Thankfully, those processes live in a seastar::thread, that ships with its own rudimentary bandwidth control mechanism: the scheduling group. The goal of this patch is to wrap background processes together in a scheduling group, and assign to such group 50 % of our CPU power; the remainder being left to foreground processes. While we pride ourselves in dynamically adjusting things to the workload, we won't be able to do this properly before the CPU scheduler lands - and let's face it, leaving background processes run wild is not adaptative either. Every workload would benefit most from a different value for such shares, but 50 % is as fair as it gets if we really need static partitining in the mean time. As a defense against unforeseen consequences, we'll leave the actual value as an option, but will do our best to hide it - as this is not a tunable that we want to be part of a normal Scylla setup. The most convenient place for this tunable is still db::config, so we can easily pass it down to the database layer - but we will not document it in the yaml, and will clearly note in the help string that it is not supposed to be tuned. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-07-18 23:35:33 -04:00
Raphael S. Carvalho	2686e84792	sstables: import TimeWindowCompactionStrategy.java it will be later converted to C++. Imported from latest scylla- tools-java repository. Checked that it doesn't lack anything. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-18 18:26:17 -03:00
Raphael S. Carvalho	7dbfebb7dc	lcs: remove conditional limit for partial sort Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170711140241.11023-2-raphaelsc@scylladb.com>	2017-07-11 17:18:32 +03:00
Raphael S. Carvalho	ebb5dafef0	lcs: remove useless filter for demotion procedure there's no way a sstable from a level higher than N+1 will be in set of candidates that can be either level N or level N + 1. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170711140241.11023-1-raphaelsc@scylladb.com>	2017-07-11 17:18:31 +03:00
Raphael S. Carvalho	6aa2e5be17	lcs: only demote sstable from level higher than target one if we are compacting level 1 into level 2, we only want to demote a sstable from level 3 or higher. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:42 -03:00
Raphael S. Carvalho	53b72b473e	lcs: improve indentation for get_overlapping_starved_sstables Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:40 -03:00
Raphael S. Carvalho	3639b48d7b	lcs: improve indentation for get_compaction_candidates Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:38 -03:00
Raphael S. Carvalho	5a8b8a6ccb	lcs: partially sort candidates that will be trimmed Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:37 -03:00
Raphael S. Carvalho	8334086441	lcs: remove quadratic behavior from L0 compaction L0 compaction triggers quadratic behavior when many newly created sstables are needed for promotion due to their size being relatively low to max sstable size parameter. So until L0 is worth promoting, the strategy will compact every new sstable with all the existing ones in L0. To fix it, let's do STCS on level 0 until it becomes worth promoting. Fixes #2432. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:35 -03:00
Raphael S. Carvalho	80f1dca328	lcs: introduce private interface Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:33 -03:00
Raphael S. Carvalho	bc71f97116	lcs: make some member functions static Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:32 -03:00
Raphael S. Carvalho	f4b733efe4	lcs: make some functions const qualified Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:28 -03:00
Raphael S. Carvalho	ede0ee16b2	lcs: remove add method Its code can be inlined because no one besides create() calls it Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:26 -03:00
Raphael S. Carvalho	00ef528e5b	lcs: extract code for higher levels compaction from get_candidates_for Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:25 -03:00
Raphael S. Carvalho	a46b73c401	lcs: simplify code to get candidates for higher levels get rid of unneeded loop for dealing with suspect sstables and std::advance because vector allows random access. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:19 -03:00
Raphael S. Carvalho	e954af0f0f	lcs: extract round-robin heuristic for even distribution of keys into function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:15 -03:00
Raphael S. Carvalho	3c0028d921	lcs: update outdated comments for level 0 compaction some comments are no longer relevant, especially the ones that talk about dealing with busy sstables due to parallel compaction, which isn't done by us for lcs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:07 -03:00
Raphael S. Carvalho	62607ba36a	lcs: improve worth_promoting_L0_candidates interface Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:00 -03:00
Raphael S. Carvalho	c1e42f6528	lcs: do not check if level 0 can be promoted twice can_promote flag will be used to carry info about whether or not level 0 can promoted. That will avoid a single iteration for higher levels too which can contain tens of thousands of sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:34:49 -03:00
Raphael S. Carvalho	887aab4ae7	lcs: extract code for level 0 compaction from get_candidates_for I will split code for higher levels compaction into functions first before putting it into its own function too. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:34:41 -03:00

1 2 3 4 5 ...

1041 Commits