scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Avi Kivity	72c673fcc3	Merge "I/O Controller for memtables and compactions" from Glauber "This patchset implements the compaction controller for I/O shares. The goal is to automatic adjust compaction shares based on a strategy-specific backlog. A higher backlog will translate into higher shares. As compaction progresses, that reduces the backlog. As new data is flushed, that increases the backlog. The goal of the controler is to keep the backlog constant at a certain rate, so that we don't go neither too fast or too slow. Tracking reads and writes: ========================== Tracking of reads and writes happen through the read_monitor and the write_monitor. The write monitor is an existing interface that has the purpose of releasing the write permit at particular points of the write process. We enhance it so to get a reference to an instance that tracks the current offset inside the sstables::file_writer. This way the backlog tracker can always know for sure what's the offset of the current write. A similar thing is done for reads. The data_consumer already tracks the position of the current read, and we isolate that into a structure to which we can get a reference. A read_monitor allows us to connect the compaction to that reference. Lifetime management: ==================== In general, tracking objects will be owned by their callers and passed down as references. The compaction object will own the read monitors and the compaction write monitors and the memtable flush write monitor will be kept alive in a do_with block around the flush itself. The backlog_{write,read}_progress_manager needs to be kept alive until the SSTable is no longer in progress. For writes, that means until we are able to add the SSTable charges in full, and for reads (compaction) that means until we are able to remove the charges in full. It is important to do that to avoid spikes in the graph. If we remove the progress managers in a different operation than updating the SSTable list we will be left in a temporary state where charges appear or disappear abruptly, to be fixed when the final add_sstable/remove_sstable happens. So we want those things to happen together. The compaction_backlog_tracker is kept alive until the strategy changes, for example, through ALTER TABLE. Current charges are transferred to the new strategy's compaction_backlog_tracker object when we do that. If the type of strategy changes, the current read charges are forgotten. We can do that because those running compaction will not really contribute to decrease the backlog of the new compaction strategy. Tranfer of Charges ================== When ALTER TABLE happens, we need to transfer ongoing writes to the new backlog manager. Ongoing reads will still be tracked by the backlog_manager that originated them. The rationale for that is that reads still belong to the current compaction, with the strategy that generated them. But new Tables being written will add to the backlog of the new strategy. Note that ALTER TABLE operations not necessarily cause a change of Strategy. We can be using the same strategy but just changing properties. If that is the case, we expect no discontinuity in the backlog graph (tested). Resharding ========== Resharding compactions are more complex than normal compactions because the SSTables are created in one shard and later sent to another shard. It is better, then, to track resharding compactions separately and let them have their own backlog tracker, which will insert backlog in proportion to the amount of data to be resharded. Memtable Flush I/O Controller ============================= With the current infrastructure it becomes trivial to add a new controller, for either I/O or CPU. This patchset then adds an I/O controller for memtable flushes, using the same backlog algorithm that we already used for CPU." * 'compaction-controller-io-v5' of github.com:glommer/scylla: database: add a controller for I/O on memtable flushes. document the compaction controller compaction: adjust shares for compactions backlog_controllers: implement generic I/O controller factor out some of the controller code io shares: multiply all shares by 10 compaction_strategy: implement backlog manager for the SizeTiered strategy infrastructure for backlog estimator for compaction work. sstables: notify about end of data component write sstables: add read_monitor_generator sstables: add read_monitor sstables: enhance data consumer with a position tracker sstables: enhance the file_writer with an offset tracker sstables: pass references instead of pointers for write_monitor compaction: control destruction of readers	2018-01-07 15:00:10 +02:00
Glauber Costa	4f1b875784	database: add a controller for I/O on memtable flushes. The algorithm and principle of operation is the same as the CPU controller. It is, however, always enabled and we will operate on I/O shares. I/O-bound workloads are expected to hit the maximum once virtual dirty fills up and stay there while the load is steady. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:58:57 -05:00
Glauber Costa	244c564aac	compaction: adjust shares for compactions Compactions can be a heavy disk user and the I/O scheduler can always guarantee that it uses its fair share of disk. Such fair share can, however, be a lot more than what compaction indeed need. This patch draws on the controllers infrastructure to adjust the I/O shares that the compaction class will get so that compaction bandwidth is dynamically adjusted. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:58:57 -05:00
Glauber Costa	4b44a22236	backlog_controllers: implement generic I/O controller Like the CPU controller, but will act on I/O priorities. Shares can go from 0 to 1000. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:56:54 -05:00
Glauber Costa	1671d9c433	factor out some of the controller code The control algorithm we are using for memtables have proven itself quite successful. We will very likely use the same for other processes, like compactions. Make the code a bit more generic, so that a new controller has to only set the desired parameters Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-03 19:56:54 -05:00
Raphael S. Carvalho	818830715f	Fix potential infinite recursion when combining mutations for leveled compaction The issue is triggered by compaction of sstables of level higher than 0. The problem happens when interval map of partitioned sstable set stores intervals such as follow: [-9223362900961284625 : -3695961740249769322 ] (-3695961740249769322 : -3695961103022958562 ] When selector is called for first interval above, the exclusive lower bound of the second interval is returned as next token, but the inclusivess info is not returned. So reader_selector was returning that there were new readers when the current token was -3695961740249769322 because it was stored in selector position field as inclusive, but it's actually exclusive. This false positive was leading to infinite recursion in combined reader because sstable set's incremental selector itself knew that there were actually no new readers, and therefore no progress could be made. Fix is to use ring_position in reader_selector, such that inclusiveness would be respected. So reader_selector::has_new_readers() won't return false positive under the conditions described above. Fixes #2908. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-01-03 16:23:01 -02:00
Glauber Costa	ca284174d0	infrastructure for backlog estimator for compaction work. This patch adds infrastucture in various points in the system to allow us to determine the amount of work present as backlog from compactions. What needs to be done can be explained in three major pieces: 1) Add hooks in the points where sstables are added or inserted to a column family (or more precisely, to a compaction_strategy object). 2) Add hooks in reads and write monitors that allows a compaction backlog estimator (tracker) to become aware of bytes that are partially written and compacted away. 3) Add a per-column family class (compaction_backlog_tracker) that can be used to track work that is done and relevant to compactions (like the two above), and a compaction manager to provide a system-wide backlog based on the response of the individual trackers. The definition of how much backlog one has is strategy-specific. The Null strategy is easy, as it never really has any backlog, and so is the major strategy - since what it really matters is the backlog of the underlying compaction strategy. Although backlogs are strategy-specific, they should be "compatible", in the sense that if a particular strategy has more work to do, it should yield a higher number than its counterparts. All the others are presented in this patch as unimplemented: they will always advertise a mild backlog that should yield a constant CPU-utilization if used alone. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Glauber Costa	86d7c160fd	sstables: notify about end of data component write We need to notify the monitor that the offset tracker that we are using is about to be destroyed and will no longer be valid. While we could modify the file_writer interface so that we could capture the offset_tracker and take ownership of it - guaranteeing it is alive until we reach the existing on_write_completed(), this feels like a layer violation. It is also potentially useful in general to offer the monitor callers with knowledge that writing the data portion is done. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Glauber Costa	3bd6bceaf0	sstables: add read_monitor_generator Passing the read monitor down to the sstable readers is tricky. The point of interest - like compaction - are usually very far from the interfaces that register the monitor, like read_rows. Between the two, there is usually a mutation_reader, which is and ought to be totally unaware of the read monitor: technically, a mutation_reader may not even know it is backed by sstables. The solution is to create a read_monitor_generator, that can be passed from the upper layers, like compaction, to the layers that are actually making the decision of which sstables to create readers for. Note that we don't need an equivalent piece of infrastructure for writes, because writes don't happen through hidden layers and have all the information they need to initialize their monitors. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Glauber Costa	110b8531f4	sstables: enhance the file_writer with an offset tracker Callers, like the memtable flusher or compactions will be able to find out the current amount of bytes written at any time. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:07 -05:00
Glauber Costa	00df0a5ad3	sstables: pass references instead of pointers for write_monitor This came from Avi's review on the read_monitors. He suggests we wouldn't keep shared pointers, and would instead have the caller ensuring lifetime. That makes sense, but having the writer interface using shared_ptr and the read interface using references would lead to an inconsistent interface. For the sake of consistency we will change the write monitor to take references before we do that. From database.cc's perspective, we could now keep the monitors in a do_with() block, but we will keep the shared_ptrs to manage their lifetime in anticipation of upcoming patches in this series, where we'll have to pass them somewhere else. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-02 18:43:06 -05:00
Avi Kivity	8795238869	Merge "Fix handling of range tombstones starting at same position" from Tomasz "When we get two range tombstones with the same lower bound from different data sources (e.g. two sstable), which need to be combined into a single stream, they need to be de-overlapped, because each mutation fragment in the stream must have a different position. If we have range tombstones [1, 10) and [1, 20), the result of that de-overlapping will be [1, 10) and [10, 20]. The problem is that if the stream corresponds to a clustering slice with upper bound greater than 1, but lower than 10, the second range tombstone would appear as being out of the query range. This is currently violating assumptions made by some consumers, like cache populator. One effect of this may be that a reader will miss rows which are in the range (1, 10) (after the start of the first range tombstone, and before the start of the second range tombstone), if the second range tombstone happens to be the last fragment which was read for a discontinuous range in cache and we stopped reading at that point because of a full buffer and cache was evicted before we resumed reading, so we went to reading from the sstable reader again. There could be more cases in which this violation may resurface. There is also a related bug in mutation_fragment_merger. If the reader is in forwarding mode, and the current range is [1, 5], the reader would still emit range_tombstone([10, 20]). If that reader is later fast forwarded to another range, say [6, 8], it may produce fragments with smaller positions which were emitted before, violating monotonicity of fragment positions in the stream. A similar bug was also present in partition_snapshot_flat_reader. Possible solutions: 1) relax the assumption (in cache) that streams contain only relevant range tombstones, and only require that they contain at least all relevant tombstones 2) allow subsequent range tombstones in a stream to share the same starting position (position is weakly monotonic), then we don't need to de-overlap the tombstones in readers. 3) teach combining readers about query restrictions so that they can drop fragments which fall outside the range 4) force leaf readers to trim all range tombstones to query restrictions This patch implements solution no 2. It simplifies combining readers, which don't need to accumulate and trim range tombstones. I don't like solution 3, because it makes combining readers more complicated, slower, and harder to properly construct (currently combining readers don't need to know restrictions of the leaf streams). Solution 4 is confined to implementations of leaf readers, but also has disadvantage of making those more complicated and slower. There is only one consumer which needs the tombstones with monotonic positions, and that is the sstable writer. Fixes #3093." * tag 'tgrabiec/fix-out-of-range-tombstones-v1' of github.com:scylladb/seastar-dev: tests: row_cache: Introduce test for concurrent read, population and eviction tests: sstables: Add test for writing combined stream with range tombstones at same position tests: memtable: Test that combined mutation source is a mutation source tests: memtable: Test that memtable with many versions is a mutation source tests: mutation_source: Add test for stream invariants with overlapping tombstones tests: mutation_reader: Test fast forwarding of combined reader with overlapping range tombstones tests: mutation_reader: Test combined reader slicing on random mutations tests: mutation_source_test: Extract random_mutation_generator::make_partition_keys() mutation_fragment: Introduce range() clustering_interval_set: Introduce overlaps() clustering_interval_set: Extract private make_interval() mutation_reader: Allow range tombstones with same position in the fragment stream sstables: Handle consecutive range_tombstone fragments with same position tests: streamed_mutation_assertions: Merge range_tombstones with the same position in produces_range_tombstone() streamed_mutation: Introduce peek() mutation_fragment: Extract mergeable_with() mutation_reader: Move definition of combining mutation reader to source file mutation_reader: Use make_combined_reader() to create combined reader	2018-01-02 18:32:09 +02:00
Duarte Nunes	89b353cd95	Delete unused nway_merger.hh Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1514463536-7732-1-git-send-email-duarte@scylladb.com>	2017-12-28 14:21:40 +02:00
Tomasz Grabiec	52285a9e73	mutation_reader: Use make_combined_reader() to create combined reader So that we can hide the definition of combined_mutation_reader. It's also less verbose.	2017-12-21 21:24:11 +01:00
Piotr Jastrzebski	308ec43ea5	cf::for_all_partitions::iteration_state: don't store schema_ptr Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 11:47:07 +01:00
Piotr Jastrzebski	570703a169	read_mutation_from_flat_mutation_reader: don't take schema_ptr Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 11:47:07 +01:00
Paweł Dziepak	3cf46a31a6	flat_multi_range_mutation_reader: disallow streamed_mutation::forwarding Properly implementing streamed_mutation::forwarding::yes in multi range reader would noticeably increase its complexity and is not needed.	2017-12-20 14:50:11 +00:00
Avi Kivity	2137d753b3	Merge "Serialize compaction of same size tier for different cfs" from Raphael "Currently, compaction manager will serialize compaction of same size tier (or weight) if they belong to the same column family. However, it fails to do so if the compaction jobs belong to different column families. That can lead to an ungodly amount of running compaction which gets worse the higher the number of shards and active column families. The problem is that it may affect overall system performance due to excessive resource usage. It's easy to trigger it during bootstraping after loading node with new sstables or repairing, or if lots of cfs are being actively written." Fixes #1295. * 'similar_sized_compaction_serialization_v4' of github.com:raphaelsc/scylla: sstables: remove column_family from compaction_weight_registration compaction_manager: serialize compaction of same size tier for different cfs sstables: introduces deregister() and weight() to compaction_weight_registration sstables: move compaction_weight_registration to its own header sstables: improve compact_sstables() interface	2017-12-19 16:32:27 +02:00
Piotr Jastrzebski	570fc5afed	Use row_cache::make_flat_reader in column_family::make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <ba1659ceed8676f45942ce6e7506158026947345.1513687259.git.piotr@scylladb.com>	2017-12-19 14:42:32 +02:00
Raphael S. Carvalho	eff62bc61e	compaction_manager: serialize compaction of same size tier for different cfs Currently, compaction manager will serialize compaction of same size tier (or weight) if they belong to the same column family. However, it fails to do so if the compaction jobs belong to different column families. That can lead to an ungodly amount of running compaction which gets worse the higher the number of shards and active column families. The problem is that it may affect overall system performance due to excessive resource usage. It's easy to trigger it during bootstraping after loading node with new sstables or repairing, or if lots of cfs are being actively written. That being said, compaction jobs of same size tier are now serialized on a given shard, such that maximum number of compaction (system wise) is now: (SHARDS) * (SIZE TIERS) instead of: (SHARDS) * (COLUMN FAMILIES) * (SIZE TIERS) We'll work hard to release a size tier (weight) for a column family waiting on it as fast as possible, given that we wouldn't like to underutilize resources available for compaction. We want one starting after the other. Compaction for a column family that cannot run now because the size tier is taken, will be postponed. There's a worker that will be sleeping on a condition variable that will be signalled whenever a compaction completes. FIFO ordering is used on postponed list for fairness. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:42:48 -02:00
Raphael S. Carvalho	49f3cfe746	sstables: improve compact_sstables() interface Motivation is that a new field in the descriptor will be forwarded to compaction procedure without extending parameter list even more. Also beautifies the interface, making it concise and easier to play with. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-17 17:22:19 -02:00
Paweł Dziepak	8e0da776ab	db: convert single_key_sstalbe_reader to flat streams Before flat mutation readers sstable::read_row() returned a future<streamed_mutation>. That required a helper reader that would wait for the streamed_mutations from all relevant sstables to be created and then construct a mutation merger. With flat mutation readers sstable::read_row_flat() returns a flat_mutation_reader (no futures) so that the code can be simplified by collecting all the relevant readers and creating a combined reader without suspension points. The unfortunate disadvantage of the flat_mutation_reader-based approach is the fact that combined reader now needlessly compares the partition keys even though we know that we read only a single partition, but optimising that is out of scope of this patch.	2017-12-13 12:01:03 +00:00
Paweł Dziepak	24026a0c7d	db: fully convert incremental_reader_selector to flat readers	2017-12-13 12:01:03 +00:00
Paweł Dziepak	73b3d02cc0	db: make make_range_sstable_reader() return flat reader	2017-12-13 12:01:03 +00:00
Paweł Dziepak	8b3c3fc832	db: make column_family::make_reader() return flat reader	2017-12-13 12:01:03 +00:00
Paweł Dziepak	e12959616c	db: make column_family::make_sstable_reader() return a flat reader	2017-12-13 12:01:03 +00:00
Paweł Dziepak	a0a13ceb46	filtering_reader: switch to flat mutation fragment streams	2017-12-13 12:01:03 +00:00
Paweł Dziepak	3bbb3b300d	filtering_reader: pass a const dht::decorated_key& to the callback All users of the filtering reader need only the decorated key of a partition, but currently the predicate is given a reference to streamed_mutations which are obsolete now.	2017-12-13 11:57:27 +00:00
Paweł Dziepak	f3901eb154	db: use make_restricted_flat_reader	2017-12-13 10:46:41 +00:00
Glauber Costa	1aabbc75ab	database: delete created SSTables if streaming writes fail We have had an issue recently where failed SSTable writes left the generated SSTables dangling in a potentially invalid state. If the write had, for instance, started and generated tmp TOCs but not finished, those files would be left for dead. We had fixed this in commit `b7e1575ad4`, but streaming memtables still have the same isse. Note that we can't fix this in the common function write_memtable_to_sstable because different flushers have different retry policies. Fixes #3062 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20171213011741.8156-1-glauber@scylladb.com>	2017-12-13 10:09:20 +02:00
Avi Kivity	d934ca55a7	Merge "SSTable resharding fixes" from Raphael "Didn't affect any release. Regression introduced in `301358e`. Fixes #3041" * 'resharding_fix_v4' of github.com:raphaelsc/scylla: tests: add sstable resharding test to test.py tests: fix sstable resharding test sstables: Fix resharding by not filtering out mutation that belongs to other shard db: introduce make_range_sstable_reader rename make_range_sstable_reader to make_local_shard_sstable_reader db: extract sstable reader creation from incremental_reader_selector db: reuse make_range_sstable_reader in make_sstable_reader	2017-12-07 16:42:48 +02:00
Raphael S. Carvalho	f1b65a115a	db: introduce make_range_sstable_reader introduce reader variant that will allow its caller to read a range in a given table without any filter applied. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 03:15:26 -02:00
Raphael S. Carvalho	d1b146baa6	rename make_range_sstable_reader to make_local_shard_sstable_reader Tomek says: "I think that the least surprising behavior for a function named like this is to read the sstables unfiltered (it just reads them), and the filtering should be indicated specially in the name or by accepting a parameter." Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 03:15:25 -02:00
Raphael S. Carvalho	3d725d6823	db: extract sstable reader creation from incremental_reader_selector step closer to divorcing incremental_selector from sstables Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 01:53:16 -02:00
Raphael S. Carvalho	ab82bacddd	db: reuse make_range_sstable_reader in make_sstable_reader Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-07 01:53:14 -02:00
Raphael S. Carvalho	1d0e6496ec	gc_clock: introduce operator<<(ostream&, gc_clock::time_point) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-12-06 19:52:32 -02:00
Paweł Dziepak	ce9a890940	incremental_reader_selector: do not use read_range_rows()	2017-12-05 14:53:14 +00:00
Paweł Dziepak	bccca90207	database: use read_row_flat() instead of read_row()	2017-12-05 14:52:57 +00:00
Botond Dénes	8731c1bc66	Flatten the implementation of combined_mutation_reader In fact flatten mutation_reader_merger and adjust combined_mutation_reader accordingly.	2017-12-04 07:57:43 +02:00
Botond Dénes	3f8110b5b6	Make combined_mutation_reader a flat_mutation_reader For now only the interface is converted, behind the scenes the previous implementation remains, it's output is simply converted by flat_mutation_reader_from_mutation_reader. The implementation will be converted in the following patches.	2017-12-04 07:57:43 +02:00
Tomasz Grabiec	fd7ab5fe99	database: Move operator<<() overloads to appropriate source files	2017-12-01 10:52:37 +01:00
Paweł Dziepak	32eb6437fd	memtable: make make_flush_reader() return flat_mutation_reader	2017-11-27 20:07:22 +01:00
Paweł Dziepak	11b32276e6	sstables: switch write_components() to flat_mutation_reader	2017-11-23 18:14:31 +00:00
Piotr Jastrzebski	6cd4b6b09c	Remove sstable_range_wrapping_reader The wrapper is no longer needed because read_range_rows returns ::mutation_reader instead of sstables::mutation_reader and the reader returned from it keeps the pointer to shared_sstable that was used to create the reader. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-11-15 10:40:02 +01:00
Paweł Dziepak	dca93bea23	db: convert make_streaming_reader() to flat_mutation_reader	2017-11-13 16:49:52 +00:00
Paweł Dziepak	37640f223b	db: drop single-range make_streaming_reader()	2017-11-13 16:49:52 +00:00
Glauber Costa	a6b2226562	dirty_memory_manager: block if we hit the real dirty limit Since we started accounting virtual dirty memory we no longer have a cap on real dirty memory. In most situations that is not needed, since real dirty will just be at most twice as much as virtual dirty (current flushing memtable plus new memtable). However, due to things like cache updates and component flushing we can end up having a lot of memtables that are virtually freed but not yet fully released, leading real dirty memory to explode using all the box' memory. This patch adds a cap on real dirty memory as well. Because of the hierarchical nature of region_group, if the parent blocks due to memory depletion, so will the child (virtual dirty region group). A next step is to add a controller that will increase the priority of the tasks involving in releasing real dirty memory if we get dangerously close to the threshold. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-11-08 16:21:44 -05:00
Avi Kivity	d6cd44a725	Revert "Merge 'Single key sstable reader optimization' from Botond" This reverts commit `5e9cd128ad`, reversing changes made to `1f4e6759a7`. Tomek found some serious issues.	2017-10-19 12:47:21 +03:00
Botond Dénes	dfe312ca3a	Add counters for the single-key reader optimization Add two counters, one to determine how many of the reads fall into the optimization, and a second one to determine it's effectiveness. The first one is single_key_reader_optimization_hit_rate. It contains the rate of reads that the optimization applies to out of all the reads that go into the single_key_sstable_reader. The second one, single_key_reader_optimization_extra_read_proportion is a histogram of the efficiency of the optimization. It contains the proportion of extra data-sources read. It's a number between 0 and 1, where 0 is the best case (only one data-source was read) and 1 is the worst case (all data-sources were read eventually). This is the same number that is used for the threshold option (see previous patch). Each of the histogram's buckets cover a chunk of 0.1 from the [0, 1] range. Note that single_key_parallel_scan_threshold effectively provides an upper bound for the proportion as the optimization is turned off as soon as it goes above that number. The counters are disabled if single_key_parallel_scan_threshold is set to 0 disabling the optimization entirely.	2017-10-18 17:24:03 +03:00
Botond Dénes	08502f2d48	Add single_key_parallel_scan_threshold option This option regulates when exactly the single-key optimization is considered ineffective and turned off. The threshold is the proportion of the extra data source candidates that can be read before the optimization is considered ineffective and disabled. The proportion is calculated as follows: (read_data_sources - 1) / (total_data_sources - 1) We substract 1 from the read_data_sources and total_data_sources to effectively measure the rate of extra data sources we read. This makes sure that the proportion is meaningful even if e.g. we have only have a total of 2 data-sources and we read only 1 (best case). Whenever this number goes above the threshold the optimization is disabled. The threshold is number between 0 and 1, 0 forces the optimization off and 1 forces it on. Increase the treshold to favor throughput over latency for single-row reads, decrease the treshold to improve latency at the expense of throughput. If the threshold is > 0 (it's not force disabled) and the optimization is disabled due to a read crossing the threshold, we will issue "probing" reads (every 100th read) to determine if the optimization is worth re-enabling. Probing reads are allowed to run through the optimization path and if they go below the threshold the optimization is re-enabled.	2017-10-18 17:24:03 +03:00

1 2 3 4 5 ...

966 Commits