scylladb

Author	SHA1	Message	Date
Amnon Heiman	064f5e1b63	row_cache: switch to the metrics layer registration This patch moves the row_cache metrics registration from collectd to the metric layer. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20170321143812.785-3-amnon@scylladb.com>	2017-03-21 16:42:58 +02:00
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Glauber Costa	facb0aa6d9	row_cache: rewrite loop so that debug mode doesn't become a noop need_preempt() is always true in debug mode. Because of that, this loop will never be executed. Rewrite it as a do-while loop so we are sure that it is executed at least once - or exactly once in debug mode. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1485913079-1283-1-git-send-email-glauber@scylladb.com>	2017-02-01 10:02:13 +02:00
Glauber Costa	69dbb3e108	row_cache: yield if need_preempt(), even if there is quota left. The quota check is quite old at the moment, and dates back to a time in which the infrastructure in seastar threads was lacking a lot. It is a bad check since it will not take into consideration the size of the partition or the time it takes to merge them. A better check would at least take need_preempt() into account, so that we would respect the task quota. That check is now embedded into should_yield(), so there would no need to check anything else. Although should_yield() does the job, it is still currently quite expensive. And because we are in a seastar thread with a computationally intensive loop, it can hurt latency a lot. So as a temporary measure, let's at least check for need_preempt() - as it is hurting real users at the moment - and soon work on making should_yield() cheaper. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-01-26 22:10:54 -05:00
Glauber Costa	0e1f64b163	row_cache: add systemtap markers for the update process update is one of our biggest sources of performance issues as far as the cache is concerned. systemtap can be useful in helping tracking some of them down. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-01-26 21:56:32 -05:00
Tomasz Grabiec	d048eec254	row_cache: Fix stats handling for uncached wide partitions Report hitting wide partition dummy as a cache miss instead of a hit. Refs #2011 Message-Id: <1484302266-3828-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 09:58:04 +00:00
Tomasz Grabiec	87f15624f4	row_cache: Add counter for wide partition mispopulations Message-Id: <1484733250-14470-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 09:57:51 +00:00
Tomasz Grabiec	78844fa2e5	db: Use incremental selector in partition_presence_checker This reduces the number of sstables we need to check to only those whose token range overlaps with the key. Reduces cache update time. Especially effective with leveled compaction strategy. Refs #1943. Incremental selector works with an immutable sstable set, so cache updates need to be serialized. Otherwise we could mispopulate due to stale presence information. Presence checker interface was changed to accept decorated key in order to gain easy access to the token, which is required by the incremental selector.	2016-12-19 14:20:58 +01:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Tomasz Grabiec	1b5f338c17	memtable: Track flushed memory in memtable object	2016-12-05 12:59:09 +01:00
Paweł Dziepak	f877be50b0	Merge "Keep wide partition cache entry longer than others" from Piotr "Cache entries for wide partitions are usually smaller than other entries and the cost of recreating them is higher so it makes sense to keep them longer than ordinary entries."	2016-11-15 20:44:52 +00:00
Paweł Dziepak	999dafbe57	row_cache: touch entries read during range queries Fixes #1847. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1479230809-27547-1-git-send-email-pdziepak@scylladb.com>	2016-11-15 18:54:11 +01:00
Piotr Jastrzebski	5ec668c9c6	Add separate LRU for wide partitions. Evict wide partitions only every 1000 normal partition evictions. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-11-15 16:19:13 +01:00
Piotr Jastrzebski	9a41bfbf69	Add collectd metric for wide partition evictions. This will allow us to see how big is an amount of evictions of cached info about wide partitions. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-11-15 15:53:14 +01:00
Paweł Dziepak	a8308e2a8d	row_cache: dummy entry does not count as partition Since continuity flag introduction row cache contains a single dummy entry. cache_tracker knows nothing about it so that it doesn't appear in any of the metrics. However, cache destructor calls cache_tracker::on_erase() for every entry in the cache including the dummy one. This is incorrect since the tracker wasn't informed when the dummy entry was created. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1478608776-10363-1-git-send-email-pdziepak@scylladb.com>	2016-11-08 13:54:44 +01:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Paweł Dziepak	a7224ae46e	row_cache: avoid dereferencing invalid iterator Conditions in row_cache::do_find_or_create_entry() make it possible that std::prev(it) is going to be dereferenced even if it is a begin iterator. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-26 15:24:23 +01:00
Paweł Dziepak	654f651e0c	row_cache: set _first_element flag correctly If the continuity flag was set for the first element _first_element flag would not be cleared. This shouldn't cause any correctness problems but properly setting the flag allows to avoid some unnecessary key comparisons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-26 15:07:24 +01:00
Paweł Dziepak	567ff96f2a	row_cache: fix clearing continuity flag at eviction In original implementation the continuity flag indicated that cache has full information about the range the between current partition and the one following it, hence when evicting an entry the one preceeding it had to have its continuity flag cleared. This was changed, however, and now the continuiy flag tells whether the cache is continuous between the current element and the one before it. This means that eviction code needs to clear the flag for the entry directly following the evicted one. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-26 14:58:20 +01:00
Paweł Dziepak	5ff699e09f	row_cache: rework cache to use fast forwarding reader This uncomfortably large patch overhauls cache range reader so that it can take advantage of fast forwarding mutation readers. A significant change in the cache itself is that the continuity flag now is used to determine whether cache is contiguous between the previous entry and the current one. This allows for a significant simplification of the cache code and easier integration with reader fast forwarding. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	18acb0c0e6	row_cache: put cache entry flags in a struct Flags are easier to manage if they are in a single structure. Especially, default initialization and move contstructors are simpler and less error prone. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f248e23db5	row_cache: add do_find_or_create_entry() to reduce code duplication Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Avi Kivity	9ac441d3b5	range: adjust split_after to allow split_point outside input range Make split_after() more generic by allowing split_point to be anywhere, not just within the input range. If the split_point is before, the entire range is returned; and if it is after, stdx::nullopt is returned. "before" and "after" are not well defined for wrap-around ranges, so but we are phasing them out and soon there will not be wrapping_range::split_after() users. This is a prerequisite for converting partition_range and friends to nonwrapping_range. Message-Id: <1475765099-10657-1-git-send-email-avi@scylladb.com>	2016-10-06 17:54:44 +02:00
Duarte Nunes	f864bca773	row_cache: Deal with side-effects in allocating_section In row_cache::make_reader, we update statistics inside an allocating_section, which retries the supplied function until it can satisfy all allocations by way of reserving LSA memory up front. Since those updates are interleave with allocations, retries can lead to miscounts. This patch fixes this by updating statistics after all allocations. Fixes #1659 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>	2016-09-14 10:46:25 +01:00
Glauber Costa	dc5d8e33af	Revert "row_cache: update sstable histograms on cache hits" This reverts commit `1726b1d0cc`. Reverting this patch turns our SSTable access counter into a miss counter only. The estimated histogram always starts its first bucket at 1, so by marking cache accesses we will be wrongly feeding "1" into the buckets. Notice that this is not yet ideal: nodetool is supposed to show a histogram of all reads, and by doing this we are changing its meaning slightly. Workloads that serve mostly from cache will be distorted towards their misses. The real solution is to use a different histogram, but we will need to enforce a newer version of nodetool for that: the current issue is that nodetool expects an EstimatedHistogram in a specific format in the other side. Conflicts: row_cache.hh Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy lladb.com> Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-09-01 18:07:31 +03:00
Duarte Nunes	9269256246	row_cache: Accept a trace_state_ptr This patch changes the row_cache so it accepts a trace_state_ptr, which it is responsible of flowing to the underlying mutation_reader if needed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:00:55 +02:00
Glauber Costa	1726b1d0cc	row_cache: update sstable histograms on cache hits If we have a cache hit, we still need to update our sstable histogram - notting that we have touched 0 SSTables. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:14:22 -04:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Avi Kivity	fbc3377ad4	row_cache: add a counter for a miss that did not result in an insertion Such misses are due to concurrent access to the same key. Add a counter to track this as it results in unnecessary I/O being performed. See #1534. Message-Id: <1470139871-14693-1-git-send-email-avi@scylladb.com>	2016-08-02 14:14:27 +02:00
Piotr Jastrzebski	ca9c29e296	Cache information about partition being wide Once we encounter a wide partition store information about this in cache entry and don't try to read it all and cache next time it's requested. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> [Paweł: rebased, moved large partition reading logic to cache_entry::read_wide()] Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 18:39:22 +01:00
Paweł Dziepak	ee1f1ee1c4	row_cache: fix creating readers for large partitions There were cases of use-after-free introduced by the code responsible for creating mutation_readers for large partitions – the lifetimes of partition ranges and the readers themselves weren't sufficiently extended. Another problem, was that if the partition was no longer present in the sstable the reader would return EOS which was then returned by range_populating_reader itself causing its users to incorrectly interpret that as an end of stream. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 17:02:17 +01:00
Piotr Jastrzebski	fdfd1af694	Use continuity flag correctly with concurrent invalidations Between reading cache entry and actually using it invalidations can happen so we have to check if no flag was cleared if it was we need to read the entry again. Fixes #1464. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com>	2016-07-28 11:55:18 +01:00
Piotr Jastrzebski	37a7d49676	Add collectd counter for uncached wide partitions. Keep track of every read of wide partition that's not cached. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:47:49 +02:00
Piotr Jastrzebski	636a4acfd0	Add flag to configure max size of a cached partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:47:20 +02:00
Piotr Jastrzebski	98c12dc2e2	Try to read whole streamed_mutation up to limit If limit is exceeded then return the streamed_mutation and don't cache it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:35:35 +02:00
Paweł Dziepak	81e4952c78	row_cache: fix marking last entry as continuous Range queries need to take special care when transitioning between ranges that are read from sstables and ranges that are already in the cache. Original code in such case just started a secondary reader and told it to unconditionally mark the last entry as continuous (primary reader has already returned an element tha immediately follows the range that is going to be read form sstables). However, that information may get stale. For instance, by the time secondary reader finish reading its range the element immediately following it may get evicted from the cache thus causing continuity flag to be incorrectly set. The solution is to ensure that the element immediately after the range read from sstables is still in the cache. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1468586893-15266-1-git-send-email-pdziepak@scylladb.com>	2016-07-15 15:15:02 +02:00
Avi Kivity	9a8788019d	row_cache: fix visitor for boost <= 1.55 Older boosts can't return a future from a visitor (likely lacking support for move-only objects). Supply a dirty hackaround. Message-Id: <1467822548-25940-1-git-send-email-avi@scylladb.com>	2016-07-06 19:55:51 +03:00
Glauber Costa	d41fcd45d1	memtables: make memtable inherit from region The LSA memory pressure mechanism will let us know which region is the best candidate for eviction when under pressure. We need to somehow then translate region -> memtable -> column family. The easiest way to convert from region to memtable, is having memtable inherit from region. Despite the fact that this requires multiple inheritance, which always raise a flag a bit, the other class we inherit from is enable_shared_from_this, which has a very simple and well defined interface. So I think it is worthy for us to do it. Once we have the memtable, grabing the column family is easy provided we have a database object. We can grab it from the schema. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:29 -04:00
Piotr Jastrzebski	59d0d9e666	Fix cache_tracker::clear Make sure that artificial entries for all column families are set to non continuous. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f9e517fe40482c05f6c388faab7d6b9eca6b159e.1467103548.git.piotr@scylladb.com>	2016-06-28 11:18:23 +02:00
Piotr Jastrzebski	27575a0528	Fix previous_entry_is_continuous Rename it to check_previous_entry. Remove unnesessary test. Make sure ring_position always has working relation_to_keys method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>	2016-06-28 10:27:08 +02:00
Piotr Jastrzebski	68e5a199e9	Clean continuous flag of cache entry preceeding invalidated decorated key even when it's not found. Add test. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c7b8f4df37256363bf304e0396f84b5f37921b81.1467059472.git.piotr@scylladb.com>	2016-06-28 10:26:02 +02:00
Piotr Jastrzebski	cd9f3f94c4	Fix row_cache::update Clear continuous flag on the last cache entry with key smaller than a partition being dropped from memtable on flush and not saved in cache. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <0b5293cc0bf8bb858e62aa8dd00ae7fe7a484380.1467059472.git.piotr@scylladb.com>	2016-06-28 10:25:38 +02:00
Piotr Jastrzebski	eb959a8b81	Change check for artificial entry in cache_entry destructor from _key.has_key() to _lru_link.is_linked() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f6d3d1bc49d9f6dd5b67a10cbe862466047b039d.1467059472.git.piotr@scylladb.com>	2016-06-28 10:24:29 +02:00
Piotr Jastrzebski	9b011bff18	row_cache: add contiguity flag to cache entry to reduce disk IO during scans Add contiguity flag to cache entry and set it in scanning reader. Partitions fetched during scanning are continuous and we know there's nothing between them. Clear contiguity flag on cache entries when the succeeding entry is removed. Use continuous flag in range queries. Don't go do disk if we know that there's nothing between two entries we have in cache. We know that when continuous flag of the first one is set to true. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>	2016-06-23 09:43:15 +03:00
Paweł Dziepak	b2c37429e7	row_cache: drop slicing_reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	f605499aec	row_cache: fully support streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	2ab1a73efa	memtable: rename partition_entry to memtable_entry partition_entry is going to be a more general object used by both cache and memtable entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	737eb73499	mutation_reader: make readers return streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00

1 2 3

119 Commits