scylladb

Author	SHA1	Message	Date
Duarte Nunes	f864bca773	row_cache: Deal with side-effects in allocating_section In row_cache::make_reader, we update statistics inside an allocating_section, which retries the supplied function until it can satisfy all allocations by way of reserving LSA memory up front. Since those updates are interleave with allocations, retries can lead to miscounts. This patch fixes this by updating statistics after all allocations. Fixes #1659 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>	2016-09-14 10:46:25 +01:00
Glauber Costa	dc5d8e33af	Revert "row_cache: update sstable histograms on cache hits" This reverts commit `1726b1d0cc`. Reverting this patch turns our SSTable access counter into a miss counter only. The estimated histogram always starts its first bucket at 1, so by marking cache accesses we will be wrongly feeding "1" into the buckets. Notice that this is not yet ideal: nodetool is supposed to show a histogram of all reads, and by doing this we are changing its meaning slightly. Workloads that serve mostly from cache will be distorted towards their misses. The real solution is to use a different histogram, but we will need to enforce a newer version of nodetool for that: the current issue is that nodetool expects an EstimatedHistogram in a specific format in the other side. Conflicts: row_cache.hh Message-Id: <a599fa9e949766e7c9697450ae34fc28e881e90a.1472742276.git.glauber@scy lladb.com> Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-09-01 18:07:31 +03:00
Duarte Nunes	9269256246	row_cache: Accept a trace_state_ptr This patch changes the row_cache so it accepts a trace_state_ptr, which it is responsible of flowing to the underlying mutation_reader if needed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:00:55 +02:00
Glauber Costa	1726b1d0cc	row_cache: update sstable histograms on cache hits If we have a cache hit, we still need to update our sstable histogram - notting that we have touched 0 SSTables. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:14:22 -04:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Avi Kivity	fbc3377ad4	row_cache: add a counter for a miss that did not result in an insertion Such misses are due to concurrent access to the same key. Add a counter to track this as it results in unnecessary I/O being performed. See #1534. Message-Id: <1470139871-14693-1-git-send-email-avi@scylladb.com>	2016-08-02 14:14:27 +02:00
Piotr Jastrzebski	ca9c29e296	Cache information about partition being wide Once we encounter a wide partition store information about this in cache entry and don't try to read it all and cache next time it's requested. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> [Paweł: rebased, moved large partition reading logic to cache_entry::read_wide()] Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 18:39:22 +01:00
Paweł Dziepak	ee1f1ee1c4	row_cache: fix creating readers for large partitions There were cases of use-after-free introduced by the code responsible for creating mutation_readers for large partitions – the lifetimes of partition ranges and the readers themselves weren't sufficiently extended. Another problem, was that if the partition was no longer present in the sstable the reader would return EOS which was then returned by range_populating_reader itself causing its users to incorrectly interpret that as an end of stream. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 17:02:17 +01:00
Piotr Jastrzebski	fdfd1af694	Use continuity flag correctly with concurrent invalidations Between reading cache entry and actually using it invalidations can happen so we have to check if no flag was cleared if it was we need to read the entry again. Fixes #1464. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <7856b0ded45e42774ccd6f402b5ee42175bd73cf.1469701026.git.piotr@scylladb.com>	2016-07-28 11:55:18 +01:00
Piotr Jastrzebski	37a7d49676	Add collectd counter for uncached wide partitions. Keep track of every read of wide partition that's not cached. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:47:49 +02:00
Piotr Jastrzebski	636a4acfd0	Add flag to configure max size of a cached partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:47:20 +02:00
Piotr Jastrzebski	98c12dc2e2	Try to read whole streamed_mutation up to limit If limit is exceeded then return the streamed_mutation and don't cache it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-07-21 09:35:35 +02:00
Paweł Dziepak	81e4952c78	row_cache: fix marking last entry as continuous Range queries need to take special care when transitioning between ranges that are read from sstables and ranges that are already in the cache. Original code in such case just started a secondary reader and told it to unconditionally mark the last entry as continuous (primary reader has already returned an element tha immediately follows the range that is going to be read form sstables). However, that information may get stale. For instance, by the time secondary reader finish reading its range the element immediately following it may get evicted from the cache thus causing continuity flag to be incorrectly set. The solution is to ensure that the element immediately after the range read from sstables is still in the cache. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1468586893-15266-1-git-send-email-pdziepak@scylladb.com>	2016-07-15 15:15:02 +02:00
Avi Kivity	9a8788019d	row_cache: fix visitor for boost <= 1.55 Older boosts can't return a future from a visitor (likely lacking support for move-only objects). Supply a dirty hackaround. Message-Id: <1467822548-25940-1-git-send-email-avi@scylladb.com>	2016-07-06 19:55:51 +03:00
Glauber Costa	d41fcd45d1	memtables: make memtable inherit from region The LSA memory pressure mechanism will let us know which region is the best candidate for eviction when under pressure. We need to somehow then translate region -> memtable -> column family. The easiest way to convert from region to memtable, is having memtable inherit from region. Despite the fact that this requires multiple inheritance, which always raise a flag a bit, the other class we inherit from is enable_shared_from_this, which has a very simple and well defined interface. So I think it is worthy for us to do it. Once we have the memtable, grabing the column family is easy provided we have a database object. We can grab it from the schema. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:29 -04:00
Piotr Jastrzebski	59d0d9e666	Fix cache_tracker::clear Make sure that artificial entries for all column families are set to non continuous. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f9e517fe40482c05f6c388faab7d6b9eca6b159e.1467103548.git.piotr@scylladb.com>	2016-06-28 11:18:23 +02:00
Piotr Jastrzebski	27575a0528	Fix previous_entry_is_continuous Rename it to check_previous_entry. Remove unnesessary test. Make sure ring_position always has working relation_to_keys method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>	2016-06-28 10:27:08 +02:00
Piotr Jastrzebski	68e5a199e9	Clean continuous flag of cache entry preceeding invalidated decorated key even when it's not found. Add test. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c7b8f4df37256363bf304e0396f84b5f37921b81.1467059472.git.piotr@scylladb.com>	2016-06-28 10:26:02 +02:00
Piotr Jastrzebski	cd9f3f94c4	Fix row_cache::update Clear continuous flag on the last cache entry with key smaller than a partition being dropped from memtable on flush and not saved in cache. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <0b5293cc0bf8bb858e62aa8dd00ae7fe7a484380.1467059472.git.piotr@scylladb.com>	2016-06-28 10:25:38 +02:00
Piotr Jastrzebski	eb959a8b81	Change check for artificial entry in cache_entry destructor from _key.has_key() to _lru_link.is_linked() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f6d3d1bc49d9f6dd5b67a10cbe862466047b039d.1467059472.git.piotr@scylladb.com>	2016-06-28 10:24:29 +02:00
Piotr Jastrzebski	9b011bff18	row_cache: add contiguity flag to cache entry to reduce disk IO during scans Add contiguity flag to cache entry and set it in scanning reader. Partitions fetched during scanning are continuous and we know there's nothing between them. Clear contiguity flag on cache entries when the succeeding entry is removed. Use continuous flag in range queries. Don't go do disk if we know that there's nothing between two entries we have in cache. We know that when continuous flag of the first one is set to true. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>	2016-06-23 09:43:15 +03:00
Paweł Dziepak	b2c37429e7	row_cache: drop slicing_reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	f605499aec	row_cache: fully support streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	2ab1a73efa	memtable: rename partition_entry to memtable_entry partition_entry is going to be a more general object used by both cache and memtable entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	737eb73499	mutation_reader: make readers return streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	5b45d46f82	row_cache: simplify slicing_reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	dde87e0b0e	row_cache: drop schema upgrade for new entries in update() Commit `daad2eb` "row_cache: fix memory leak in case of schema upgrade failure" has fixed a memory leak caused by failed upgrade_entry(). However, in case of upgrade failure memtable_entry used to create the new cache entry was left in some invalid state. If the operation was retried the cache would attempt again to apply that memtable_entry which now would be in invalid state. The solution is to either to ignore upgrade_entry() exceptions or do not call it at all and let the cache entry be upgraded on demand. This patch implements the latter. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>	2016-06-17 13:43:01 +02:00
Paweł Dziepak	daad2ebf81	row_cache: fix memory leak in case of schema upgrade failure When update() causes a new entry to be inserted to the cache the procedure is as follows: 1. allocate and construct new entry 2. upgrade entry schema 3. add entry to lru list and cache tree Step 2 may fail and at this point the pointer to the entry is neither protected by RAII nor added in any of the cache containers. The solution is to swap steps 2 and 3 so that even if the upgrade fails the entry is already owned by the cache and won't leak. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>	2016-06-17 13:12:01 +02:00
Avi Kivity	465c0a4ead	Merge "Make stronger guarantees in row_cache's clear/invalidate" from Tomasz "Correctness of current uses of clear() and invalidate() relies on fact that cache is not populated using readers created before invalidation. Sstables are first modified and then cache is invalidated. This is not guaranteed by current implementation though. As pointed out by Avi, a populating read may race with the call to clear(). If that read started before clear() and completed after it, the cache may be populated with data which does not correspond to the new sstable set. To provide such guarantee, invalidate() variants were adjusted to synchronize using _populate_phaser, similarly like row_cache::update() does. Fixes #1291."	2016-06-13 09:55:29 +03:00
Tomasz Grabiec	d5a2d7a88d	row_cache: Add eviciton and removal counters Fixes #1273. Message-Id: <1465315433-8473-1-git-send-email-tgrabiec@scylladb.com>	2016-06-08 16:08:32 -04:00
Tomasz Grabiec	170a214628	row_cache: Make stronger guarantees in clear/invalidate Correctness of current uses of clear() and invalidate() relies on fact that cache is not populated using readers created before invalidation. Sstables are first modified and then cache is invalidated. This is not guaranteed by current implementation though. As pointed out by Avi, a populating read may race with the call to clear(). If that read started before clear() and completed after it, the cache may be populated with data which does not correspond to the new sstable set. To provide such guarantee, invalidate() variants were adjusted to synchronize using _populate_phaser, similarly like row_cache::update() does.	2016-06-06 13:21:06 +02:00
Tomasz Grabiec	2ab18dcd2d	row_cache: Implement clear() using invalidate() Reduces code duplication.	2016-06-03 13:34:40 +02:00
Avi Kivity	9637c2232c	Merge "Move the JMX timer polling logic to Scylla" from Amnon	2016-05-24 13:07:52 +03:00
Amnon Heiman	468bcfbf1f	row_cache: Change counter to timed_rate_moving_average_and_histogram As part of moving the derived statistic in to scylla, this replaces the counter in the row_cache stats to timed_rate_moving_average_and_histogram. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-05-17 11:53:15 +03:00
Piotr Jastrzebski	dcba6f5c45	Pass clustering_row_ranges to mutation readers. This will allow readers to reduce the amount of data read. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 14:36:57 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	3997421b2c	row_cache: Let the cleanup guard do invalidation of unmerged partitions	2016-02-26 16:57:31 +01:00
Tomasz Grabiec	aa15268249	row_cache: Delete the entry even if invalidation failed Otherwise we will leak it, and region destructor will fail: row_cache_test: utils/logalloc.cc:1211: virtual logalloc::region_impl::~region_impl(): Assertion `seg->is_empty()' failed. Fixes regression in row_cache_test.	2016-02-26 16:57:31 +01:00
Tomasz Grabiec	be24816c8a	row_cache: Clear partitions with region locked Since invalidate() may allocate, we need to take the region lock to keep m.partitions references valid around whole clear_and_dispose(), which relies on that.	2016-02-26 16:57:31 +01:00
Avi Kivity	fbe6961827	row_cache: run partiton-touching operations of row_cache::update in a linearization context To avoid scattered keys (and values, though those are already protected) from being accessed, run the update procedure in a managed_bytes linearization context. Fixes #807.	2016-02-16 14:37:44 +02:00
Avi Kivity	ad58663c96	row_cache: reindent	2016-02-07 13:25:29 +02:00
Paweł Dziepak	490201fd1c	row_cache: protect against stale entries row_cache::update() does not explicitly invalidate the entries it failed to update in case of a failure. This could lead to inconsistency between row cache and sstables. In paractice that's not a problem because before row_cache::update() fails it will cause all entries in the cache to be invalidated during memory reclaim, but it's better to be safe and explicitly remove entries that should be updated but it was not possible to do so. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1453829681-29239-1-git-send-email-pdziepak@scylladb.com>	2016-01-26 20:34:41 +01:00
Glauber Costa	f6cfb04d61	add a priority class to mutation readers SSTables already have a priority argument wired to their read path. However, most of our reads do not call that interface directly, but employ the services of a mutation reader instead. Some of those readers will be used to read through a mutation_source, and those have to patched as well. Right now, whenever we need to pass a class, we pass Seastar's default priority class. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Tomasz Grabiec	d332fcaefc	row_cache: Restore indentation	2016-01-15 15:33:17 +01:00
Tomasz Grabiec	6b059fd828	row_cache: Guard against wrap-around range in make_reader()	2016-01-13 17:50:55 +01:00
Tomasz Grabiec	7fb0bc4e15	row_cache: Take the reclaim lock in invalidate() It's needed to keep the iterators valid in case eviciton is triggered somehwere in between. It probably isn't because destructors should not allocate, but better be safe.	2016-01-13 17:50:55 +01:00
Tomasz Grabiec	50cc0c162e	row_cache: Make invalidate() handle wrap-around ranges Currently for wrap around the "begin" iterator would not meet with the "end" iterator, invoking undefined behavior in erase_and_dispose() which results in a crash. Fixes #785	2016-01-13 17:50:55 +01:00
Tomasz Grabiec	d81a46d7b5	column_family: Add schema setters There is one current schema for given column_family. Entries in memtables and cache can be at any of the previous schemas, but they're always upgraded to current schema on access.	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00

1 2

95 Commits