scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Tomasz Grabiec	2cc27f72ca	memtable: Accept all mutation_source parameters	2017-02-23 18:23:52 +01:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Tomasz Grabiec	527ff6aa40	db: Clear memtable after flush when cache is disabled So that memory is released gradually (impacting latency less) and sooner than when memtable is destroyed. Active readers may keep the memtable alive for unbounded amount of time. Refs #1879	2016-12-05 12:59:09 +01:00
Tomasz Grabiec	1bba51319e	memtable: Maintain virtual dirty on clear() When memtable is flushing, it subtracts _flushed_memory from groups's size to gradually allow more writes. Ideally _flushed_memory would be equal to region's size when flush ends, so the group's size would reach zero. When the memtable and its region are gone the group size should remain the same as after the flush. This is ensured by adding back _flushed_memory to group's size right before the region is removed from the group. Calling clear() before region is removed from the group breaks the accounting because it will shrink the region, but will not affect the amount of memory subtracted due to _flushed_memory. So group's size would decrease more than we want (twice the region's size). The fix is to change clear() so that it reverts _flushed_memory by the amount by which the region size is reduced. This will keep the groups's size constant as long as _flushed_memory > 0.	2016-12-05 12:59:09 +01:00
Tomasz Grabiec	1b5f338c17	memtable: Track flushed memory in memtable object	2016-12-05 12:59:09 +01:00
Tomasz Grabiec	c3768fe4de	memtable: Pass dirty_memory_manager& to memtable constructor The implementation assumes that memtable's region group is owned by dirty_memory_manager, and tries to obtain a reference to it like this: boost::intrusive::get_parent_from_member(_region.group(), &dirty_memory_manager::_region_group)); This is undefined behavior when the region's group does not come from dirty manager. It's safer to be explicit about this dependency by taking a reference to dirty_memory_manager in the constructor.	2016-12-05 12:59:09 +01:00
Glauber Costa	0ca8c3f162	database: keep a pointer to the memtable list in a memtable We current pass a region group to the memtable, but after so many recent changes, that is a bit too low level. This patch changes that so we pass a memtable list instead. Doing that also has a couple of advantages. Mainly, during flush we must get to a memtable to a memtable_list. Currently we do that by going to the memtable to a column family through the schema, and from there to the memtable_list. That, however, involves calling virtual functions in a derived class, because a single column family could have both streaming and normal memtables. If we pass a memtable_list to the memtable, we can keep pointer, and when needed get the memtable_list directly. Not only that gets rid of the inheritance for aesthetic reasons, but that inheritance is not even correct anymore. Since the introduction of the big streaming memtables, we now have a plethora of lists per column family and this transversal is totally wrong. We haven't noticed before because we were flushing the memtables based on their individual sizes, but it has been wrong all along for edge cases in which we would have to resort to size-based flush. This could be the case, for instance, with various plan_ids in flight at the same time. At this point, there is no more reason to keep the derived classes for the dirty_memory_manager. I'm only keeping them around to reduce clutter, although they are useful for the specialized constructors and to communicate to the reader exactly what they are. But those can be removed in a follow up patch if we want. The old memtable constructor signature is kept around for the benefit of two tests in memtable_tests which have their own flush logic. In the future we could do something like we do for the SSTable tests, and have a proxy class that is friends with the memtable class. That too, is left for the future. Fixes #1870 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <811ec9e8e123dc5fc26eadbda82b0bae906657a9.1479743266.git.glauber@scylladb.com>	2016-11-21 18:18:27 +02:00
Paweł Dziepak	ef57b9a26f	rename memory_usage() to external_memory_usage() where applicable Renaming the function to external_memory_usage() makes it clear that sizeof(T) is not included, something that was a source of confusion in the past. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-11-18 11:25:36 +00:00
Glauber Costa	0b337dab14	memtable: add a method to expose the region_group That is technically not needed because a memtable inherits from group. So whenever we have a memtable, we can use it's group() method to obtain a group for it, and then from there go to the region_group. However, region() is a const method in the memtable, so we have to play trick with the const_cast, or remove the constness from the region. An alternative to that, which I prefer, is to expose a method for the region_group directly from the memtable object that does the right thing and bypasses all that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-11-16 21:20:24 -05:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Glauber Costa	7146776d7c	fix sstable tests by not using the flush_reader if no region_group The latest virtual dirty patches broke the SSTable tests. The reason for this is that those tests will flush synthetic memtables that do not have a region_group attached to it. Normally in cases like this we would just give the flush_reader an empty region group. However, the memtable class constructor takes a region_group pointer and that can be null according to the interface. So we must conditionally test it. If there isn't a region_group involved, the virtual dirty accounting should be disabled: after all, we won't even have the baseline memory to begin with. One of the approaches to fix this could be to just provide null accounter classes to be used as a surrogate for the accounting classes in this case. However, since this is mostly used for tests, a much simpler way is to just revert back to the scanning reader in that case. The scanning reader is similar enough to the flush_reader, except that it can handle partial ranges, slices, and delegate accesses to an sstable post-flush. We don't need any of that, but as argued above, there is no need to remove it either. Signed-off-by: Glauber Costa <glommer@scylladb.com> Message-Id: <1475667271-60806-1-git-send-email-glommer@scylladb.com>	2016-10-05 12:44:21 +01:00
Glauber Costa	f89a67c75c	database: allow virtual dirty memory management Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Glauber Costa	eee15578fb	memtables: split scanning reader in two The code that is common will live in its own reader, the iterator_reader. All friendly private access to memtable attributes and methods happen through the iterator reader. After this patch, we are now left with the scanning_reader - same as always, but now implemented on top of the iterator_reader, and a flush_reader, which will be used by SSTable flushes only. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Glauber Costa	16886eeb96	sstables: use special reader for writing a memtable Right now the special reader doesn't do much, but the idea is that we will soon replace it will a reader that specializes in flush, and is in turn able to provide read-side on-flush functionality like virtual dirty. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Glauber Costa	d41fcd45d1	memtables: make memtable inherit from region The LSA memory pressure mechanism will let us know which region is the best candidate for eviction when under pressure. We need to somehow then translate region -> memtable -> column family. The easiest way to convert from region to memtable, is having memtable inherit from region. Despite the fact that this requires multiple inheritance, which always raise a flag a bit, the other class we inherit from is enable_shared_from_this, which has a very simple and well defined interface. So I think it is worthy for us to do it. Once we have the memtable, grabing the column family is easy provided we have a database object. We can grab it from the schema. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:29 -04:00
Paweł Dziepak	6871bd5fa0	memtable: fully support streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:52 +01:00
Paweł Dziepak	2ab1a73efa	memtable: rename partition_entry to memtable_entry partition_entry is going to be a more general object used by both cache and memtable entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	80ab41a715	memtable reader: also include a priority class There are situations when a memtable is already flushed but the memtable reader will continue to be in place, relaying reads to the underlying table. For that reason, the "memtables don't need a priority class" argument gets obviously broken. We need to pass a priority class for its reader as well. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-02-24 18:00:34 -05:00
Tomasz Grabiec	d81a46d7b5	column_family: Add schema setters There is one current schema for given column_family. Entries in memtables and cache can be at any of the previous schemas, but they're always upgraded to current schema on access.	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	8a05b61d68	memtable: Read under _read_section	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	5184381a0b	memtable: Deconstify memtable in readers We want to upgrade entries on read and for that we need mutating permission.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	32ac2ccc4a	memtable: Introduce apply(memtable&)	2015-11-29 16:25:21 +01:00
Avi Kivity	a40a62d840	memtable: use allocating_section to guard allocations Without this, an allocation can fail, and we may not be able to reclaim memory.	2015-11-16 10:56:06 +02:00
Paweł Dziepak	b0edaa5bb7	memtable: add as_key_source() Needed only for tests. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-20 20:27:53 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Amnon Heiman	c102031fe2	memtable: Expose the region object This adds a const getter for the region object Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-09-10 00:20:26 +03:00
Tomasz Grabiec	a0c180ef49	memtable: Fix flush in the middle of scanning bug Fixes #309. When scanning memtable readers detect is was flushed, which means that it started to be moved to cache, they fall back to reading from memtable's sstable. Eventually what we should do is to combine memtable and cache contents so that as long as data is not evicted we won't do IO. We do not support scanning in cache yet though, so there is no point in doing this now, and it is not trivial.	2015-09-09 10:17:35 +02:00
Tomasz Grabiec	d20fae96a2	lsa: Make reclaimer run synchronously with allocations The goal is to make allocation less likely to fail. With async reclaimer there is an implicit bound on the amount of memory that can be allocated between deferring points. This bound is difficult to enforce though. Sync reclaimer lifts this limitation off. Also, allocations which could not be satisfied before because of fragmentation now will have higher chances of succeeding, although depending on how much memory is fragmented, that could involve evicting a lot of segments from cache, so we should still avoid them. Downside of sync reclaiming is that now references into regions may be invalidated not only across deferring points but at any allocation site. compaction_lock can be used to pin data, preferably just temporarily.	2015-08-31 21:50:18 +02:00
Tomasz Grabiec	ff8c81b25f	memtable: Encapsulate unsafe accessors	2015-08-31 21:50:17 +02:00
Tomasz Grabiec	d9ce307c6a	memtable: Add non-const partition_entry::key() variant Helps moving from memtable to cache.	2015-08-31 14:54:26 +03:00
Avi Kivity	1579f86503	memtable: keep the lsa region alive while partitions are being destroyed Or we get a use-after-free. Reported by Pekka.	2015-08-20 15:32:30 +03:00
Avi Kivity	c175025bb6	db: place all memtables into a single region_group We can use this to track the amount of unevictable memory in the system.	2015-08-19 19:36:41 +03:00
Tomasz Grabiec	3b92ba2857	db: Add memtable flush logging	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	cda31eccf7	db: Use LSA to allocate data inside memtable	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	fe4c75dee6	memtable: Remove unused find_partition()	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	1046ee6e80	memtable: Remove all_partitions() Preferred way to access the memtable is via reader.	2015-08-06 14:05:16 +02:00
Avi Kivity	182d5ab798	memtable: fix memory leak Since memtable::partitions is now an intrusive_set, it must be cleared explicitly, or memory is leaked.	2015-07-26 20:01:50 +03:00
Tomasz Grabiec	9c4956c5dc	memtable: Use boost::intrusive_set<> to store partition entries So that we can use heterogenous comparators. For range queries we will need to compare keys with ring_position.	2015-07-22 13:14:33 +02:00
Tomasz Grabiec	da937897cf	memtable: Introduce as_data_source()	2015-07-09 19:46:29 +02:00
Tomasz Grabiec	51cae834e3	db: Put all sstables behind single reader This change abstracts reading from on-disk data sources behind a single reader which is then composed with memtable readers. This change also abstracts all data sources behind a single reader obtained via column_family::make_reader(). That reader is then used by algorithms like column_family::for_all_partitions() or column_family::query(). Having those abstractions will make it easier to add row cache, because it will be encapsulated in a single place.	2015-06-18 16:33:33 +02:00
Tomasz Grabiec	bc468f9a0e	memtable: Make memtable inherit from enable_lw_shared_from_this	2015-06-18 15:48:21 +02:00
Tomasz Grabiec	7f1ff0401e	db: Move mutation_reader definition to separate header	2015-06-18 15:47:40 +02:00
Nadav Har'El	78a8ac8470	Make mutation_reader usable outside database.cc The "mutation_reader" defined in database.cc is a convenient mechanism for iterating over mutations. It can be useful for more than just database.cc (I want to use it in the compaction code), so this patch moves the type's definition to mutation.hh, and the make_memtable_reader() function to memtable::make_reader() (in memtable.hh). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-16 14:03:34 +02:00
Calle Wilund	293dbf66e3	Forward and use replay_position when applying mutation * Forward commitlog replay_position to column_family.memtable, updating highest RP if needed * When flushing memtable, signal back to commitlog that RP has been dealt with to potentially remove finished segment(s) Note: since memtable flushing right now is _not_ explicitly ordered, this does not actually work, since we need to guarantee ordering with regards to RP. I.e. if we flush N blocks, we must guarantee that: a.) We report "flushed RP" in RP order b.) For a given RP1, all RP* lower than RP1 must also have been flushed. (The latter means that it is fine to say, flush X tables at the same time, as long as we report a single RP that is the highest, and no lower RP:s exist in non-flushed tables) I am however letting someone else deal with ensuring MT->sstable flush order. Signed-off-by: Calle Wilund <calle@cloudius-systems.com>	2015-06-03 12:38:13 +03:00

1 2

54 Commits