scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Glauber Costa	7146776d7c	fix sstable tests by not using the flush_reader if no region_group The latest virtual dirty patches broke the SSTable tests. The reason for this is that those tests will flush synthetic memtables that do not have a region_group attached to it. Normally in cases like this we would just give the flush_reader an empty region group. However, the memtable class constructor takes a region_group pointer and that can be null according to the interface. So we must conditionally test it. If there isn't a region_group involved, the virtual dirty accounting should be disabled: after all, we won't even have the baseline memory to begin with. One of the approaches to fix this could be to just provide null accounter classes to be used as a surrogate for the accounting classes in this case. However, since this is mostly used for tests, a much simpler way is to just revert back to the scanning reader in that case. The scanning reader is similar enough to the flush_reader, except that it can handle partial ranges, slices, and delegate accesses to an sstable post-flush. We don't need any of that, but as argued above, there is no need to remove it either. Signed-off-by: Glauber Costa <glommer@scylladb.com> Message-Id: <1475667271-60806-1-git-send-email-glommer@scylladb.com>	2016-10-05 12:44:21 +01:00
Glauber Costa	f89a67c75c	database: allow virtual dirty memory management Scylla currently suffers from a brick wall behavior of the request throttler. Requests pile up until we reach the dirty memory limit, at which point we stop serving them until we have freed enough memory to allow for more requests. The problem is that freeing dirty memory means writing an SSTable to completion. That can take a long time, even if we are blessed with great disks. Those long waiting times can and will translate into timeouts. That is bad behavior. What this patch does is introduce one form of virtual dirty memory accounting. Instead of allowing 100 % of the dirty memory to be filled up until we stop accepting requests, we will do that when we reach 50 % of memory. However, instead of releasing requests only when an SSTable is fully written, we start releasing them when some memory was written. The practical effect of that is that once we reach 50 % occupancy in our dirty memory region, we will bring the system from CPU speed to disk speed, and will start accepting requests only at the rate we are able to write memory back. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Glauber Costa	eee15578fb	memtables: split scanning reader in two The code that is common will live in its own reader, the iterator_reader. All friendly private access to memtable attributes and methods happen through the iterator reader. After this patch, we are now left with the scanning_reader - same as always, but now implemented on top of the iterator_reader, and a flush_reader, which will be used by SSTable flushes only. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Glauber Costa	16886eeb96	sstables: use special reader for writing a memtable Right now the special reader doesn't do much, but the idea is that we will soon replace it will a reader that specializes in flush, and is in turn able to provide read-side on-flush functionality like virtual dirty. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Glauber Costa	d41fcd45d1	memtables: make memtable inherit from region The LSA memory pressure mechanism will let us know which region is the best candidate for eviction when under pressure. We need to somehow then translate region -> memtable -> column family. The easiest way to convert from region to memtable, is having memtable inherit from region. Despite the fact that this requires multiple inheritance, which always raise a flag a bit, the other class we inherit from is enable_shared_from_this, which has a very simple and well defined interface. So I think it is worthy for us to do it. Once we have the memtable, grabing the column family is easy provided we have a database object. We can grab it from the schema. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:29 -04:00
Paweł Dziepak	6871bd5fa0	memtable: fully support streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:52 +01:00
Paweł Dziepak	2ab1a73efa	memtable: rename partition_entry to memtable_entry partition_entry is going to be a more general object used by both cache and memtable entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Piotr Jastrzebski	23c23abe53	Make memtable mutation_reader slice using clustering ranges. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 11:46:41 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	80ab41a715	memtable reader: also include a priority class There are situations when a memtable is already flushed but the memtable reader will continue to be in place, relaying reads to the underlying table. For that reason, the "memtables don't need a priority class" argument gets obviously broken. We need to pass a priority class for its reader as well. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-02-24 18:00:34 -05:00
Tomasz Grabiec	d81a46d7b5	column_family: Add schema setters There is one current schema for given column_family. Entries in memtables and cache can be at any of the previous schemas, but they're always upgraded to current schema on access.	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	8a05b61d68	memtable: Read under _read_section	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	5184381a0b	memtable: Deconstify memtable in readers We want to upgrade entries on read and for that we need mutating permission.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	32ac2ccc4a	memtable: Introduce apply(memtable&)	2015-11-29 16:25:21 +01:00
Avi Kivity	a40a62d840	memtable: use allocating_section to guard allocations Without this, an allocation can fail, and we may not be able to reclaim memory.	2015-11-16 10:56:06 +02:00
Paweł Dziepak	b0edaa5bb7	memtable: add as_key_source() Needed only for tests. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-10-20 20:27:53 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Amnon Heiman	c102031fe2	memtable: Expose the region object This adds a const getter for the region object Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-09-10 00:20:26 +03:00
Tomasz Grabiec	a0c180ef49	memtable: Fix flush in the middle of scanning bug Fixes #309. When scanning memtable readers detect is was flushed, which means that it started to be moved to cache, they fall back to reading from memtable's sstable. Eventually what we should do is to combine memtable and cache contents so that as long as data is not evicted we won't do IO. We do not support scanning in cache yet though, so there is no point in doing this now, and it is not trivial.	2015-09-09 10:17:35 +02:00
Tomasz Grabiec	d20fae96a2	lsa: Make reclaimer run synchronously with allocations The goal is to make allocation less likely to fail. With async reclaimer there is an implicit bound on the amount of memory that can be allocated between deferring points. This bound is difficult to enforce though. Sync reclaimer lifts this limitation off. Also, allocations which could not be satisfied before because of fragmentation now will have higher chances of succeeding, although depending on how much memory is fragmented, that could involve evicting a lot of segments from cache, so we should still avoid them. Downside of sync reclaiming is that now references into regions may be invalidated not only across deferring points but at any allocation site. compaction_lock can be used to pin data, preferably just temporarily.	2015-08-31 21:50:18 +02:00
Tomasz Grabiec	ff8c81b25f	memtable: Encapsulate unsafe accessors	2015-08-31 21:50:17 +02:00
Tomasz Grabiec	d9ce307c6a	memtable: Add non-const partition_entry::key() variant Helps moving from memtable to cache.	2015-08-31 14:54:26 +03:00
Avi Kivity	1579f86503	memtable: keep the lsa region alive while partitions are being destroyed Or we get a use-after-free. Reported by Pekka.	2015-08-20 15:32:30 +03:00
Avi Kivity	c175025bb6	db: place all memtables into a single region_group We can use this to track the amount of unevictable memory in the system.	2015-08-19 19:36:41 +03:00
Tomasz Grabiec	3b92ba2857	db: Add memtable flush logging	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	cda31eccf7	db: Use LSA to allocate data inside memtable	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	fe4c75dee6	memtable: Remove unused find_partition()	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	1046ee6e80	memtable: Remove all_partitions() Preferred way to access the memtable is via reader.	2015-08-06 14:05:16 +02:00
Avi Kivity	182d5ab798	memtable: fix memory leak Since memtable::partitions is now an intrusive_set, it must be cleared explicitly, or memory is leaked.	2015-07-26 20:01:50 +03:00
Tomasz Grabiec	9c4956c5dc	memtable: Use boost::intrusive_set<> to store partition entries So that we can use heterogenous comparators. For range queries we will need to compare keys with ring_position.	2015-07-22 13:14:33 +02:00
Tomasz Grabiec	da937897cf	memtable: Introduce as_data_source()	2015-07-09 19:46:29 +02:00
Tomasz Grabiec	51cae834e3	db: Put all sstables behind single reader This change abstracts reading from on-disk data sources behind a single reader which is then composed with memtable readers. This change also abstracts all data sources behind a single reader obtained via column_family::make_reader(). That reader is then used by algorithms like column_family::for_all_partitions() or column_family::query(). Having those abstractions will make it easier to add row cache, because it will be encapsulated in a single place.	2015-06-18 16:33:33 +02:00
Tomasz Grabiec	bc468f9a0e	memtable: Make memtable inherit from enable_lw_shared_from_this	2015-06-18 15:48:21 +02:00
Tomasz Grabiec	7f1ff0401e	db: Move mutation_reader definition to separate header	2015-06-18 15:47:40 +02:00
Nadav Har'El	78a8ac8470	Make mutation_reader usable outside database.cc The "mutation_reader" defined in database.cc is a convenient mechanism for iterating over mutations. It can be useful for more than just database.cc (I want to use it in the compaction code), so this patch moves the type's definition to mutation.hh, and the make_memtable_reader() function to memtable::make_reader() (in memtable.hh). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-06-16 14:03:34 +02:00
Calle Wilund	293dbf66e3	Forward and use replay_position when applying mutation * Forward commitlog replay_position to column_family.memtable, updating highest RP if needed * When flushing memtable, signal back to commitlog that RP has been dealt with to potentially remove finished segment(s) Note: since memtable flushing right now is _not_ explicitly ordered, this does not actually work, since we need to guarantee ordering with regards to RP. I.e. if we flush N blocks, we must guarantee that: a.) We report "flushed RP" in RP order b.) For a given RP1, all RP* lower than RP1 must also have been flushed. (The latter means that it is fine to say, flush X tables at the same time, as long as we report a single RP that is the highest, and no lower RP:s exist in non-flushed tables) I am however letting someone else deal with ensuring MT->sstable flush order. Signed-off-by: Calle Wilund <calle@cloudius-systems.com>	2015-06-03 12:38:13 +03:00
Avi Kivity	bbc8dbbab5	db: abstract memtable empty test	2015-05-21 15:48:51 +03:00
Avi Kivity	51e19fc090	memtable.h: add copyright statement	2015-05-18 15:59:23 +03:00
Avi Kivity	07d7f410f3	Merge branch 'memtable' into db Conflicts: database.hh memtable changes moved to memtable.hh	2015-05-18 15:50:24 +03:00
Glauber Costa	2174285c31	db: move memtable definition to its own file Following what happened to others: we can now include memtable.hh without including database.hh Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-05-17 12:38:32 +03:00

44 Commits