scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	388315c1ff	sstables: Expose index metrics	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	1dbd2e239e	sstables: index_reader: Share index lists among other index readers Direct motivation for this is to be able to use two index readers from a single mutation reader, one for lower bound of the range and one for the upper bound of the range, without sacrificing optimization of avoiding index reads when forwarding to partition ranges which are close by. After the change, all index readers of given sstable will share index buffers, so lower bound reader can reuse the page read by the upper bound reader. The reason for using two readers will be so that we are able to skip inside the partition range, not only outside of it. This is not possible if we use the same index reader to locate the upper bound of the range, because we may only advance the cursor.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	705bd6da1a	sstables: Remove unused method	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	124dde30db	sstables: Extract writer parameters into config objects Also enables users to change the default promoted index block size.	2017-03-10 14:42:22 +01:00
Gleb Natapov	0977f4fdf8	sstable: close sstable_writer's file if writing of sstable fails. Failing to close a file properly before destroying file's object causes crashes. [tgrabiec: fixed typo] Message-Id: <20170221144858.GG11471@scylladb.com>	2017-02-21 18:17:47 +01:00
Paweł Dziepak	83c6fc1114	sstables: write counter cells	2017-02-02 10:35:14 +00:00
Tomasz Grabiec	dd0fb48564	sstables: Close _file even if random_access_reader::close() reports errors close() operation is like a destructor, it cannot fail. It just reports errors, but close itself succeeds. So we should proceed with the closing even if it fails. Message-Id: <1484245886-7269-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 12:41:55 +00:00
Tomasz Grabiec	33e1f9af6b	sstables: Close input_stream from random_access_reader Spotted by destroy-without-close detector. Message-Id: <1484072527-13058-1-git-send-email-tgrabiec@scylladb.com>	2017-01-11 09:40:00 +00:00
Raphael S. Carvalho	68dfcf5256	db: avoid excessive memory usage during resharding After resharding, sstables may be owned by all shards, which means that file descriptors and memory usage for metadata will increase by a factor equal to number of shards. That can easily lead to OOM. SSTable components are immutable, so they can be stored in one shard and shared with others that need it. We use the following formula to decide which shard will open the sstable and share it with the others: (generation % smp::count), which is the inverse of how we calculate generation for new sstables. So if no resharding is performed, everything is shard-local. With this approach, resource usage due to loaded sstables will be evenly distributed among shards. For this approach to work, we now only populate keyspaces from shard 0. It's now the sole responsible for iterating through column family dirs. In addition, most of population functions are now free and take distributed database object as parameter. Fixes #1951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-09 15:24:36 -02:00
Raphael S. Carvalho	eed2a7d065	sstables: group sstable components that can be shared among shards We intend to share immutable sstable components among shards to reduce excessive memory usage when resharding shared sstables. This change is about grouping those components into a structure, and using foreign ptr to make sure that the structure will be deleted by whichever shard created it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:19 -02:00
Raphael S. Carvalho	a492f8dfaf	sstables: rename sstable member Rename _components to _recognized_components because _components will be used to name a field with shareable components. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:17 -02:00
Tomasz Grabiec	f2a63270d1	sstables: Fix double close on index and data files when writing fails file output streams take the responsibility of closing the file, they will close the file as part of closing the stream. During sstable writing we create sstable object and keep file references there as well. Sstable object also has responsibility for closing the files, and does so from sstable::~sstable(). Double close was supposed to be avoided by a construct like this: writer.close().get(); _file = {}; However if close() failed, which can happen when write-ahead failed, _file would not be cleared, and both the writer and sstable would close the file. This will result in a crash in append_challenged_posix_file_impl::close(), which is not prepared to be closed twice. Another problem is that if exception happened before we reached that construct, we still should close the writer. Currently we don't, so there's no double close on the file, but that's a bug which needs to be fixed and once that's fixed double close on _file will be even more likely. The fix employed here is to not keep files inside sstable object when writing. As soon as the writer is constructed, it's the only owner of the file. Fixes #1764. Message-Id: <1482428648-22553-1-git-send-email-tgrabiec@scylladb.com>	2016-12-23 11:44:43 +02:00
Tomasz Grabiec	0e487b3499	db: Compute key hash once in partition_presence_checker I measured reduction of cache update time by 20% for 6 sstables and by 40% for 16. Refs #1943.	2016-12-19 14:20:58 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Avi Kivity	872b5ef5f0	sstables: fix probe with Unknown component Commit `53b7b7def3` ("sstables: handle unrecognized sstable component") ignores unrecognized components, but misses one code path during probe_file(). Ignore unrecognized components there too. Fixes #1922. Message-Id: <20161208131027.28939-1-avi@scylladb.com>	2016-12-08 15:24:25 +01:00
Avi Kivity	5530a61975	stables: fix build with older boost (boost::variant::get<T&>) Older boost doesn't support boost::variant::get<T&> (where the type parameter is reference qualified); remove (unneeded anyway).	2016-12-08 10:56:05 +02:00
Avi Kivity	3c3a18f222	sstables: move sharding metadata from Statistics component to a new Scylla component The Cassandra derived sstable tools (and likely Cassandra itself) object to a new sub-component in the Statistics component; create a new Scylla component instead to host this data.	2016-12-07 15:20:13 +02:00
Avi Kivity	24140ec8c6	sstables: add support for sets of discriminated union types Allow declaring discriminated unions (with an enum type as the discriminant and any sstable serializable type as a value) and sets of these unions, with the disciminant as the key. Parsers and writers are auto-generated.	2016-12-07 13:27:52 +02:00
Raphael S. Carvalho	38743c1948	sstables: provide write time of data component Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <59686148149f2159990329775e0cd8780bc54254.1480533805.git.raphaelsc@scylladb.com>	2016-12-01 11:19:57 +02:00
Raphael S. Carvalho	4781b6eb71	sstables: use nonwrapping_range::make to avoid compilation issues GCC 5.3.1 was unable to convert bound to optional<bound>. sstables/sstables.cc:2494:123: error: no matching function for call to ‘nonwrapping_range<dht::ring_position>::nonwrapping_range(dht::ring_position, dht::ring_position)’ (dtr.right.exclusive ? dht::ring_position::starting_at : dht::ring_position::ending_at)(std::move(t2))); In file included from ./dht/i_partitioner.hh:52:0, from ./query-request.hh:28, from ./clustering_key_filter.hh:27, from sstables/sstables.hh:35, from sstables/sstables.cc:38: ./range.hh:441:14: note: candidate: nonwrapping_range<T>::nonwrapping_range( const wrapping_range<U>&) [with T = dht::ring_position] explicit nonwrapping_range(const wrapping_range<T>& r) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <95bbf984cd73a61739c8da99cf6cd5e94f1d1457.1479954360.git.raphaelsc@scylladb.com>	2016-11-24 11:26:16 +02:00
Avi Kivity	98a4544e1c	sstables: add method to get sstable owning shards from an unloaded sstable When we load an sstable, we don't know beforehand which shards it belongs to; we don't want to open it until we do. Add a method that allows us to read just the sharding data, without opening anything else.	2016-11-22 21:52:23 +02:00
Avi Kivity	bdd11648ac	sstables: add intra-node sharding metadata Add a metadata component that describes token ranges that are spanned by this sstable. With the current sharding algorithm, where each shard owns a single token range, the first/last partition key is sufficient to describing sharding information, but for multi-range algorithms, this is not sufficient.	2016-11-22 21:44:25 +02:00
Avi Kivity	316ef1d70a	sstables: automate writing statistics components Add a virtual funnction to metadata_base so we can loop over statistics components when writing them.	2016-11-22 21:05:06 +02:00
Avi Kivity	d05b22e502	sstables: automatically calculate offsets in statistics Instead of calculating the offset for each statistic component manually, use a loop to iterate over all components, accumulating the offset as we go along.	2016-11-22 20:35:24 +02:00
Avi Kivity	3c06ffac9d	sstables: const correctness for the write(file_writer&, T&) functions write() doesn't need to change its input; so change it to const. The only snag is that describe_type() isn't and can't be made const-correct, so cheat when it is called and const_cast the input. This helps in writing a generic serialized_size() that is const correct, in the next patch.	2016-11-22 20:04:27 +02:00
Raphael S. Carvalho	3dc9294023	db: do not leak deleted sstable when deletion triggers an exception The leakage results in deleted sstables being opened until shutdown, and disk space isn't released. That's because column_family::rebuild_sstable_list() will not remove reference to deleted sstables if an exception was triggered in sstables::delete_atomically(). A sstable only has its files closed when its object is destructed. The exception happens when a major compaction is issued in parallel to a regular one, and one of them will be unable to delete a sstable already deleted by the other. That results in remove_by_toc_name() triggering boost::filesystem ::filesystem_error because TOC and temporary TOC don't exist. We wouldn't have seen this problem if major compaction were going through compaction manager, but remove_by_toc_name() and rebuild_sstable_list() should be made resilient. Fixes #1840. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d43b2e78f9658e2c3c5bbb7f813756f18874bf92.1479390842.git.raphaelsc@scylladb.com>	2016-11-17 17:46:36 +02:00
Gleb Natapov	c052a1bc4f	sstable: use schema's min_index_interval config when generating missing summary Message-Id: <20161116181937.GA25303@scylladb.com>	2016-11-17 15:24:03 +02:00
Gleb Natapov	ae0a2935b4	sstables: fix ad-hoc summary creation If sstable Summary is not present Scylla does not refuses to boot but instead creates summary information on the fly. There is a bug in this code though. Summary files is a map between keys and offsets into Index file, but the code creates map between keys and Data file offsets instead. Fix it by keeping offset of an index entry in index_entry structure and use it during Summary file creation. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20161116165421.GA22296@scylladb.com>	2016-11-17 11:05:23 +02:00
Avi Kivity	f10b9906d8	sstables: move atomic deletion code to its own files This will simplify unit testing. We move generic code that depends only on seastar, so compile time should not increase too much.	2016-11-04 15:47:35 +02:00
Avi Kivity	9e85653c33	sstables: make atomic_deletion_manager more abstract Make the shard count and method of deleting sstables abstract, in order not to require all that machinery for unit tests.	2016-11-04 15:44:09 +02:00
Avi Kivity	e527da1e3c	sstables: wrap atomic deletion code in a class This makes it easier to abstract and unit-test.	2016-11-04 15:44:07 +02:00
Avi Kivity	a05837936a	sstables: remove quadratic behavior from atomic sstable deletions In order to ensure exception safety, the atomic sstable deletion code creates a copy of the list of sstables pending deletion, modifies that copy, and then replaces the original data with the copy. This guarantees that any exception does not change the data, since the assignment does not require allocation. However, it does result in quadratic behavior. During startup, all sstables are loaded on each shard, and each shard deletes sstables that are do not have any partitions served by that shard; this results in almost all sstables being deleted from all shards, with all that work going to shard 0; the list grows to O(nr sstables), and there are O((nr sstables) * (nr shards)) operations to perform. Fix by replacing the copy-modify-assign method with an in-place update, but one that is designed to only commit changes after all allocations have been made; in addition, instead of using a list, use a hash table, removing another source of quadratic behavior. Fixes #1812 (the quadratic beahvior part).	2016-11-04 15:42:44 +02:00
Raphael S. Carvalho	53b7b7def3	sstables: handle unrecognized sstable component As in C*, unrecognized sstable components should be ignored when loading a sstable. At the moment, Scylla fails to do so and will not boot as a result. In addition, unknown components should be remembered when moving a sstable or changing its generation. Fixes #1780. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com>	2016-11-02 12:44:53 +02:00
Raphael S. Carvalho	a3e065da9b	db: make it possible to use custom error handler with io checker By default, io checker will cause Scylla to shutdown if it finds specific system errors. Right now, io checker isn't flexible enough to allow a specialized handler. For example, we don't want to Scylla to shutdown if there's an permission problem when uploading new files from upload dir. This desired flexibility is made possible here by allowing a handler parameter to io check functions and also changing existing code to take advantage of it. That's a step towards fixing #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 15:54:21 -02:00
Raphael S. Carvalho	975ce62dbc	sstables: do not swallow exception when reading TOC That caused problem when refreshing a sstable with bad permissions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <48e5322c53234209e55da05c64c99b8ec4e190a3.1477372974.git.raphaelsc@scylladb.com>	2016-10-25 12:21:32 +03:00
Paweł Dziepak	ab0eeae82d	sstables: keep separate stream history for single and range reads Single partition and partition range reads are expected to behave considerably different so it is worth to have them use separate file stream history. This also makes reads use different history for each sstable which is also a good thing. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	c63e88d556	sstables: implement mutation_reader::impl::fast_forward_to() This patch allows sstable readers to be fast forwarded without making it necessary to recreate the reader (and dropping all buffers in the process). It is built on top of index_reader and ability of data_consume_context to be fast forwarded. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	a530762277	sstables: introduce index_reader index_reader is a helper that implements index lookups. Its goal is to avoid dropping read buffers if they still may be needed (for example to get end bound of the range or after fast forwarding the reader). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	0bc873ace5	sstables: add fast_forward_to() to continuous_data_consumer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Duarte Nunes	c36dbaf0f1	sstables: Add function to get key samples This patch implements the get_key_samples() function, on which a future patch will base an implementation of the describe_splits() thrift verb closer to Cassandra's. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 19:50:14 +02:00
Duarte Nunes	ceed09b23e	sstables: Get estimates for a particular range This patch adds the estimated_keys_for_range() function, which estimates the number of keys present between the specified range. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:52:15 +02:00
Glauber Costa	7146776d7c	fix sstable tests by not using the flush_reader if no region_group The latest virtual dirty patches broke the SSTable tests. The reason for this is that those tests will flush synthetic memtables that do not have a region_group attached to it. Normally in cases like this we would just give the flush_reader an empty region group. However, the memtable class constructor takes a region_group pointer and that can be null according to the interface. So we must conditionally test it. If there isn't a region_group involved, the virtual dirty accounting should be disabled: after all, we won't even have the baseline memory to begin with. One of the approaches to fix this could be to just provide null accounter classes to be used as a surrogate for the accounting classes in this case. However, since this is mostly used for tests, a much simpler way is to just revert back to the scanning reader in that case. The scanning reader is similar enough to the flush_reader, except that it can handle partial ranges, slices, and delegate accesses to an sstable post-flush. We don't need any of that, but as argued above, there is no need to remove it either. Signed-off-by: Glauber Costa <glommer@scylladb.com> Message-Id: <1475667271-60806-1-git-send-email-glommer@scylladb.com>	2016-10-05 12:44:21 +01:00
Glauber Costa	16886eeb96	sstables: use special reader for writing a memtable Right now the special reader doesn't do much, but the idea is that we will soon replace it will a reader that specializes in flush, and is in turn able to provide read-side on-flush functionality like virtual dirty. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Paweł Dziepak	eb59b4c4ab	keys: disable constructing from generic range stdx::optional<T> uses quite elaborate std::enable_if_t magic to decide whether the argument passed to its constructor should be used for a call T constructor or stdx::optional<T> constructor. Apparently, with GCC 6.2 having T constructor which accepts any type confuses that magic and we end up with compile errors. The solution is to have from_range() method that replaces that constructor from range. There is also constructor that creates a key from std::vector<bytes> so that code generated by IDL works as it did before. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1474550971-15309-1-git-send-email-pdziepak@scylladb.com>	2016-09-24 18:57:01 +03:00
Raphael S. Carvalho	cfe7419f0f	sstables: update or remove some outdated comments Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <74bae503447da2544a005e29b7d3aafa9f6e8c90.1474383273.git.raphaelsc@scylladb.com>	2016-09-24 18:53:19 +03:00
Raphael S. Carvalho	0eaa0f46c9	sstables: store first and last decorated keys in sstable object leveled strategy uses heavily first and last decorated keys of a sstable to get overlapping sstables in a given level. By storing first and last decorated keys in sstable object, it's expected that performance of leveled strategy (not compaction) will be improved. We will set first and last keys in sstable when either loading or sealing it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Raphael S. Carvalho	dffb41f9d8	sstables: remove schema parameter from some sstable methods schema can now be found in the sstable object itself. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00

1 2 3 4 5 ...

363 Commits