scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 23:13:15 +00:00

Author	SHA1	Message	Date
Avi Kivity	299d1fad0b	Merge "reduce bloom filter overhead in compaction" from Raphael "Function to calculate maximum purgeable timestamp is made 10 times faster when compacting sstables overlap with 10% of all sstables. That's possible with an incremental selector that will incrementally select sstables based on key being compacted. Currently, we iterate through all non-compacting sstables and consult their bloom filter to determine max purgeable timestamp, and that will be very expensive for compactions that are frequently deciding whether or not to purge tombstones." * 'filter_overhead_fix_v4' of github.com:raphaelsc/scylla: compaction: reduce bloom filter overhead with incremental selector tests: add test for sstable set's incremental selector sstable_set: introduce incremental selector compatible_ring_position: add function to return token	2016-12-11 09:46:58 +02:00
Glauber Costa	5803957ab5	compaction: fix build Commit `732ee275` moved tracking of one statistics value inside a lambda without capturing this in that lambda. Compilation fails as a result. Signed-off-by: Glauber Costa <glauber@scylladb.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <68860640f4533dd43e43f341f1620e25464b700b.1481313455.git.glauber@scylladb.com>	2016-12-10 09:00:20 +02:00
Raphael S. Carvalho	fcfc84e836	compaction: reduce bloom filter overhead with incremental selector The procedure to calculate max purgeable timestamp is optimized by only visiting sstables that overlap with key being currently compacted. That's done using incremental sstable selector. Function to calculate maximum purgeable timestamp is made 10 times faster when compacting sstables overlap with 10% of all sstables. Fixes #1322. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-12-09 16:17:17 -02:00
Raphael S. Carvalho	02541e15c1	sstable_set: introduce incremental selector Incrementally select sstables from sstable set using token in ascending order. For leveled strategy, it returns all sstables that belong to current interval. For other strategies, it just return all sstables from the set. Useful for compaction which needs all sstables that overlap with key being currently compacted to calculate maximum purgeable timestamp. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-12-09 16:17:16 -02:00
Raphael S. Carvalho	732ee275f8	compaction: fix running compaction counter when splitting sstables The counter was being increased before taking the semaphore, so every pending split would count as a running compaction which misleads the user as a result. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <f2050cc3599cee7af29d4579368a154708b37731.1481248048.git.raphaelsc@scylladb.com>	2016-12-09 15:01:43 +02:00
Avi Kivity	872b5ef5f0	sstables: fix probe with Unknown component Commit `53b7b7def3` ("sstables: handle unrecognized sstable component") ignores unrecognized components, but misses one code path during probe_file(). Ignore unrecognized components there too. Fixes #1922. Message-Id: <20161208131027.28939-1-avi@scylladb.com>	2016-12-08 15:24:25 +01:00
Avi Kivity	5530a61975	stables: fix build with older boost (boost::variant::get<T&>) Older boost doesn't support boost::variant::get<T&> (where the type parameter is reference qualified); remove (unneeded anyway).	2016-12-08 10:56:05 +02:00
Avi Kivity	3c3a18f222	sstables: move sharding metadata from Statistics component to a new Scylla component The Cassandra derived sstable tools (and likely Cassandra itself) object to a new sub-component in the Statistics component; create a new Scylla component instead to host this data.	2016-12-07 15:20:13 +02:00
Avi Kivity	24140ec8c6	sstables: add support for sets of discriminated union types Allow declaring discriminated unions (with an enum type as the discriminant and any sstable serializable type as a value) and sets of these unions, with the disciminant as the key. Parsers and writers are auto-generated.	2016-12-07 13:27:52 +02:00
Raphael S. Carvalho	b30a2cb21a	lcs: generate info that preserves token distribution in higher levels The information (last compacted keys) is lost after node is restarted or schema is updated, which causes strategy to be rebuilt. We need it for strategy to guarantee uniform distribution of token range across sstables, or we could end up with 1 sstable of level L overlapping with lots of sstables of level L+1, and that results in a compaction of undesired length. That information can be generated from scratch by getting last key of newest sstable in each level > 0. Fixes #1906. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <35ebd15977d5a8418239febb160c796cdc0e98fa.1480533805.git.raphaelsc@scylladb.com>	2016-12-01 11:19:58 +02:00
Raphael S. Carvalho	38743c1948	sstables: provide write time of data component Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <59686148149f2159990329775e0cd8780bc54254.1480533805.git.raphaelsc@scylladb.com>	2016-12-01 11:19:57 +02:00
Raphael S. Carvalho	a16425833c	size_tiered: do not recreate bucket when it goes beyond max threshold Problem will cause size tiered to return small jobs when there are more than max_threshold sstables of similar size. For example, if max_threshold is 32, and there are 36 sstables of similar size, strategy will only return 4 sstables to be compacted. That's because we incorrectly create a new bucket when it meets the max threshold. What we should do is to allow buckets to grow beyond max threshold and trim them when selecting the most suitable one for compaction. Important to mention that estimation for size tiered will now work better when there are more than max_threshold sstables of similar size. Fixes #1901. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <080bad70d6cb86eaf52ac1bdd6765ac47aab5b03.1478316140.git.raphaelsc@scylladb.com>	2016-11-29 16:56:02 +02:00
Raphael S. Carvalho	4781b6eb71	sstables: use nonwrapping_range::make to avoid compilation issues GCC 5.3.1 was unable to convert bound to optional<bound>. sstables/sstables.cc:2494:123: error: no matching function for call to ‘nonwrapping_range<dht::ring_position>::nonwrapping_range(dht::ring_position, dht::ring_position)’ (dtr.right.exclusive ? dht::ring_position::starting_at : dht::ring_position::ending_at)(std::move(t2))); In file included from ./dht/i_partitioner.hh:52:0, from ./query-request.hh:28, from ./clustering_key_filter.hh:27, from sstables/sstables.hh:35, from sstables/sstables.cc:38: ./range.hh:441:14: note: candidate: nonwrapping_range<T>::nonwrapping_range( const wrapping_range<U>&) [with T = dht::ring_position] explicit nonwrapping_range(const wrapping_range<T>& r) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <95bbf984cd73a61739c8da99cf6cd5e94f1d1457.1479954360.git.raphaelsc@scylladb.com>	2016-11-24 11:26:16 +02:00
Duarte Nunes	cc3f26c993	lz4: Conditionally use LZ4_compress_default() Since not all distributions have a version of LZ4 with LZ4_compress_default(), we use it conditionally. This is specially important beginning with version 1.7.3 of LZ4, which deprecates the LZ4_compress() function in favour of LZ4_compress_default() and thus prevents Scylla from compiling due to the deprecated warning. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161124092339.23017-1-duarte@scylladb.com>	2016-11-24 11:25:03 +02:00
Avi Kivity	98a4544e1c	sstables: add method to get sstable owning shards from an unloaded sstable When we load an sstable, we don't know beforehand which shards it belongs to; we don't want to open it until we do. Add a method that allows us to read just the sharding data, without opening anything else.	2016-11-22 21:52:23 +02:00
Avi Kivity	bdd11648ac	sstables: add intra-node sharding metadata Add a metadata component that describes token ranges that are spanned by this sstable. With the current sharding algorithm, where each shard owns a single token range, the first/last partition key is sufficient to describing sharding information, but for multi-range algorithms, this is not sufficient.	2016-11-22 21:44:25 +02:00
Avi Kivity	316ef1d70a	sstables: automate writing statistics components Add a virtual funnction to metadata_base so we can loop over statistics components when writing them.	2016-11-22 21:05:06 +02:00
Avi Kivity	d05b22e502	sstables: automatically calculate offsets in statistics Instead of calculating the offset for each statistic component manually, use a loop to iterate over all components, accumulating the offset as we go along.	2016-11-22 20:35:24 +02:00
Avi Kivity	7c5e6525ef	sstables: switch statistics components to generic serialized_size() implementation	2016-11-22 20:20:38 +02:00
Avi Kivity	096ae59a5b	sstables: introduce generic serialized_size() Introduce a new function that reuses the file_writer code to compute the serialized size of an sstable object, by serializing it into memory and discarding the result.	2016-11-22 20:06:23 +02:00
Avi Kivity	3c06ffac9d	sstables: const correctness for the write(file_writer&, T&) functions write() doesn't need to change its input; so change it to const. The only snag is that describe_type() isn't and can't be made const-correct, so cheat when it is called and const_cast the input. This helps in writing a generic serialized_size() that is const correct, in the next patch.	2016-11-22 20:04:27 +02:00
Raphael S. Carvalho	3dc9294023	db: do not leak deleted sstable when deletion triggers an exception The leakage results in deleted sstables being opened until shutdown, and disk space isn't released. That's because column_family::rebuild_sstable_list() will not remove reference to deleted sstables if an exception was triggered in sstables::delete_atomically(). A sstable only has its files closed when its object is destructed. The exception happens when a major compaction is issued in parallel to a regular one, and one of them will be unable to delete a sstable already deleted by the other. That results in remove_by_toc_name() triggering boost::filesystem ::filesystem_error because TOC and temporary TOC don't exist. We wouldn't have seen this problem if major compaction were going through compaction manager, but remove_by_toc_name() and rebuild_sstable_list() should be made resilient. Fixes #1840. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d43b2e78f9658e2c3c5bbb7f813756f18874bf92.1479390842.git.raphaelsc@scylladb.com>	2016-11-17 17:46:36 +02:00
Gleb Natapov	c052a1bc4f	sstable: use schema's min_index_interval config when generating missing summary Message-Id: <20161116181937.GA25303@scylladb.com>	2016-11-17 15:24:03 +02:00
Gleb Natapov	ae0a2935b4	sstables: fix ad-hoc summary creation If sstable Summary is not present Scylla does not refuses to boot but instead creates summary information on the fly. There is a bug in this code though. Summary files is a map between keys and offsets into Index file, but the code creates map between keys and Data file offsets instead. Fix it by keeping offset of an index entry in index_entry structure and use it during Summary file creation. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20161116165421.GA22296@scylladb.com>	2016-11-17 11:05:23 +02:00
Raphael S. Carvalho	e86de40b49	compaction_manager: inform about compaction cancelled by shutdown After some changes in compaction manager, user no longer is informed that compaction was cancelled in event of shutdown. That's because we only ignore ready future when compaction manager was asked to stop. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <02ca29b5a93fe3a558896598f325b0dce069e82c.1478277317.git.raphaelsc@scylladb.com>	2016-11-14 16:37:33 +02:00
Piotr Jastrzebski	4fe989d58e	Cleanup sstables::mutation_reader::impl Pointer to sstable seems unnecessary. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <a45e8853af2b5f896ec44144fbc26d3325a5ec0c.1479123740.git.piotr@scylladb.com>	2016-11-14 11:52:52 +00:00
Avi Kivity	f10b9906d8	sstables: move atomic deletion code to its own files This will simplify unit testing. We move generic code that depends only on seastar, so compile time should not increase too much.	2016-11-04 15:47:35 +02:00
Avi Kivity	9e85653c33	sstables: make atomic_deletion_manager more abstract Make the shard count and method of deleting sstables abstract, in order not to require all that machinery for unit tests.	2016-11-04 15:44:09 +02:00
Avi Kivity	e527da1e3c	sstables: wrap atomic deletion code in a class This makes it easier to abstract and unit-test.	2016-11-04 15:44:07 +02:00
Avi Kivity	a05837936a	sstables: remove quadratic behavior from atomic sstable deletions In order to ensure exception safety, the atomic sstable deletion code creates a copy of the list of sstables pending deletion, modifies that copy, and then replaces the original data with the copy. This guarantees that any exception does not change the data, since the assignment does not require allocation. However, it does result in quadratic behavior. During startup, all sstables are loaded on each shard, and each shard deletes sstables that are do not have any partitions served by that shard; this results in almost all sstables being deleted from all shards, with all that work going to shard 0; the list grows to O(nr sstables), and there are O((nr sstables) * (nr shards)) operations to perform. Fix by replacing the copy-modify-assign method with an in-place update, but one that is designed to only commit changes after all allocations have been made; in addition, instead of using a list, use a hash table, removing another source of quadratic behavior. Fixes #1812 (the quadratic beahvior part).	2016-11-04 15:42:44 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Raphael S. Carvalho	53b7b7def3	sstables: handle unrecognized sstable component As in C*, unrecognized sstable components should be ignored when loading a sstable. At the moment, Scylla fails to do so and will not boot as a result. In addition, unknown components should be remembered when moving a sstable or changing its generation. Fixes #1780. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com>	2016-11-02 12:44:53 +02:00
Raphael S. Carvalho	a3e065da9b	db: make it possible to use custom error handler with io checker By default, io checker will cause Scylla to shutdown if it finds specific system errors. Right now, io checker isn't flexible enough to allow a specialized handler. For example, we don't want to Scylla to shutdown if there's an permission problem when uploading new files from upload dir. This desired flexibility is made possible here by allowing a handler parameter to io check functions and also changing existing code to take advantage of it. That's a step towards fixing #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 15:54:21 -02:00
Raphael S. Carvalho	bc2d351c25	sstables: remove duplicated declaration of remove_by_toc_name Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-26 11:21:27 -02:00
Piotr Jastrzebski	27726cecff	Clean up position_in_partition. Introduce position_in_partition_view and use it in position() method in mutation_fragment, range_tombstone, static_row and clustering_row. Clean up comparators in position_in_partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c65293c71a6aa23cf930ed317fb63df1fdc34fd1.1477399763.git.piotr@scylladb.com>	2016-10-25 15:13:20 +01:00
Raphael S. Carvalho	975ce62dbc	sstables: do not swallow exception when reading TOC That caused problem when refreshing a sstable with bad permissions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <48e5322c53234209e55da05c64c99b8ec4e190a3.1477372974.git.raphaelsc@scylladb.com>	2016-10-25 12:21:32 +03:00
Paweł Dziepak	ab0eeae82d	sstables: keep separate stream history for single and range reads Single partition and partition range reads are expected to behave considerably different so it is worth to have them use separate file stream history. This also makes reads use different history for each sstable which is also a good thing. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	20bfa1fa52	sstables: drop sstable::{lower, upper}_bound() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	c63e88d556	sstables: implement mutation_reader::impl::fast_forward_to() This patch allows sstable readers to be fast forwarded without making it necessary to recreate the reader (and dropping all buffers in the process). It is built on top of index_reader and ability of data_consume_context to be fast forwarded. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	a530762277	sstables: introduce index_reader index_reader is a helper that implements index lookups. Its goal is to avoid dropping read buffers if they still may be needed (for example to get end bound of the range or after fast forwarding the reader). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f49a9e0d64	sstables: drop unused read_range_rows() overload That overload was used only by unit test and violated guarantee that partition range lives until mutation reader is done. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	0bc873ace5	sstables: add fast_forward_to() to continuous_data_consumer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	2124d08b88	sstables: add skip() to compressed_file_data_source Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Duarte Nunes	c36dbaf0f1	sstables: Add function to get key samples This patch implements the get_key_samples() function, on which a future patch will base an implementation of the describe_splits() thrift verb closer to Cassandra's. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 19:50:14 +02:00
Duarte Nunes	fc07b66678	sstables/key: Add to_partition_key function Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 19:50:11 +02:00
Duarte Nunes	ceed09b23e	sstables: Get estimates for a particular range This patch adds the estimated_keys_for_range() function, which estimates the number of keys present between the specified range. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:52:15 +02:00
Duarte Nunes	8c223b31c8	sstables/key: Make key::kind public Needed to create synthetic keys without any value but with ordering properties. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:47:24 +02:00
Glauber Costa	7146776d7c	fix sstable tests by not using the flush_reader if no region_group The latest virtual dirty patches broke the SSTable tests. The reason for this is that those tests will flush synthetic memtables that do not have a region_group attached to it. Normally in cases like this we would just give the flush_reader an empty region group. However, the memtable class constructor takes a region_group pointer and that can be null according to the interface. So we must conditionally test it. If there isn't a region_group involved, the virtual dirty accounting should be disabled: after all, we won't even have the baseline memory to begin with. One of the approaches to fix this could be to just provide null accounter classes to be used as a surrogate for the accounting classes in this case. However, since this is mostly used for tests, a much simpler way is to just revert back to the scanning reader in that case. The scanning reader is similar enough to the flush_reader, except that it can handle partial ranges, slices, and delegate accesses to an sstable post-flush. We don't need any of that, but as argued above, there is no need to remove it either. Signed-off-by: Glauber Costa <glommer@scylladb.com> Message-Id: <1475667271-60806-1-git-send-email-glommer@scylladb.com>	2016-10-05 12:44:21 +01:00

1 2 3 4 5 ...

778 Commits