scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	124dde30db	sstables: Extract writer parameters into config objects Also enables users to change the default promoted index block size.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	045b9fd7c1	sstables: Calculate key hash only once during compaction Improves compaction performance.	2016-12-22 13:24:46 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Raphael S. Carvalho	fcfc84e836	compaction: reduce bloom filter overhead with incremental selector The procedure to calculate max purgeable timestamp is optimized by only visiting sstables that overlap with key being currently compacted. That's done using incremental sstable selector. Function to calculate maximum purgeable timestamp is made 10 times faster when compacting sstables overlap with 10% of all sstables. Fixes #1322. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-12-09 16:17:17 -02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Avi Kivity	02530faeb2	compaction: fix tombstones not being garbage collected during compaction `2a46410f4a` changed sstable_list from a map to a set, so it is no longer sorted by generation. The code for finding the list of sstables not being compacted relied on this sort order, and now broke, returning a longer list than needed (including some of the sstables being compacted). As a result, the compaction code preserved the tombstones, incorrectly thinking there was still live data they referenced. Fix by sorting the set explicitly. Fixes #1429. Message-Id: <1467793026-6571-1-git-send-email-avi@scylladb.com>	2016-07-06 10:22:31 +02:00
Raphael S. Carvalho	e9076f39be	compaction: implement function to get fully expired sstables Strongly based on org.apache.cassandra.db.compaction. CompactionController.getFullyExpiredSSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	8d38fa49d4	sstables: move code to get uncompacting sstables to a function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:33:55 -03:00
Avi Kivity	e22517bafc	Merge "Optimize reads from leveled sstables" In a leveled column family, there can be many thousands of sstables, since each sstable is limited to a relatively small size (160M by default). With the current approach of reading from all sstables in parallel, cpu quickly becomes a bottleneck as we need to check the bloom filter for each of these sstables. This patch addresses the problem by introducing a compaction-strategy-specific data structure for holding sstables. This data structure has a method to obtain the sstables used for a read. For leveled compaction strategy, this data structure is an interval map, which can be efficiently used to select the right sstables.	2016-07-04 16:00:35 +03:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Paweł Dziepak	c2f0ee9b5f	sstables: add consumer-style sstable compactor This patch moves compaction logic to a consumer that can be used with consume_flattened_in_thread(). Internally, sstable_writer is used to write individual sstables. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	599ed7f1ed	sstables: restore indentation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	e7ff20b3bb	sstables: run compaction code inside a thread Currently, each sstable write has its separate thread. However, the goal is to have compaction use consume_flattened() with a consumer that creates and writes the sstables. consume_flattened() needs to be executed inside a thread, since sstable writer may defer. This patch is a first step in preparations and it just makes whole compaction logic run inside a thread. That makes little sense now, since all sstable writes spawn their own threads but that's going to change in the following patches. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	b6f78a8e2f	sstable: make sstable reads return streamed_mutation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	737eb73499	mutation_reader: make readers return streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Vlad Zolotarov	baf3614e8f	sstables: don't backup sstables that are a result of a compaction According to incremental backup description (http://docs.datastax.com/en/cassandra_win/2.2/cassandra/operations/opsBackupIncremental.html) sstables that are a result of a compaction process should not be backed up since original sstables had already been backed up. Fixes #1308 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1466338622-7323-1-git-send-email-vladz@cloudius-systems.com>	2016-06-20 09:52:30 +03:00
Raphael S. Carvalho	29db5f5e1f	sstables: move compaction strategy code to a new source file Moving compaction strategy code from sstables/compaction.cc to sstables/compaction_strategy.cc That improves readability. Strategy code should be separated from the generic compaction code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <5af6fc8f7321351a071fc0ce03c80ffea21f8396.1460761821.git.raphaelsc@scylladb.com>	2016-04-19 08:45:43 +03:00
Pekka Enberg	3f2286d02e	Merge "Delete compacted sstables atomically" from Avi "If we compact sstables A, B into a new sstable C we must either delete both A and B, or none of them. This is because a tombstone in B may delete data in A, and during compaction, both the tombstone and the data are removed. If only B is deleted, then the data gets resurrected. Non-atomic deletion occurs because the filesystem does not support atomic deletion of multiple files; but the window for that is small and is not addressed in this patchset. Another case is when A is shared across multiple shards (as is the case when changing shard count, or migrating from existing Cassandra sstables). This case is covered by this patchset. Fixes #1181."	2016-04-14 22:04:15 +03:00
Avi Kivity	a843aea547	db: delete compacted sstables atomically If sstables A, B are compacted, A and B must be deleted atomically. Otherwise, if A has data that is covered by a tombstone in B, and that tombstone is deleted, and if B is deleted while A is not, then the data in A is resurrected. Fixes #1181.	2016-04-14 17:14:26 +03:00
Raphael S. Carvalho	c28d168619	sstables: allow user to specify max sstable size with leveled strategy This change will allow user to specify the maximum size of a new sstable created as a result of leveled compaction. Example of using this setting: ALTER TABLE ks.test5 with compaction = {'sstable_size_in_mb': '1000', 'class': 'LeveledCompactionStrategy'} Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <ebb9844401af74388bda12586c2435283f6d8db8.1460486043.git.raphaelsc@scylladb.com>	2016-04-13 09:13:33 +03:00
Raphael S. Carvalho	8fe7524e46	sstables: enable leveled strategy feature to prevent L0 from falling behind If level 0 falls behind, size tiered strategy is used on it to reduce overhead until we can catch up on the higher levels. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <17bf15b7d12cd5dc652cc92939c0c68f921662a2.1459976469.git.raphaelsc@scylladb.com>	2016-04-11 11:52:00 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	d5c1366e85	compaction: be verbose about which table is causing an exception When we, for some reason, fail to compact an SSTable, we do not log the file name leaving us with cryptic messages that tell us what happened, but not where it happened. This patch adds logging in compaction so that we'll know what's going on. Please note that readers are more of a concern, because the SSTable being written technically do not exist yet. Still, better safe than sorry: if open_data fails, or we leave an unfinished SSTable, it is still good to know which one was the culprit. Some argument can be made about whether we should log this at the lower SSTable level, or at the compaction level. The reason I am logging this at the compaction level, is that we don't really know which exception will trigger, and where: it may be the case that we're seeing exceptions that are not SSTable specific, and may not have the chance to log it properly. In particular, if the exception happens inside the reader: read_rows() and friends only return a mutation reader, which doesn't really do anything until we call read(). But at that time, we don't hold any pointers to the SSTable anymore. In Summary, logging at the compaction level guarantees that we always do it no matter what. Exceptions that are part of the main SSTable path can log the file name as well if they want: if that's the case, we'll be left with the name appearing twice. That's totally harmless, and better than none. Fixes #1123 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <c5c969fb6aeb788a037bd7a4ea69979c1042cb34.1459263847.git.glauber@scylladb.com>	2016-03-29 18:15:56 +03:00
Raphael S. Carvalho	bb48f1b06c	sstables: use system clock's epoch for timestamp in compaction history As pointed out by Tomek, the type of column used is timestamp, therefore system's clock epoch (db_clock) should be used instead. Fixes #817. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <f80f9f411d673cf2d653e193ccb8ebaa36bc891b.1456317766.git.raphaelsc@scylladb.com>	2016-02-24 14:49:21 +02:00
Raphael S. Carvalho	9cb8a43684	start using notation ks.cf everywhere Some places were using the notation ks/cf to represent a keyspace and column family pair. ks.cf is the notation used by C*, so we should use it everywhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <939449af92565b79d1823890784dc4d1dc3cdc84.1455830989.git.raphaelsc@scylladb.com>	2016-02-21 11:15:09 +02:00
Raphael S. Carvalho	ed61fe5831	sstables: make compaction stop report user-friendly When scylla stopped an ongoing compaction, the event was reported as an error. This patch introduces a specialized exception for compaction stop so that the event can be handled appropriately. Before: ERROR [shard 0] compaction_manager - compaction failed: read exception: std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.) After: INFO [shard 0] compaction_manager - compaction info: Compaction for keyspace1/standard1 was stopped due to shutdown. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>	2016-02-11 12:16:53 +02:00
Raphael S. Carvalho	a46aa47ab1	make sstables::compact_sstables return list of created sstables Now, sstables::compact_sstables() receives as input a list of sstables to be compacted, and outputs a list of sstables generated by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0d8397f0395ce560a7c83cccf6e897a7f464d030.1454110234.git.raphaelsc@scylladb.com>	2016-01-31 12:39:20 +02:00
Raphael S. Carvalho	ee84f310d9	move deletion of sstables generated by interrupted compaction This deletion should be handled by sstables::compact_sstables, which is the responsible for creation of new sstables. It also simplifies the code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>	2016-01-31 12:39:20 +02:00
Raphael S. Carvalho	45c446d6eb	compaction: pass dht::token by reference Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-27 13:25:41 -02:00
Raphael S. Carvalho	fc541e2f08	compaction: remove code to sort local ranges storage_service::get_local_ranges returns sorted ranges, which are not overlapping nor wrap-around. As a result, there is no need for the consumer to do anything. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-27 13:15:36 -02:00
Glauber Costa	3f94070d4e	use auto&& instead of auto& for priority classes. By Avi's request, who reminds us that auto& is more suited for situations in which we are assigning to the variable in question. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>	2016-01-26 17:00:20 +02:00
Glauber Costa	b63611e148	mark I/O operations with priority classes After this patch, our I/O operations will be tagged into a specific priority class. The available classes are 5, and were defined in the previous patch: 1) memtable flush 2) commitlog writes 3) streaming mutation 4) SSTable compaction 5) CQL query Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Pekka Enberg	b5833e8002	Merge "Enable incremental backups option" from Vlad "This series moves the "backup" logic into the sstable::write_components() methods, adds a support for enabling backup for sstables flushed in the compaction flow (in addition to a regular flushing flow which had this support already) and enables the "incremental_backups" configuration option." I fixed up a merge conflict with commit `5e953b5` ("Merge "Add support to stop ongoing compaction" from Raphael").	2016-01-21 18:52:07 +02:00
Vlad Zolotarov	c2ab54e9c7	sstables flushing: enable incremental backup (if requested) Enable incremental backup when sstables are flushed if incremental backup has been requested. It has been enabled in the regular flushing flow before but wasn't in the compaction flow. This patch enables it in both places and does it using a backup capability of sstable::write_components() method(s). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-01-21 12:13:20 +02:00
Raphael S. Carvalho	3bd240d9e8	compaction: add ability to stop an ongoing compaction That's needed for nodetool stop, which is called to stop all ongoing compaction. The implementation is about informing an ongoing compaction that it was asked to stop, so the compaction itself will trigger an exception. Compaction manager will catch this exception and re-schedule the compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Raphael S. Carvalho	ec4c73d451	compaction: rename compaction_stats to compaction_info compaction_info makes more sense because this structure doesn't only store stats about ongoing compaction. Soon, we will add information to it about whether or not an user asked to stop the respective ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Raphael S. Carvalho	0c67b1d22b	compaction: filter out mutation that doesn't belong to shard When compacting sstable, mutation that doesn't belong to current shard should be filtered out. Otherwise, mutation would be duplicated in all shards that share the sstable being compacted. sstable_test will now run with -c1 because arbitrary keys are chosen for sstables to be compacted, so test could fail because of mutations being filtered out. fixes #527. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>	2016-01-19 10:16:41 +01:00
Raphael S. Carvalho	d44a5d1e94	compaction: filter out compacting sstables The implementation is about storing generation of compacting sstables in an unordered set per column family, so before strategy is called, compaction manager will filter out compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 01:18:29 -02:00
Raphael S. Carvalho	9c13c1c738	compaction: move compaction execution from strategy to manager Currently, compaction strategy is the responsible for both getting the sstables selected for compaction and running compaction. Moving the code that runs compaction from strategy to manager is a big improvement, which will also make possible for the compaction manager to keep track of which sstables are being compacted at a moment. This change will also be needed for cleanup and concurrent compaction on the same column family. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 00:04:27 -02:00
Raphael S. Carvalho	ed80ed82ef	sstables: prepare compact_sstables to work with cleanup Cleanup is about rewriting a sstable discarding any keys that are irrelevant, i.e. keys that don't belong to current node. Parameter cleanup was added to compact_sstables. If set to true, irrelevant code such as the one that updates compaction history will be skipped. Logic was also added to discard irrelevant keys. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-11 21:43:40 -02:00
Raphael S. Carvalho	b7d36af26f	compaction: fix max_purgeable calculation max_purgeable was being incorrectly calculated because the code that creates vector of uncompacted sstables was wrong. This value is used to determine whether or not a tombstone can be purged. Operand < is supposed to be used instead in the callback passed as third parameter to boost::set_difference. This fix is a step towards closing the issue #676. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-29 09:30:08 +02:00
Vlad Zolotarov	33552829b2	core: use steady_clock where monotinic clock is required Use steady_clock instead of high_resolution_clock where monotonic clock is required. high_resolution_clock is essentially a system_clock (Wall Clock) therefore may not to be assumed monotonic since Wall Clock may move backwards due to time/date adjustments. Fixes issue #638 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-12-27 18:07:53 +02:00
Pekka Enberg	40e8a9c99c	sstables/compaction: Fix compilation error with GCC 4.9.2 I am sure it's a compiler issue but I am not ready to give up and upgrade just yet: sstables/compaction.cc:307:55: error: converting to ‘std::unordered_map<int, long int>’ from initializer list would use explicit constructor ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map(std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type, const hasher&, const key_equal&, const allocator_type&) [with _Key = int; _Tp = long int; _Hash = std::hash<int>; _Pred = std::equal_to<int>; _Alloc = std::allocator<std::pair<const int, long int> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type = long unsigned int; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::hasher = std::hash<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::key_equal = std::equal_to<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::allocator_type = std::allocator<std::pair<const int, long int> >]’ stats->start_size, stats->end_size, {});	2015-12-16 10:03:14 +02:00
Raphael S. Carvalho	193ede68f3	compaction: register and deregister compaction_stats That's important for compaction stats API that will need stats data of each ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:32 -02:00
Raphael S. Carvalho	1fba394dd0	sstables: store keyspace and cf in compaction_stats The reason behind this change is that we will need ks and cf for the compaction stats API. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:02 -02:00
Raphael S. Carvalho	ac1a67c8bc	sstables: move compaction_stats to header file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:49:45 -02:00
Raphael S. Carvalho	a2fb0ec9a3	sstables: update compaction history at the end of compaction When compaction job finishes, call function to update the system table COMPACTION_HISTORY. That's also needed for the compaction history API. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-14 14:20:03 -02:00

1 2

91 Commits