scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 18:40:38 +00:00

Author	SHA1	Message	Date
Nadav Har'El	c647d917e0	sstables: move to_bytes_view to header file Move the to_bytes_view(temporary_buffer<char>) function from source file to header file where is can be used in more places. This saves one use of reinterpret_cast (which we are no re-evaluating), and moreover, we want to use this function also in the promoted index code (to return a bytes_view from the promoted index which was saved as a temporary_buffer). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1468761437-27046-1-git-send-email-nyh@scylladb.com>	2016-07-17 16:29:26 +03:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Nadav Har'El	aec90a22da	sstable parsing: assert we do not lose clustering rows The sstable parsing code calls mp_row_consumer::flush() after every clustering row has been read, and this puts the now complete row in a single field "_ready". The assumption is that at this point parsing will stop, the consumer will move out this _ready (mp_row_consumer::get_mutation_fragment()) and when flush() is later called again, _ready will be empty again. This assumption is correct in our code, but is based on an intricate combination of estoreric parts of the code, such as: 1. In data_consume_row_context we stop parsing after reading the parition's header, before reading any clustering rows, giving the caller the chance to call sstable_streamed_mutation::read_next() to be prepared for the incoming mutations. 2. In mp_row_consumer::flush_if_needed(), we stop the parser after each individual clustering row. It is easy to break this assumption, and I did this in one of my code changes, and the result was silent loss of clustering rows, as "_ready" got silently overwritten before the reader had a chance to move it out. What this patch does is to add an assertion: If a clustering row is silently lost before being transferred to the mutation fragment reader, we croak. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1468389955-24600-1-git-send-email-nyh@scylladb.com>	2016-07-13 09:42:48 +01:00
Duarte Nunes	4eca7632ec	sstables: Replace composite fields with raw bytes This patch fixes a regression introduced in `f81329be60`, which made keys compound by default when using a particular ctor, in turn leading to mismatches when comparing the same key built with functions that properly consider compoundness. As a temporary fix, the sstable::key and sstable::key_view classes store raw bytes instead of a composite. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468339295-3924-1-git-send-email-duarte@scylladb.com>	2016-07-12 18:08:04 +02:00
Duarte Nunes	f81329be60	sstables: sstables::key delegates to composite The sstables::key class now delegates much of its functionality to the composite class. All existing behavior is preserved. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 23:37:33 +02:00
Duarte Nunes	ad8ff1df7e	sstables: Replace composite class This patch replaces the sstables::composite class with the one in compound_compat.hh. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-11 16:55:11 +02:00
Avi Kivity	24e3026e32	Merge "compaction manager refactoring" from Raphael	2016-07-10 17:16:23 +03:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Raphael S. Carvalho	ed5e7e6842	compaction: refactor compaction manager Previously, same function was used to handle both regular compaction and cleanup requests. That's bad because a lot of conditions were added for both compaction types to live in the same function. Now, cleanup and regular compaction will live in different functions. They share a lot of code, so helper functions were introduced. This change is also important for user-initiated compaction that will go through compaction manager in the future. Code is also a lot easier to read now. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 16:37:53 -03:00
Raphael S. Carvalho	da6a2b429d	compaction: add functions to register and deregister compacting sstables Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 16:00:51 -03:00
Raphael S. Carvalho	4d6dce8ec9	compaction: add helper function to get candidates for strategy Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:06:14 -03:00
Raphael S. Carvalho	bfc5376548	compaction: remove gate from compaction manager task There is no longer a need to use gate for regular termination of fiber that runs compaction. Now, we only set task->stopping to true, ask for compaction termination, and wait for its future to resolve. Code is simplified a lot with this change. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-08 15:05:10 -03:00
Avi Kivity	8dab93a853	sstables: fix low disk utilization with compression and small chunk lengths As Nadav notes we use the chunk length as the buffer size for the compressed stream too. Fix by using it only for the outer (uncompressed) stream; the inner (compressed) stream uses the sstable buffer size, 128 kiB. Fixes #1402. Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2016-07-07 18:13:30 +01:00
Paweł Dziepak	5bc51821fe	sstables: allow writing unsealed sstables The purpose of this patch is to split the actions of writing sstable and sealing it. As long as the sstable is unsealed it is considered incomplete and is going to be removed on reboot. Such functionality is needed in order to defer visibility of sstables created during streaming until the streaming is complete. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	a7b6c1110f	sstables: do not require seal_sstable() to be run in thread Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Raphael S. Carvalho	0772d20c60	fix compilation in debug mode build/debug/sstables/compaction_strategy.o: In function `date_tiered_manifest::date_tiered_manifest(std::map<basic_sstring<char, unsigned int, 15u>, basic_sstring<char, unsigned int, 15u>, std::less<basic_sstring<char, unsigned int, 15u> >, std::allocator<std::pair<basic_sstring<char, unsigned int, 15u> const, basic_sstring<char, unsigned int, 15u> > > > const&)': /home/centos/scylla/sstables/date_tiered_compaction_strategy.hh:67: undefined reference to `date_tiered_manifest::DEFAULT_BASE_TIME_SECONDS' That's fixed by moving definition of static constexpr outside the class. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20c16ad71f64900aa5591018bc4e976406cfebb3.1467870383.git.raphaelsc@scylladb.com>	2016-07-07 11:52:37 +03:00
Avi Kivity	02530faeb2	compaction: fix tombstones not being garbage collected during compaction `2a46410f4a` changed sstable_list from a map to a set, so it is no longer sorted by generation. The code for finding the list of sstables not being compacted relied on this sort order, and now broke, returning a longer list than needed (including some of the sstables being compacted). As a result, the compaction code preserved the tombstones, incorrectly thinking there was still live data they referenced. Fix by sorting the set explicitly. Fixes #1429. Message-Id: <1467793026-6571-1-git-send-email-avi@scylladb.com>	2016-07-06 10:22:31 +02:00
Raphael S. Carvalho	b699ef2de3	compaction: wire up date tiered compaction strategy After this commit, date tiered compaction strategy is supported on Scylla. To understand how it works, take a look at our wiki page: https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction Fixes #511. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	e5cc0cc6c4	compaction: implement date tiered compaction strategy This commit is basically about converting Java to C++. Date tiered compaction strategy isn't wired yet. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	e9076f39be	compaction: implement function to get fully expired sstables Strongly based on org.apache.cassandra.db.compaction. CompactionController.getFullyExpiredSSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	92848efc42	sstables: make overlapping functions static That's needed for a function that will get overlapping sstables to get fully expired ones. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:34:34 -03:00
Raphael S. Carvalho	8d38fa49d4	sstables: move code to get uncompacting sstables to a function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:33:55 -03:00
Raphael S. Carvalho	cc6c383249	sstables: properly keep track of max local deletion time We weren't updating max local deletion time for cells that contain ttl, or for tombstone cells. If there is a live cell with no ttl, then max local deletion time is supposed to store maximum value, which means that the sstable will not be fully expired later on. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:24 -03:00
Raphael S. Carvalho	1ecd9bdefc	sstables: fix type of max_local_deletion_time max_local_deletion_time was incorrectly using an unsigned type instead of a signed one. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:13 -03:00
Raphael S. Carvalho	f9ab94d266	compaction: import DateTieredCompactionStrategy.java File can be found at the following C* directory: src/java/org/apache/cassandra/db/compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:12:49 -03:00
Avi Kivity	cb59e724ee	Merge "Fix enabling sstable read ahead" from Paweł "This series contains remaining changes necessary to safely enable read ahead of sstables. Basically, it makes sure that input_streams are always properly closed (even in case of exception during read)."	2016-07-05 19:04:19 +03:00
Raphael S. Carvalho	43926026c3	compaction: introduce compaction strategy method to estimate pending compaction At the moment, it's not possible to know how many compaction are needed for compaction strategy to be satisfied. It's not possible to know exactly the number of pending compaction, but the strategy can provide an estimation. For size tiered, it's based on number of sstables in each bucket. By dividing bucket size by max threshold, we get number of compaction needed to compact that single bucket. For leveled, it's about the number of sstables that exceeds the limit in each level. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>	2016-07-05 19:03:11 +03:00
Paweł Dziepak	4acf77d755	sstables: drop unused data_stream_at() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:43 +01:00
Paweł Dziepak	2cdf498bbd	sstables: close input stream in sstable::data_read() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	8931b939a1	sstables: use finally() to close input streams Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Avi Kivity	e22517bafc	Merge "Optimize reads from leveled sstables" In a leveled column family, there can be many thousands of sstables, since each sstable is limited to a relatively small size (160M by default). With the current approach of reading from all sstables in parallel, cpu quickly becomes a bottleneck as we need to check the bloom filter for each of these sstables. This patch addresses the problem by introducing a compaction-strategy-specific data structure for holding sstables. This data structure has a method to obtain the sstables used for a read. For leveled compaction strategy, this data structure is an interval map, which can be efficiently used to select the right sstables.	2016-07-04 16:00:35 +03:00
Avi Kivity	c8237fc262	compaction_strategy: introduce make_sstable_set() Allow compaction_strategy to create a container for sstables that is optimized for the strategy. Most compaction_strategies return bag_sstable_set; leveled compaction returns the specialized partitioned_sstable_set.	2016-07-03 10:27:01 +03:00
Avi Kivity	168696c558	Introduce partitioned_sstable_set partitioned_sstable_set assumes that sstable are mostly partitioned along the token range: only a few sstables will be needed to access a particular token. It is implemented as an interval_map.	2016-07-03 10:27:00 +03:00
Avi Kivity	64e4357461	Introduce bag_sstable_set bag_sstable_set is a generic sstable_set implementation: it assumes nothing about the sstables. It is implemented as a vector, and any select will return the entire sstable set.	2016-07-03 10:27:00 +03:00
Avi Kivity	85e9cf4616	Introduce sstable_set sstable_set abstracts the notion of a container of sstables, allowing different compaction strategies to supply their own implementation. The intended user is leveled compaction strategy; since it partitions sstables, it can quickly restrict the number of sstables that participate in a query by looking at the min/max partition key. sstable_set also maintains an internal lw_shared_ptr<sstable_list>, in parallel with the abstract container. This is to support column_family::get_sstable(), which returns a lw_shared_ptr<sstable_list> which must be anchored somewhere if it is not saved at the caller side, as it isn't in most current callers.	2016-07-03 10:27:00 +03:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Paweł Dziepak	b150720361	sstable: enable read ahead Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 13:18:24 +01:00
Paweł Dziepak	4513f8b52c	sstables: add compressed_file_data_source_impl::close() compressed_file_data_source_impl should close the underlying data source properly when asked to. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 13:07:07 +01:00
Paweł Dziepak	55a6911d7a	sstables: close input_stream<> properly If read ahead is going to be enabled it is important to close input_stream<> properly (and wait for completion) before destroying it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	e44e12c74a	sstables: drop no longer needed code Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	c2f0ee9b5f	sstables: add consumer-style sstable compactor This patch moves compaction logic to a consumer that can be used with consume_flattened_in_thread(). Internally, sstable_writer is used to write individual sstables. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	18a9ee105f	sstables: add consumer-style sstable writer sstable_writer encapsulates all logic related to writing sstable. Previously introduced component_writer is used to write actual mutations. sstable_writer is intended to be used with consume_flattened_in_thread(). Its purpose is to be used by higher-level consumer that needs to write possibly more than one sstable (sstable compaction is an example of such consumer). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	0e8b8463ba	sstables: introduce consumer-style components writer This patch rewrites do_write_components() so that it can use consume_flattened_in_thread(). All components-writing code is moved to a new consumer: component_writer. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	599ed7f1ed	sstables: restore indentation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	e7ff20b3bb	sstables: run compaction code inside a thread Currently, each sstable write has its separate thread. However, the goal is to have compaction use consume_flattened() with a consumer that creates and writes the sstables. consume_flattened() needs to be executed inside a thread, since sstable writer may defer. This patch is a first step in preparations and it just makes whole compaction logic run inside a thread. That makes little sense now, since all sstable writes spawn their own threads but that's going to change in the following patches. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:37:54 +01:00
Paweł Dziepak	2ee69860d2	sstables: make sstable reader produce streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	b6f78a8e2f	sstable: make sstable reads return streamed_mutation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	9e8db53c46	sstables: allow row consumer to stop at any point Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	71088b4f4a	sstables: fix partition slicing for row markers and collections Row markers and collections weren't filtered out even if they belonged to a clustering row that shouldn't be in the result. The check whether to include cell or not was done only for live and dead atomic cells. This patch adds appropriate checks for collections and row markers. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Paweł Dziepak	575daea897	sstables: make deletion_time to tombstone cast safer Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00

1 2 3 4 5 ...

670 Commits