scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	aad943523a	sstables: index_reader: Add trace-level logging	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	388315c1ff	sstables: Expose index metrics	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	1dbd2e239e	sstables: index_reader: Share index lists among other index readers Direct motivation for this is to be able to use two index readers from a single mutation reader, one for lower bound of the range and one for the upper bound of the range, without sacrificing optimization of avoiding index reads when forwarding to partition ranges which are close by. After the change, all index readers of given sstable will share index buffers, so lower bound reader can reuse the page read by the upper bound reader. The reason for using two readers will be so that we are able to skip inside the partition range, not only outside of it. This is not possible if we use the same index reader to locate the upper bound of the range, because we may only advance the cursor.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	5edb427873	sstables: Remove private constructor To reduce duplication.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	705bd6da1a	sstables: Remove unused method	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	124dde30db	sstables: Extract writer parameters into config objects Also enables users to change the default promoted index block size.	2017-03-10 14:42:22 +01:00
Paweł Dziepak	5d66031b7a	sstable: make input_stream_history initializers in-class sstable has two constructors but only one of them was creating input stream history objects. Message-Id: <20170227151734.16928-1-pdziepak@scylladb.com>	2017-02-28 09:22:11 +01:00
Paweł Dziepak	0198d8e470	Merge "Introduce streamed_mutation::fast_forward_to()" from Tomasz "This introduces an API which allows forward navigation in a stream of mutation fragments. It allows one to consume only a subset of the stream by iteratively specifying sub-ranges from which fragments should be returned. API outline: When in forwarding mode, the stream does not return all fragments right away, but only those belonging to the current range. Initially current range only covers the static row. The stream can be forwarded, even before reaching end- of-stream for current range, to a later range with fast_forward_to(). Forwarding doesn't change initial restrictions of the stream, it can only be used to skip over data. Monotonicity of positions is preserved by forwarding. That is fragments emitted after forwarding will have greater positions than any fragments emitted before forwarding. For any range, all range tombstones relevant for that range which are present in the original stream will be emitted. Range tombstones emitted before forwarding which overlap with the new range are not necessarily re-emitted. When not in forwarding mode, the stream acts as if the current range was equal to the full range. This implies that fast_forward_to() cannot be used. Whether stream is in forwarding mode or not is specified when the stream is created, typically via mutation_source interface. What's left for later series: Optimization by providing specialized implementations. This series implements forwarding support in all mutation sources via generic wrapper which simply drops fragments." * tag 'tgrabiec/clustering-fast-forward-to-v2' of github.com:scylladb/seastar-dev: tests: mutation_source_tests: Verify monotonicty of positions tests: random_mutation_generator: Spread the keys more tests: mutation_source_test: Make blobs more easily distinguishable tests: streamed_mutation: Test that merged stream passes mutation source tests tests: mutation_source_test: Add tests for forwarding of streamed_mutation tests: streamed_mutation_assertions: Add methods for navigating the stream tests: Add range generators to random_mutation_generator partition_slice_builder: Add with_ranges() query: Introduce full_clustering_range streamed_mutation: Add non-owning variant of mutation_from_streamed_mutation() db: Enable creating forwardable readers via mutation_source mutation_source: Document liveness requirements mutation_source: Cleanup db: Replace virtual_reader_type with mutation_source_opt partition_version: Refactor make_partition_snapshot_reader() overloads database: Fix mutation_source created by as_mutation_source() to not ignore trace_state_ptr memtable: Accept all mutation_source parameters streamed_mutation: Implement fast_forward_to() in stream merger streamed_mutation: Add generic implementation of forwardable streamed_mutation streamed_mutation: Add fast_forward_to() API position_in_partition: Introduce position_range position_in_partition: Introduce position constructor for right after the static row streamed_mutation: Make cast to view non-explicit streamed_mutation: Make schema() getter non-copying	2017-02-24 10:37:51 +00:00
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Gleb Natapov	0977f4fdf8	sstable: close sstable_writer's file if writing of sstable fails. Failing to close a file properly before destroying file's object causes crashes. [tgrabiec: fixed typo] Message-Id: <20170221144858.GG11471@scylladb.com>	2017-02-21 18:17:47 +01:00
Paweł Dziepak	83c6fc1114	sstables: write counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	19ad35610b	sstables: do not discard future returned by fast_forward_to() continuous_data_consumer::fast_forward_to() returns a future which was later ignored by data_consume_context::fast_forward_to(). With the current implementation, the future in question is always ready and that's why the problem didn't manifest itself in the form of crashes or invalid results. Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>	2017-01-20 12:22:17 +01:00
Raphael S. Carvalho	68dfcf5256	db: avoid excessive memory usage during resharding After resharding, sstables may be owned by all shards, which means that file descriptors and memory usage for metadata will increase by a factor equal to number of shards. That can easily lead to OOM. SSTable components are immutable, so they can be stored in one shard and shared with others that need it. We use the following formula to decide which shard will open the sstable and share it with the others: (generation % smp::count), which is the inverse of how we calculate generation for new sstables. So if no resharding is performed, everything is shard-local. With this approach, resource usage due to loaded sstables will be evenly distributed among shards. For this approach to work, we now only populate keyspaces from shard 0. It's now the sole responsible for iterating through column family dirs. In addition, most of population functions are now free and take distributed database object as parameter. Fixes #1951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-09 15:24:36 -02:00
Raphael S. Carvalho	eed2a7d065	sstables: group sstable components that can be shared among shards We intend to share immutable sstable components among shards to reduce excessive memory usage when resharding shared sstables. This change is about grouping those components into a structure, and using foreign ptr to make sure that the structure will be deleted by whichever shard created it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:19 -02:00
Raphael S. Carvalho	a492f8dfaf	sstables: rename sstable member Rename _components to _recognized_components because _components will be used to name a field with shareable components. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:17 -02:00
Tomasz Grabiec	0e487b3499	db: Compute key hash once in partition_presence_checker I measured reduction of cache update time by 20% for 6 sstables and by 40% for 16. Refs #1943.	2016-12-19 14:20:58 +01:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Avi Kivity	3c3a18f222	sstables: move sharding metadata from Statistics component to a new Scylla component The Cassandra derived sstable tools (and likely Cassandra itself) object to a new sub-component in the Statistics component; create a new Scylla component instead to host this data.	2016-12-07 15:20:13 +02:00
Raphael S. Carvalho	38743c1948	sstables: provide write time of data component Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <59686148149f2159990329775e0cd8780bc54254.1480533805.git.raphaelsc@scylladb.com>	2016-12-01 11:19:57 +02:00
Avi Kivity	98a4544e1c	sstables: add method to get sstable owning shards from an unloaded sstable When we load an sstable, we don't know beforehand which shards it belongs to; we don't want to open it until we do. Add a method that allows us to read just the sharding data, without opening anything else.	2016-11-22 21:52:23 +02:00
Avi Kivity	bdd11648ac	sstables: add intra-node sharding metadata Add a metadata component that describes token ranges that are spanned by this sstable. With the current sharding algorithm, where each shard owns a single token range, the first/last partition key is sufficient to describing sharding information, but for multi-range algorithms, this is not sufficient.	2016-11-22 21:44:25 +02:00
Avi Kivity	3c06ffac9d	sstables: const correctness for the write(file_writer&, T&) functions write() doesn't need to change its input; so change it to const. The only snag is that describe_type() isn't and can't be made const-correct, so cheat when it is called and const_cast the input. This helps in writing a generic serialized_size() that is const correct, in the next patch.	2016-11-22 20:04:27 +02:00
Avi Kivity	f10b9906d8	sstables: move atomic deletion code to its own files This will simplify unit testing. We move generic code that depends only on seastar, so compile time should not increase too much.	2016-11-04 15:47:35 +02:00
Raphael S. Carvalho	53b7b7def3	sstables: handle unrecognized sstable component As in C*, unrecognized sstable components should be ignored when loading a sstable. At the moment, Scylla fails to do so and will not boot as a result. In addition, unknown components should be remembered when moving a sstable or changing its generation. Fixes #1780. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b7af0c28e5b574fd577a7a1d28fb006ac197aa0a.1478025930.git.raphaelsc@scylladb.com>	2016-11-02 12:44:53 +02:00
Raphael S. Carvalho	a3e065da9b	db: make it possible to use custom error handler with io checker By default, io checker will cause Scylla to shutdown if it finds specific system errors. Right now, io checker isn't flexible enough to allow a specialized handler. For example, we don't want to Scylla to shutdown if there's an permission problem when uploading new files from upload dir. This desired flexibility is made possible here by allowing a handler parameter to io check functions and also changing existing code to take advantage of it. That's a step towards fixing #1709. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-27 15:54:21 -02:00
Raphael S. Carvalho	bc2d351c25	sstables: remove duplicated declaration of remove_by_toc_name Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-10-26 11:21:27 -02:00
Paweł Dziepak	ab0eeae82d	sstables: keep separate stream history for single and range reads Single partition and partition range reads are expected to behave considerably different so it is worth to have them use separate file stream history. This also makes reads use different history for each sstable which is also a good thing. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	20bfa1fa52	sstables: drop sstable::{lower, upper}_bound() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	c63e88d556	sstables: implement mutation_reader::impl::fast_forward_to() This patch allows sstable readers to be fast forwarded without making it necessary to recreate the reader (and dropping all buffers in the process). It is built on top of index_reader and ability of data_consume_context to be fast forwarded. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	a530762277	sstables: introduce index_reader index_reader is a helper that implements index lookups. Its goal is to avoid dropping read buffers if they still may be needed (for example to get end bound of the range or after fast forwarding the reader). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f49a9e0d64	sstables: drop unused read_range_rows() overload That overload was used only by unit test and violated guarantee that partition range lives until mutation reader is done. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Duarte Nunes	c36dbaf0f1	sstables: Add function to get key samples This patch implements the get_key_samples() function, on which a future patch will base an implementation of the describe_splits() thrift verb closer to Cassandra's. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 19:50:14 +02:00
Duarte Nunes	ceed09b23e	sstables: Get estimates for a particular range This patch adds the estimated_keys_for_range() function, which estimates the number of keys present between the specified range. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 17:52:15 +02:00
Raphael S. Carvalho	0eaa0f46c9	sstables: store first and last decorated keys in sstable object leveled strategy uses heavily first and last decorated keys of a sstable to get overlapping sstables in a given level. By storing first and last decorated keys in sstable object, it's expected that performance of leveled strategy (not compaction) will be improved. We will set first and last keys in sstable when either loading or sealing it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Raphael S. Carvalho	dffb41f9d8	sstables: remove schema parameter from some sstable methods schema can now be found in the sstable object itself. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>	2016-09-19 13:25:58 +02:00
Raphael S. Carvalho	004617839d	database: check bloom filter of all sstables earlier All sstables will now have bloom filter checked in a single pass before reader iterate through all candidates. It's possible that we will need to futurize the procedure if it holds cpu for too long. This change is also a step towards the optimization that will rule out sstables based on clustering filter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:50:08 -03:00
Raphael S. Carvalho	94c8ef39c3	sstables: store components ranges in sstable object Store range for each clustering component in sstable itself to optimize sstable filtering based on clustering key. If schema defines no clustering key, this new field will be empty. Each range stores min and max value of that specific component. With this information, it's possible to know if a sstable possibly stores a given clustering component. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:32 -03:00
Raphael S. Carvalho	0a5af61176	sstables: introduce function to validate min max clustering values Scylla was generating a sstable with incorrect min max clustering values. This information is used to filter out a sstable when user asks for a range of clustering rows. So it's important to detect wrong metadata and make sure that it will not be used. The validation is fast and will only happen when loading a sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:28 -03:00
Raphael S. Carvalho	1f31223f32	sstables: store schema in sstable object That will be needed for optimization that will store decorated keys in the sstable object, and also for a subsequent work that will detect wrong metadata (min/max column names) by looking at columns in the schema. As schema is stored in sstable, there's no longer a need to store ks and cf names in it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:49:17 -03:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Nadav Har'El	0d00da7f7f	sstables: don't forget to read static row [v2: fix check for static column (don't check if the schema is not compound) and move want-static-columns flag inside the filtering context to avoid changing all the callers.] When a CQL request asks to read only a range of clustering keys inside a partition, we actually need to read not just these clustering rows, but also the static columns and add them to the response (as explained by Tomek in issue #1568). With the current code, that CQL request is translated into an sstable::read_row() with a clustering-key filter. But this currently only reads the requested clustering keys - NOT the static columns. We don't want sstable::read_row() to unconditionally read the from disk the static columns because if, for example, they are already cached, we might not want to read them from disk. We don't have such partial-partition cache yet, but we are likely to have one in the future. This patch adds in the clustering key filter object a flag of whether we need to read the static columns (actually, it's function, returning this flag per partition, to match the API for the clustering-key filtering). When sstable::read_row() sees the flag for this partition is true, it also request to read the static columns. Currently, the code always passes "true" for this flag - because we don't have the logic to cache partially-read partitions. The current find_disk_ranges() code does not yet support returning a non- contiguous byte range, so this patch, if it notices that this partition really has static columns in addition to the range it needs to read, falls back to reading the entire partition. This is a correct solution (and fixes #1568) but not the most efficient solution. Because static columns are relatively rare, let's start with this solution (correct by less efficient when there are static columns) and providing the non- contiguous reading support is left as a FIXME. Fixes #1568 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>	2016-08-15 12:30:19 +03:00
Nadav Har'El	0d8463aba5	sstables: promoted index write support This patch adds writing of promoted index to sstables. The promoted index is basically a sample of columns and their positions for large partitions: The promoted index appears in the sstable's index file for partitions which are larger than 64 KB, and divides the partition to 64 KB blocks (as in Cassandra, this interval is configurable through the column_index_size_in_kb config parameter). Beyond modifying the index file, having a promoted index may also modify the data file: Since each of blocks may be read independently, we need to add in the beginning of each block the list of range tombstones that are still open at that position. See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File Fixes #959 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:12 +03:00
Nadav Har'El	6cdf5684f5	sstables: introduce find_disk_ranges() Our sstable reading code is currently hard-coded to read entire partitions, even if we know that only a subset of the columns are requested. This patch introduces find_disk_ranges(), a function to find the ranges of bytes we need to read from the sstable data file to guarantee that the desired columns from the desired partition are read. The returned range may be the entire byte range of the given partition - as found using the summary and index files - but if the index contains a "promoted index" (basically a sample of column positions for each key) we may return a smaller range. The "disk_read_range" type introduced in the previous patch is extended here to support reading a partial partition - by including additional information which would be missed when reading only part of a partition (viz., the partition key and the partition's tombstone). This function isn't used in this patch - we will wire its use in the next patch, which will complete the read-side support for the promoted index. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:08 +03:00
Nadav Har'El	1975abbfd6	sstables: disk_read_range Currently, the main sstable data parsing entry point data_consume_rows() takes a contiguous range of bytes to read from disk and parse. This range is supposed to be an entire partition or contiguous group of partitions. and is self contained (can be parsed without extra information about the identity of these partitions). For the promoted index feature (which we will add in a following patch) we will want the range to span only a part of a partition, and will need the caller to provide some information not available to the parser (such as the partition's key). In the future, we will also want to support a vector of byte ranges, instead of just one. So in preparation for this, this patch simply replaces the start/end pair by a new class disk_read_range, which can be easily extended in later patches. No new functionality is introduced in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:02 +03:00
Paweł Dziepak	5ba4cd1a0b	sstables: enable_lw_shared_from_this for sstable sstable has member functions that create objects which need to extend lifetime of the sstable (for example mutation_readers), the easiest way to achieve that is to enable_lw_shared_from_this for sstable. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 15:51:12 +01:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00

1 2 3 4 5

230 Commits