scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	11b74050a1	partitioned_sstable_set: fix quadratic space complexity streaming generates lots of small sstables with large token range, which triggers O(N^2) in space in interval map. level 0 sstables will now be stored in a structure that has O(N) in space complexity and which will be included for every read. Fixes #2287. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>	2017-04-18 13:04:38 +03:00
Tomasz Grabiec	d523c60629	sstables: Push fragments from mp_row_consumer so that parser is interrupted less Currently we return proceed::no after every mutation_fragment which is to be consumed. This froces parser to save and reload its state often. This can be avoided if we pushed the fragments directly from mp_row_consumer, then we would return proceed::no only when the buffer fills up. tests/perf/perf_fast_forward shows 15% increase in throughput of a large partition scan, from 1.34M frag/s to 1.55M frag/s. Message-Id: <1490882700-22684-1-git-send-email-tgrabiec@scylladb.com>	2017-04-05 18:10:54 +03:00
Avi Kivity	5b530aa464	Merge "Use promoted index for skipping in sstable mutation readers" from Tomasz "sstable_streamed_mutation::fast_forward_to() is changed to use promoted index (via index_reader) to optimize skipping in large partitions. In addition to that, sstable mutation_reader is changed to use the index to skip to the next partition. Performance impact was evaluated using newly added tests/perf/perf_fast_forward What's beyond this series: - Using index_reader for single-partition reads as well - Using index_reader for skipping across ranges in clustering restrictions" * tag 'tgrabiec/skip-within-partition-using-index-v2' of github.com:cloudius-systems/seastar-dev: (47 commits) tests: Add performance test for fast forwarding of sstable readers tests: Allow starting cql_test_env on pre-existing data config: Allow specifying source when setting value tests: sstable: Add test for fast forwarding within partition using index sstables: sstable_streamed_mutation: use index in fast_forward_to() sstables: Store parsed promoted index in index_entry sstables: Add trace-level logging for sstable consumption sstables: Define deletion_time earlier sstables: Make parsing throw exception on malformed promoted index tests: Add tests for ordering of position_in_partition relative to composites position_range: Introduce all_clustered_rows() factory method position_in_partition: Introduce for_key()/after_key() factory methods position_in_partition: Add factory methods for positions around all rows position_in_partition: Introduce for_range_start()/for_range_end() position_in_partition: Fix friendship declaration keys: Introduce is_empty() for prefixes position_in_partition: Make comparable with composites types: Enhance lexicographical comparators compound_compat: Accept marker value in serialize_value() compound_compat: Add trichotomic comparator ...	2017-03-29 19:01:12 +03:00
Raphael S. Carvalho	023031b0c8	compaction: lcs: fix functionality to feed starved levels quick introduction to level starvation: high levels may be left uncompacted (thus starved) for a long time if user makes something that make they contain little data, such as cleanup or change of max sstable size (default 160M). Leveled strategy handles this problem as follow: consider we're compacting L1 to L2. If L3 is starved, we look for one of its sstable that is fully contained in token range of candidates L1->L2, so that we won't end up with an overlapping in L2. now the problem: the functionality isn't working properly now because range of candidates is being incorrectly calculated due to an accident when converting the code to C++. It won't cause an overlap because it's actually being more restrictive about which sstable from starved level can be used. A test case was added to confirm the problem. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170328223753.15398-1-raphaelsc@scylladb.com>	2017-03-29 18:59:46 +03:00
Tomasz Grabiec	3fbc0bed6e	sstables: sstable_streamed_mutation: use index in fast_forward_to()	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5b36976bf0	sstables: Store parsed promoted index in index_entry	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	a2a8312c78	sstables: Add trace-level logging for sstable consumption	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5af815bf20	sstables: Define deletion_time earlier	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5e34743882	sstables: Make parsing throw exception on malformed promoted index Will be easier to propagate failure to upper layers once parsing is reused in the index_reader. The old behavior of ignoring parsing failures is preserved, but the error is logged now.	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	18a057aa81	compound_compat: Return composite from serialize_value() To make the code more type-safe. Also, mark constructor from bytes explicit.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	123b102dd6	sstables: Skip to next partition using index Slicing front of a very large partition: Before: offset read time [s] frags frag/s aio [KiB] blocked dropped cpu 0 1 0.110960 1 9 992 126956 924 0 92.4% After: offset read time [s] frags frag/s aio [KiB] blocked dropped cpu 0 1 0.000784 1 1276 3 344 2 1 37.3%	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	a9252dfc58	sstables: Use separate index readers for lower and upper bounds So that lower bound can be advanced within the range.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	aad943523a	sstables: index_reader: Add trace-level logging	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	388315c1ff	sstables: Expose index metrics	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	1dbd2e239e	sstables: index_reader: Share index lists among other index readers Direct motivation for this is to be able to use two index readers from a single mutation reader, one for lower bound of the range and one for the upper bound of the range, without sacrificing optimization of avoiding index reads when forwarding to partition ranges which are close by. After the change, all index readers of given sstable will share index buffers, so lower bound reader can reuse the page read by the upper bound reader. The reason for using two readers will be so that we are able to skip inside the partition range, not only outside of it. This is not possible if we use the same index reader to locate the upper bound of the range, because we may only advance the cursor.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	0635d74e17	sstables: Make index_entry copyable Needed to make the index_list copyable, which is going to be needed to implement legacy get_index_entries() which returns by value, after index sharing is implemented.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	e36979da47	sstables: index_reader: Use sstable's schema Makes for a simpler interface.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	e3e2f037bb	sstables: index_reader: Refactor around the concept of a cursor Index reader already can be queried only with monotonic positions, so the concept of a cursor is ingrained. Making it explicit will make it easier to define behavior for forwarding withing the partition. After the change: - lower_bound() is renamed to advance_to() and doesn't return the position, only advances the cursor - data file position for partition under cursor can be obtained at any time with data_file_position()	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	27862fa8f6	sstables: index_reader: Narrow down summary range during lookup Positions passed to lower_bound() must be non-decreasing, so summary indexes as well.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	02ace99798	sstables: index_reader: Change lookup to work on ring_position_view In preparation for changing the interface to work not only with ranges.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	5edb427873	sstables: Remove private constructor To reduce duplication.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	705bd6da1a	sstables: Remove unused method	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	d5e704ca1e	sstables: Make key_view constructor from bytes_view explicit	2017-03-28 18:10:39 +02:00
Raphael S. Carvalho	6b6bb38f38	compaction_manager: stop manager after storage io error Manager will stop itself if a compaction fails due to storage io error, which unconditionally results in stop of transportation services. Fixes #2147. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170316054538.23423-1-raphaelsc@scylladb.com>	2017-03-16 10:37:47 +02:00
Tomasz Grabiec	1f1b516b31	sstables: Remove use of forwarding wrapper	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d7afab21e7	sstables: Implement sstable_streamed_mutation::fast_forward_to() Handling of forwarding is done inside mp_row_consumer, because it allows us to filter out irrelevant data sooner and thus more efficiently. Becuase static row can be now skipped as well, _skip_clustering_row was renamed to more generic _skip_in_progress.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	4750216387	sstables: Extract and use clustering_ranges_walker Extracted from mp_row_consumer.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	124dde30db	sstables: Extract writer parameters into config objects Also enables users to change the default promoted index block size.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	01374c41f2	sstables: Move workaround for out-of-order range tombstones to mp_row_consumer This is a preliminary step before adding support for fast-forwarding to mp_row_consumer, so that range handling can be solely in mp_row_consumer rather than split between it and sstable_streamed_mutation. This also alleviates #2080 by reading all tombstones only up to the first row, after that range tombstones are treated like other fragments.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d41a7c5eb4	sstables: Drop default mp_row_consumer constructor	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	56f1ad7841	sstables: Swap order of values in "proceed" so that "no" is assigned 0	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	084747b1ee	sstables: streamed_mutation: Stop reading when end of slice reached As part of this change, skip detection detection is refactored. This simplifies reasoning about mp_row_consumer's state a bit because now is_mutation() is not reset externally and only depends on current position of the reader. It will prove useful when we extend mutation reader to decide if it should skip to the next partition up front before calling _context.read(), so that we can for instance skip using index instead. Fixes #2088.	2017-03-10 14:42:19 +01:00
Tomasz Grabiec	55358cacc5	sstables: Switch is_in_range() to position_in_partition Makes it immune to #1446 and is a prerequisite for implementing forwarding in mp_row_consumer.	2017-03-09 21:15:11 +01:00
Nadav Har'El	506e074ba4	sstable decompression: fix skip() to end of file The skip() implementation for the compressed file input stream incorrectly handled the case of skipping to the end of file: In that case we just need to update the file pointer, but not skip anywhere in the compressed disk file; In particular, we must NOT call locate() to find the relevant on-disk compressed chunk, because there is none - locate() can only be called on actual positions of bytes, not on the one-past-end-of-file position. Fixes #2143 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170308100057.23316-1-nyh@scylladb.com>	2017-03-08 12:35:05 +02:00
Avi Kivity	1b5ba63676	sstable: fix unhandled exception in atomic_deletion_manager::delete_atomically() The current code is assymetric: the first N-1 shards to delete a set receive a synthetic future to wait on, while the last deletion receives the result of the delete operation (which also broadcasts completion to the first N-1 operations. This results, in case of an error, with the Nth future being reported as an unhandled error. Fix by making everything symmetric: all N callers receive a synthetic future. Nobody waits for the deletion operation (which still broadcasts its completion to all waiters, so errors are not lost). Message-Id: <20170305151607.14264-1-avi@scylladb.com>	2017-03-07 12:41:12 +02:00
Paweł Dziepak	5d66031b7a	sstable: make input_stream_history initializers in-class sstable has two constructors but only one of them was creating input stream history objects. Message-Id: <20170227151734.16928-1-pdziepak@scylladb.com>	2017-02-28 09:22:11 +01:00
Paweł Dziepak	0198d8e470	Merge "Introduce streamed_mutation::fast_forward_to()" from Tomasz "This introduces an API which allows forward navigation in a stream of mutation fragments. It allows one to consume only a subset of the stream by iteratively specifying sub-ranges from which fragments should be returned. API outline: When in forwarding mode, the stream does not return all fragments right away, but only those belonging to the current range. Initially current range only covers the static row. The stream can be forwarded, even before reaching end- of-stream for current range, to a later range with fast_forward_to(). Forwarding doesn't change initial restrictions of the stream, it can only be used to skip over data. Monotonicity of positions is preserved by forwarding. That is fragments emitted after forwarding will have greater positions than any fragments emitted before forwarding. For any range, all range tombstones relevant for that range which are present in the original stream will be emitted. Range tombstones emitted before forwarding which overlap with the new range are not necessarily re-emitted. When not in forwarding mode, the stream acts as if the current range was equal to the full range. This implies that fast_forward_to() cannot be used. Whether stream is in forwarding mode or not is specified when the stream is created, typically via mutation_source interface. What's left for later series: Optimization by providing specialized implementations. This series implements forwarding support in all mutation sources via generic wrapper which simply drops fragments." * tag 'tgrabiec/clustering-fast-forward-to-v2' of github.com:scylladb/seastar-dev: tests: mutation_source_tests: Verify monotonicty of positions tests: random_mutation_generator: Spread the keys more tests: mutation_source_test: Make blobs more easily distinguishable tests: streamed_mutation: Test that merged stream passes mutation source tests tests: mutation_source_test: Add tests for forwarding of streamed_mutation tests: streamed_mutation_assertions: Add methods for navigating the stream tests: Add range generators to random_mutation_generator partition_slice_builder: Add with_ranges() query: Introduce full_clustering_range streamed_mutation: Add non-owning variant of mutation_from_streamed_mutation() db: Enable creating forwardable readers via mutation_source mutation_source: Document liveness requirements mutation_source: Cleanup db: Replace virtual_reader_type with mutation_source_opt partition_version: Refactor make_partition_snapshot_reader() overloads database: Fix mutation_source created by as_mutation_source() to not ignore trace_state_ptr memtable: Accept all mutation_source parameters streamed_mutation: Implement fast_forward_to() in stream merger streamed_mutation: Add generic implementation of forwardable streamed_mutation streamed_mutation: Add fast_forward_to() API position_in_partition: Introduce position_range position_in_partition: Introduce position constructor for right after the static row streamed_mutation: Make cast to view non-explicit streamed_mutation: Make schema() getter non-copying	2017-02-24 10:37:51 +00:00
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Gleb Natapov	0977f4fdf8	sstable: close sstable_writer's file if writing of sstable fails. Failing to close a file properly before destroying file's object causes crashes. [tgrabiec: fixed typo] Message-Id: <20170221144858.GG11471@scylladb.com>	2017-02-21 18:17:47 +01:00
Tomasz Grabiec	33457cc9a9	sstables: Fix detection of repeated tombstones The check was not catching range tombstone repeated immediately after itself. Message-Id: <1487596098-17409-1-git-send-email-tgrabiec@scylladb.com>	2017-02-20 15:35:15 +00:00
Tomasz Grabiec	cc439df542	Revert "sstables: Simplify sstable_streamed_mutation::read_next()" This reverts commit `1e2c01ff49`. We do not detect repeated tombstone if it follows an in-range tombstone following a skipped clustering row, because _in_progress will be disengaged after such tombstone is emitted. Message-Id: <1487596080-21480-1-git-send-email-tgrabiec@scylladb.com>	2017-02-20 15:34:58 +00:00
Raphael S. Carvalho	53d9008052	sstables/deletion_manager: kill dead code Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <2b24d9e622238030a737fbbe12b8439853d5d075.1487095059.git.raphaelsc@scylladb.com>	2017-02-16 18:38:54 +02:00
Tomasz Grabiec	1e2c01ff49	sstables: Simplify sstable_streamed_mutation::read_next() mp_row_consumer doesn't split row fragments on repeated range tombstones any more.	2017-02-13 16:12:16 +01:00
Tomasz Grabiec	6324876f24	sstables: Emit only relevant range tombstones	2017-02-13 16:12:16 +01:00
Paweł Dziepak	354ce0b2c7	mutation_fragment: make write access more explicit mutation_fragments are going to be caching their size in memory. In order to be able to invalidate that correctly, they need to know when that size may change (but avoid invalidation when it is not necessary).	2017-02-09 10:49:46 +00:00
Paweł Dziepak	83c6fc1114	sstables: write counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	5905729c4a	sstables: read counter cells	2017-02-02 10:35:14 +00:00
Tomasz Grabiec	6c75614d19	sstables: Fix input_stream not being closed by index_reader Fixes #2022 Message-Id: <1484912679-5729-1-git-send-email-tgrabiec@scylladb.com>	2017-01-20 11:58:33 +00:00
Paweł Dziepak	19ad35610b	sstables: do not discard future returned by fast_forward_to() continuous_data_consumer::fast_forward_to() returns a future which was later ignored by data_consume_context::fast_forward_to(). With the current implementation, the future in question is always ready and that's why the problem didn't manifest itself in the form of crashes or invalid results. Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>	2017-01-20 12:22:17 +01:00

1 2 3 4 5 ...

850 Commits