scylladb

Author	SHA1	Message	Date
Piotr Jastrzebski	ea449c9cce	Replace sstables::mutation_reader with ::mutation_reader This will make migration to flat_mutation_reader much easier and sstables::mutation_reader is going away with this migration anyway. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-11-15 10:40:01 +01:00
Raphael S. Carvalho	1f478d5daa	tests: enable twcs test that relied on size-tiered properties Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-11-14 13:27:27 -02:00
Raphael S. Carvalho	8165af1d08	twcs: respect stcs options by forwarding them to stcs method Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-11-14 13:27:27 -02:00
Raphael S. Carvalho	9cdc047a4c	lcs: forward stcs options to respect them Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-11-14 13:27:27 -02:00
Raphael S. Carvalho	d8ec913c34	stcs: make most_interesting_bucket respect thresholds Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-11-14 13:26:04 -02:00
Raphael S. Carvalho	cb6d060d8e	compaction: make size_tiered_most_interesting_bucket static method of stcs class Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-11-14 13:24:03 -02:00
Duarte Nunes	baeec0935f	Replace query::full_slice with schema::full_slice() query::full_slice doesn't select any regular or static columns, which is at odds with the expectations of its users. This patch replaces it with the schema::full_slice() version. Refs #2885 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>	2017-10-17 11:25:53 +02:00
Raphael S. Carvalho	67c5c8dc67	sstables: do not recompute shards for all tables after each compaction For every finished compaction, we were calculating shards for all existing tables. With ignore_msb set to 0, it's probably not a big deal, but if ignore_msb is like 12 and LCS is used (meaning thousands of tables possibly), the operation may stall the reactor for a considerable amount of time. That's fixed by caching shards. Fixes #2875. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20171011053424.22308-1-raphaelsc@scylladb.com>	2017-10-11 11:45:01 +03:00
Botond Dénes	046a1f9b05	sstables: Get rid of [[deprecated]] index_reader::get_index_entries() Change test code (the only consumers) to read index by partitions. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b6111e92b5e0729bfa2e76fd848215804174067a.1507297154.git.bdenes@scylladb.com>	2017-10-08 12:18:52 +03:00
Raphael S. Carvalho	e34c1db642	db: update compaction history outside the sstable write lock The reason to do that is because compaction can deadlock if refresh disables write which waits for compaction, and compaction in turn waits for dirty memory[1] that would be released by memtable write. Dirty memory manager for non-system cfs was being used for system cfs, which was useful for exposing this problem. [1]: when updating compaction history. Fixes #2769. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170918215238.9810-2-raphaelsc@scylladb.com>	2017-09-26 19:51:12 +02:00
Raphael S. Carvalho	1524426deb	sstables: Fix compaction correctness of higher-level tables When incremental_reader_selector is used for compaction, it will first call incremental selector of partitioned sstable set with minimum token that will result in first interval being skipped, which means not everything being compacted. The interval is skipped because iterator is incorrectly advanced when token lies before it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170918021446.15920-1-raphaelsc@scylladb.com>	2017-09-19 09:59:30 +03:00
Avi Kivity	f7023501d6	treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable> Since shared_sstable is going to be its own type soon, we can't use the old alias.	2017-09-12 10:43:05 +03:00
Paweł Dziepak	2b614201a7	tests/sstables: add storage_service_for_tests to counter write test Writing a counters to a sstable is going to require cluster feature information, which requires accessing some singletons.	2017-09-05 10:32:48 +01:00
Paweł Dziepak	5007c9290a	tests/sstables: add test for reading wrong-order counter cells	2017-09-05 10:32:48 +01:00
Raphael S. Carvalho	050a7019b8	sstables/index_reader: fix index reader for summary entry spanning lots of keys quantity prevents index_reader from reading all index entries of a summary entry that span more than min_index_interval entries. That can happen after introduction of size-based sampling, and consequently, sstable will not be able to return a key which logical position in summary entry is beyond min_index_interval. It's ok to not use quantity because index_reader will read all indexes until either next summary entry or end of file is reached. Fixes test_sstable_conforms_to_mutation_source Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170812045821.25269-1-raphaelsc@scylladb.com>	2017-08-12 09:44:16 +03:00
Raphael S. Carvalho	5124f94358	tests: test summary entry spanning more keys than min interval Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 01:37:06 -03:00
Raphael S. Carvalho	8726ee937d	sstables: introduce size-based sampling for sstable summary Currently, a summary entry is added after min_index_interval index entries were written. Not taking into account size of index entries becomes a problem with large partitions which may create big index entries due to promoted indexes. Read performance is affected as a consequence because index entries spanned by summary are all read from disk to serve request. What we wanna do is to also add a summary entry after index reaches a boundary. To deal with oversampling, we want to write 1 byte to summary for every 2000 bytes written to data file (this will be eventually made into an option in the config file). Both conditions must be met to avoid under or oversampling. That way, the amount of data needed from index file to satify the request is drastically reduced. Fixes #1842. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 00:30:12 -03:00
Botond Dénes	9ee9988097	Add combined_mutation_reader_test unit test	2017-08-10 12:38:10 +03:00
Botond Dénes	94fc550e68	sstable_set::incremental_selector: select() now returns a selection A seletion contains - in addition to the list of sstables - a next_token which is a hint as to what is the next best token to call select() with. This should be the smallest token such that at the next call to select() the least number of new sstables will be returned, without skipping any.	2017-08-09 16:27:33 +03:00
Avi Kivity	c21bb5ae05	tests: fix sstable_datafile_test build with boost 1.55 Boost 1.55 accidentally removed support for "range for" on recursive_directory_iterator (previous and latter versions do support it). Use old-style iteration instead. Message-Id: <20170724080128.8824-1-avi@scylladb.com>	2017-07-24 11:20:12 +03:00
Tomasz Grabiec	a9237c1666	schema: Revert back to the 1.7 layout of static compact tables in memory We are using C* 3.x compatible layout in schema tables but want to keep using the 1.7 layout in memory for compatibility during rolling upgrade. This patch switches the schema and schema_builder classes back to the old layout. Translation of layout happens when converting to/from schema mutations. Notable changes: 1) Includes a revert of commit `6260f31e08` "thrift: Update CQL mapping of static CFs". 2) Brings back the "default_validation_class" schema attribute. In v3 it can be dervied from column definitions, but in v2 it can't, so we have to store it. 3) legacy_schema_migrator and schema_builder don't have to do conversions to v3, this is now handled by the v3_columns class. schema_builder works with the same layout as schema, that is v2. 4) Includes a revert of commit `66991a7ccb` "v3 schema test fixes" Fixes #2555.	2017-07-19 09:52:15 +02:00
Raphael S. Carvalho	c55c63f213	tests: add tests for time window compaction strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-19 02:58:37 -03:00
Avi Kivity	9116dd91cb	tests: copy the sstable with an unknown component to the data directory We will be creating links to those sstable's files, and those don't work if the data directory and the test sstable are on different devices. Copying the files to the same directory fixes the problem. Message-Id: <20170716090405.14307-1-avi@scylladb.com>	2017-07-16 11:55:00 +02:00
Avi Kivity	4704a78332	tests: remove bad constexpr in sstable_datafile_test std::ceil() is not constexpr. Found by clang.	2017-07-12 17:14:13 +03:00
Raphael S. Carvalho	8334086441	lcs: remove quadratic behavior from L0 compaction L0 compaction triggers quadratic behavior when many newly created sstables are needed for promotion due to their size being relatively low to max sstable size parameter. So until L0 is worth promoting, the strategy will compact every new sstable with all the existing ones in L0. To fix it, let's do STCS on level 0 until it becomes worth promoting. Fixes #2432. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-11 09:35:35 -03:00
Avi Kivity	7b4412c3ce	Revert "Merge "improvements for leveled strategy manifest" from Raphael" This reverts commit `43a3e718e6`, reversing changes made to `3813e94b0a`. It contains some unrelated commits.	2017-07-11 11:12:53 +03:00
Raphael S. Carvalho	28ebe1807f	lcs: remove quadratic behavior from L0 compaction L0 compaction triggers quadratic behavior when many newly created sstables are needed for promotion due to their size being relatively low to max sstable size parameter. So until L0 is worth promoting, the strategy will compact every new sstable with all the existing ones in L0. To fix it, let's do STCS on level 0 until it becomes worth promoting. Fixes #2432. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-10 15:42:28 -03:00
Raphael S. Carvalho	7f7758fb6f	tests/sstable: make sstable_expired_data_ratio more robust this change will stress histogram ability to return a good estimation after merging keys such that it doesn't grow beyond size limit. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170708205713.5958-1-raphaelsc@scylladb.com>	2017-07-09 10:33:10 +03:00
Raphael S. Carvalho	b350352e6c	compaction: keep only one variant of size_tiered_most_interesting_bucket two variants of size_tiered_most_interesting_bucket existed to avoid copy, but subsequent work will make lcs use vector for each level of sstables, so let's only keep one variant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-07-04 03:34:51 -03:00
Avi Kivity	5883e85da3	Merge "improve maintainability of compaction strategies" from Raphael "compaction_strategy.cc keeps the full implementation of size tiered, major, and null strategies, and partial implementation of leveled and date tiered strategies. It's a mess. In the future, we will also need space for time window strategy. The file is hard to read and maintain. My goal here is to improve maintainability of the strategies by putting each of them into its own header. NOTE: No semantic change is introduced here." * 'improve_compaction_strategy_maintainability' of github.com:raphaelsc/scylla: compaction_strategy: move dtcs to its existing header compaction_strategy: move lcs implementation to its own header compaction_strategy: move stcs implementation to its own header compaction_strategy: move compaction_strategy_impl to its own header	2017-07-03 11:39:30 +03:00
Avi Kivity	6895f6e603	sstable_datafile_test: fix sstable_expired_data_ratio failure A comment states that we want the file to be old enough, but sets a timestamp of max(), which is in the future. This may have passed because the conversion from numeric_limits<time_t>::max() to db_clock::time_point is not well defined (their dynamic range is different), so truncation may have converted the large number to a low one. Message-Id: <20170702082903.20879-1-avi@scylladb.com>	2017-07-02 20:22:51 +02:00
Raphael S. Carvalho	69a9ad468c	compaction_strategy: move dtcs to its existing header Goal is to improve maintainability. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-30 03:50:09 -03:00
Raphael S. Carvalho	ab335c8085	tests: more testing for tombstone compaction options Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	ce4dc15a20	tests: basic tombstone compaction test for date tiered Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	c400bf97b9	tests: basic test of tombstone compaction with lcs Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	138fda468f	tests: basic tombstone compaction test for size tiered Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	ad24470972	tests: add test for estimation of droppable tombstone ratio Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:43:08 -03:00
Raphael S. Carvalho	c01c659594	tests: add test for sstable with bad tombstone histogram Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	7b532867ce	tests: add sstable tombstone histogram test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 01:17:28 -03:00
Raphael S. Carvalho	fb9bc609c6	streaming_histogram: do not limit it to be used by sstables streaming histogram will later be placed in /utils, so we want it to use std::unordered_map<> instead of disk_hash<>. That also requires implementing serialization/deserialization functions for it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-27 16:51:52 -03:00
Avi Kivity	555621b537	Disentable memtables from sstables Remove sstable::write_components(memtable), replacing it with a helper. Fixes #2354 Message-Id: <20170624142639.16662-1-avi@scylladb.com>	2017-06-26 09:37:11 +02:00
Raphael S. Carvalho	4bb27cbd6f	lcs: actually prefer oldest sstables of L0 when it falls behind Strategy prefers promoting oldest sstables in L0. Because sort procedure is incorrectly sorting elements in descending order, newest sstables will be promoted first if and only if L0 falls behind (more than 32 sstables). If L0 doesn't fall behind, we'll have all L0 sstables compacted with overlapping ones in L1. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-19 20:45:39 -03:00
Nadav Har'El	3018df11b5	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170619152629.11703-1-nyh@scylladb.com>	2017-06-19 18:31:32 +03:00
Avi Kivity	6e2c9ef9fb	Revert "Allow reading exactly desired byte ranges and fast_forward_to" This reverts commit `317d7fc253` (and also the related `2c57ab84b2`). It causes crashes during range scans, reported by Gleb: "To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s dataset and 3 node cluster. Backtrace: at /home/gleb/work/seastar/seastar/core/apply.hh:36 rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57 range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142 at ./seastar/core/future.hh:890 at /home/gleb/work/seastar/seastar/core/future-util.hh:119 at /home/gleb/work/seastar/seastar/core/future-util.hh:142	2017-06-18 16:10:21 +03:00
Nadav Har'El	317d7fc253	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170614072122.13473-1-nyh@scylladb.com>	2017-06-15 13:22:46 +01:00
Tomasz Grabiec	f3a6d94398	sstables: Introduce sstable::as_mutation_source() Adaptors extracted from existing testing code. Message-Id: <1495729508-30081-1-git-send-email-tgrabiec@scylladb.com>	2017-05-25 19:30:20 +03:00
Calle Wilund	66991a7ccb	v3 schema test fixes	2017-05-10 16:44:48 +00:00
Duarte Nunes	65d96421da	tests/sstable_datafile_test: Fix regression This patch fixes a regression introduced in `9e88b60`, where the wrong clustering key was being specified. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170509091621.2682-1-duarte@scylladb.com>	2017-05-09 12:18:47 +03:00
Duarte Nunes	9e88b60ef5	mutation: Set cell using clustering_key_prefix Change the clustering key argument in mutation::set_cell from exploded_clustering_prefix to clustering_key_prefix, which allows for some overall code simplification and fewer copies. This mostly affects the cql3 layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Raphael S. Carvalho	8b0e358d73	tests/sstable_test: fix release-mode compaction_manager_test in release mode, compaction task is active after submitting request because ready future may be scheduled immediately. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170502171925.9893-1-raphaelsc@scylladb.com>	2017-05-02 20:48:30 +03:00

1 2 3 4

157 Commits