scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Avi Kivity	672de608bf	tests: fix call to seastar::sleep() It's not in the global namespace.	2017-06-22 18:16:13 +03:00
Raphael S. Carvalho	4bb27cbd6f	lcs: actually prefer oldest sstables of L0 when it falls behind Strategy prefers promoting oldest sstables in L0. Because sort procedure is incorrectly sorting elements in descending order, newest sstables will be promoted first if and only if L0 falls behind (more than 32 sstables). If L0 doesn't fall behind, we'll have all L0 sstables compacted with overlapping ones in L1. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-19 20:45:39 -03:00
Nadav Har'El	3018df11b5	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170619152629.11703-1-nyh@scylladb.com>	2017-06-19 18:31:32 +03:00
Avi Kivity	6e2c9ef9fb	Revert "Allow reading exactly desired byte ranges and fast_forward_to" This reverts commit `317d7fc253` (and also the related `2c57ab84b2`). It causes crashes during range scans, reported by Gleb: "To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s dataset and 3 node cluster. Backtrace: at /home/gleb/work/seastar/seastar/core/apply.hh:36 rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57 range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142 at ./seastar/core/future.hh:890 at /home/gleb/work/seastar/seastar/core/future-util.hh:119 at /home/gleb/work/seastar/seastar/core/future-util.hh:142	2017-06-18 16:10:21 +03:00
Calle Wilund	3464422051	commitlog_test: Fix reader test dropping rp handles Test wants data in live segments to read from, so should not just drop the handles returned from allocate. Message-Id: <1497344532-2616-1-git-send-email-calle@scylladb.com>	2017-06-16 22:45:46 +01:00
Etienne Kruger	be0a947596	tests: perf_simple_query: Add delete perf test Add a performance test for deletion in addition to the existing update and query tests. The deletion performance test is executed using the '--delete' argument to perf_simple_query. Fixes #2417. Signed-off-by: Etienne Kruger <el@loadavg.io> Message-Id: <20170615232500.26987-1-el@loadavg.io>	2017-06-16 14:51:00 +01:00
Nadav Har'El	317d7fc253	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170614072122.13473-1-nyh@scylladb.com>	2017-06-15 13:22:46 +01:00
Duarte Nunes	5736468a71	mutation_partition_serializer: Assume range tombstone support Range tombstones were introduced in version 1.3 and there exists no direct upgrade from 1.2 to vnext, so we can retire the code enforcing backwards compatibility. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170614211654.82501-1-duarte@scylladb.com>	2017-06-15 09:54:05 +03:00
Avi Kivity	e11f1c9cc3	tests: fix partitioner_test build on gcc 5	2017-06-14 17:22:01 +03:00
Avi Kivity	419ad9d6cb	Merge "repair memory usage fix" from Asias "This series switches repair to use more stream plans to stream the mismatched sub ranges and use a range generator to produce sub ranges. Test shows no huge memory is used for repair with large data set. In addition, we now have a progress reporter in the log how many ranges are processed. Jun 06 14:18:22 [shard 0] repair - Repair 512 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942] Jun 06 14:19:55 [shard 0] repair - Repair 513 out of 529 ranges, id=1, keyspace=myks, cf=mytable, range=(8526136029525195375, 8549482295083869942] Fixes #2430." * tag 'asias/fix-repair-2430-branch-master-v1' of github.com:cloudius-systems/seastar-dev: repair: Remove unused sub_ranges_max repair: Reduce parallelism in repair_ranges repair: Tweak the log a bit repair: Use more stream_plan repair: iterator over subranges instead of list	2017-06-08 14:19:08 +03:00
Calle Wilund	2913241df1	memtable/commitlog: Change bookkeep to track individul segments Use per CF-id reference count instead, and use handles as result of add operations. These must either be explicitly released or stored (rp_set), or they will release the corresponding replay_position upon destruction. Note: this does _not_ remove the replay positioning ordering requirement for mutations. It just removes it as a means to track segment liveness.	2017-06-07 12:07:01 +00:00
Calle Wilund	0c598e5645	commitlog_test: Fix test_commitlog_delete_when_over_disk_limit Test should a.) Wait for the flush semaphore b.) Only compare segement sets between start and end, not start, end and inbetwen. I.e. the test sort of assumed we started with < 2 (or so) segments. Not always the case (timing) Message-Id: <1496828317-14375-1-git-send-email-calle@scylladb.com>	2017-06-07 12:44:02 +03:00
Nadav Har'El	b3ff37e67f	repair: iterator over subranges instead of list When starting repair, we divided the large token ranges (vnodes) linto small subranges of a desired length (around 100 partition), and built a huge list of those subranges - to iterate over them later and compare checksums of those chunks. However, building this list up-front is completely unnecessary, and wastes a lot of memory: In a test with 1 TB of data, as much as 3 gigabytes was spent on this list. Instead, what we do in this patch is to find the next chunk in a DFS-like splitting algorithm, using only the token range midpoint() function (as before). The amount of memory needed for this is O(logN), instead of O(N) in the previous implementation. Refs #2430. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2017-06-07 08:50:56 +08:00
Vlad Zolotarov	0619c2cb71	utils::serialization: remove not used deserialization_xxx() functions Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495556124-16672-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:20 +03:00
Tomasz Grabiec	f3a6d94398	sstables: Introduce sstable::as_mutation_source() Adaptors extracted from existing testing code. Message-Id: <1495729508-30081-1-git-send-email-tgrabiec@scylladb.com>	2017-05-25 19:30:20 +03:00
Avi Kivity	fd0e1eb1e2	Merge "Fixes for mutation algebra" from Tomasz "Enforces commutativity of addition: m1 + m2 == m2 + m1 and consistency of difference and addition with equality: m1 + (m2 - m1) == m1 + m2" * tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev: mutation: Make compare_*_for_merge() consistent with equals() tests: mutation: Improve assertion failure message tests: Use default equality in test_mutation_diff_with_random_generator mutation: Make counter cell difference consistent with apply tests: range_tombstone_list_test: Improve error message tests: range_tombstone_list: Check adjacent range merging range_tombstone_list: Merge adjacent range tombstones in apply() tests: mutation: Check commutativity of mutation addition range_tombstone_list: Avoid violating set invariant range_tombstone_list: Make tombstone merging commutative range_tombstone_list: Add erase() operation to the reverter range_tombstone_list: Make all undo operations ordered relative to each other utils: Extract to_boost_visitor() to a separate header allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-23 15:20:38 +03:00
Tomasz Grabiec	804f46f684	mutation: Make compare_*_for_merge() consistent with equals() equals() considers expiring cells to be different form non-expiring cells, but compare_row_marker_for_merge() considers them equal. Fix the latter to pick expiring cells. The choice was arbitrary.	2017-05-23 13:35:03 +02:00
Tomasz Grabiec	c1475a8eb2	tests: mutation: Improve assertion failure message	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	d15880b3b7	tests: Use default equality in test_mutation_diff_with_random_generator	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	951da421db	tests: range_tombstone_list_test: Improve error message	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	bee40b4628	tests: range_tombstone_list: Check adjacent range merging	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	3c509308ab	range_tombstone_list: Merge adjacent range tombstones in apply() Needed for equivalence to work correctly with difference and addition: m1 + (m2 - m1) = m1 + m2 Fixes #2158.	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	ef4c7c458c	tests: mutation: Check commutativity of mutation addition	2017-05-23 12:11:12 +02:00
Tomasz Grabiec	5aeb9eb70c	utils: Extract to_boost_visitor() to a separate header	2017-05-22 19:30:02 +02:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Avi Kivity	c8cb3d6ff5	Merge "Materialized views: bug fixes and unit tests" from Duarte "This series fixes bugs related to materialized views, most pertaining to column filtering in the where clause." * 'materialized-views/bug-fixes/v1' of https://github.com/duarten/scylla: tests/view_schema_test: Add more test cases tests/cql_assertions: Add assertion for row set equality single_column_relation: Correctly print IN relation statement_restrictions: Allow filtering regular columns for views statement_restrictions: Relax clustering restrictions for views statement_restrictions: Relax partition restrictions for views cql3/statements: Prevent setting default ttl on view cql3/restrictions: Complete implementation of is_satisfied_by() db/view: Re-implement clustering_prefix_matches() db/view: Re-implement partition_key_matches() db/view: Generate regular tombstone for base deletions db/view: Consider cell liveness when generating updates db/view: Don't generate view updates for static rows	2017-05-20 13:52:56 +03:00
Avi Kivity	ba31619594	tests: fix partitioner_test for g++ 5 It can't make the leap from dht::ring_position to stdx::optional<range_bound<dht::ring_position>> for some reason.	2017-05-18 13:09:41 +03:00
Avi Kivity	2aa5b3e20c	Merge "Improve perf_fast_forward test" from Tomasz "Notably: - add validation of the results (e.g. fragment count, expectations about disk activity) - add cache-specific tests" * 'tgrabiec/add-cache-tests-to-perf-fast-forward' of github.com:cloudius-systems/seastar-dev: tests: perf_fast_forward: Report cache stats row_cache: Keep counters in a struct tests: perf_fast_forward: Add cache-specific tests tests: perf_fast_forward: Extract test_reading_all() tests: perf_fast_forward: Add validation of the results tests: perf_fast_forward: Fix partition scans to read the expected amount of fragments tests: perf_fast_forward: Allow the test to be interrupted tests: perf_fast_forward: Allow testing with cache enabled row_cache: Implement mutation_reader::fast_forward_to() for cache scanner	2017-05-17 18:06:02 +03:00
Paweł Dziepak	3ecceaee48	Merge "Fix fast_forward_to() on sstable reader being ignored in some cases" from Tomasz "When mutation reader enters the partition using index, streamed_mutation object is returned to the user before the row start fragment is processed. In that case, when we process the row start, we should ignore it and not call setup_for_partition() again. That may override user's fast_forward_to() request." * 'tgrabiec/fix-initial-fast-forward-to-for-single-key-sstable-readers' of github.com:scylladb/seastar-dev: tests: mutation_source_test: Test forwarding in single-key readers sstables: Remove unused code sstables: mutation_reader: Fix setup_for_partition() being called twice in some cases sstables: Fix verify_end_state() to tolerate ATOM_START_2 state	2017-05-17 15:35:30 +01:00
Tomasz Grabiec	777ffa3a27	tests: perf_fast_forward: Report cache stats	2017-05-17 14:15:14 +02:00
Tomasz Grabiec	7a81f5e980	tests: perf_fast_forward: Add cache-specific tests	2017-05-17 14:15:14 +02:00
Tomasz Grabiec	1a7b03004a	tests: perf_fast_forward: Extract test_reading_all()	2017-05-17 14:15:14 +02:00
Tomasz Grabiec	a38fd16f89	tests: perf_fast_forward: Add validation of the results	2017-05-17 14:15:14 +02:00
Tomasz Grabiec	3c3ea51657	tests: perf_fast_forward: Fix partition scans to read the expected amount of fragments make_pkeys() needs to be invoked with n equal to the number of keys which the table was populated with. Otherwise the extra keys, which are missing in the table, may be placed anywhere in the vector due to ring order sorting, and break the assumption that the table contains all keys from the array up to index n. This resulted in the test reading slighlty less fragments than it would follow from the desired count. Another problem is that we should not skip the fast_forward_to() call for the inital range (workaround for a bug in sstable mutation reader), otherwise we will read slightly less than expected as well.	2017-05-17 14:15:14 +02:00
Tomasz Grabiec	49a0bc3847	tests: perf_fast_forward: Allow the test to be interrupted	2017-05-17 14:15:14 +02:00
Tomasz Grabiec	5c7f5643a6	tests: perf_fast_forward: Allow testing with cache enabled	2017-05-17 14:15:14 +02:00
Avi Kivity	44a1a51987	tests: add tests for dht::split_range_to_single_shard()	2017-05-17 13:50:30 +03:00
Avi Kivity	6eb6f12909	tests: add test for ring_position_exponential_sharder	2017-05-17 13:18:52 +03:00
Avi Kivity	025c6b45b2	dht: extend i_partitioner::next_token_for_shard() Right now, next_token_for_shard() only allows iterating linearly in shard order. Add the ability to select a specific shard to skip to (in case we're only interested in a single shard), and to select larger ranges (so that exponential increases are not implemented by iteration).	2017-05-17 12:30:03 +03:00
Duarte Nunes	ef252036ba	tests/view_schema_test: Add more test cases Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 11:21:58 +02:00
Duarte Nunes	0861a66853	tests/cql_assertions: Add assertion for row set equality For row set equality, the order of the actual rows doesn't need to match the order of the expected rows. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:19 +02:00
Duarte Nunes	f365b7f1f7	tests: Add test case for nonwrapping_range::intersection() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-17 10:33:18 +02:00
Avi Kivity	f5dae826ce	Merge "Migrate schema tables to v3 format" from Calle "Defines origin v3-format for system/schema tables, and use them for schema storage/retrival. Includes a legacy_schema_migrator implementation/port from origin. Note that since we don't support features like triggers, functions and aggregates, it will bail if encountering such a feature used. Note also that this patch set does not convert the "hints" and "backlog" tables, even though these have changed in v3 as well. That will be a separate patch set. Tested against dtests. Note that patches for dtest + ccm will follow." * 'calle/systemtables' of github.com:cloudius-systems/seastar-dev: (36 commits) legacy_schema_migrator: Actually truncate legacy schema tables on finish database: Extract "remove" from "drop_columnfamily" v3 schema test fixes thrift: Update CQL mapping of static CFs schema_tables: Use v3 schema tables and formats type_parser: Origin expects empty string -> bytes_type cf_prop_defs: Add crc_check_chance as recognized (even if we don't use) types_test: v3 style schemas enforce explicit "frozen" in tupes/ut:s cql3_type: v3 to_string cql_types: Introduce cql3_type::empty and associate with empty data_type schema: rename column accessors to be in line with origin schema: Add "is_static_compact_table" schema_builder: Add helper to generate unique column names akin origin schema: Add utility functions for static columns schema: Use heterogeneous comparator for columns bounds cql3_type_parser: Resolve from cql3 names/expressions cql3_type: Add "prepare_interal" and "references_user_type" cql3::cql3_type: Add prepare_internal path using only "local" holders cql3_type: Add virtual destructor. database/main: encapsulate system CF dir touching ...	2017-05-17 11:25:52 +03:00
Vlad Zolotarov	494ea82a88	utils::UUID: align the UUID serialization API with the similar API of other classes in the project The standard serialization API (e.g. in data_value) includes the following methods: size_t serialized_size() const; void serialize(bytes::iterator& it) const; bytes serialize() const; Align the utils::UUID API with the pattern above. The only addition is that we are going to make an output iterator parameter of a second method above a template so that we may serialize into different output sources. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Vlad Zolotarov	7706775a63	utils: serialization: unify the variety of serialize_XXX(...) Use the same templated implementation for all different serialize_XXX(...). The chosen implementation is based on the std::copy_n(char*, size, OutputIterator), which is heavily optimized and will be using memcpy/memmove where possible. This patch also removes the not needed specializations that accept signed integer values since we were casting them to unsigned value anyway. The std::ostream based specifications are also removed since they are not used anywhere except for a test-serialization.cc and adjusting the ostream to the iterator is a single-liner. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Tomasz Grabiec	bdf3c536aa	tests: mutation_source_test: Test forwarding in single-key readers	2017-05-16 13:36:10 +02:00
Duarte Nunes	a69039df03	tests/batchlog_manager_test: Fix failure Since `a9f6e5f8da`, metrics can't be duplicated. This patch works around that by avoiding to create a new batchlog_manager (one is already created by the cql_test_env). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170510191047.6154-1-duarte@scylladb.com>	2017-05-11 08:28:08 +02:00
Calle Wilund	66991a7ccb	v3 schema test fixes	2017-05-10 16:44:48 +00:00
Calle Wilund	3d90152dc5	types_test: v3 style schemas enforce explicit "frozen" in tupes/ut:s	2017-05-10 16:44:48 +00:00
Calle Wilund	0e6ae8dec2	schema: rename column accessors to be in line with origin More pointedly: Expose columns as is (currently all_columns_in_select_order), expose name->column mapping more appropriately named. Renaming like this is not strictly neccesary, but there is a point to trying to keep nomenclature similar-ish with origin, esp. when select order column need to become filtered (spoiler alert).	2017-05-10 16:44:48 +00:00

1 2 3 4 5 ...

1447 Commits