scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Avi Kivity	9e67bd5aac	Merge " Add partial range deletion support" from Duarte "This series introduces partial support for range deletions. This allows deletion operations such as delete from cf where p=1 and c > 0 and c <= 3. This series only adds support for single-column range restrictions. We enforce that both range bounds be specified, because we can't represent infinite bounds in the current sstable format. Such bounds are represented as a prefix with no components, with the bound_kind informing whether they are a bottom of top bound. We're currently unable to serialize an infinite bound in such a way that it would be correctly interpreted by Cassandra 2.2.x. A serialized bound is a composite with a (<length><value><EOC>)+ format. While we could technically represent the bottom bound, the top bound, if written as a single component with 0 bytes in size and some EOC, would always sort before other values. The same would happen if represented as an empty (no components) composite, because in Cassandra 2.2.x those always have EOC = NONE. This limitation should stay in place until we can properly represent range tombstones in the storage format." * 'range-deletions/v2' of https://github.com/duarten/scylla: mutation: Set cell using clustering_key_prefix mutation_partition: Harmonize apply_delete overloads prefix_compound_view_wrapper: Add is_full and is_empty functions tests/cql_query_test: Add range deletion tests cql3: Partially support ranged deletions single_column_primary_key_restrictions: Implement has_bound() modification_statement: Use statement_restrictions for where clause statement_restrictions: Expose primary key restrictions to_string: Add missing include	2017-05-07 19:27:09 +03:00
Duarte Nunes	9e88b60ef5	mutation: Set cell using clustering_key_prefix Change the clustering key argument in mutation::set_cell from exploded_clustering_prefix to clustering_key_prefix, which allows for some overall code simplification and fewer copies. This mostly affects the cql3 layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	ef138bdd2c	tests/cql_query_test: Add range deletion tests Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Tomasz Grabiec	e71771d019	tests: mutation_source_test: Add test cases for single-key out of range reads	2017-05-04 14:59:08 +02:00
Raphael S. Carvalho	8b0e358d73	tests/sstable_test: fix release-mode compaction_manager_test in release mode, compaction task is active after submitting request because ready future may be scheduled immediately. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170502171925.9893-1-raphaelsc@scylladb.com>	2017-05-02 20:48:30 +03:00
Raphael S. Carvalho	8dfb5f9c33	tests/sstable_test: fix compaction_manager_test after 'compaction: make major compaction go through compaction manager', the test fails because task is preempted in debug mode before it reaches intruction to increase stat. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170501183255.6191-1-raphaelsc@scylladb.com>	2017-05-02 09:06:41 +03:00
Raphael S. Carvalho	687a4bb0c2	dtcs: do not compact fully expired sstable which ancestor is not deleted yet Currently, fully expired sstable[1] is unconditionally chosen for compaction by DTCS, but that may lead to a compaction loop under certain conditions. Let's consider that an almost expired sstable is compacted, and it's not deleted yet, and that the new sstable becomes expired before its ancestor is deleted. Because this new sstable is expired, it will be chosen by DTCS, but it will not be purged because 'compacted undeleted' sstables are taken into account by calculation of max purgeable timestamp and prevents expired data from being purged. The problem is that this sequence of events can keep happening forever as reported by issue #2260. NOTE: This problem was easier to reproduce before improvement on compaction of expired cells, because fully expired sstable was being converted into a sstable full of tombstones, which is also considered fully expired. Fixes #2260. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170428233554.13744-1-raphaelsc@scylladb.com>	2017-04-30 19:35:46 +03:00
Avi Kivity	248aa4fc23	Merge "Fix update of counter in static rows" from Paweł "The logic responsible for converting counter updates to counter shards was not covered by unit tests and didn't transform counter cells inside static rows. This series fixes the problem and makes sure that the tests cover both static rows and transformation logic." * tag 'pdziepak/static-counter-updates/v1' of github.com:cloudius-systems/seastar-dev: tests/counter: test transform_counter_updates_to_shards tests/counter: test static columns counters: transform static rows from updates to shards	2017-04-30 19:13:44 +03:00
Avi Kivity	339322517e	Merge "sstables: index_reader: Fix advance_to() to include relevant range tombstones" from Tomasz "Fixes #2326." * 'tgrabiec/fix-range-tombstones-missing-when-slicing' of github.com:cloudius-systems/seastar-dev: tests: mutation_source_test: Cover single-ranged queries in test_streamed_mutation_slicing_returns_only_relevant_tombstones() tests: mutation_source_test: Add test for slicing of clustered rows tests: mutation_reader_assertions: Log expectations tests: mutation_reader_assertions: Add produces_eos_or_empty_mutation() tests: sstables: Use read_row() for single-key reads tests: sstables: Test more configutaions of sstable writer in test_sstable_conforms_to_mutation_source() sstables: Improve logging sstables: index_reader: Fix advance_to() to include relevant range tombstones	2017-04-30 14:40:41 +03:00
Avi Kivity	831ee80c3c	tests: workaround older boost::apply_visitor requiring a result_type member Older versions of boost::apply_visitor require a result_type member for the visitor; supply it to make them happy. Fixes #2312.	2017-04-30 13:56:44 +03:00
Paweł Dziepak	5c302cf67b	tests/counter: test transform_counter_updates_to_shards	2017-04-28 16:29:34 +01:00
Paweł Dziepak	0473750056	tests/counter: test static columns	2017-04-28 16:29:34 +01:00
Tomasz Grabiec	d4df6e214e	tests: mutation_source_test: Cover single-ranged queries in test_streamed_mutation_slicing_returns_only_relevant_tombstones()	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	22cce52dff	tests: mutation_source_test: Add test for slicing of clustered rows	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	86b693f562	tests: mutation_reader_assertions: Log expectations	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	ece6e107cc	tests: mutation_reader_assertions: Add produces_eos_or_empty_mutation()	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	6354acc1a2	tests: sstables: Use read_row() for single-key reads So that as_mutation_reader() will create the same kind of reader which database::make_sstable_reader() does. Before this change, all readers were range readers.	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	fd5dbe04b5	tests: sstables: Test more configutaions of sstable writer in test_sstable_conforms_to_mutation_source() Test different versions of the format, and different promoted index block sizes. The size of 1 is especially important, it will put each fragment in a separate block, exposing various issues with promoted index handling.	2017-04-27 18:43:49 +02:00
Duarte Nunes	d45596ae8e	sstables: Read and write shadowable tombstones This patch serializes shadowable tombstones to sstables by adding a new, incompatible atom's mask. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Tomasz Grabiec	f3609fc813	tests: log_historgram_test: Fix compiation on Ubuntu Some gcc versions incorrectly complain: tests/log_histogram_test.cc:87:22: error: ‘opts1’ is not a valid template argument for type ‘const log_histogram_options&’ because object ‘opts1’ has not external linkage size_t hist_key<node<opts1>>(const node<opts1>& n) { return n.v; } Apparently this is a bug in gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52036 Fixes #2307. Message-Id: <1493108791-11247-1-git-send-email-tgrabiec@scylladb.com>	2017-04-25 12:15:28 +03:00
Pekka Enberg	940c3f4330	Merge "Clang fixes (part 2)" from Avi "This series fixes some more errors found by clang, with the aim of enabling clang/zapcc as a supported compiler. A single issue remains, but it's probably in std::experimental::optional::swap(); not in our code." * tag 'clang/2/v1' of https://github.com/avikivity/scylla: sstable_test: avoid passing negative non-type template arguments to unsigned parameters UUID: add more comparison operators sstable_datafile_test: avoid string_view user-defined literal conversion operator mutation_source_test: avoid template function without template keyword cql_query_test: define static variable cql_query_test: add braces for single-item collection initializers storage_service: don't use typeid(temporary) logalloc: remove unused max_occupancy_for_compaction storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic storage_proxy: drop unused member access from return value storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare read_repair_decision: fix operator<<(std::ostream&, ...)	2017-04-24 20:32:16 +03:00
Avi Kivity	6d9e18fd61	logalloc: reduce descriptor overhead Every lsa-allocated object is prefixed by a header that contains information needed to free or migrate it. This includes its size (for freeing) and an 8-byte migrator (for migrating). Together with some flags, the overhead is 14 bytes (16 bytes if the default alignment is used). This patch reduces the header size to 1 byte (8 bytes if the default alignment is used). It uses the following techniques: - ULEB128-like encoding (actually more like ULEB64) so a live object's header can typically be stored using 1 byte - indirection, so that migrators can be encoded in a small index pointing to a migrator table, rather than using an 8-byte pointer; this exploits the fact that only a small number of types are stored in LSA - moving the responsibility for determining an object's size to its migrator, rather than storing it in the header; this exploits the fact that the migrator stores type information, and object size is in fact information about the type The patch improves the results of memory_footprint_test as following: Before: - in cache: 976 - in memtable: 947 After: mutation footprint: - in cache: 880 - in memtable: 858 A reduction of about 10%. Further reductions are possible by reducing the alignment of lsa objects. logalloc_test was adjusted to free more objects, since with the lower footprint, rounding errors (to full segments) are different and caused false errors to be detected. Missing: adjustments to scylla-gdb.py; will be done after we agree on the new descriptor's format.	2017-04-24 12:23:12 +02:00
Duarte Nunes	cddf2f4d74	tests: Fix failure virtual_reader_test This patch fixes a failure of virtual_reader_test, where both the test itself and the cql_test_env initialize the messaging_service to listen on the same address and port, triggering an assert in posix_ap_server_socket_impl::accept(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170423104240.21275-1-duarte@scylladb.com>	2017-04-23 14:06:35 +03:00
Avi Kivity	566c094764	sstable_test: avoid passing negative non-type template arguments to unsigned parameters Clang complains. The test looks somewhat bogus, but that's for another patch.	2017-04-22 22:13:55 +03:00
Avi Kivity	5424aca745	sstable_datafile_test: avoid string_view user-defined literal conversion operator Clang doesn't like it, perhaps because it isn't in the std namespace (it's still in std::experimental).	2017-04-22 22:11:30 +03:00
Avi Kivity	705ac957a2	mutation_source_test: avoid template function without template keyword This isn't (yet?) standard C++, and clang rejects it.	2017-04-22 22:10:21 +03:00
Avi Kivity	551fb03476	cql_query_test: define static variable single_node_cql_env is declared but not defined; define it to make clang happy.	2017-04-22 22:01:44 +03:00
Avi Kivity	eb700752d8	cql_query_test: add braces for single-item collection initializers Clang complains that braces are missing; I didn't verify it but I'm sure it's right. Add braces to make it happy.	2017-04-22 22:00:49 +03:00
Raphael S. Carvalho	4a86dd473d	tests: add tests/sstable_resharding_test.cc Forgot to add file after resolving conflict. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170422172053.3734-1-raphaelsc@scylladb.com>	2017-04-22 21:09:29 +03:00
Benoît Canet	f68049ef5d	tests: Fix clang auto universal reference type deduction Replace it by regular template type deduction. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170421204150.4626-2-benoit@scylladb.com>	2017-04-22 20:04:00 +03:00
Benoit Canet	b902f3b81b	tests: Remove parenthesis in variable declaration Prevent clang compilation of this tests. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170421204150.4626-1-benoit@scylladb.com>	2017-04-22 20:04:00 +03:00
Raphael S. Carvalho	8a37b279ed	tests: add test for new sstable resharding Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:34 -03:00
Raphael S. Carvalho	d82a8dfae0	lcs: restore invariant instead of sending overlapping sst to L0 A large token span sstable may find its way into high level due to resharding, which means the strategy invariant is broken. The invariant is restored by compacting first set of overlapping sstables, meaning that the restoration is done incrementally for multiple overlapping sets. Invariant is restored by regular compaction after resharding puts new unshared sstables into their original level, where level > 0. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:09 -03:00
Avi Kivity	fccbf2c51f	Merge "Reduce memory reclamation latency" from Tomasz "Currently eviction is performed until occupancy of the whole region drops below the 85% threshold. This may take a while if region had high occupancy and is large. We could improve the situation by only evicting until occupancy of the sparsest segment drops below the threshold, as is done by this change. I tested this using a c-s read workload in which the condition triggers in the cache region, with 1G per shard: lsa-timing - Reclamation cycle took 12.934 us. lsa-timing - Reclamation cycle took 47.771 us. lsa-timing - Reclamation cycle took 125.946 us. lsa-timing - Reclamation cycle took 144356 us. lsa-timing - Reclamation cycle took 655.765 us. lsa-timing - Reclamation cycle took 693.418 us. lsa-timing - Reclamation cycle took 509.869 us. lsa-timing - Reclamation cycle took 1139.15 us. The 144ms pause is when large eviction is necessary. Statistics for reclamation pauses for a read workload over larger-than-memory data set: Before: avg = 865.796362 stdev = 10253.498038 min = 93.891000 max = 264078.000000 sum = 574022.988000 samples = 663 After: avg = 513.685650 stdev = 275.270157 min = 212.286000 max = 1089.670000 sum = 340573.586000 samples = 663 Refs #1634." * tag 'tgrabiec/lsa-reduce-reclaim-latency-v3' of github.com:cloudius-systems/seastar-dev: lsa: Reduce reclamation latency tests: Add test for log_histogram log_histogram: Allow non-power-of-two minimum values lsa: Use regular compaction threshold in on-idle compaction tests: row_cache_test: Induce update failure more reliably lsa: Add getter for region's eviction function	2017-04-21 17:47:06 +03:00
Tomasz Grabiec	20f4c9bf23	lsa: Reduce reclamation latency Currently eviction is performed until occupancy of the whole region drops below the 85% threshold. This may take a while if region had high occupancy and is large. We could improve the situation by only evicting until occupancy of the sparsest segment drops below the threshold, as is done by this change. I tested this using a c-s read workload in which the condition triggers in the cache region, with 1G per shard: lsa-timing - Reclamation cycle took 12.934 us. lsa-timing - Reclamation cycle took 47.771 us. lsa-timing - Reclamation cycle took 125.946 us. lsa-timing - Reclamation cycle took 144356 us. lsa-timing - Reclamation cycle took 655.765 us. lsa-timing - Reclamation cycle took 693.418 us. lsa-timing - Reclamation cycle took 509.869 us. lsa-timing - Reclamation cycle took 1139.15 us. The 144ms pause is when large eviction is necessary. Statistics for reclamation pauses for a read workload over larger-than-memory data set: Before: avg = 865.796362 stdev = 10253.498038 min = 93.891000 max = 264078.000000 sum = 574022.988000 samples = 663 After: avg = 513.685650 stdev = 275.270157 min = 212.286000 max = 1089.670000 sum = 340573.586000 samples = 663 Refs #1634. Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>	2017-04-21 12:52:31 +02:00
Tomasz Grabiec	4313641c03	tests: Add test for log_histogram	2017-04-21 12:52:31 +02:00
Tomasz Grabiec	e054ccc037	tests: row_cache_test: Induce update failure more reliably After changing region evicitability condition to be less strict, cache update stopped failing because reclamation was able to compact dense region. Induce failure by installing evictor which refuses to evict from cache beyond few elements.	2017-04-20 14:51:47 +02:00
Tomasz Grabiec	4ed7e529db	sstables: Move binary_search() to a header There are instantiations of binary_search() used in sstables.cc, but defined in partition.cc. The instantiations are explicitly declared in partition.cc, but the types changed and they became obsolete. The thing worked because partition.cc also instantiated it with the right type. But after that code will be removed, it no longer would, and we would get a linker error. To avoid such problems, define binary_search() in a header.	2017-04-20 10:54:38 +02:00
Tomasz Grabiec	7dc3fe7d3f	tests: perf_fast_forward: Add test case for forwarding with clustering restrictions in a large partition	2017-04-20 10:54:36 +02:00
Tomasz Grabiec	eed864690b	tests: perf_fast_forward: Add test case for slicing of large partition using a single-partition reader	2017-04-20 10:54:36 +02:00
Tomasz Grabiec	81fc7977a4	tests: perf_fast_forward: Add test for selecting few rows from large partition	2017-04-20 10:54:36 +02:00
Tomasz Grabiec	02da3ba316	tests: perf_fast_forward: Fix use-after-free in scan_with_stride_partitions() partition_range must live as long as the reader is used.	2017-04-19 08:37:56 +02:00
Raphael S. Carvalho	11b74050a1	partitioned_sstable_set: fix quadratic space complexity streaming generates lots of small sstables with large token range, which triggers O(N^2) in space in interval map. level 0 sstables will now be stored in a structure that has O(N) in space complexity and which will be included for every read. Fixes #2287. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170417185509.6633-1-raphaelsc@scylladb.com>	2017-04-18 13:04:38 +03:00
Benoît Canet	8f793905a3	perf_sstable: Change busy loop to futurized loop The blocked task detector introduced in `113ed9e963` was seeing the initialization phase of perf_ssttable as a blocked task. Tranform this part of the code in a futurized loop to make to blocked task detector happy. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170413132506.17806-1-benoit@scylladb.com>	2017-04-13 18:17:28 +03:00
Raphael S. Carvalho	a6f8f4fe24	compaction: do not write expired cell as dead cell if it can be purged right away When compacting a fully expired sstable, we're not allowing that sstable to be purged because expired cell is unconditionally converted into a dead cell. Why not check if the expired cell can be purged instead using gc before and max purgeable timestamp? Currently, we need two compactions to get rid of a fully expired sstable which cells could have always been purged. look at this sstable with expired cell: { "partition" : { "key" : [ "2" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 120, "liveness_info" : { "tstamp" : "2017-04-09T17:07:12.702597Z", "ttl" : 20, "expires_at" : "2017-04-09T17:07:32Z", "expired" : true }, "cells" : [ { "name" : "country", "value" : "1" }, ] now this sstable data after first compaction: [shard 0] compaction - Compacted 1 sstables to [...]. 120 bytes to 79 (~65% of original) in 229ms = 0.000328997MB/s. { ... "rows" : [ { "type" : "row", "position" : 79, "cells" : [ { "name" : "country", "deletion_info" : { "local_delete_time" : "2017-04-09T17:07:12Z" }, "tstamp" : "2017-04-09T17:07:12.702597Z" }, ] now another compaction will actually get rid of data: compaction - Compacted 1 sstables to []. 79 bytes to 0 (~0% of original) in 1ms = 0MB/s. ~2 total partitions merged to 0 NOTE: It's a waste of time to wait for second compaction because the expired cell could have been purged at first compaction because it satisfied gc_before and max purgeable timestamp. Fixes #2249, #2253 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170413001049.9663-1-raphaelsc@scylladb.com>	2017-04-13 10:59:19 +03:00
Avi Kivity	5b530aa464	Merge "Use promoted index for skipping in sstable mutation readers" from Tomasz "sstable_streamed_mutation::fast_forward_to() is changed to use promoted index (via index_reader) to optimize skipping in large partitions. In addition to that, sstable mutation_reader is changed to use the index to skip to the next partition. Performance impact was evaluated using newly added tests/perf/perf_fast_forward What's beyond this series: - Using index_reader for single-partition reads as well - Using index_reader for skipping across ranges in clustering restrictions" * tag 'tgrabiec/skip-within-partition-using-index-v2' of github.com:cloudius-systems/seastar-dev: (47 commits) tests: Add performance test for fast forwarding of sstable readers tests: Allow starting cql_test_env on pre-existing data config: Allow specifying source when setting value tests: sstable: Add test for fast forwarding within partition using index sstables: sstable_streamed_mutation: use index in fast_forward_to() sstables: Store parsed promoted index in index_entry sstables: Add trace-level logging for sstable consumption sstables: Define deletion_time earlier sstables: Make parsing throw exception on malformed promoted index tests: Add tests for ordering of position_in_partition relative to composites position_range: Introduce all_clustered_rows() factory method position_in_partition: Introduce for_key()/after_key() factory methods position_in_partition: Add factory methods for positions around all rows position_in_partition: Introduce for_range_start()/for_range_end() position_in_partition: Fix friendship declaration keys: Introduce is_empty() for prefixes position_in_partition: Make comparable with composites types: Enhance lexicographical comparators compound_compat: Accept marker value in serialize_value() compound_compat: Add trichotomic comparator ...	2017-03-29 19:01:12 +03:00
Raphael S. Carvalho	023031b0c8	compaction: lcs: fix functionality to feed starved levels quick introduction to level starvation: high levels may be left uncompacted (thus starved) for a long time if user makes something that make they contain little data, such as cleanup or change of max sstable size (default 160M). Leveled strategy handles this problem as follow: consider we're compacting L1 to L2. If L3 is starved, we look for one of its sstable that is fully contained in token range of candidates L1->L2, so that we won't end up with an overlapping in L2. now the problem: the functionality isn't working properly now because range of candidates is being incorrectly calculated due to an accident when converting the code to C++. It won't cause an overlap because it's actually being more restrictive about which sstable from starved level can be used. A test case was added to confirm the problem. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170328223753.15398-1-raphaelsc@scylladb.com>	2017-03-29 18:59:46 +03:00
Tomasz Grabiec	7fd724821b	tests: Add performance test for fast forwarding of sstable readers	2017-03-28 18:34:55 +02:00

1 2 3 4 5 ...

1396 Commits