scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 13:45:53 +00:00

Author	SHA1	Message	Date
Piotr Jastrzebski	7729bc5e7b	Remove unused mutation_reader_assertions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Avi Kivity	c743d1258d	Merge "Reverse order of version merging in MVCC" from Tomasz "Changes merging in MVCC to apply newer version to older instead of older to newer. Before (v0 = oldest): (((v3 + v2) + v1) + v0) After: (v0 + (v1 + (v2 + v3))) or: (((v0 + v1) + v2) + v3) There are several reasons to do this: 1) When continuity merging will change semantics to support eviction from older versions, it will be easier to implement apply() if we can assume that we merge newer to older instead of older to newer, since newer version may have entries falling into a continuous interval in older, but not the other way around. If we didn't revert the order, apply() would have to keep track of lower bound of a continuous interval in the right-hand side argument (older version) as it is applied and update continuity flags in the left hand side by scanning all entries overlapping with it. If order is reversed, merging only needs to deal with the current entry. Also, if we were to keep the old order, we cannot simply move entries from the left hand side as we merge because we need to keep track of the lower bound of a continuous interval, and we need to provide monotonic exception guarantees. So merging would be both more complicated and slower. 2) With large partitions older versions are typically larger than newer versions, and since merging is O(N_right(1 + log(N_left))), it's better to merge newer into older. This fixes latency spikes seen in perf_cache_eviction. Fixes #2715." tag 'tgrabiec/reverse-order-of-mvcc-version-merging-v1' of github.com:scylladb/seastar-dev: mvcc: Reverse order of version merging anchorless_list: Introduce last() mvcc: Implement partition_entry::upgrade() using squashed() mvcc: Extract version merging functions mutation_partition: Add rows_entry::set_dummy() position_in_partition: Introduce after_key()	2018-01-21 13:56:57 +02:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Tomasz Grabiec	60d3c25c02	mvcc: Reverse order of version merging Change merging to apply newer version to older instead of older to newer. Before: (((v3 + v2) + v1) + v0) After: (v0 + (v1 + (v2 + v3))) or equivalent: (((v0 + v1) + v2) + v3) There are several reasons to do this: 1) When continuity merging will change semantics to support eviction from older versions, it will be easier to implement apply() if we can assume that we merge newer to older instead of older to newer, since newer version may have entries falling into a continuous interval in older, but not the other way around. If we didn't revert the order, apply() would have to keep track of lower bound of a continuous interval in the right-hand side argument (older version) as it is applied and update continuity flags in the left hand side by scanning all entries overlapping with it. If order is reversed, merging only needs to deal with the current entry. Also, if we were to keep the old order, we cannot simply move entries from the left hand side as we merge because we need to keep track of the lower bound of a continuous interval, and we need to provide monotonic exception guarantees. So merging would be both more complicated and slower. 2) With large partitions older versions are typically larger than newer versions, and since merging is O(N_right*(1 + log(N_left))), it's better to merge newer into older. Fixes #2715.	2018-01-18 13:52:08 +01:00
Piotr Jastrzebski	dc75df6353	Stop using memtable::make_reader in mutation_test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 11:47:07 +01:00
Paweł Dziepak	8b3c3fc832	db: make column_family::make_reader() return flat reader	2017-12-13 12:01:03 +00:00
Avi Kivity	4cfcd8055e	Merge "Drop reversible apply() from mutation_partition" from Tomasz "This simplifies implementation of mutation_partition merging by relaxing exception guarantees it needs to provide. This allows reverters to be dropped. Direct motivation for this is to make it easier to implement new semantics for merging of clustering range continuity. Implementation details: We only need strong exception guarantees when applying to the memtable, which is using MVCC. Instead of calling apply() with strong exception guarantees on the latest version, we will move the incoming mutation to a new partition_version and then use monotonic apply() to merge them. If that merging fails, we attach the version with the remainder, which cannot fail. This way apply() always succeeds if the allocation of partition_version object succeeds. Results of `perf_simple_query_g -c1 -m1G --write` (high overwrite rate): Before: 101011.13 tps 102498.07 tps 103174.68 tps 102879.55 tps 103524.48 tps 102794.56 tps 103565.11 tps 103018.51 tps 103494.37 tps 102375.81 tps 103361.65 tps After: 101785.37 tps 101366.19 tps 103532.26 tps 100834.83 tps 100552.11 tps 100891.31 tps 101752.06 tps 101532.00 tps 100612.06 tps 102750.62 tps 100889.16 tps Fixes #2012." * tag 'tgrabiec/drop-reversible-apply-v1' of github.com:scylladb/seastar-dev: mutation_partition: Drop apply_reversibly() mutation_partition: Relax exception guarantees of apply() mutation_partition: Introduce apply_weak() tests: mvcc: Add test for atomicity of partition_entry::apply() tests: Move failure_injecting_allocation_strategy to a header tests: mutation_partition: Test exception guarantees of apply_monotonically() mvcc: Use apply_monotonically() where sufficient mvcc: partition_version: Use apply_monotonically() to provide atomicity mvcc: Extract partition_entry::add_version() mutation_partition: Introduce apply_monotonically() mutation_partition: Introduce row::consume_with()	2017-11-28 16:35:06 +02:00
Tomasz Grabiec	091e10fc70	mutation_partition: Relax exception guarantees of apply() The uses which needed strong or weak exception guarantees were switched to a solution involving apply_monotonically(). All remaining uses don't need any exception guarantees.	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	e5532bd644	tests: Move failure_injecting_allocation_strategy to a header	2017-11-28 12:38:28 +01:00
Tomasz Grabiec	1b5f2b0473	tests: mutation_partition: Test exception guarantees of apply_monotonically()	2017-11-28 12:38:28 +01:00
Jesse Haber-Kucharsky	fb0866ca20	Move `thread_local` declarations out of `main.cc` Since `disk-error-handler.hh` defines these global variables `extern`, it makes sense to declare them in the `disk-error-handler.cc` instead of `main.cc`. This means that test files don't have to declare them. Fixes #2735. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <1eed120bfd9bb3647e03fe05b60c871de2df2a86.1511810004.git.jhaberku@scylladb.com>	2017-11-27 20:27:42 +01:00
Duarte Nunes	922f095f22	tests: Initialize storage service for some tests These tests now require having the storage service initialize, which is needed to decide whether correct non-compound range tombstones should be emitted or not. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20171126152921.5199-1-duarte@scylladb.com>	2017-11-26 17:41:06 +02:00
Tomasz Grabiec	6bf1c6014f	mvcc: partition_snapshot_row_cursor: Mark allocation points This marks places which may allocate but not always do as allocation points to increase effectiveness of testing.	2017-11-13 20:55:13 +01:00
Tomasz Grabiec	d76b141b34	tests: Extract mvcc tests to separate file	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	2df6f356b1	mvcc: Store LSA region reference in partition_snapshot Will be useful for improving encapsulation.	2017-09-13 17:38:08 +02:00
Avi Kivity	9b540eccb0	database: remove dependency on compaction.hh and compaction_manager.hh	2017-09-11 20:09:45 +03:00
Piotr Jastrzebski	c602ffd610	Make Scylla ttl expiration behave like in Cassandra Fixes #2497 [tgrabiec: reworked the title] Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2f5a99dce6ef11fe0ef135c9fa0592078fc9a056.1502886874.git.piotr@scylladb.com>	2017-08-21 14:25:45 +02:00
Tomasz Grabiec	fb62dfab02	tests: mvcc: Introduce test_schema_upgrade_preserves_continuity	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	164989a574	tests: mvcc: Add test for partition_entry::apply_to_incomplete()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	db053ef902	tests: Add test for continuity merging rules	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	804f46f684	mutation: Make compare_*_for_merge() consistent with equals() equals() considers expiring cells to be different form non-expiring cells, but compare_row_marker_for_merge() considers them equal. Fix the latter to pick expiring cells. The choice was arbitrary.	2017-05-23 13:35:03 +02:00
Tomasz Grabiec	c1475a8eb2	tests: mutation: Improve assertion failure message	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	d15880b3b7	tests: Use default equality in test_mutation_diff_with_random_generator	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	ef4c7c458c	tests: mutation: Check commutativity of mutation addition	2017-05-23 12:11:12 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Avi Kivity	6d9e18fd61	logalloc: reduce descriptor overhead Every lsa-allocated object is prefixed by a header that contains information needed to free or migrate it. This includes its size (for freeing) and an 8-byte migrator (for migrating). Together with some flags, the overhead is 14 bytes (16 bytes if the default alignment is used). This patch reduces the header size to 1 byte (8 bytes if the default alignment is used). It uses the following techniques: - ULEB128-like encoding (actually more like ULEB64) so a live object's header can typically be stored using 1 byte - indirection, so that migrators can be encoded in a small index pointing to a migrator table, rather than using an 8-byte pointer; this exploits the fact that only a small number of types are stored in LSA - moving the responsibility for determining an object's size to its migrator, rather than storing it in the header; this exploits the fact that the migrator stores type information, and object size is in fact information about the type The patch improves the results of memory_footprint_test as following: Before: - in cache: 976 - in memtable: 947 After: mutation footprint: - in cache: 880 - in memtable: 858 A reduction of about 10%. Further reductions are possible by reducing the alignment of lsa objects. logalloc_test was adjusted to free more objects, since with the lower footprint, rounding errors (to full segments) are different and caused false errors to be detected. Missing: adjustments to scylla-gdb.py; will be done after we agree on the new descriptor's format.	2017-04-24 12:23:12 +02:00
Duarte Nunes	143136647a	mutation_test: Add more test cases for difference() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-15 14:34:01 +01:00
Paweł Dziepak	04b80272f2	cell_locker: add metrics for lock acquisition	2017-03-02 09:05:12 +00:00
Paweł Dziepak	4ffe0401ee	test/mutation_source: specify whether to generate counter mutations Tests using random mutation generator should be provided with bot counter and non-counter mutations to ensure that both cases are sufficiently covered. However, mixed schemas (with both counter and non-counter columns) are not allowed so the RMG has to be explicitly told whether to use counter or non-counter schema.	2017-02-07 15:17:14 +00:00
Piotr Jastrzebski	4bbe05dd47	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Avi Kivity	1d9ee358f1	Revert "Merge "Reduce the size of mutation_partition" from Piotr" This reverts commit `aa392810ff`, reversing changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses boost::intrusive::detail directly, which it must not, and doesn't compile on all boost versions as a consequence.	2016-12-25 16:07:48 +02:00
Piotr Jastrzebski	2af6ff68d9	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Avi Kivity	7faf2eed2f	build: support for linking statically with boost Remove assumptions in the build system about dynamically linked boost unit tests. Includes seastar update which would have otherwise broken the build.	2016-10-26 08:51:21 +03:00
Glauber Costa	28e3f2f6ee	LSA: export information about object memory footprint We allocate objects of a certain size, but we use a bit more memory to hold them. To get a clerer picture about how much memory will an object cost us, we need help from the allocator. This patch exports an interface that allow users to query into a specific allocator to get that information. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Paweł Dziepak	6012a7e733	mutation_partition: fix iterator invalidation in trim_rows Reversed iterators are adaptors for 'normal' iterators. These underlying iterators point to different objects that the reversed iterators themselves. The consequence of this is that removing an element pointed to by a reversed iterator may invalidate reversed iterator which point to a completely different object. This is what happens in trim_rows for reversed queries. Erasing a row can invalidate end iterator and the loop would fail to stop. The solution is to introduce reversal_traits::erase_dispose_and_update_end() funcion which erases and disposes object pointed to by a given iterator but takes also a reference to and end iterator and updates it if necessary to make sure that it stays valid. Fixes #1609. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>	2016-08-25 16:52:35 +03:00
Piotr Jastrzebski	bb0c4c3c40	Fix compilation errors query::range parameter in mutation_partiton::range has to be changed to nonwrapping_range. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <36e444bfe90586f8d3b08ca36d8dc13d5898ef97.1471347402.git.piotr@scylladb.com>	2016-08-16 12:49:54 +01:00
Tomasz Grabiec	8c4b5e4283	db: Avoiding checking bloom filters during compaction Checking bloom filters of sstables to compute max purgeable timestamp for compaction is expensive in terms of CPU time. We can avoid calculating it if we're not about to GC any tombstone. This patch changes compacting functions to accept a function instead of ready value for max_purgeable. I verified that bloom filter operations no longer appear on flame graphs during compaction-heavy workload (without tombstones). Refs #1322.	2016-07-10 09:54:20 +02:00
Paweł Dziepak	983321f194	tests/mutation: do not create memtable on stack Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	e4ae7894d4	tests/mutation: test slicing mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:51 +01:00
Paweł Dziepak	737eb73499	mutation_reader: make readers return streamed_mutations Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:50 +01:00
Duarte Nunes	dc8319ed91	keys: Remove schema argument from make_empty An empty key is independent of the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	a15ed3c60f	mutation_test: Specify tmp data dir Otherwise we attempt to create sstable files under /. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1464618602-1124-1-git-send-email-duarte@scylladb.com>	2016-05-30 20:34:47 +02:00
Avi Kivity	db03295c8a	Merge "Fix query digest mismatch" from Tomasz "Currently data query digest includes cells and tombstones which may have expired or be covered by higher-level tombstones. This causes digest mismatch between replicas if some elements are compacted on one of the nodes and not on others. This mismatch triggers read-repair which doesn't resolve because mutations received by mutation queries are not differing, they are compacted already. The fix adds compacting step before writing and digesting query results by reusing the algorithm used by mutation query. This is not the most optimal way to fix this. The compaction step could be folded with the query writing, there is redundancy in both steps. However such change carries more risk, and thus was postponed. perf_simple_query test (cassandra-stress-like partitions) shows regression from 83k to 77k (7%) ops/s. Fixes #1165."	2016-04-08 12:13:29 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	474a35ba6b	tests: Add test for query digest calculation	2016-04-07 19:57:19 +02:00
Tomasz Grabiec	5d768d0681	tests: mutation_test: Move mutation generator to mutation_source_test.hh So that it can be reused.	2016-04-07 19:57:19 +02:00
Tomasz Grabiec	30d25bc47a	tests: mutation_test: Add test case for querying of expired cells	2016-04-07 19:57:19 +02:00
Tomasz Grabiec	2fbb55929d	mutation_test: Add allocation failure stress test for apply() The test injects allocation failures at every allocation site during apply(). Only allocations throug allocation_strategy are instrumented, but currently those should include all allocations in the apply() path. The target and source mutations are randomized.	2016-03-21 21:49:53 +01:00

1 2

76 Commits