scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	131a47dea3	tests/mutation: add test for changing column type With the introduction of the new in-memory representation changing column type has become a more complex operation since it needs to handle switch from fixed-size to variable-size types. This commit adds an explicit test for such cases.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	aa25f0844f	atomic_cell: introduce fragmented buffer value interface As a prepratation for the switch to the new cell representation this patch changes the type returned by atomic_cell_view::value() to one that requires explicit linearisation of the cell value. Even though the value is still implicitly linearised (and only when managed by the LSA) the new interface is the same as the target one so that no more changes to its users will be needed.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	418c159057	treewide: require type to copy atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Paweł Dziepak	e9d6fc48ac	treewide: require type for creating atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	93130e80fb	atomic_cell: require column_definition for creating atomic_cell views	2018-05-31 15:51:11 +01:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	bbe771e28f	tests: Add more tests for continuity merging	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	9893e8e5f7	mvcc: Make each version have independent continuity This change is a preparation for introducing row-level eviction, such that entries can be evicted from older versions without having to touch other versions. Currently continuity flags on entries are interpreted relative to the combined view merged from all entries. For example: v2: <key=2, cont=1> v1: <key=1, cont=1> In v2, the flag on entry key=2 marks the range (1, 2) as continuous. This is problematic because if the old version is evicted, continuity will change in an incorrect way: v2: <key=2, cont=1> Here, the range (-inf, 1) would be marked as continuous, which is not true. To solve this problem, we change the rules for continuity interpretation in MVCC. Each version will have its own continuity, fully specified in that version, independent of continuity of other versions. Continuity of the snapshot will be a union of continuous ranges in each version. It is assumed that continuous intervals in different versions are non- overlapping, except for points corresponding to complete rows, in which case a later version may overlap with an older version (overwrite). We make use of this assumption to make calculation of the union of intervals on merging easier. I make use of the above assumption in mutation_partition::apply_monotonically(). MVCC population of incomplete entries already almost maintains the non-overlapping invariant, because population intervals correspond to intervals which are incomplete in the old snapshot. The only change needed is to ensure that both population bounds will have entries in the latest version. Population from memtables doesn't mark any intervals as continuous, so also conforms. The only change needed there is to not inherit continuity flags from the old snapshot, effectively making the new version internally discontinuous except for row points. The example from the beginning will become: v2: <key=1, cont=0> <key=2, cont=1> v1: <key=1, cont=1> When marking a range as continuous with some rows present only in older versions, we need to insert entries in the latest version, so that we can mark the range as continuous. The easiest solution is to copy the entry from the old version. Another option would be to add support for incomplete rows and insert such instead. This way we would avoid duplicating row contents. This optimization is deferred.	2018-03-06 11:50:25 +01:00
Duarte Nunes	78508e8e43	tests/mutation_test: Use xxHash instead of MD5 for some tests Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	6cb0bbd978	tests/mutation_test: Test xx_hasher alongside md5_hasher Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	6b4b429883	query-result: Introduce class result_options Introduce class result_options to carry result options through the request pipeline, which at this point mean the result type and the digest algorithm. This class allows us to encapsulate the concrete digest algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Piotr Jastrzebski	7729bc5e7b	Remove unused mutation_reader_assertions Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Avi Kivity	c743d1258d	Merge "Reverse order of version merging in MVCC" from Tomasz "Changes merging in MVCC to apply newer version to older instead of older to newer. Before (v0 = oldest): (((v3 + v2) + v1) + v0) After: (v0 + (v1 + (v2 + v3))) or: (((v0 + v1) + v2) + v3) There are several reasons to do this: 1) When continuity merging will change semantics to support eviction from older versions, it will be easier to implement apply() if we can assume that we merge newer to older instead of older to newer, since newer version may have entries falling into a continuous interval in older, but not the other way around. If we didn't revert the order, apply() would have to keep track of lower bound of a continuous interval in the right-hand side argument (older version) as it is applied and update continuity flags in the left hand side by scanning all entries overlapping with it. If order is reversed, merging only needs to deal with the current entry. Also, if we were to keep the old order, we cannot simply move entries from the left hand side as we merge because we need to keep track of the lower bound of a continuous interval, and we need to provide monotonic exception guarantees. So merging would be both more complicated and slower. 2) With large partitions older versions are typically larger than newer versions, and since merging is O(N_right(1 + log(N_left))), it's better to merge newer into older. This fixes latency spikes seen in perf_cache_eviction. Fixes #2715." tag 'tgrabiec/reverse-order-of-mvcc-version-merging-v1' of github.com:scylladb/seastar-dev: mvcc: Reverse order of version merging anchorless_list: Introduce last() mvcc: Implement partition_entry::upgrade() using squashed() mvcc: Extract version merging functions mutation_partition: Add rows_entry::set_dummy() position_in_partition: Introduce after_key()	2018-01-21 13:56:57 +02:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Tomasz Grabiec	60d3c25c02	mvcc: Reverse order of version merging Change merging to apply newer version to older instead of older to newer. Before: (((v3 + v2) + v1) + v0) After: (v0 + (v1 + (v2 + v3))) or equivalent: (((v0 + v1) + v2) + v3) There are several reasons to do this: 1) When continuity merging will change semantics to support eviction from older versions, it will be easier to implement apply() if we can assume that we merge newer to older instead of older to newer, since newer version may have entries falling into a continuous interval in older, but not the other way around. If we didn't revert the order, apply() would have to keep track of lower bound of a continuous interval in the right-hand side argument (older version) as it is applied and update continuity flags in the left hand side by scanning all entries overlapping with it. If order is reversed, merging only needs to deal with the current entry. Also, if we were to keep the old order, we cannot simply move entries from the left hand side as we merge because we need to keep track of the lower bound of a continuous interval, and we need to provide monotonic exception guarantees. So merging would be both more complicated and slower. 2) With large partitions older versions are typically larger than newer versions, and since merging is O(N_right*(1 + log(N_left))), it's better to merge newer into older. Fixes #2715.	2018-01-18 13:52:08 +01:00
Piotr Jastrzebski	dc75df6353	Stop using memtable::make_reader in mutation_test Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 11:47:07 +01:00
Paweł Dziepak	8b3c3fc832	db: make column_family::make_reader() return flat reader	2017-12-13 12:01:03 +00:00
Avi Kivity	4cfcd8055e	Merge "Drop reversible apply() from mutation_partition" from Tomasz "This simplifies implementation of mutation_partition merging by relaxing exception guarantees it needs to provide. This allows reverters to be dropped. Direct motivation for this is to make it easier to implement new semantics for merging of clustering range continuity. Implementation details: We only need strong exception guarantees when applying to the memtable, which is using MVCC. Instead of calling apply() with strong exception guarantees on the latest version, we will move the incoming mutation to a new partition_version and then use monotonic apply() to merge them. If that merging fails, we attach the version with the remainder, which cannot fail. This way apply() always succeeds if the allocation of partition_version object succeeds. Results of `perf_simple_query_g -c1 -m1G --write` (high overwrite rate): Before: 101011.13 tps 102498.07 tps 103174.68 tps 102879.55 tps 103524.48 tps 102794.56 tps 103565.11 tps 103018.51 tps 103494.37 tps 102375.81 tps 103361.65 tps After: 101785.37 tps 101366.19 tps 103532.26 tps 100834.83 tps 100552.11 tps 100891.31 tps 101752.06 tps 101532.00 tps 100612.06 tps 102750.62 tps 100889.16 tps Fixes #2012." * tag 'tgrabiec/drop-reversible-apply-v1' of github.com:scylladb/seastar-dev: mutation_partition: Drop apply_reversibly() mutation_partition: Relax exception guarantees of apply() mutation_partition: Introduce apply_weak() tests: mvcc: Add test for atomicity of partition_entry::apply() tests: Move failure_injecting_allocation_strategy to a header tests: mutation_partition: Test exception guarantees of apply_monotonically() mvcc: Use apply_monotonically() where sufficient mvcc: partition_version: Use apply_monotonically() to provide atomicity mvcc: Extract partition_entry::add_version() mutation_partition: Introduce apply_monotonically() mutation_partition: Introduce row::consume_with()	2017-11-28 16:35:06 +02:00
Tomasz Grabiec	091e10fc70	mutation_partition: Relax exception guarantees of apply() The uses which needed strong or weak exception guarantees were switched to a solution involving apply_monotonically(). All remaining uses don't need any exception guarantees.	2017-11-28 13:03:06 +01:00
Tomasz Grabiec	e5532bd644	tests: Move failure_injecting_allocation_strategy to a header	2017-11-28 12:38:28 +01:00
Tomasz Grabiec	1b5f2b0473	tests: mutation_partition: Test exception guarantees of apply_monotonically()	2017-11-28 12:38:28 +01:00
Jesse Haber-Kucharsky	fb0866ca20	Move `thread_local` declarations out of `main.cc` Since `disk-error-handler.hh` defines these global variables `extern`, it makes sense to declare them in the `disk-error-handler.cc` instead of `main.cc`. This means that test files don't have to declare them. Fixes #2735. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <1eed120bfd9bb3647e03fe05b60c871de2df2a86.1511810004.git.jhaberku@scylladb.com>	2017-11-27 20:27:42 +01:00
Duarte Nunes	922f095f22	tests: Initialize storage service for some tests These tests now require having the storage service initialize, which is needed to decide whether correct non-compound range tombstones should be emitted or not. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20171126152921.5199-1-duarte@scylladb.com>	2017-11-26 17:41:06 +02:00
Tomasz Grabiec	6bf1c6014f	mvcc: partition_snapshot_row_cursor: Mark allocation points This marks places which may allocate but not always do as allocation points to increase effectiveness of testing.	2017-11-13 20:55:13 +01:00
Tomasz Grabiec	d76b141b34	tests: Extract mvcc tests to separate file	2017-09-13 17:47:04 +02:00
Tomasz Grabiec	2df6f356b1	mvcc: Store LSA region reference in partition_snapshot Will be useful for improving encapsulation.	2017-09-13 17:38:08 +02:00
Avi Kivity	9b540eccb0	database: remove dependency on compaction.hh and compaction_manager.hh	2017-09-11 20:09:45 +03:00
Piotr Jastrzebski	c602ffd610	Make Scylla ttl expiration behave like in Cassandra Fixes #2497 [tgrabiec: reworked the title] Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2f5a99dce6ef11fe0ef135c9fa0592078fc9a056.1502886874.git.piotr@scylladb.com>	2017-08-21 14:25:45 +02:00
Tomasz Grabiec	fb62dfab02	tests: mvcc: Introduce test_schema_upgrade_preserves_continuity	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	164989a574	tests: mvcc: Add test for partition_entry::apply_to_incomplete()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	db053ef902	tests: Add test for continuity merging rules	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	804f46f684	mutation: Make compare_*_for_merge() consistent with equals() equals() considers expiring cells to be different form non-expiring cells, but compare_row_marker_for_merge() considers them equal. Fix the latter to pick expiring cells. The choice was arbitrary.	2017-05-23 13:35:03 +02:00
Tomasz Grabiec	c1475a8eb2	tests: mutation: Improve assertion failure message	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	d15880b3b7	tests: Use default equality in test_mutation_diff_with_random_generator	2017-05-23 13:16:03 +02:00
Tomasz Grabiec	ef4c7c458c	tests: mutation: Check commutativity of mutation addition	2017-05-23 12:11:12 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Avi Kivity	6d9e18fd61	logalloc: reduce descriptor overhead Every lsa-allocated object is prefixed by a header that contains information needed to free or migrate it. This includes its size (for freeing) and an 8-byte migrator (for migrating). Together with some flags, the overhead is 14 bytes (16 bytes if the default alignment is used). This patch reduces the header size to 1 byte (8 bytes if the default alignment is used). It uses the following techniques: - ULEB128-like encoding (actually more like ULEB64) so a live object's header can typically be stored using 1 byte - indirection, so that migrators can be encoded in a small index pointing to a migrator table, rather than using an 8-byte pointer; this exploits the fact that only a small number of types are stored in LSA - moving the responsibility for determining an object's size to its migrator, rather than storing it in the header; this exploits the fact that the migrator stores type information, and object size is in fact information about the type The patch improves the results of memory_footprint_test as following: Before: - in cache: 976 - in memtable: 947 After: mutation footprint: - in cache: 880 - in memtable: 858 A reduction of about 10%. Further reductions are possible by reducing the alignment of lsa objects. logalloc_test was adjusted to free more objects, since with the lower footprint, rounding errors (to full segments) are different and caused false errors to be detected. Missing: adjustments to scylla-gdb.py; will be done after we agree on the new descriptor's format.	2017-04-24 12:23:12 +02:00
Duarte Nunes	143136647a	mutation_test: Add more test cases for difference() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-15 14:34:01 +01:00
Paweł Dziepak	04b80272f2	cell_locker: add metrics for lock acquisition	2017-03-02 09:05:12 +00:00
Paweł Dziepak	4ffe0401ee	test/mutation_source: specify whether to generate counter mutations Tests using random mutation generator should be provided with bot counter and non-counter mutations to ensure that both cases are sufficiently covered. However, mixed schemas (with both counter and non-counter columns) are not allowed so the RMG has to be explicitly told whether to use counter or non-counter schema.	2017-02-07 15:17:14 +00:00
Piotr Jastrzebski	4bbe05dd47	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-01-05 11:26:03 +01:00
Avi Kivity	1d9ee358f1	Revert "Merge "Reduce the size of mutation_partition" from Piotr" This reverts commit `aa392810ff`, reversing changes made to a24ff47c637e6a5fd158099b8a65f1191fc2d023; it uses boost::intrusive::detail directly, which it must not, and doesn't compile on all boost versions as a consequence.	2016-12-25 16:07:48 +02:00
Piotr Jastrzebski	2af6ff68d9	mutation_partition: take schema in find_row and clustered_row This will allow intrusive set implementation that does not store schema. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-12-23 11:29:07 +01:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Avi Kivity	7faf2eed2f	build: support for linking statically with boost Remove assumptions in the build system about dynamically linked boost unit tests. Includes seastar update which would have otherwise broken the build.	2016-10-26 08:51:21 +03:00
Glauber Costa	28e3f2f6ee	LSA: export information about object memory footprint We allocate objects of a certain size, but we use a bit more memory to hold them. To get a clerer picture about how much memory will an object cost us, we need help from the allocator. This patch exports an interface that allow users to query into a specific allocator to get that information. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-10-04 10:39:10 -04:00
Paweł Dziepak	6012a7e733	mutation_partition: fix iterator invalidation in trim_rows Reversed iterators are adaptors for 'normal' iterators. These underlying iterators point to different objects that the reversed iterators themselves. The consequence of this is that removing an element pointed to by a reversed iterator may invalidate reversed iterator which point to a completely different object. This is what happens in trim_rows for reversed queries. Erasing a row can invalidate end iterator and the loop would fail to stop. The solution is to introduce reversal_traits::erase_dispose_and_update_end() funcion which erases and disposes object pointed to by a given iterator but takes also a reference to and end iterator and updates it if necessary to make sure that it stays valid. Fixes #1609. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>	2016-08-25 16:52:35 +03:00

1 2

89 Commits