scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-30 05:07:05 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	fd5dbe04b5	tests: sstables: Test more configutaions of sstable writer in test_sstable_conforms_to_mutation_source() Test different versions of the format, and different promoted index block sizes. The size of 1 is especially important, it will put each fragment in a separate block, exposing various issues with promoted index handling.	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	c5baeed6d2	sstables: Improve logging	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	b523815ac1	sstables: index_reader: Fix advance_to() to include relevant range tombstones Fixes #2326.	2017-04-27 18:43:49 +02:00
Tomasz Grabiec	92dba05f0d	sstables: Fix malformed_sstable_exception from single-key reads After `4742008b70`, _read_partial_row is never set, and we will fail here in case the consumer will exhoust the range. That would be the case if the end bound of the slice aligns with the end of the index page. Fix by assuming that if we're out of range in the middle of partition, we sliced. Message-Id: <1493121249-18847-1-git-send-email-tgrabiec@scylladb.com>	2017-04-25 14:59:08 +03:00
Avi Kivity	628b3092e4	Merge "Reify shadowable tombstones" from Duarte "This series introduces the row_tombstone class, which represents a tombstone applied to a clustering row. It distinguishes itself from a normal tombstone by the fact that it contains a regular tombstone and a shadowable one, which can be erased by a row marker. The intent of the series is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be, leading to incorrect results." * 'materialized-views/shadowable/v5' of https://github.com/duarten/scylla: sstables: Read and write shadowable tombstones mutation_partion: Use row_tombstone mutation_partion: Introduce row_tombstone mutation_partition: Introduce shadowable tombstones idl-compiler: Support optional fields in views tombstone: Extract out relational operators row_marker: Mark constructors explicit	2017-04-25 13:05:27 +03:00
Duarte Nunes	d45596ae8e	sstables: Read and write shadowable tombstones This patch serializes shadowable tombstones to sstables by adding a new, incompatible atom's mask. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	6a2bccd4ae	mutation_partion: Introduce row_tombstone This patch introduces the row_tombstone class, which represents a tombstone made up of a regular tombstone and a shadowable one. The rules for row_tombstones are as follows: - The shadowable tombstone is always >= than the regular one; - The regular tombstone works as expected; - The shadowable tombstone doesn't erase or compact away the regular row tombstone, nor dead cells; - The shadowable tombstone can erase live cells, but only provided they can be recovered (e.g., by including all cells in a MV update, both updated cells and pre-existing ones); - The shadowable tombstone can be erased or compacted away by a newer row marker. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:28 +02:00
Duarte Nunes	3d49c1da01	mutation_partition: Introduce shadowable tombstones A shadowable tombstone is a tombstone that can be replaced by a smaller one if provided a row_marker with a bigger timestamp than the shadowable tombstone. In the context of a row, it is only valid as long as no newer insert is done (thus setting a live row marker; note that if the row timestamp set is lower than the tombstone's, then the tombstone remains in effect as usual). If a row has a shadowable tombstone with timestamp Ti and that row is updated with a timestamp Tj, such that Tj > Ti (and that update sets the row marker), then the shadowable tombstone is shadowed by that update. A concrete consequence is that if the update has cells with timestamp lower than Ti, then those cells are preserved (since the deletion is removed), and this is contrary to a regular, non-shadowable row tombstone where the tombstone is preserved and such cells are removed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:22 +02:00
Duarte Nunes	8cc29f84fb	idl-compiler: Support optional fields in views When generating view code, the compiler was ignoring optional fields. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Duarte Nunes	d216c3dbd2	tombstone: Extract out relational operators This patch extracts out the relational operators in struct tombstone to a class capable of generating them from a tri-compare function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Duarte Nunes	392403b5b3	row_marker: Mark constructors explicit Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Tomasz Grabiec	f3609fc813	tests: log_historgram_test: Fix compiation on Ubuntu Some gcc versions incorrectly complain: tests/log_histogram_test.cc:87:22: error: ‘opts1’ is not a valid template argument for type ‘const log_histogram_options&’ because object ‘opts1’ has not external linkage size_t hist_key<node<opts1>>(const node<opts1>& n) { return n.v; } Apparently this is a bug in gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52036 Fixes #2307. Message-Id: <1493108791-11247-1-git-send-email-tgrabiec@scylladb.com>	2017-04-25 12:15:28 +03:00
Pekka Enberg	940c3f4330	Merge "Clang fixes (part 2)" from Avi "This series fixes some more errors found by clang, with the aim of enabling clang/zapcc as a supported compiler. A single issue remains, but it's probably in std::experimental::optional::swap(); not in our code." * tag 'clang/2/v1' of https://github.com/avikivity/scylla: sstable_test: avoid passing negative non-type template arguments to unsigned parameters UUID: add more comparison operators sstable_datafile_test: avoid string_view user-defined literal conversion operator mutation_source_test: avoid template function without template keyword cql_query_test: define static variable cql_query_test: add braces for single-item collection initializers storage_service: don't use typeid(temporary) logalloc: remove unused max_occupancy_for_compaction storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic storage_proxy: drop unused member access from return value storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare read_repair_decision: fix operator<<(std::ostream&, ...)	2017-04-24 20:32:16 +03:00
Tomasz Grabiec	dfbb9fd8f1	gdb: Workaround for gdb.Value being not accepted by %x Fixes the following error in "scylla segment-descs" and a similar one in "scylla lsa-segment": Traceback (most recent call last): File "scylla-gdb.py", line 530, in invoke gdb.write('0x%x: lsa free=%d region=0x%x zone=0x%x\n' % (addr, desc['_free_space'], desc['_region'], desc['_zone'])) TypeError: %x format: an integer is required, not gdb.Value Message-Id: <1493029465-6482-1-git-send-email-tgrabiec@scylladb.com>	2017-04-24 13:27:25 +03:00
Avi Kivity	6d9e18fd61	logalloc: reduce descriptor overhead Every lsa-allocated object is prefixed by a header that contains information needed to free or migrate it. This includes its size (for freeing) and an 8-byte migrator (for migrating). Together with some flags, the overhead is 14 bytes (16 bytes if the default alignment is used). This patch reduces the header size to 1 byte (8 bytes if the default alignment is used). It uses the following techniques: - ULEB128-like encoding (actually more like ULEB64) so a live object's header can typically be stored using 1 byte - indirection, so that migrators can be encoded in a small index pointing to a migrator table, rather than using an 8-byte pointer; this exploits the fact that only a small number of types are stored in LSA - moving the responsibility for determining an object's size to its migrator, rather than storing it in the header; this exploits the fact that the migrator stores type information, and object size is in fact information about the type The patch improves the results of memory_footprint_test as following: Before: - in cache: 976 - in memtable: 947 After: mutation footprint: - in cache: 880 - in memtable: 858 A reduction of about 10%. Further reductions are possible by reducing the alignment of lsa objects. logalloc_test was adjusted to free more objects, since with the lower footprint, rounding errors (to full segments) are different and caused false errors to be detected. Missing: adjustments to scylla-gdb.py; will be done after we agree on the new descriptor's format.	2017-04-24 12:23:12 +02:00
Avi Kivity	b4e897a66d	cql3::metadata: fix undefined evaluation order in constructor We both move names_ to its destination, and call names_.size() in the same expression; this has undefined evaluation order, and fails with clang. With this patch as well as the clang build fixes, Scylla starts and is able to serve requests (light cassandra-stress load). Message-Id: <20170423121727.1948-1-avi@scylladb.com>	2017-04-24 10:40:12 +03:00
Duarte Nunes	cddf2f4d74	tests: Fix failure virtual_reader_test This patch fixes a failure of virtual_reader_test, where both the test itself and the cql_test_env initialize the messaging_service to listen on the same address and port, triggering an assert in posix_ap_server_socket_impl::accept(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170423104240.21275-1-duarte@scylladb.com>	2017-04-23 14:06:35 +03:00
Avi Kivity	566c094764	sstable_test: avoid passing negative non-type template arguments to unsigned parameters Clang complains. The test looks somewhat bogus, but that's for another patch.	2017-04-22 22:13:55 +03:00
Avi Kivity	dc6ea51ffa	UUID: add more comparison operators Clang wanted them for some unit test; not sure how gcc was able to synthesize them, but they're clearly needed.	2017-04-22 22:12:33 +03:00
Avi Kivity	5424aca745	sstable_datafile_test: avoid string_view user-defined literal conversion operator Clang doesn't like it, perhaps because it isn't in the std namespace (it's still in std::experimental).	2017-04-22 22:11:30 +03:00
Avi Kivity	705ac957a2	mutation_source_test: avoid template function without template keyword This isn't (yet?) standard C++, and clang rejects it.	2017-04-22 22:10:21 +03:00
Avi Kivity	551fb03476	cql_query_test: define static variable single_node_cql_env is declared but not defined; define it to make clang happy.	2017-04-22 22:01:44 +03:00
Avi Kivity	eb700752d8	cql_query_test: add braces for single-item collection initializers Clang complains that braces are missing; I didn't verify it but I'm sure it's right. Add braces to make it happy.	2017-04-22 22:00:49 +03:00
Avi Kivity	6bb8ae7788	storage_service: don't use typeid(temporary) Clang warns that the expression will be evaluated (doh). While the warning seems dubious, keep it and change the code to call the function outside typeid(), in case it does help someone one day.	2017-04-22 21:09:41 +03:00
Avi Kivity	9303b09a64	logalloc: remove unused max_occupancy_for_compaction Noticed by clang.	2017-04-22 21:09:41 +03:00
Avi Kivity	6d0811711f	storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic Clang's std::abs() doesn't support __int128_t, so use __int64_t instead. With this change, it's possible that a read repair 252,700 years after a write will be interpreted as a recent write and the read repair will incorrectly be skipped; hopefully by that time __int128_t will be standardized.	2017-04-22 21:09:41 +03:00
Avi Kivity	5ec1742b9a	storage_proxy: drop unused member access from return value Noticed by clang.	2017-04-22 21:09:41 +03:00
Avi Kivity	e4bae0df51	storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare Noticed by clang.	2017-04-22 21:09:41 +03:00
Avi Kivity	944047f039	read_repair_decision: fix operator<<(std::ostream&, ...) Argument-dependent lookup requires that the operator be declared in the same namespace as the class; move it there. While at it, de-static it, it only causes bloat.	2017-04-22 21:09:41 +03:00
Raphael S. Carvalho	4a86dd473d	tests: add tests/sstable_resharding_test.cc Forgot to add file after resolving conflict. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170422172053.3734-1-raphaelsc@scylladb.com>	2017-04-22 21:09:29 +03:00
Benoît Canet	f68049ef5d	tests: Fix clang auto universal reference type deduction Replace it by regular template type deduction. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170421204150.4626-2-benoit@scylladb.com>	2017-04-22 20:04:00 +03:00
Benoit Canet	b902f3b81b	tests: Remove parenthesis in variable declaration Prevent clang compilation of this tests. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170421204150.4626-1-benoit@scylladb.com>	2017-04-22 20:04:00 +03:00
Avi Kivity	54ab13eb8e	Merge "sstable resharding revamp" from Raphael "Currently, a shared sstable is rewritten at all shards it belongs to, and only after that, it's deleted. This new algorithm adds the ability to reshard a set of sstables together at a single shard and produce unshared sstable for all shards involved. That's important for the leveled compaction strategy issue, in which the number of sstables growing considerably after resharding. What happened is that every sstable was being split into N ones, so we could end up with tons of small sstables. Now, we will reshard together a set of adjacent sstables." * 'sstable_resharding_revamp_v9' of github.com:raphaelsc/scylla: tests: add test for new sstable resharding database: kill column_family::start_rewrite database: wire up new resharding algorithm database: implement new sstable resharding algorithm database: introduce function to replace new sstables by their ancestors prevent regular compaction from choosing shared sstables compaction_strategy: implement resharding strategy for compaction strategies sstables: store more info in foreign_sstable_open_info sstables: make it possible to get open info from loaded sstable database: export column family dir database: inform if column family has shared tables sstables: add method to export ancestors lcs: implement get_level_count compaction_manager: introduce method to check if manager stopped lcs: restore invariant instead of sending overlapping sst to L0 sstables: extend compaction for new resharding sstables: allow shard A to correctly create sstable for shard B compaction: rework compacting_sstable_writer to work with multiple writers compaction: prepare compacting_sstable_writer to work with writers sstables: rework compaction to make it easy to extend	2017-04-22 13:31:54 +03:00
Raphael S. Carvalho	8a37b279ed	tests: add test for new sstable resharding Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:34 -03:00
Raphael S. Carvalho	662fe77c11	database: kill column_family::start_rewrite Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:33 -03:00
Raphael S. Carvalho	43ac19eb52	database: wire up new resharding algorithm Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:31 -03:00
Raphael S. Carvalho	cf45333588	database: implement new sstable resharding algorithm NOTE: it's not wired yet. Currently, a shared sstable is rewritten at all shards it belongs to and only after that, it's deleted. With this new algorithm, a shared sstable will be read only once and N unshared sstables will be created, each of them with 1/N of the data. After it's done, each owner shard will receive its new unshared sstable replacing its ancestors. Another benefit is that we'll no longer have resharding resulting in number of sstables growing considerably after resharding. A full-sized leveled sstable is usually 160MB, so after resharding, we could have N files of 160MB/N. Now, leveled strategy will help resharding. N adjacent sstables of same level will be resharded together, so we'll end up with N files of N*160MB/N. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:30 -03:00
Raphael S. Carvalho	6513252e91	database: introduce function to replace new sstables by their ancestors When resharding, we're working with sstables from all shards. So let's say we're done with resharding of sstable A that belongs to shard 0 and 1 and sstable B that belongs to shard 1 and 2. SStables were generated for shards 0, 1, and 2. So shards 0, 1, and 2 need to load the new sstables and remove the ancestors. Shard 1 for example will remove sstables A and B (ancestors) and add the new one. Then it comes this new function. We'll forward new sstables to their target shards using foreign sstable open info. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:27 -03:00
Raphael S. Carvalho	c44a2319e6	prevent regular compaction from choosing shared sstables For new resharding, it's important to exclude resharding sstables from the list of candidates for regular compaction. That's doesn't affect current resharding because it marks the sstables as compacting. That won't work with new resharding which will work with sstables from multiple shards. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:26 -03:00
Raphael S. Carvalho	13477075e2	compaction_strategy: implement resharding strategy for compaction strategies Strategies other than leveled will reshard one shared sstable at a time, and the target shard, shard at which job will run, for each job will be chosen in a round-robin fashion. For leveled strategy, we will reshard together smp::count adjacent sstables that belong to same level. The reason for that is because resharding one sstable at a time may result in creation of file for each shard, meaning after resharding we could end up with NO_SSTABLES*NO_SHARDS. These resharding strategies will be used for our new resharding algorithm. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:24 -03:00
Raphael S. Carvalho	bf930476b3	sstables: store more info in foreign_sstable_open_info We need that info for opening a sstable at different shard, unlike sstable loader which has everything in entry_descriptor, obtained from components in sstable filename. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:22 -03:00
Raphael S. Carvalho	e5e7037aa4	sstables: make it possible to get open info from loaded sstable It will be useful for resharding which will need to move a sstable across shards, and to do that without reloading the sstable at target shard, we need to be able to get the open info and move it to the target shard instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:21 -03:00
Raphael S. Carvalho	405e41e9a8	database: export column family dir Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:19 -03:00
Raphael S. Carvalho	2b774c5bc3	database: inform if column family has shared tables That's gonna be useful to quickly determine if it's worth resharding a column family. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:17 -03:00
Raphael S. Carvalho	2d119287b7	sstables: add method to export ancestors Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:16 -03:00
Raphael S. Carvalho	f2f8a2f5c7	lcs: implement get_level_count Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:14 -03:00
Raphael S. Carvalho	585596cede	compaction_manager: introduce method to check if manager stopped Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:12 -03:00
Raphael S. Carvalho	d82a8dfae0	lcs: restore invariant instead of sending overlapping sst to L0 A large token span sstable may find its way into high level due to resharding, which means the strategy invariant is broken. The invariant is restored by compacting first set of overlapping sstables, meaning that the restoration is done incrementally for multiple overlapping sets. Invariant is restored by regular compaction after resharding puts new unshared sstables into their original level, where level > 0. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:09 -03:00
Raphael S. Carvalho	0127309820	sstables: extend compaction for new resharding Extends compaction for new resharding algorithm. Not wired yet. New resharding will compact shared sstable(s) and create one sstable for each owner. It's up to the caller to open these new unshared sstables at their respective column families. This new approach will save a lot of bandwidth because we'll no longer read the entire shared sstable #smp::count times. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-04-21 17:11:08 -03:00

1 2 3 4 5 ...

11807 Commits