scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 20:46:56 +00:00

Author	SHA1	Message	Date
Vladimir Krivopalov	5db6002720	Write serialization header to Statistics.db for SSTables 3.x. Serialization header is a new components in Statistics.db introduced in SSTables 3.0 ('ma') format. It is essential for reading data file as it contains the base values used for delta-encoded values (timestamps, TTLs, local deletion times) and description of column types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:43:17 -07:00
Vladimir Krivopalov	6e4601d177	Do not pass schema to metadata_collector::update(column_stats) Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:22:32 -07:00
Vladimir Krivopalov	a10ad6b623	Collect metadata statistics when writing SSTables 3.0. Track min/max timestamps, TTLs, local deletion times and count of cells, columns and rows. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:22:30 -07:00
Vladimir Krivopalov	8342073758	Call get_metadata_collector() instead of referencing sstable::_collector directly. A step to untie classes sstable_writer_m and sstable so that eventually we could stop them being friends. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Vladimir Krivopalov	f1816d77cc	Fix logic of writing TTLed cells in SSTable 3.0 format. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Vladimir Krivopalov	3e471116b4	Separate statistics for count of cells, columns and rows in column_stats. SSTables 3.0 format makes a distinction between count of cells and count of columns. In that sense, a column of a collection type counts as one column but every atomic cell in it counts as a separate cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Vladimir Krivopalov	fdfe79e899	Deserialize collection in a way that doesn't incur shared_ptr counter increment and is generally shorter. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Vladimir Krivopalov	7039dee12b	Track both min & max values for timestamp, TTL and local deletion time in metadata_collector. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Vladimir Krivopalov	8b8c9a5d10	Add class for tracking both extremum values (min and max) on updates. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Tomasz Grabiec	5e985192b2	db: Log table id and schema version on boot Message-Id: <1524585689-12458-1-git-send-email-tgrabiec@scylladb.com>	2018-05-03 10:50:31 +03:00
Botond Dénes	5d5bc0e1ab	mutation_reader_test: fix multishard-reader test with smp > 3 test_multishard_combining_reader_destroyed_with_pending_create_reader was failing because it relied on smp == 3 and thus the shard on which the reader creation is blocked being shard-2. Since the test requires to be run with smp >= 3 we can hardcode this shard to be 2 because if the test runs at all we are guaranteed to have at least smp >= 3. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <38883a1f4c18ca0cd065aa13826a4f1858353289.1525328233.git.bdenes@scylladb.com>	2018-05-03 10:30:21 +03:00
Botond Dénes	efa08f623a	mutation_reader_test: add description to multishard-tests These tests are quite complicated and require intimate knowledge of how foreign_reader and multishard_combining_reader operates. Knowing these two objects is still required to understand the tests but make it that much easier by explaining how they were designed to test what they test. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8de580131a8652924de920c2bc68a98e579398ee.1525328226.git.bdenes@scylladb.com>	2018-05-03 10:30:20 +03:00
Paweł Dziepak	bfc017daa8	tests/mutation_reader: do not capture on-stack variable by reference 'shard' is a short-lived on-stack variable that gets captured by reference by continuation that gets executed on another shard. Fixes a race condition that leads to an heap-use-after-free. Message-Id: <20180502150507.2776-1-pdziepak@scylladb.com>	2018-05-02 18:07:37 +03:00
Botond Dénes	d80e586ccb	mutation_reader_test: remove leftover comments Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <580dcf664fc4fc84f3a29137fba5c982f57d7601.1525269726.git.bdenes@scylladb.com>	2018-05-02 17:03:50 +03:00
Botond Dénes	e14b0ca13e	mutation_reader_test: fix possible use-after-free The test_foreign_reader_destroyed_with_pending_read_ahead test currently doesn't ensure that the objects in it's scope are destroyed in the correct order. This is necessary as there are severeal foreign pointers to objects that live on remote shards and use each other. Since foreign pointers destory their managed object in the background we cannot rely on the to reliably destroy objects in order, nor can we be sure when the object they manage is actually destroy. So to work around that ensure that the puppet_reader is destroyed before the remote_control it references even has a chance of being destroyed. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <232eaa899878b03fb2a765c2916e4f05841472a3.1525269726.git.bdenes@scylladb.com>	2018-05-02 17:03:49 +03:00
Nadav Har'El	68b5eafcc6	secondary index: test index naming Test for Scylla's default choice of secondary index name (we found one small problem, see issue #3403, and left it commented out). Also test the ability to give indices non-default names. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180501153439.26619-1-nyh@scylladb.com>	2018-05-02 08:12:14 +03:00
Nadav Har'El	311b25948c	secondary index: test indexing of partition-key column Add a test that adding a secondary-index for an only partition key column is not allowed (it would be redundant), but indexing one of several partition key columns is allowed. This reproduced issue #3404, and verifies that it was fixed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180501121544.22869-2-nyh@scylladb.com>	2018-05-02 08:11:04 +03:00
Nadav Har'El	79c6bb642f	secondary index: fix indexing of partition-key column Indexing an only partition key component is not allowed (because it would be redundant), but it should be allowed to index one of several partition key components. We had a bug in that case: the underlying materialized view we created had the same column as both a partition key and a clustering key, which resulted in an assertion failure. This patch fixes that. Fixes #3404. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180501121544.22869-1-nyh@scylladb.com>	2018-05-02 08:06:38 +03:00
Nadav Har'El	21d7507b74	secondary index: move stuff out of db/index directory The db/index directory contains just a few lines of code that exists there for historical reasons. It's confusing that we have both db/index and index/ directory related to secondary-indexing. This patch moves what little is still in db/index/ to index/. In the future we should probably get rid of the "secondary_index" class we had there, but for now, let's at least not have a whole new directory for it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180501101246.21143-1-nyh@scylladb.com>	2018-05-01 13:21:24 +03:00
Avi Kivity	25545590a4	Merge "Read-ahead related fixes for multishard readers" from Botond " Both multishard_combining_reader and foreign_reader use read-head in the background to avoid blocking consumers. These read-aheads can be still pending when the reader is destroyed and hence extra attention is needed to avoid memory errors. Recent manual testing, done in the context of testing code that is using the multishard reader, proved that these cases were not handled correctly in the initial series introducing it (`2d126a79b`). This series introduces fixes and comprehensive tests for all problematic scenarios: 1) multishard_combining_reader is destroyed with pending reader creation on a remote shard. 2) foreign_reader is destroyed with pending read-ahead. 3) multishard_combining_reader is destroyed with pending read-ahead. " * 'multishard-reader-read-ahead-fixes/v2' of https://github.com/denesb/scylla: test.py: add custom seastar flags for mutation_reader_test test.py: move custom seastar flags for tests declarative mutation_reader_test: add read-ahead related multishard reader tests tests/mutation_reader_test: change recommented smp to 3 mutation_reader_test: fix name of existing multishard reader tests simple_schema: add global_simple_schema simple_schema.hh: remove unused include multishard_combining_reader: prepare for read-ahead otliving the reader foreign_reader: prepare for read-ahead outliving the reader multishard_combining_reader: avoid creating the shard reader twice multishard_combining_reader: read_ahead: don't assume reader is created multishard_combining_reader: move read-ahead related methods multishard_combining_reader: avoid looking up the shard reader twice multishard_combining_reader: use optional for maybe created reader	2018-04-30 17:41:50 +03:00
Botond Dénes	f96084d38e	test.py: add custom seastar flags for mutation_reader_test Use -c3 if possible (if the machines has at least 3 cores).	2018-04-30 17:17:45 +03:00
Botond Dénes	52f0bb0481	test.py: move custom seastar flags for tests declarative	2018-04-30 17:17:45 +03:00
Botond Dénes	79684eff8e	mutation_reader_test: add read-ahead related multishard reader tests Add tests for foreign_reader and multishard_combining_reader that check that readers destroyed while there is pending read-head will not result in use-after-free. Specifically check that: * multishard_combining_reader destroyed with pending reader creation * foreign_reader destroyed with pending read-ahead * multishard_combining_reader destroyed with pending read-ahead does not result in use-after-free or SEGFAULT. These tests try to do their best to check for correct behaviour with various BOOST_REQUIRE* checks but they still heavily rely on ASAN to detect any use-after-free, SEGFAULT or similar errors.	2018-04-30 17:17:45 +03:00
Botond Dénes	cb25afa8bf	tests/mutation_reader_test: change recommented smp to 3 Of the test_multishard_combining_reader_reading_empty_table test. Running this test with smp=3 instead of smp=2 helps detecting additional read-ahead related memory problems.	2018-04-30 17:17:45 +03:00
Botond Dénes	78266f11c4	mutation_reader_test: fix name of existing multishard reader tests s/multishard_combined_reader/multishard_combining_reader/	2018-04-30 17:17:44 +03:00
Botond Dénes	783f0f09bf	simple_schema: add global_simple_schema Which allows a simple_schema instance to be transferred to another shard. In fact a new simple_schema instance will be created on the remote shard but it will use the same schema instance the the original one.	2018-04-30 17:17:44 +03:00
Botond Dénes	ed7bde99bc	simple_schema.hh: remove unused include	2018-04-30 17:17:44 +03:00
Botond Dénes	04643fb223	multishard_combining_reader: prepare for read-ahead otliving the reader When the multishard reader is destroyed there might be severeal pending read-aheads running in the background. These read-aheads need their associated reader to stay alive until after the read-ahead completes. To solve this move the flat_mutation_reader into a struct and manage this struct's lifetime through a shared pointer. Fibers associated with read-aheads that might outlive the multishard reader will hold on to a copy of the shard pointer keeping the underlying reader alive until they complete. To avoid doing any extra work a flag is added to this state which is set when the multishard reader is destroyed. When this flag is set, pending continuations will return early. All this is encapsulated in multishard_combining_reader::shard_reader the multishard reader code itself need not be changed.	2018-04-30 17:16:21 +03:00
Botond Dénes	a05d398be7	foreign_reader: prepare for read-ahead outliving the reader The foreign reader keeps track of ongoing read-aheads via a foreign_ptr to the read-ahead's future on the remote shard. This pointer is overwritten after each "remote call" to the remote reader with a pointer to the future of the new read-ahead's future. There are severeal problems with the current implementation: 1) There is a new read-ahead launched after each "remote call" unconditionally, even if the remote reader is at EOS. This will start unecessary read-ahead when the reader is already finished and may be soon destroyed (legally) by the client. 2) The pointer to the remote read-ahead future is not set to nullptr when a remote call is issued. Thus in the destructor, where we attach a continuation to the read-ahead's future to extend the reader's lifetime until after the read-ahead finishes, we migh attach a continuation to a future that already has one and run into a failed assert(). To fix this issues reset the read-ahead pointer to nullptr each time a remote call is issued and don't start a new read-ahead if the remote reader is at EOS. This way we can ensure that when the reader is destroyed we either have a valid and non-stale read-aead future or none at all and can reliably make a decision about whether we need to extend the lifetime of the remote reader or not.	2018-04-30 14:34:43 +03:00
Botond Dénes	704d3d8421	multishard_combining_reader: avoid creating the shard reader twice The multishard reader creates its shard readers on demand when they are first attempted to be used. However at this time the reader migh already be in the progress of being created, initiated by a previous read-ahead. To avoid creating the shard reader twice, before creating the reader check whether there are any read-aheads in progress. If there is, it already created (is creating or will create) the reader and hence synchronise with the read ahead. Synchronisation happens via a promise, the read ahead creates a promise which will be fulfilled when the reader is created. A concurrent create_reader() call will wait on this promise instead of attempting to create a new reader.	2018-04-30 14:34:43 +03:00
Botond Dénes	f9464cfcd7	multishard_combining_reader: read_ahead: don't assume reader is created Currently it is assumed that when read_ahead is called the reader is already created. Under most circumstances this will not be true. It was blind (bad) luck that we didn't hit this before (during testing).	2018-04-30 14:34:43 +03:00
Botond Dénes	d9fceb398a	multishard_combining_reader: move read-ahead related methods To the group of methods that do not assume the reader is already created. A patch will follow that will update read_ahead() to not assume that the reader is created.	2018-04-30 14:34:43 +03:00
Botond Dénes	5dcfaa68f6	multishard_combining_reader: avoid looking up the shard reader twice	2018-04-30 14:34:43 +03:00
Botond Dénes	79504a7d28	multishard_combining_reader: use optional for maybe created reader After a little "research" [1] it turns out my initial fears were completely without ground, std::optional::operator->() and std::optional::opterator() doesn't involve an unnecessary branch and thus there is no need to hand-roll an optional with a separate bool. [1] http://en.cppreference.com/w/cpp/utility/optional/operator	2018-04-30 14:34:37 +03:00
Tomasz Grabiec	423712f1fe	storage_proxy: Request schema from the coordinator in the original DC The mutation forwarding intermediary (src_addr) may not always know about the schema which was used by the original coordinator. I think this may be the cause of the "Schema version ... not found" error seen in one of the clusters which entered some pathological state: storage_proxy - Failed to apply mutation from 1.1.1.1#5: std::_Nested_exception<schema_version_loading_failed> (Failed to load schema version 32893223-a911-3a01-ad70-df1eb2a15db1): std::runtime_error (Schema version 32893223-a911-3a01-ad70-df1eb2a15db1 not found) Fixes #3393. Message-Id: <1524639030-1696-1-git-send-email-tgrabiec@scylladb.com>	2018-04-30 12:51:09 +03:00
Nadav Har'El	1bbf7ba78c	secondary index: add tests for IF NOT EXISTS, IF EXISTS Confirm that issue #2991 is indeed fixed - creating a secondary index with IF NOT EXISTS ignores an already existing index, and dropping with IF EXISTS ignores a non-existant index. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180430071714.10154-1-nyh@scylladb.com>	2018-04-30 10:36:50 +02:00
Nadav Har'El	6e3a53fab0	secondary index: improve testing of case-sensitive column names The existing test_secondary_index_case_sensitive only tested the case-sensitive case of the column being indexed, and only in some scenarios. Further testing exposed more bugs - issue #3388, issue #3391, issue #3401. This patch adds tests which reproduced those bugs, and now verifies their fix. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-9-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	a556b2b367	materialized views: fix test_case_sensitivity test test_case_sensitivity from tests/view_schema_test.cc was well-intentioned, aiming to test from different angles the issue of non-lowercase (quoted) column names and their interaction with materialized views. But unfortunately, it didn't test anything! This is because the quotation marks were forgotten, so all the identifier in this test were folded to lowercase, and the test didn't test non-lowercase identifiers like it intended. So this patch adds the missing quotes, to make this test great again. After the patches for issues #3388 and #3391 which I sent earlier, the test passes (before those patches, the fixed test did not pass - the unfixed test trivially passed). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-8-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	46d4f6f352	secondary index: fix yet another case sensitivity bug When the secondary index code builds a "%s IS NOT NULL" clause for a CQL statement, it needs to quote the column name if it needs to be (not only lowercase, digits and _). Fixes #3401. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-7-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	8012f231ca	materialized views: fix another case-sensitivity bug We had another case-sensitivity bug in materialized views, where if a case-sensitive (quoted) column name was listed explicitly on "SELECT" (instead of implicitly, e.g., in "SELECT *") the column name was incorrectly folded to lower-case and inserts would fail. This patch fixes the code, where a "SELECT" statement was built using the desired column names, but column names that needed quoting were not being quoted. The bug was in a helper function build_select_statement() which took column name strings and failed to quote them. We clean up this function to take column definitions instead of strings - and take care of the quoting itself. It also needs to quote the table's name in the select statement being built. Fixes #3391. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-6-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	e2b2506cb1	materialized views - fix case-sensitive IS NOT NULL Before this patch, if a materialized view is defined with the restriction IS NOT NULL on a case-sensitive (quoted) column name, inserts fail with a "restriction 'foobar IS NOT null' unknown column foobar" error, where foobar is the lowercased version of the case-sensitive column name. The problem is that the code uses single_column_relation::to_string() to convert the relation into a CQL where clause. And indeed, this method generates a CQL expression; But it calls column_identifier::raw::to_string() to print identifiers. This is the wrong function - it doesn't quote identifiers that need quoting because they are not lowercase. So this patch uses column_identifier::raw::to_cql_string() (a method we added in the previous patch) to generate the properly quoted CQL relation. Fixes #3388 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-5-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	b8ee50e6b9	Implement column_identifier::raw::to_cql_string() Implement a method column_identifier::raw::to_cql_string(). Exactly like the one without "raw", this method quotes the identifier name as needed for CQL. We'll need this method in a later patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-4-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	993c4441e5	column_identifier::to_cql_string() using maybe_quote() There is no reason for to_cql_string() and maybe_quote() to both implement the same quoting algorithm. Use the latter to implement the former. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-3-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	f4178f9582	Fix cql3::util::maybe_quote() The utility function maybe_quote() is supposed to quote identifier names (name of keyspace, table, or column) according to CQL rules, e.g., if the name has any uppercase or non-alphanumeric characters, it needs to be quoted. Unfortunatelty, it didn't quite do the right thing, so this patch fixes that. This patch also adds a comment explaining what maybe_quote() is supposed to do (until now, users could only guess). Fixes #3400. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-2-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	ecc85297a4	secondary index: clean up dead unquoting code In commit `d674b6f672`, I fixed a case- sensitive column name bug by avoiding CQL quoting of a column name in create_index_statement.cc when building a "targets" option string. However, there is also matching code in target_parser.hh to unquote that option string. So this unquoting code is no longer necessary, and should be dropped. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-1-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Nadav Har'El	a0bc0d2d11	secondary index: fix support for compound partition key In the current code, if the base table has a compound partition key (i.e., multiple partition-key columns) searching its secondary indexes didn't work. There is no real reason why this, it was a just a bug in preparing the second query: Every SI query is converted to two queries. The first queries the associated materialized view, to find a list of primary keys. Those we need to use in a second query, of the base table. The second query needs to list, as restrictions, the keys found above. When a partition key is compound, its components build one key and one restriction. But in the buggy code, we incorrectly used each component as a separate (improperly formatted) key and restriction, and obviously this didn't work. This patch also adds a test that reproduces this problem and confirms its fix. In the fixed code I also found another incorrect use of to_cql_string() (which could break case-sensitive primary key column names) and changed it to to_string(). Fixes #3210. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429124138.24406-1-nyh@scylladb.com>	2018-04-29 14:40:13 +01:00
Duarte Nunes	b1dd1876e5	gms/gossiper: Prevent duplicate processing of EchoMessage reply We make multiple attempts to mark a node as alive. We do that be sending an EchoMessage, and marking the node as alive upon receiving a successful answer. In case there's a network partition and the nodes can't reach each other, multiple messages may be delivered and processed. We can avoid processing duplicate EchoMessage replies by checking whether we had already marked the node as alive. Fixes #1184 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180428191942.31990-1-duarte@scylladb.com>	2018-04-29 14:20:01 +03:00
Avi Kivity	51b235aa7e	compress: adjust HAVE_LZ4_COMPRESS_DEFAULT macro for new name Seastar changed the name of this macro.	2018-04-29 12:57:27 +03:00
Avi Kivity	0530653da9	Merge "adapt scylla_io_setup to recent I/O Scheduler changes" from Glauber " Recently many changes have landed in seastar for the I/O Scheduler. We can now describe the I/O storage of a machine by its visible properties like throughput and bandwidth instead of relying in an indirect calculation. For the instances we support, we can just measure that and start using them right away. A version of iotune that computes those properties is not yet ready, but in its making I have noticed that we aren't really setting the nomerges and scheduler properties of the disks under testing. We definitely should, since that can influence the results. So this patchset also starts doing that. The commandline for iotunev2 shouldn't change much. When it is ready we will just adjust this script once more. " * 'scylla_io_setup' of github.com:glommer/scylla: scylla_io_setup: preconfigure i3 and i2 instances with new I/O scheduler properties scylla_lib: drop support for m3 and c3 AWS instance types io_setup: call blocktune before tuning I/O blocktune: allow it to be called as a library. scripts: move scylla-blocktune to scripts location	2018-04-29 11:44:06 +03:00
Avi Kivity	7161244130	Merge seastar upstream * seastar 70aecca...ac02df7 (5): > Merge "Prefix preprocessor definitions" from Jesse > cmake: Do not enable warnings transitively > posix: prevent unused variable warning > build: Adjust DPDK options to fix compilation > io_scheduler: adjust property names DEBUG, DEFAULT_ALLOCATOR, and HAVE_LZ4_COMPRESS_DEFAULT macro references prefixed with SEASTAR_. Some may need to become Scylla macros.	2018-04-29 11:03:21 +03:00

1 2 3 4 5 ...

15239 Commits