scylladb

Author	SHA1	Message	Date
Botond Dénes	450b9ac9bf	multishard_combining_reader: shard reader: don't stop on non-full prefixes This patch is a backport of the fix for #4733 (merged to master as `0cf4fab`). As the shard reader code has been substantially refactored post the 3.0 branch cut time, that fix cannot be backported at all, instead this is a separate fix developed specially for 3.0. To quickly reiterate, the problem at hand is that when recreating a previously evicted shard reader of a multishard reader, the position of the last fragment seen by that reader is used as the position after which the read resumes. For this we just created a clustering range starting from after the key (open bound). This works well in most cases but when that last key is a non-full prefix this will also ignore any still unread clustering rows that falls into that prefix. This patch doesn't attempt to fix the problem in a systematic way like the fix in master does, making sure reader recreation works properly with prefixes as well, instead, for the sake of minimizing the impact, we simply avoid ending the buffer on a prefix key. This fix is more naive and can cause over-read when the stream contains lots of successive range tombstones with prefix positions. On the other hand, this leads to a much simpler fix, and anyway, as reader eviction is much rarer in 3.0 this should have a lesser impact. A unit test is also added to make sure the problem is fixed. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190819120748.28168-1-bdenes@scylladb.com>	2019-08-19 15:09:47 +03:00
Kamil Braun	a690e20966	Fix infinite looping when performing a range query on system.size_estimates. Queries to system.size_estimates table which are not single parition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. This commit fixes the issue and closes #4689.	2019-08-14 12:51:33 +02:00
Kamil Braun	7172009a0d	Fix segmentation fault when querying system.size_estimates for an empty keyspace.	2019-08-14 12:51:33 +02:00
Kamil Braun	cb688ef62e	Refactor size_estimates_virtual_reader Move the implementation of size_estimates_mutation_reader to a separate compilation unit to speed up compilation times and increase readability. Refactor tests to use seastar::thread.	2019-08-14 12:51:27 +02:00
Avi Kivity	094a2a4263	Merge "Catch unclosed partition sstable write #4794 " from Tomasz " Not emitting partition_end for a partition is incorrect. SStable writer assumes that it is emitted. If it's not, the sstable will not be written correctly. The partition index entry for the last partition will be left partially written, which will result in errors during reads. Also, statistics and sstable key ranges will not include the last partition. It's better to catch this problem at the time of writing, and not generate bad sstables. Another way of handling this would be to implicitly generate a partition_end, but I don't think that we should do this. We cannot trust the mutation stream when invariants are violated, we don't know if this was really the last partition which was supposed to be written. So it's safer to fail the write. Enabled for both mc and la/ka. Passing --abort-on-internal-error on the command line will switch to aborting instead of throwing an exception. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down. " * 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla: sstables: writer: Validate that partition is closed when the input mutation stream ends config, exceptions: Add helper for handling internal errors utils: config_file: Introduce named_value::observe() (cherry picked from commit `95c0804731`) (cherry picked from commit `cf4c238b28`)	2019-08-08 16:47:26 +03:00
Raphael S. Carvalho	3172cc6bac	sstables/compaction: Fix segfault when replacing expired sstable in incremental compaction Fully expired sstable is not added to compacting set, meaning it's not actually compacted, but it's kept in a list of sstables which incremental compaction uses to check if any sstable can be replaced. Incremental compaction was unconditionally removing expired sstable from compacting set, which led to segfault because end iterator was given. The fix is about changing sstable_set::erase() behavior to follow standard one for erase functions which will works if the target element is not present. Fixes #4085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190130163100.5824-1-raphaelsc@scylladb.com> (cherry picked from commit `930f8caff9`)	2019-07-22 15:07:00 +03:00
Kamil Braun	e30c289835	Fix timestamp_type_impl::timestamp_from_string. Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00. Fixes #4641. Signed-off-by: Kamil Braun <kbraun@scylladb.com> (cherry picked from commit `4417e78125`)	2019-07-17 21:56:03 +03:00
kbr-	7d743563bf	Implement tuple_type_impl::to_string_impl. (#4645 ) Resolves #4633. Signed-off-by: Kamil Braun <kbraun@scylladb.com> (cherry picked from commit `8995945052`)	2019-07-08 11:11:30 +03:00
Avi Kivity	bdcbf4aa4e	Merge "Backport fixing infinite paging for indexed queries" from Piotr " This series backports fixing infinite paging for indexed queries to branch-3.0. Tests: unit(dev) " Fixes #4569 * 'fix_infinite_paging_for_indexed_queries_for_3.0' of https://github.com/psarna/scylla: tests: add test case for finishing index paging cql3: fix infinite paging for indexed queries	2019-06-25 11:56:11 +03:00
Avi Kivity	e80cd9dfed	Merge "Backport fixing ignoring ck restrictions in filtering" from Piotr " Tests: unit(dev) Refs #4541 " * 'fix_ignoring_ck_restrictions_in_filtering_for_3.0_2' of https://github.com/psarna/scylla: tests: add a test case for filtering clustering key cql3: fix qualifying clustering key restrictions for filtering cql3: fix fetching clustering key columns for filtering	2019-06-25 11:56:11 +03:00
Piotr Sarna	87fd298a6e	tests: add a test case for filtering clustering key The test cases makes sure that clustering key restriction columns are fetched for filtering if they form a clustering key prefix, but not a primary key prefix (partition key columns are missing). Ref #4541	2019-06-25 10:05:34 +02:00
Piotr Sarna	fcab0d1392	tests: add test case for finishing index paging The test case makes sure that paging indexes does not result in an infinite loop. Refs #4569 (cherry picked from commit `b8cadc928c`)	2019-06-24 10:14:35 +02:00
Piotr Sarna	7b8e570e6c	tests: add case for partition key index and filtering The test ensures that partition key index does not influence filtering decisions for regular columns. Ref #4539	2019-06-18 13:35:32 +02:00
Paweł Dziepak	f2d2a9f5b8	Merge "Fix empty counters handling in MC" from Piotr " Before this patchset empty counters were incorrectly persisted for MC format. No value was written to disk for them. The correct way is to still write a header that informs the counter is empty. We also need to make sure that reading wrongly persisted empty counters works because customers may have sstables with wrongly persisted empty counters. Fixes #4363 " * 'haaawk/4363/v3' of github.com:scylladb/seastar-dev: sstables: add test for empty counters docs: add CorrectEmptyCounters to sstable-scylla-format sstables: Add a feature for empty counters in Scylla.db. sstables: Write header for empty counters sstables: Remove unused variables in make_counter_cell sstables: Handle empty counter value in read path (cherry picked from commit `899ebe483a`)	2019-05-24 06:23:38 +03:00
Botond Dénes	b3cbc2e58a	tests: add unit test for multishard reader correctly handling non-strictly monotonous positions (cherry picked from commit `aa18bb33b9`)	2019-05-06 11:19:04 +03:00
Botond Dénes	e4c1c4f052	flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice To be able to support this new overload, the reader is made partition-range aware. It will now correctly only return fragments that fall into the partition-range it was created with. For completeness' sake and to be able to test it, also implement `fast_forward_to(const dht::partition_range)`. Slicing is done by filtering out non-overlapping fragments from the initial list of fragments. Also add a unit test that runs it through the mutation_source test suite. (cherry picked from commit `51e81cf027`)	2019-05-06 11:19:04 +03:00
Botond Dénes	6a4bc5bd71	flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice To be able to run the mutation-source test suite with this reader. In the next patch, this reader will be used in testing another reader, so it is important to make sure it works correctly first. (cherry picked from commit `bc08f8fd07`)	2019-05-06 09:17:48 +03:00
Avi Kivity	e32e682911	Merge "SI: Add virtual columns to underlying MV" from Duarte " Virtual columns are MV-specific columns that contribute to the liveness of view rows. However, we were not adding those columns when creating an index's underlying MV, causing indexes to miss base rows. Fixes #4144 Branches: master, branch-3.0 " Reviewed-by: Nadav Har'El <nyh@scylladb.com> * 'sec-index/virtual-columns/v1' of https://github.com/duarten/scylla: tests/secondary_index_test: Add reproducer for #4144 index/secondary_index_manager: Add virtual columns to MV (cherry picked from commit `ebf179318c`)	2019-05-01 12:58:35 +01:00
Duarte Nunes	394afae3a8	service/migration_manager: Validate duplicate ID in time We allow tables to be created with the ID property, mostly for advanced recovery cases. However, we need to validate that the ID doesn't match an existing one. We currently do this in database::add_column_family(), but this is already too late in the normal workflow: if we allow the schema change to go through, then it is applied to the system tables and loaded the next time the node boots, regardless of us throwing from database::add_column_family(). To fix this, we perform this validation when announcing a new table. Note that the check wasn't removed from database::add_column_family(); it's there since 2015 and there might have been other reasons to add it that are not related to the ID property. Refs #2059 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181001230142.46743-1-duarte@scylladb.com> (cherry picked from commit `7ba944a243`)	2019-04-17 18:01:48 +01:00
Avi Kivity	403f66ecad	Revert "Revert "compaction: fix use-after-free when calculating backlog after schema change"" This reverts commit `841ceac4f9`. It was reverted because the test failed; this turned out due to a miscompile of the test. With this additional fix, the test compiles correctly: --- a/tests/sstable_datafile_test.cc +++ b/tests/sstable_datafile_test.cc @@ -4785,11 +4785,11 @@ SEASTAR_TEST_CASE(backlog_tracker_correctness_after_stop_tracking_compaction) { auto fut = sstables::compact_sstables(sstables::compaction_descriptor(ssts), cf, sst_gen); bool stopped_tracking = false; for (auto& info : cf._data->cm.get_compactions()) { - if (info->cf == &cf) { + if (info->cf == cf->schema()->cf_name()) { info->stop_tracking(); stopped_tracking = true; } } BOOST_REQUIRE(stopped_tracking); info->cf is an sstring, and &cf is a table. It's not clear how the compiler was able to compare an sstring and a pointer.	2019-04-12 21:45:59 +03:00
Shlomi Livne	841ceac4f9	Revert "compaction: fix use-after-free when calculating backlog after schema change" This reverts commit `2b326fc7fa`.	2019-04-12 09:55:06 +03:00
Raphael S. Carvalho	2b326fc7fa	compaction: fix use-after-free when calculating backlog after schema change The problem happens after a schema change because we fail to properly remove ongoing compaction, which stopped being tracked, from list that is used to calculate backlog, so it may happen that a compaction read monitor (ceases to exist after compaction ends) is used after freed. Fixes #4410. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190409024936.23775-1-raphaelsc@scylladb.com> (cherry picked from commit `8a117c338a`)	2019-04-12 00:19:58 +03:00
Nadav Har'El	83a8f779bb	view_complex_test: fix another ttl In a previous patch I fixed most TTLs in the view_complex_test.cc tests from low numbers to 100 seconds. I missed one. This one never caused problems in practice, but for good form, let's fix it too. Ref #3918. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115160234.26478-1-nyh@scylladb.com> (cherry picked from commit `45f05b06d2`)	2019-03-31 12:28:37 +03:00
Nadav Har'El	6066968e33	view_complex_test: increase low ttl which may fail test on busy machine Several of the tests in tests/view_complex_test.cc set a cell with a TTL, and then skip time ahead artificially with forward_jump_clocks(), to go past the TTL time and check the cell disappeared as expected. The TTLs chosen for these tests were arbitrary numbers - some had 3 seconds, some 5 seconds, and some 60 seconds. The actual number doesn't matter - it is completely artificial (we move the clock with forward_jump_clocks() and never really wait for that amount of time) and could very well be a million seconds. But low numbers, like the 3 seconds, present a problem on extremely overcomitted test machines. Our eventually() function already allows for the possibility that things can hang for up to 8 seconds, but with a 3 second TTL, we can find ourselves with data being expired and the test failing just after 3 seconds of wall time have passed - while the test intended that the dataq will expire only when we explicitly call forward_jump_clocks(). So this patch changes all the TTLs in this test to be the same high number - 100 seconds. This hopefully fixes #3918. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20181115125607.22647-1-nyh@scylladb.com> (cherry picked from commit `4108458b8e`)	2019-03-31 12:27:48 +03:00
Tomasz Grabiec	d3d877b9db	Merge "db/view: Apply tracked tombstones for new updates" from Duarte When generating view updates for base mutations when no pre-existing data exists, we were forgetting to apply the tracked tombstones. Fixes #4321 Tests: unit(dev) * https://github.com/duarten/scylla materialized-views/4321/v1.1: db/view: Apply tracked tombstones for new updates tests/view_schema_test: Add reproducer for #4321 (cherry picked from commit `2b8bf0dbf8`)	2019-03-27 21:56:21 +00:00
Avi Kivity	97357a7321	Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz " Static compact tables are tables with compact storage and no clustering columns. Before this patch, Scylla was writing rows of static compact tables as clustered rows instead of as static rows. That's because in our in-memory model such tables have regular rows and no static row. In Cassandra's schema (since 3.x), those tables have columns which are marked as static and there are no regular columns. This worked fine as long as Scylla was writing and reading those sstables. But when importing sstables from Cassandra, our reader was skipping the static row, since it's not present in our schema, and returning no rows as a result. Also, Cassandra, and Scylla tools, would have problems reading those sstables. Fix this by writing rows for such tables the same way as Cassandra does. In order to support rolling downgrade, we do that only when all nodes are upgraded. Fixes #4139. Tests: - unit (dev) " * tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla: tests: sstables: Test reading of static compact sstable generated by Cassandra tests: sstables: Add test for writing and reading of static compact tables sstables: mc: Write static compact tables the same way as Cassandra sstable: mc: writer: Set _static_row_written inside write_static_row() sstables: Add sstable::features() sstables: mc: writer: Prepare write_static_row() for working with any column_kind storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag sstables: mc: writer: Build indexed_columns together with serialization_header sstables: mc: writer: De-optimize make_serialization_header() sstable: mc: writer: Move attaching of mc-specific components out of generic code (cherry picked from commit `eddb98e8c6`)	2019-03-24 16:34:42 +02:00
Tomasz Grabiec	089e41999a	tests: sstables: Extract make_sstable_mutation_source() Message-Id: <1540459849-27612-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `46d0c157ae`)	2019-03-24 13:48:41 +02:00
Tomasz Grabiec	ea0f1c039d	row_cache: Fix abort in cache populating read concurrent with memtable flush When we're populating a partition range and the population range ends with a partition key (not a token) which is present in sstables and there was a concurrent memtable flush, we would abort on the following assert in cache::autoupdating_underlying_reader: utils::phased_barrier::phase_type creation_phase() const { assert(_reader); return _reader_creation_phase; } That's because autoupdating_underlying_reader::move_to_next_partition() clears the _reader field when it tries to recreate a reader but it finds the new range to be empty: if (!_reader \|\| _reader_creation_phase != phase) { if (_last_key) { auto cmp = dht::ring_position_comparator(_cache._schema); auto&& new_range = _range.split_after(_last_key, cmp); if (!new_range) { _reader = {}; return make_ready_future<mutation_fragment_opt>(); } Fix by not asserting on _reader. creation_phase() will now be meaningful even after we clear the _reader. The meaning of creation_phase() is now "the phase in which the reader was last created or 0", which makes it valid in more cases than before. If the reader was never created we will return 0, which is smaller than any phase returned by cache::phase_of(), since cache starts from phase 1. This shouldn't affect current behavior, since we'd abort() if called for this case, it just makes the value more appropriate for the new semantics. Tests: - unit.row_cache_test (debug) Fixes #4236 Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `69775c5721`)	2019-03-22 09:31:59 -03:00
Tomasz Grabiec	751fdc9f6c	Merge "Fix window during init where waiting for a feature can be ignored" from Avi storage_service keeps a bunch of "feature" variables, indicating cluster-wide supported features, and has the ability to wait until the entire cluster supports a given feature. The propagation of features depends on gossip, but gossip is initialized after storage_service, so the current code late-initializes the features. However, that means that whoever waits on a feature between storage_service initialization and gossip initialization loses their wait entry. In #3952, we have proof that this in fact happens. Fix this by removing the circular dependency. We now store features in a new service, feature_service, that is started before both gossip and storage_service. Gossip updates feature_service while storage_service reads for it. Fixes #3953. * https://github.com/avikivity/3953/v4.1: storage_service: deinline enable_all_features() gossiper: keep features registered tests/gossip: switch to seastar::thread storage_service: deinline init/deinit functions gossiper: split feature storage into a new feature_service gossiper: maybe enable features after start_gossiping() storage_service: fix gap when feature::when_enabled() doesn't work (cherry picked from commit `6012a63660`)	2019-03-20 17:25:35 +08:00
Avi Kivity	3869b5ab51	Merge "Fix commitlog chunks overwriting each other" from Paweł " This series fixes a problem in the commitlog cycle() function that confused in-memory and on-disk size of chunks it wrote to disk. The former was used to decide how much data needs to be actually written, and the latter was used to compute the offset of the next chunk. If two chunk writes happened concurrently one the one positioned earlier in the file could corrupt the header of the next one. Fixes #4231. Tests: unit(dev), dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup,test_commitlog_replay_with_alter_table) " * tag 'fix-commitlog-cycle/v1' of https://github.com/pdziepak/scylla: commitlog: write the correct buffer size utils/fragmented_temporary_buffer_view: add remove suffix (cherry picked from commit `d95dec22d9`)	2019-03-04 17:58:46 +02:00
Avi Kivity	8e657e5685	Merge " Fix INSERT JSON with null values" from Piotr " Fixes #4256 This miniseries fixes a problem with inserting NULL values through INSERT JSON interface. Tests: unit (dev) " * 'fix_insert_json_with_null' of https://github.com/psarna/scylla: tests: add test for INSERT JSON with null values cql3: add missing value erasing to json parser (cherry picked from commit `5520fc37ba`)	2019-02-22 15:52:46 +02:00
Avi Kivity	4fde670abf	Merge "Add DEFAULT UNSET support to JSON" from Piotr " This series adds DEFAULT UNSET and DEFAULT NULL keyword support to INSERT JSON statement, as stated in #3909. Tests: unit (release) " * 'add_json_default_unset_2' of https://github.com/psarna/scylla: tests: add DEFAULT UNSET case to JSON cql tests tests: split JSON part of cql query test cql3: add DEFAULT UNSET to INSERT JSON (cherry picked from commit `447f953a2c`)	2019-02-22 15:52:16 +02:00
Avi Kivity	fae11c0d6b	fragmented_temporary_buffer: fix read_exactly() during premature end-of-stream read_exactly(), when given a stream that does not contain the amount of data requested, will loop endlessly, allocating more and more memory as it does, until it fails with an exception (at which point it will release the memory). Fix by returning an empty result, like input_stream::read_exactly() (which it replaces). Add a test case that fails without a fix. Affected callers are the native transport, commitlog replay, and internal deserialization. Fixes #4233. Branches: master, branch-3.0 Tests: unit(dev) Message-Id: <20190216150825.14841-1-avi@scylladb.com> (cherry picked from commit `03531c2443`)	2019-02-20 11:03:11 +02:00
Nadav Har'El	82016c07f2	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com> (cherry picked from commit `fec562ec8f`)	2019-02-16 21:54:41 +02:00
Duarte Nunes	0685c8f5bc	Merge 'Fix misdetection of remote counter shards' from Paweł " The code reading counter cells form sstables verifies that there are no unsupported local or remote shards. The latter are detected by checking if all shards are present in the counter cell header (only remote shards do not have entries there). However, the logic responsible for doing that was incorrectly computing the total number of counter shards in a cell if the header was larger than a single counter shard. This resulted in incorrect complaints that remote shards are present. Fixes #4206 Tests: unit(release) " * tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla: tests/sstables: test counter cell header with large number of shards sstables/counters: fix remote counter shard detection (cherry picked from commit `d2d885fb93`)	2019-02-11 14:09:55 +02:00
Nadav Har'El	9ba608cae4	cql3: really ensure retrieval of columns for filtering Commit `fd422c954e` aimed to fix issue #3803. In that issue, if a query SELECTed only certain columns but did filtering (ALLOW FILTERING) over other unselected columns, the filtering didn't work. The fix involved adding the columns being filtered to the set of columns we read from disk, so they can be filtered. But that commit included an optimization: If you have clustering keys c1 and c2, and the query asks for a specific partition key and c1 < 3 and c2 > 3, the "c1 < 3" part does NOT need to be filtered because it is already done as a slice (a contiguous read from disk). The committed code erroneously concluded that both c1 and c2 don't need to be filtered, which was wrong (c2 does need to be read and filtered). In this patch, we fix this optimization. Previously, we used the "prefix length", which in the above example was 2 (both c1 and c2 were filtered) but we need a new and more elaborate function, num_prefix_columns_that_need_not_be_filtered(), to determine we can only skip filtering of 1 (c1) and cannot skip the second. Fixes #4121. This patch also adds a unit test to confirm this. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190123131212.6269-1-nyh@scylladb.com> (cherry picked from commit `76f1fcc346`)	2019-01-23 21:11:05 +02:00
Duarte Nunes	22a085fbd3	Merge 'Fix filtering with LIMIT and paging' from Piotr " Before this series the limit was applied per page instead of globally, which might have resulted in returning too many rows. To fix that: 1. restrictions filter now has a 'remaining' parameter in order to stop accepting rows after enough of them have already been accepted 2. pager passes its row limit to restrictions filter, so no more rows than necessary will be served to the client 3. results no longer need to be trimmed on select_statement level Tests: unit (release) " Fixes #4100 * 'fix_filtering_limit_with_paging_3' of https://github.com/psarna/scylla: tests: add filtering+limit+paging test case tests: allow null paging state in filtering tests cql3: fix filtering with LIMIT with regard to paging (cherry picked from commit `7505815013`)	2019-01-17 18:07:41 +02:00
Tomasz Grabiec	2d181da656	row_cache: Fix crash on memtable flush with LCS Presence checker is constructed and destroyed in the standard allocator context, but the presence check was invoked in the LSA context. If the presence checker allocates and caches some managed objects, there will be alloc-dealloc mismatch. That is the case with LeveledCompactionStrategy, which uses incremental_selector. Fix by invoking the presence check in the standard allocator context. Fixes #4063. Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `32f711ce56`)	2019-01-15 21:16:13 +02:00
Avi Kivity	8168d13887	Merge "Fix UDTs representation in serialization header" from Piotr " Tests: unit(release) " Fixes #4073. * commit 'FETCH_HEAD~1': Add test for serialization header with UDT Fix UDT names in serialization header (cherry picked from commit `4a6aeced59`)	2019-01-11 07:48:23 +02:00
Avi Kivity	2fcae36d96	tests: mutation_source_test: generate valid utf-8 data test_fast_forwarding_across_partitions_to_empty_range uses an uninitialized string to populate an sstable, but this can be invalid utf-8 so that sstable cannot be sstabledumped. Make it valid by using make_random_string(). Fixes #4040. Message-Id: <20190107193240.14409-1-avi@scylladb.com> (cherry picked from commit `d8adbeda11`)	2019-01-08 14:53:55 +02:00
Nadav Har'El	515399ce17	materialized views: move hints to top-level directory While we keep ordinary hints in a directory parallel to the data directory, we decided to keep the materialized view hints in a subdirectory of the data directory, named "view_pending_updates". But during boot, we expect all subdirectories of data/ to be keyspace names, and when we notice this one, we print a warning: WARN: database - Skipping undefined keyspace: view_pending_updates This spurious warning annoyed users. But moreover, we could have bigger problems if the user actually tries to create a keyspace with that name. So in this patch, we move the view hints to a separate top-level directory, which defaults to /var/lib/scylla/view_hints, but as usual can be configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190107142257.16342-1-nyh@scylladb.com> (cherry picked from commit `da090a5458`)	2019-01-07 22:01:56 +02:00
Tomasz Grabiec	f818d6ee3f	tests: cql_test_env: Start the compaction manager Broken in `fee4d2e` Not doing this results in compaction requests being ignored. One effect of this is that perf_fast_forward produces many sstables instead of one. Refs #3984 Refs #3983 Message-Id: <1544719540-10178-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `245a0d953a`)	2019-01-03 14:56:42 +01:00
Avi Kivity	f58e592345	Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz " partition_snapshots created in the memtable will keep a reference to the memtable (as region) and to memtable::_cleaner. As long as the reader is alive, the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumes destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043` (in >= 3.0-rc1) Fixes #4030. Tests: - mvcc_test (debug) " tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla: tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed tests: mvcc: Introduce mvcc_container::migrate() tests: mvcc: Make mvcc_partition move-constructible tests: mvcc: Introduce mvcc_container::make_not_evictable() tests: mvcc: Allow constructing mvcc_container without a cache_tracker mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup mvcc: partition_snapshot: Introduce migrate() mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner (cherry picked from commit `8e2f6d0513`)	2018-12-28 13:37:29 +02:00
Avi Kivity	b94997be0d	Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz " The motivation is to keep code related to each format separate, to make it easier to comprehend and reduce incremental compilation times. Also reduces dependency on sstable writer code by removing writer bits from sstales.hh. The ka/la format writers are still left in sstables.cc, they could be also extracted. " * 'extract-sstable-writer-code' of github.com:tgrabiec/scylla: sstables: Make variadic write() not picked on substitution error sstables: Extract MC format writer to mc/writer.cc sstables: Extract maybe_add_summary_entry() out of components_writer sstables: Publish functions used by writers in writer.hh sstables: Move common write functions to writer.hh sstables: Extract sstable_writer_impl to a header sstables: Do not include writer.hh from sstables.hh sstables: mc: Extract bound_kind_m related stuff into mc/types.hh sstables: types: Extract sstable_enabled_features::all() sstables: Move components_writer to .cc tests: sstable_datafile_test: Avoid dependency on components_writer (cherry picked from commit `b023e8b45d`)	2018-12-21 20:40:35 +02:00
Avi Kivity	dbe347811c	Merge "materialized views: Apply backpressure from view replicas" from Duarte " As the amount of pending view updates increases we know that there’s a mismatch between the rate at which the base receives writes and the rate at which the view retires them. We react by applying backpressure to decrease the rate of incoming base writes, allowing the slow view replicas to catch up. We want to delay the client’s next writes to a base replica and we use the base’s backlog of view updates to derive this delay. To validate this approach we tested a 3 node Scylla cluster on GCE, using n1-standard-4 instances with NVMEs. A loader running on a n1-standard-8 instance run cassandra-stress with 100 threads. With the delay function d(x) set to 1s, we see no base write timeouts. With the delay function as defined in the series, we see that backlogs stabilize at some (arbitrary) point, as predicted, but this stabilization co-exists with base write timeouts. However, the system overall behaves better than the current version, with the 100 view update limit, and also better than the version without such limit or any backpressure. More work is necessary to further stabilize the system. Namely, we want to keep delaying until we see the backlog is decreasing. This will require us to add more delay beyond the stabilization point, which in turn should minimize the base write timeouts, and will also minimize the amount of memory the backlog takes at each base replica. Design document: https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo Fixes #2538 " Reviewed-by: Nadav Har'El <nyh@scylladb.com> * 'materialized-views/backpressure/v2' of https://github.com/duarten/scylla: (32 commits) service/storage_proxy: Release mutation as early as possible service/storage_proxy: Delay replica writes based on view update backlog service/storage_proxy: Get the backlog of a particular base replica service/storage_proxy: Add counters for delayed base writes main: Start and stop the view_update_backlog_broker service: Distribute a node's view update backlog service: Advertise view update backlog over gossip service/storage_proxy: Send view update backlog from replicas service/storage_proxy: Prepare to receive replica view update backlog service/storage_proxy: Expose local view update backlog tests/view_schema_test: Add simple test for db::view::node_update_backlog db/view: Introduce node_update_backlog class db/hints: Initialize current backlog database: Add counter for current view backlog database: Expose current memory view update backlog idl: Add db::view::update_backlog db/view: Add view_update_backlog database: Wait on view update semaphore for view building service/storage_proxy: Use near-infinite timeouts for view updates database: generate_and_propagate_view_updates no longer needs a timeout ... (cherry picked from commit `b66f59aa3d`)	2018-12-20 19:11:56 +02:00
Avi Kivity	0b09008cde	Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr " Before the reader was just ignoring such columns but this creates a risk of data loss. Refs #2598 " * 'haaawk/2598/v3' of github.com:scylladb/seastar-dev: sstables: Add test_sstable_reader_on_unknown_column sstables: Exception on sstable's column not present in schema sstables: store column name in column_translation::column_info sstables: Make test_dropped_column_handling test dropped columns (cherry picked from commit `b0cb69ec25`)	2018-12-18 16:23:51 +00:00
Paweł Dziepak	7b6841f947	Merge "Check for schema mismatch after dropping dead cells" from Piotr " Previously we were checking for schema incompatibility between current schema and sstable serialization header before reading any data. This isn't the best approach because data in sstable may be already irrelevant due to column drop for example. This patchset moves the check after actual data is read and verified that it has a timestamp new enough to classify it as nonobsolete. Fixes #3924 " * 'haaawk/3924/v3' of github.com:scylladb/seastar-dev: sstables: Enable test_schema_change for MC format sstables3: Throw error on schema mismatch only for live cells sstables: Pass column_info to consume_*_column sstables: Add schema_mismatch to column_info sstables: Store column data type in column_info sstables: Remove code duplication in column_translation (cherry picked from commit `62ea153629`)	2018-12-18 15:27:53 +00:00
Tomasz Grabiec	f124b7026f	Merge 'Add tests for schema changes' from Paweł This series adds a generic test for schema changes that generates various schema and data before and after an ALTER TABLE operation. It is then used to check correctness of mutation::upgrade() and sstable readers and lead to the discovery of #3924 and #3925. Fixes #3925. * https://github.com/pdziepak/scylla.git schema-change-test/v3.1 schema_builder: make member function names less confusing converting_mutation_partition_applier: fix collection type changes converting_mutation_partition_applier: do not emit empty collections sstable: use format() instead of sprint() tests/random-utils: make functions and variables inline tests: add models for schemas and data tests: generate schema changes tests/mutation: add test for schema changes tests/sstable: add test for schema changes (cherry picked from commit `564b328b2e`)	2018-12-18 14:57:50 +00:00
Avi Kivity	28cca751d1	Merge "Don't binary compare compressed sstables in test_write_many_partitions_* tests" from Piotr " Compression is not deterministic so instead of binary comparing the sstable files we just read data back and make sure everything that was written down is still present. Tests: unit(release) " * 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev: sstables: Remove compressed parameter from get_write_test_path sstables: Remove unused sstable test files sstables: Ensure compare_sstables isn't used for compressed files sstables: Don't binary compare compressed sstables sstables: Remove debug printout from test_write_many_partitions (cherry picked from commit `1ff6b8fb96`)	2018-12-18 14:53:52 +00:00
Botond Dénes	afc9f0e177	querier_cache: check that the query wasn't evicted during registering The reader concurrency semaphore can evict the querier when it is registered as an inactive read. Make the `querier_cache` aware of this so that it doesn't continue to process the inserted querier when this happens. Also add a unit test for this. (cherry picked from commit `5780f2ce7a`)	2018-12-18 14:34:33 +02:00

1 2 3 4 5 ...

2668 Commits