scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Yaron Kaikov	6c0825e2a6	release: prepare for 4.6.11 scylla-4.6.11	2022-11-28 15:45:26 +02:00
Nadav Har'El	db3dd3bdf6	Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek When filtering with multi column restriction present all other restrictions were ignored. So a query like: `SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;` would ignore the restriction `regular_col = 0`. This was caused by a bug in the filtering code: `2779a171fc/cql3/selection/selection.cc (L433-L449)` When multi column restrictions were detected, the code checked if they are satisfied and returned immediately. This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied. This code was introduced back in 2019, when fixing #3574. Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct. Fixes: #6200 Fixes: #12014 Closes #12031 * github.com:scylladb/scylladb: cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works boost/restrictions-test: uncomment part of the test that passes now cql-pytest: enable test for filtering combined multi column and regular column restrictions cql3: don't ignore other restrictions when a multi column restriction is present during filtering (cherry picked from commit `2d2034ea28`) Closes #12086	2022-11-27 00:15:04 +02:00
Pavel Emelyanov	4ad24180f5	Merge '[branch-4.6] multishard_mutation_query: don't unpop partition header of spent partition ' from Botond Dénes When stopping the read, the multishard reader will dismantle the compaction state, pushing back (unpopping) the currently processed partition's header to its originating reader. This ensures that if the reader stops in the middle of a partition, on the next page the partition-header is re-emitted as the compactor (and everything downstream from it) expects. It can happen however that there is nothing more for the current partition in the reader and the next fragment is another partition. Since we only push back the partition header (without a partition-end) this can result in two partitions being emitted without being separated by a partition end. We could just add the missing partition-end when needed but it is pointless, if the partition has no more data, just drop the header, we won't need it on the next page. The missing partition-end can generate an "IDL frame truncated" message as it ends up causing the query result writer to create a corrupt partition entry. Fixes: https://github.com/scylladb/scylladb/issues/9482 Closes #11914 * github.com:scylladb/scylladb: test/cql-pytest: add regression test for "IDL frame truncated" error mutation_compactor: detach_state(): make it no-op if partition was exhausted treewide: fix headers	2022-11-16 11:52:51 +03:00
Anna Mikhlin	755c7eeb6a	release: prepare for 4.6.10 scylla-4.6.10	2022-11-14 10:30:20 +02:00
Eliran Sinvani	8914ca8c58	cql: Fix crash upon use of the word empty for service level name Wrong access to an uninitialized token instead of the actual generated string caused the parser to crash, this wasn't detected by the ANTLR3 compiler because all the temporary variables defined in the ANTLR3 statements are global in the generated code. This essentialy caused a null dereference. Tests: 1. The fixed issue scenario from github. 2. Unit tests in release mode. Fixes #11774 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190612133151.20609-1-eliransin@scylladb.com> Closes #11777 (cherry picked from commit `ab7429b77d`)	2022-11-10 20:43:44 +02:00
Botond Dénes	e82e4bbed3	test/cql-pytest: add regression test for "IDL frame truncated" error (cherry picked from commit `11af489e84`)	2022-11-07 16:51:14 +02:00
Botond Dénes	f9c457778e	mutation_compactor: detach_state(): make it no-op if partition was exhausted detach_state() allows the user to resume a compaction process later, without having to keep the compactor object alive. This happens by generating and returning the mutation fragments the user has to re-feed to a newly constructed compactor to bring it into the exact same state the current compactor was at the point of stopping the compaction. This state includes the partition-header (partition-start and static-row if any) and the currently active range tombstone. Detaching the state is pointless however when the compaction was stopped such that the currently compacted partition was completely exhausted. Allowing the state to be detached in this case seems benign but it caused a subtle bug in the main user of this feature: the partition range scan algorithm, where the fragments included in the detached state were pushed back into the reader which produced them. If the partition happened to be exhausted -- meaning the next fragment in the reader was a partition-start or EOS -- this resulted in the partition being re-emitted later without a partition-end, resulting in corrupt query-result being generated, in turn resulting in an obscure "IDL frame truncated" error. This patch solves this seemingly benign but sinister bug by making the return value of `detach_state()` an std::optional and returning a disengaged optional when the partition was exhausted. (cherry picked from commit `70b4158ce0`)	2022-11-07 16:51:14 +02:00
Botond Dénes	8315a7b164	treewide: fix headers To fix CI.	2022-11-07 16:51:14 +02:00
Nadav Har'El	291ca8db60	cql3: fix cql3::util::maybe_quote() for keywords cql3::util::maybe_quote() is a utility function formatting an identifier name (table name, column name, etc.) that needs to be embedded in a CQL statement - and might require quoting if it contains non-alphanumeric characters, uppercase characters, or a CQL keyword. maybe_quote() made an effort to only quote the identifier name if neccessary, e.g., a lowercase name usually does not need quoting. But lowercase names that are CQL keywords - e.g., to or where - cannot be used as identifiers without quoting. This can cause problems for code that wants to generate CQL statements, such as the materialized-view problem in issue #9450 - where a user had a column called "to" and wanted to create a materialized view for it. So in this patch we fix maybe_quote() to recognize invalid identifiers by using the CQL parser, and quote them. This will quote reserved keywords, but not so-called unreserved keywords, which are allowed as identifiers and don't need quoting. This addition slows down maybe_quote(), but maybe_quote() is anyway only used in heavy operations which need to generate CQL. This patch also adds two tests that reproduce the bug and verify its fix: 1. Add to the low-level maybe_quote() test (a C++ unit test) also tests that maybe_quote() quotes reserved keywords like "to", but doesn't quote unreserved keywords like "int". 2. Add a test reproducing issue #9450 - creating a materialized view whose key column is a keyword. This new test passes on Cassandra, failed on Scylla before this patch, and passes after this patch. It is worth noting that maybe_quote() now has a "forward compatiblity" problem: If we save CQL statements generated by maybe_quote(), and a future version introduces a new reserved keyword, the parser of the future version may not be able to parse the saved CQL statement that was generated with the old mayb_quote() and didn't quote what is now a keyword. This problem can be solved in two ways: 1. Try hard not to introduced new reserved keywords. Instead, introduce unreserved keywords. We've been doing this even before recognizing this maybe_quote() future-compatibility problem. 2. In the next patch we will introduce quote() - which unconditionally quotes identifier names, even if lowercase. These quoted names will be uglier for lowercase names - but will be safe from future introduction of new keywords. So we can consider switching some or all uses of maybe_quote() to quote(). Fixes #9450 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220118161217.231811-1-nyh@scylladb.com> (cherry picked from commit `5d2f694a90`)	2022-11-07 10:38:10 +02:00
Jadw1	4da5fbaa24	CQL3: fromJson accepts string as bool The problem was incompatibility with cassandra, which accepts bool as a string in `fromJson()` UDF. The difference between Cassandra and Scylla now is Scylla accepts whitespaces around word in string, Cassandra don't. Both are case insensitive. Fixes: #7915 (cherry picked from commit `1902dbc9ff`)	2022-11-07 10:38:10 +02:00
Takuya ASADA	fc16664d81	locator::ec2_snitch: Retry HTTP request to EC2 instance metadata service EC2 instance metadata service can be busy, ret's retry to connect with interval, just like we do in scylla-machine-image. Fixes #10250 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #11688 (cherry picked from commit `6b246dc119`) (cherry picked from commit `e2809674d2`)	2022-11-06 15:43:58 +02:00
Botond Dénes	80bea5341e	Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted. In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it). Fixes #11801 Closes #11808 * github.com:scylladb/scylladb: test/alternator: add test for issue 11801 MV: fix handling of view update which reassign the same key value materialized views: inline used-once and confusing function, replace_entry() (cherry picked from commit `e981bd4f21`)	2022-11-01 13:31:51 +02:00
Botond Dénes	6ecc772b56	mutation_partition: deletable_row::apply(shadowable_tombstone): remove redundant maybe_shadow() Shadowing is already checked by the underlying row_tombstone::apply(). This redundant check was introduced by a previous fix to #9483 (`6a76e12768`). The rest of that patch is good. Refs: #9483 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211115091513.181233-1-bdenes@scylladb.com> (cherry picked from commit `b136746040`)	2022-10-16 11:53:04 +03:00
Benny Halevy	0b2e951954	range_tombstone_list: insert_from: correct rev.update range_tombstone in not overlapping case 2nd std::move(start) looks like a typo in `fe2fa3f20d`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220404124741.1775076-1-bhalevy@scylladb.com> (cherry picked from commit `2d80057617`)	2022-10-14 12:29:56 +02:00
Pavel Emelyanov	f2a738497f	compaction_manager: Swallow ENOSPCs in ::stop() When being stopped compaction manager may step on ENOSPC. This is not a reason to fail stopping process with abort, better to warn this fact in logs and proceed as if nothing happened refs: #11245 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 16:02:33 +03:00
Pavel Emelyanov	badf7c816f	exceptions: Mark storage_io_error::code() with noexcept Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 16:02:32 +03:00
Pavel Emelyanov	bfb86f2c78	compaction_manager: Shuffle really_do_stop() Make it the future-returning method and setup the _stop_future in its only caller. Makes next patch much simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 16:02:31 +03:00
Beni Peled	18e7a46038	release: prepare for 4.6.9 scylla-4.6.9	2022-10-09 08:54:33 +03:00
Nadav Har'El	cbcfa31e51	cql: validate bloom_filter_fp_chance up-front Scylla's Bloom filter implementation has a minimal false-positive rate that it can support (6.71e-5). When setting bloom_filter_fp_chance any lower than that, the compute_bloom_spec() function, which writes the bloom filter, throws an exception. However, this is too late - it only happens while flushing the memtable to disk, and a failure at that point causes Scylla to crash. Instead, we should refuse the table creation with the unsupported bloom_filter_fp_chance. This is also what Cassandra did six years ago - see CASSANDRA-11920. This patch also includes a regression test, which crashes Scylla before this patch but passes after the patch (and also passes on Cassandra). Fixes #11524. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11576 (cherry picked from commit `4c93a694b7`)	2022-10-04 16:23:25 +03:00
Nadav Har'El	5ee69ff3a9	alternator: return ProvisionedThroughput in DescribeTable DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable operation must return a ProvisionedThroughput structure, listing both ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not stated in some DynamoDB documentation but is explictly mentioned in https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html Also in empirically, DynamoDB returns ProvisionedThroughput with zeros even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this. The ProvisionedThroughput structure being missing was a problem for applications like DynamoDB connectors for Spark, if they implicitly assume that ProvisionedThroughput is returned by DescribeTable, and fail (as described in issue #11222) if it's outright missing. So this patch adds the missing ProvisionedThroughput structure, and the xfailing test starts to pass. Note that this patch doesn't change the fact that attempting to set a table to PROVISIONED billing mode is ignored: DescribeTable continues to always return PAY_PER_REQUEST as the billing mode and zero as the provisioned capacities. Fixes #11222 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11298 (cherry picked from commit `941c719a23`)	2022-10-03 14:29:22 +03:00
Tomasz Grabiec	949103d22a	test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone The generator was first setting the marker then applied tombstones. The marker was set like this: row.marker() = random_row_marker(); Later, when shadowable tombstones were applied, they were compacted with the marker as expected. However, the key for the row was chosen randomly in each iteration and there are multiple keys set, so there was a possibility of a key clash with an earlier row. This could override the marker without applying any tombstones, which is conditional on random choice. This could generate rows with markers uncompacted with shadowable tombstones. This broken row_cache_test::test_concurrent_reads_and_eviction on comparison between expected and read mutations. The latter was compacted because it went through an extra merge path, which compacts the row. Fix by making sure there are no key clashes. Closes #11663 (cherry picked from commit `5268f0f837`)	2022-10-03 09:00:28 +03:00
Botond Dénes	549cb60f4c	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422 (cherry picked from commit `be9d1c4df4`)	2022-09-30 17:56:58 +03:00
Botond Dénes	37633c5576	test/lib/random_schema: add a simpler overload for fixed partition count Some tests want to generate a fixed amount of random partitions, make their life easier. (cherry picked from commit `98f3d516a2`) Ref #11421 (prerequisite)	2022-09-30 17:56:10 +03:00
Michael Livshin	abd9f43fa7	batchlog_manager: warn when a batch fails to replay Only for reasons other than "no such KS", i.e. when the failure is presumed transient and the batch in question is not deleted from batchlog and will be retried in the future. (Would info be more appropriate here than warning?) Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Closes #10556 Fixes #10636 (cherry picked from commit `00ed4ac74c`)	2022-09-29 12:13:21 +03:00
Raphael S. Carvalho	d41d4db5c0	compaction: Make cleanup withstand better disk pressure scenario It's not uncommong for cleanup to be issued against an entire keyspace, which may be composed of tons of tables. To increase chances of success if low on space, cleanup will now start from smaller tables first, such that bigger tables will have more space available, once they're reached, to satisfy their space requirement. parallel_for_each() is dropped and wasn't needed given that manager performs per-shard serialization of cleanup jobs. Refs #9504. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211130133712.64517-1-raphaelsc@scylladb.com> (cherry picked from commit `0d5ac845e1`)	2022-09-29 10:15:29 +03:00
Michał Radwański	c500043a78	flat_mutation_reader: allow destructing readers which are not closed and didn't initiate any IO. In functions such as upgrade_to_v2 (excerpt below), if the constructor of transforming_reader throws, r needs to be destroyed, however it hasn't been closed. However, if a reader didn't start any operations, it is safe to destruct such a reader. This issue can potentially manifest itself in many more readers and might be hard to track down. This commit adds a bool indicating whether a close is anticipated, thus avoiding errors in the destructor. Code excerpt: flat_mutation_reader_v2 upgrade_to_v2(flat_mutation_reader r) { class transforming_reader : public flat_mutation_reader_v2::impl { // ... }; return make_flat_mutation_reader_v2<transforming_reader>(std::move(r)); } Fixes #9065. (cherry picked from commit `9ada63a9cb`)	2022-09-29 09:40:07 +03:00
Pavel Emelyanov	af4752a526	messaging_service: Fix gossiper verb group When configuring tcp-nodelay unconditionally, messaging service thinks gossiper uses group index 1, though it had changed some time ago and now those verbs belong to group 0. fixes: #11465 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `2c74062962`)	2022-09-19 10:32:49 +03:00
Anna Mikhlin	0aa9a8c266	release: prepare for 4.6.8 scylla-4.6.8	2022-09-19 09:30:09 +03:00
Michał Chojnowski	85fd6ab377	sstables: add a flag for disabling long-term index caching Long-term index caching in the global cache, as introduced in 4.6, is a major pessimization for workloads where accesses to the index are (spacially) sparse. We want to have a way to disable it for the affected workloads. There is already infrastructure in place for disabling it for BYPASS CACHE queries. One way of solving the issue is hijacking that infrastructure. This patch adds a global flag (and a corresponding CLI option) which controls index caching. Setting the flag to `false` causes all index reads to behave like they would in BYPASS CACHE queries. Consequences of this choice: - The per-SSTable partition_index_cache is unused. Every index_reader has its own, and they die together. Independent reads can no longer reuse the work of other reads which hit the same index pages. This is not crucial, since partition accesses have no (natural) spatial locality. Note that the original reason for partition_index_cache -- the ability to share reads for the lower and upper bound of the query -- is unaffected. - The per-SSTable cached_file is unused. Every index_reader has its own (uncached) input stream from the index file, and every bsearch_clustered_cursor has its own cached_file, which dies together with the cursor. Note that the cursor still can perform its binary search with caching. However, it won't be able to reuse the file pages read by index_reader. In particular, if the promoted index is small, and fits inside the same file page as its index_entry, that page will be re-read. It can also happen that index_reader will read the same index file page multiple times. When the summary is so dense that multiple index pages fit in one index file page, advancing the upper bound, which reads the next index page, will read the same index file page. Since summary:disk ratio is 1:2000, this is expected to happen for partitions with size greater than 2000 partition keys. Fixes #11202 (cherry picked from commit `cdb3e71045`)	2022-09-18 13:30:28 +03:00
Beni Peled	7c79c513d1	release: prepare for 4.6.7 scylla-4.6.7	2022-09-07 11:17:55 +03:00
Karol Baryła	9a8e73f0c3	transport/server.cc: Return correct size of decompressed lz4 buffer An incorrect size is returned from the function, which could lead to crashes or undefined behavior. Fix by erroring out in these cases. Fixes #11476 (cherry picked from commit `1c2eef384d`)	2022-09-07 10:58:54 +03:00
Benny Halevy	fac0443200	snapshot-ctl: run_snapshot_modify_operation: reject views and secondary index using the schema Detecting a secondary index by checking for a dot in the table name is wrong as tables generated by Alternator may contain a dot in their name. Instead detect bot hmaterialized view and secondary indexes using the schema()->is_view() method. Fixes #10526 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `aa127a2dbb`)	2022-09-06 17:56:30 +03:00
Piotr Sarna	6bcfef2cfa	cql3: fix misleading error message for service level timeouts The error message incorrectly stated that the timeout value cannot be longer than 24h, but it can - the actual restriction is that the value cannot be expressed in units like days or months, which was done in order to significantly simplify the parsing routines (and the fact that timeouts counted in days are not expected to be common). Fixes #10286 Closes #10294 (cherry picked from commit `85e95a8cc3`)	2022-09-01 20:34:22 +03:00
Juliusz Stasiewicz	d2c67a2429	cdc/check_and_repair_cdc_streams: ignore LEFT endpoints When `check_and_repair_cdc_streams` encountered a node with status LEFT, Scylla would throw. This behavior is fixed so that LEFT nodes are simply ignored. Fixes #9771 Closes #9778 (cherry picked from commit `351f142791`)	2022-09-01 15:44:35 +03:00
Avi Kivity	d6c2f228e7	Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec Scenario: cache = [ row(pos=2, continuous=false), row(pos=after(2), dummy=true) ] Scanning read starts, starts populating [-inf, before(2)] from sstables. row(pos=2) is evicted. cache = [ row(pos=after(2), dummy=true) ] Scanning read finishes reading from sstables. Refreshes cache cursor via partition_snapshot_row_cursor::maybe_refresh(), which calls partition_snapshot_row_cursor::advance_to() because iterators are invalidated. This advances the cursor to after(2). no_clustering_row_between(2, after(2)) returns true, so advance_to() returns true, and maybe_refresh() returns true. This is interpreted by the cache reader as "the cursor has not moved forward", so it marks the range as complete, without emitting the row with pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads will also miss the row. The bug is in advance_to(), which is using no_clustering_row_between(a, b) to determine its result, which by definition excludes the starting key. Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction with reduced key range in the random_mutation_generator (1024 -> 16). Fixes #11239 Closes #11240 * github.com:scylladb/scylladb: test: mvcc: Fix illegal use of maybe_refresh() tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range() tests: row_cache_test: Introduce one_shot mode to throttle row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy	2022-08-11 19:19:30 +02:00
Yaron Kaikov	a1b1df2074	release: prepare for 4.6.6 scylla-4.6.6	2022-08-07 16:24:51 +03:00
Avi Kivity	14e13ecbd4	Merge 'Backport: Fix map subscript crashes when map or subscript is null' from Nadav Har'El This is a backport of https://github.com/scylladb/scylla/pull/10420 to branch 5.0. Branch 5.0 had somewhat different code in this expression area, so the backport was not automatically, but nevertheless was fairly straightforward - just copy the exact same checking code to its right place, and keep the exact same tests to see we indeed fixed the bug. Refs #10535. The original cover letter from https://github.com/scylladb/scylla/pull/10420: In the filtering expression "WHERE m[?] = 2", our implementation was buggy when either the map, or the subscript, was NULL (and also when the latter was an UNSET_VALUE). Our code ended up dereferencing null objects, yielding bizarre errors when we were lucky, or crashes when we were less lucky - see examples of both in issues https://github.com/scylladb/scylla/issues/10361, https://github.com/scylladb/scylla/issues/10399, https://github.com/scylladb/scylla/pull/10401. The existing test test_null.py::test_map_subscript_null reproduced all these bugs sporadically. In this series we improve the test to reproduce the separate bugs separately, and also reproduce additional problems (like the UNSET_VALUE). We then define both m[NULL] and NULL[2] to result in NULL instead of the existing undefined (and buggy, and crashing) behavior. This new definition is consistent with our usual SQL-inspired tradition that NULL "wins" in expressions - e.g., NULL < 2 is also defined as resulting in NULL. However, this decision differs from Cassandra, where m[NULL] is considered an error but NULL[2] is allowed. We believe that making m[NULL] be a NULL instead of an error is more consistent, and moreover - necessary if we ever want to support more complicate expressions like m[a], where the column a can be NULL for some rows and non-NULL for others, and it doesn't make sense to return an "invalid query" error in the middle of the scan. Fixes https://github.com/scylladb/scylla/issues/10361 Fixes https://github.com/scylladb/scylla/issues/10399 Fixes https://github.com/scylladb/scylla/pull/10401 Closes #11142 * github.com:scylladb/scylla: test/cql-pytest: reproducer for CONTAINS NULL bug expressions: don't dereference invalid map subscript in filter expressions: fix invalid dereference in map subscript evaluation test/cql-pytest: improve tests for map subscripts and nulls (cherry picked from commit `23a34d7e42`)	2022-07-31 15:44:00 +03:00
Benny Halevy	b8740bde6e	multishard_mutation_query: do_query: stop ctx if lookup_readers fails lookup_readers might fail after populating some readers and those better be closed before returning the exception. Fixes #10351 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10425 (cherry picked from commit `055141fc2e`)	2022-07-25 14:52:58 +03:00
Benny Halevy	1b23f8d038	sstables: time_series_sstable_set: insert: make exception safe Need to erase the shared sstable from _sstables if insertion to _sstables_reversed fails. Fixes #10787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `cd68b04fbf`)	2022-07-25 14:22:08 +03:00
Tomasz Grabiec	05a228e4c5	memtable: Fix missing range tombstones during reads under ceratin rare conditions There is a bug introduced in `e74c3c8` (4.6.0) which makes memtable reader skip one a range tombstone for a certain pattern of deletions and under certain sequence of events. _rt_stream contains the result of deoverlapping range tombstones which had the same position, which were sipped from all the versions. The result of deoverlapping may produce a range tombstone which starts later, at the same position as a more recent tombstone which has not been sipped from the partition version yet. If we consume the old range tombstone from _rt_stream and then refresh the iterators, the refresh will skip over the newer tombstone. The fix is to drop the logic which drains _rt_stream so that _rt_stream is always merged with partition versions. For the problem to trigger, there have to be multiple MVCC versions (at least 2) which contain deletions of the following form: [a, c] @ t0 [a, b) @ t1, [b, d] @ t2 c > b The proper sequence for such versions is (assuming d > c): [a, b) @ t1, [b, d] @ t2 Due to the bug, the reader will produce: [a, b) @ t1, [b, c] @ t0 The reader also needs to be preempted right before processing [b, d] @ t2 and iterators need to get invalidated so that lsa_partition_reader::do_refresh_state() is called and it skips over [b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it does emit the proper range tombstone, it's possible that it will violate fragment order in the stream if _rt_stream accumulated remainders (possible with 3 MVCC versions). The problem goes away once MVCC versions merge. Fixes #10913 Fixes #10830 Closes #10914 (cherry picked from commit `a6aef60b93`) [avi: backport prerequisite position_range_to_clustering_range() too]	2022-07-19 19:27:15 +03:00
Yaron Kaikov	2ec293ab0e	release: prepare for 4.6.5 scylla-4.6.5	2022-07-19 16:02:46 +03:00
Pavel Emelyanov	b60f14601e	azure_snitch: Do nothing on non-io-cpu All snitch drivers are supposed to snitch info on some shard and replicate the dc/rack info across others. All, but azure really do so. The azure one gets dc/rack on all shards, which's excessive but not terrible, but when all shards start to replicate their data to all the others, this may lead to use-after-frees. fixes: #10494 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `c6d0bc87d0`)	2022-07-17 14:22:29 +03:00
Raphael S. Carvalho	284dd21ef7	compaction_manager: Fix race when selecting sstables for rewrite operations Rewrite operations are scrub, cleanup and upgrade. Race can happen because 'selection of sstables' and 'mark sstables as compacting' are decoupled. So any deferring point in between can lead to a parallel compaction picking the same files. After commit `2cf0c4bbf`, files are marked as compacting before rewrite starts, but it didn't take into account the commit `c84217ad` which moved retrieval of candidates to a deferring thread, before rewrite_sstables() is even called. Scrub isn't affected by this because it uses a coarse grained approach where whole operation is run with compaction disabled, which isn't good because regular compaction cannot run until its completion. From now on, selection of files and marking them as compacting will be serialized by running them with compaction disabled. Now cleanup will also retrieve sstables with compaction disabled, meaning it will no longer leave uncleaned files behind, which is important to avoid data resurrection if node regains ownership of data in uncleaned files. Fixes #8168. Refs #8155. [backport notes: - minor conflict around run_with_compaction_disabled() - bumped into our old friend https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95111, so I had to use std::ref() on local copy of lambda - with the yielding part of candidate retrieval now happening in rewrite_sstables(), task registration is moved to after run_with_ compaction_disabled() call, so the latter won't incorrectly try to stop the task that called it, which triggers an assert in debug mode. ] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129133107.53011-1-raphaelsc@scylladb.com> (cherry picked from commit `80a1ebf0f3`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #10963	2022-07-13 18:45:36 +03:00
Pavel Emelyanov	8b52f1d6e7	view: Fix trace-state pointer use after move It's moved into .mutate_locally() but it captured and used in its continuation. It works well just because moved-from pointer looks like nullptr and all the tracing code checks for it to be non-such. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1266/ (CI job failed on post-actions thus it's red) Fixes #11015 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220711134152.30346-1-xemul@scylladb.com> (cherry picked from commit `5526738794`)	2022-07-12 14:21:11 +03:00
Piotr Sarna	157951f756	view: exclude using static columns in the view filter The code which applied view filtering (i.e. a condition placed on a view column, e.g. "WHERE v = 42") erroneously used a wildcard selection, which also assumes that static columns are needed, if the base table contains any such columns. The filtering code currently assumes that no such columns are fetched, so the selection is amended to only ask for regular columns (primary key columns are sent anyway, because they are enabled via slice options, so no need to ask for them explicitly). Fixes #10851 Closes #10855 (cherry picked from commit `bc3a635c42`)	2022-07-11 17:07:22 +03:00
Juliusz Stasiewicz	4f643ed4a5	cdc: `check_and_repair_cdc_streams`: regenerate if too many streams are present If the number of streams exceeds the number of token ranges it indicates that some spurious streams from decommissioned nodes are present. In such a situation - simply regenerate. Fixes #9772 Closes #9780 (cherry picked from commit `ea46439858`)	2022-07-07 18:53:14 +02:00
Avi Kivity	b598629b7f	messaging: do isolate default tenants In `10dd08c9` ("messaging_service: supply and interpret rpc isolation_cookies", 4.2), we added a mechanism to perform rpc calls in remote scheduling groups based on the connection identity (rather than the verb), so that connection processing itself can run in the correct group (not just verb processing), and so that one verb can run in different groups according to need. In `16d8cdadc` ("messaging_service: introduce the tenant concept", 4.2), we changed the way isolation cookies are sent: scheduling_group messaging_service::scheduling_group_for_verb(messaging_verb verb) const { return _scheduling_info_for_connection_index[get_rpc_client_idx(verb)].sched_group; @@ -665,11 +694,14 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge if (must_compress) { opts.compressor_factory = &compressor_factory; } opts.tcp_nodelay = must_tcp_nodelay; opts.reuseaddr = true; - opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie; + // We send cookies only for non-default statement tenant clients. + if (idx > 3) { + opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie; + } This effectively disables the mechanism for the default tenant. As a result some verbs will be executed in whatever group the messaging service listener was started in. This used to be the main group, but in `554ab03` ("main: Run init_server and join_cluster inside maintenance scheduling group", 4.5), this was change to the maintenance group. As a result normal read/writes now compete with maintenance operations, raising their latency significantly. Fix by sending the isolation cookie for all connections. With this, a 2-node cassandra-stress load has 99th percentile increase by just 3ms during repair, compared to 10ms+ before. Fixes #9505. Closes #10673 (cherry picked from commit `c83393e819`)	2022-07-05 13:42:10 +03:00
Nadav Har'El	43f82047b9	Merge 'types: fix is_string for reversed types' from Piotr Sarna Checking if the type is string is subtly broken for reversed types, and these types will not be recognized as strings, even though they are. As a result, if somebody creates a column with DESC order and then tries to use operator LIKE on it, it will fail because the type would not be recognized as a string. Fixes #10183 Closes #10181 * github.com:scylladb/scylla: test: add a case for LIKE operator on a descending order column types: fix is_string for reversed types (cherry picked from commit `733672fc54`)	2022-07-03 17:59:56 +03:00
Benny Halevy	ec3c07de6e	compaction_manager: perform_offstrategy: run_offstrategy_compaction in maintenance scheduling group It was assumed that offstrategy compaction is always triggered by streaming/repair where it would inherit the caller's scheduling group. However, offstrategy is triggered by a timer via table::_off_strategy_trigger so I don't see how the expiration of this timer will inherit anything from streaming/repair. Also, since `d309a86`, offstrategy compaction may be triggered by the api where it will run in the default scheduling group. The bottom line is that the compaction manager needs to explicitly perform offstrategy compaction in the maintenance scheduling group similar to `perform_sstable_scrub_validate_mode`. Fixes #10151 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220302084821.2239706-1-bhalevy@scylladb.com> (cherry picked from commit `0764e511bb`)	2022-07-03 14:30:54 +03:00
Takuya ASADA	82572e8cfe	scylla_coredump_setup: support new format of Storage field Storage field of "coredumpctl info" changed at systemd-v248, it added "(present)" on the end of line when coredump file available. Fixes #10669 Closes #10714 (cherry picked from commit `ad2344a864`)	2022-07-03 13:55:25 +03:00

1 2 3 4 5 ...

28942 Commits