scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 05:53:13 +00:00

Author	SHA1	Message	Date
Nadav Har'El	07c20bdfea	materialized view: fix bug in some large modifications to base partitions Sometimes a single modification to a base partition requires updates to a large number of view rows. A common example is deletion of a base partition containing many rows. A large BATCH is also possible. To avoid large allocations, we split the large amount of work into batch of 100 (max_rows_for_view_updates) rows each. The existing code assumed an empty result from one of these batches meant that we are done. But this assumption was incorrect: There are several cases when a base-table update may not need a view update to be generated (see can_skip_view_updates()) so if all 100 rows in a batch were skipped, the view update stopped prematurely. This patch includes two tests showing when this bug can happen - one test using a partition deletion with a USING TIMESTAMP causing the deletion to not affect the first 100 rows, and a second test using a specially-crafed large BATCH. These use cases are fairly esoteric, but in fact hit a user in the wild, which led to the discovery of this bug. The fix is fairly simple: To detect when build_some() is done it is no longer enough to check if it returned zero view-update rows; Rather, it explicitly returns whether or not it is done as an std::optional. The patch includes several tests for this bug, which pass on Cassandra, failed on Scylla before this patch, and pass with this patch. Fixes #12297. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12305 (cherry picked from commit `92d03be37b`)	2023-01-04 11:36:39 +02:00
Botond Dénes	8a36c4be54	evicatble_reader: avoid preemption pitfall around waiting for readmission Permits have to wait for re-admission after having been evicted. This happens via `reader_permit::maybe_wait_readmission()`. The user of this method -- the evictable reader -- uses it to re-wait admission when the underlying reader was evicted. There is one tricky scenario however, when the underlying reader is created for the first time. When the evictable reader is part of a multishard query stack, the created reader might in fact be a resumed, saved one. These readers are kept in an inactive state until actually resumed. The evictable reader shares it permit with the to-be-resumed reader so it can check whether it has been evicted while saved and needs to wait readmission before being resumed. In this flow it is critical that there is no preemption point between this check and actually resuming the reader, because if there is, the reader might end up actually recreated, without having waited for readmission first. To help avoid this situation, the existing `maybe_wait_readmission()` is split into two methods: * `bool reader_permit::needs_readmission()` * `future<> reader_permit::wait_for_readmission()` The evictable reader can now ensure there is no preemption point between `needs_readmission()` and resuming the reader. Fixes: #10187 Tests: unit(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220315105851.170364-1-bdenes@scylladb.com> (cherry picked from commit `61028ad718`)	2023-01-04 11:20:28 +02:00
Botond Dénes	0e388d2140	reader_concurrency_semaphore: unify admission logic across all paths The semaphore currently has two admission paths: the obtain_permit()/with_permit() methods which admits permits on user request (the front door) and the maybe_admit_waiters() which admits permits based on internal events like memory resource being returned (the back door). The two paths used their own admission conditions and naturally this means that they diverged in time. Notably, maybe_admit_waiters() did not look at inactive readers assuming that if there are waiters there cannot be inactive readers. This is not true however since we merged the execution-stage into the semaphore. Waiters can queue up even when there are inactive reads and thus maybe_admit_waiters() has to consider evicting some of them to see if this would allow for admitting new reads. To avoid such divergence in the future, the admission logic was moved into a new method can_admit_read() which is now shared between the two method families. This method now checks for the possibility of evicting inactive readers as well. The admission logic was tuned slightly to only consider evicting inactive readers if there is a real possibility that this will result in admissions: notably, before this patch, resource availability was checked before stalls were (used permits == blocked permits), so we could evict readers even if this couldn't help. Because now eviction can be started from maybe_admit_waiters(), which is also downstream from eviction, we added a flag to avoid recursive evict -> maybe admit -> evict ... loops. Fixes: #11770 Closes #11784 (cherry picked from commit `7fbad8de87`)	2023-01-03 16:46:30 +02:00
Petr Gusev	e03e9b1abe	cql: batch statement, inserting a row with a null key column should be forbidden Regular INSERT statements with null values for primary key components are rejected by Scylla since #9286 and #9314. Batch statements missed a similar check, this patch fixes it. Fixes: #12060 (cherry picked from commit `7730c4718e`)	2022-12-28 18:15:54 +02:00
Piotr Grabowski	25508705a8	type_json: fix wrong blob JSON validation Fixes wrong condition for validating whether a JSON string representing blob value is valid. Previously, strings such as "6" or "0392fa" would pass the validation, even though they are too short or don't start with "0x". Add those test cases to json_cql_query_test.cc. Fixes #10114 (cherry picked from commit `f8b67c9bd1`)	2022-12-28 15:17:31 +02:00
Botond Dénes	347da028e9	mutation_compactor: reset stop flag on page start When the mutation compactor has all the rows it needs for a page, it saves the decision to stop in a member flag: _stop. For single partition queries, the mutation compactor is kept alive across pages and so it has a method, start_new_page() to reset its state for the next page. This method didn't clear the _stop flag. This meant that the value set at the end of the previous could cause the new page and subsequently the entire query to be stopped prematurely. This can happen if the new page starts with a row that is covered by a higher level tombstone and is completely empty after compaction. Reset the _stop flag in start_new_page() to prevent this. This commit also adds a unit test which reproduces the bug. Fixes: #12361 Closes #12384 (cherry picked from commit `b0d95948e1`)	2022-12-25 09:45:50 +02:00
Benny Halevy	e0777f1112	utils: uuid: add null_uuid and respective bool predecate and operator and unit test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220215113438.473400-1-bhalevy@scylladb.com>	2022-12-15 18:48:45 +03:00
Nadav Har'El	2750d2e94b	Merge 'alternator: fix wrong 'where' condition for GSI range key' from Marcin Maliszkiewicz Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages). Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800) Closes #11926 * github.com:scylladb/scylladb: alternator: add maybe_quote to secondary indexes 'where' condition test/alternator: correct xfail reason for test_gsi_backfill_empty_string test/alternator: correct indentation in test_lsi_describe alternator: fix wrong 'where' condition for GSI range key (cherry picked from commit `ce7c1a6c52`)	2022-12-05 20:53:19 +02:00
Nadav Har'El	f667c5923a	materialized views: fix view writes after base table schema change When we write to a materialized view, we need to know some information defined in the base table such as the columns in its schema. We have a "view_info" object that tracks each view and its base. This view_info object has a couple of mutable attributes which are used to lazily-calculate and cache the SELECT statement needed to read from the base table. If the base-table schema ever changes - and the code calls set_base_info() at that point - we need to forget this cached statement. If we don't (as before this patch), the SELECT will use the wrong schema and writes will no longer work. This patch also includes a reproducing test that failed before this patch, and passes afterwords. The test creates a base table with a view that has a non-trivial SELECT (it has a filter on one of the base-regular columns), makes a benign modification to the base table (just a silly addition of a comment), and then tries to write to the view - and before this patch it fails. Fixes #10026 Fixes #11542 (cherry picked from commit `2f2f01b045`)	2022-12-05 20:09:36 +02:00
Botond Dénes	e4ba0c56df	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671 (cherry picked from commit `5621cdd7f9`)	2022-12-05 15:01:21 +02:00
Petr Gusev	b956293f47	modification_statement: fix LWT insert crash if clustering key is null PR #9314 fixed a similar issue with regular insert statements but missed the LWT code path. It's expected behaviour of modification_statement::create_clustering_ranges to return an empty range in this case, since possible_lhs_values it uses explicitly returns empty_value_set if it evaluates rhs to null, and it has a comment about it (All NULL comparisons fail; no column values match.) On the other hand, all components of the primary key are required to be set, this is checked at the prepare phase, in modification_statement::process_where_clause. So the only problem was modification_statement::execute_with_condition was not expecting an empty clustering_range in case of a null clustering key. Fixes: #11954 (cherry picked from commit `0d443dfd16`)	2022-12-04 15:00:27 +02:00
Nadav Har'El	6a8c2d3f56	Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek When filtering with multi column restriction present all other restrictions were ignored. So a query like: `SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;` would ignore the restriction `regular_col = 0`. This was caused by a bug in the filtering code: `2779a171fc/cql3/selection/selection.cc (L433-L449)` When multi column restrictions were detected, the code checked if they are satisfied and returned immediately. This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied. This code was introduced back in 2019, when fixing #3574. Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct. Fixes: #6200 Fixes: #12014 Closes #12031 * github.com:scylladb/scylladb: cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works boost/restrictions-test: uncomment part of the test that passes now cql-pytest: enable test for filtering combined multi column and regular column restrictions cql3: don't ignore other restrictions when a multi column restriction is present during filtering (cherry picked from commit `2d2034ea28`) Closes #12086	2022-11-26 14:24:08 +02:00
Pavel Emelyanov	d83134a245	Merge '[branch-5.0] multishard_mutation_query: don't unpop partition header of spent partition' from Botond Dénes When stopping the read, the multishard reader will dismantle the compaction state, pushing back (unpopping) the currently processed partition's header to its originating reader. This ensures that if the reader stops in the middle of a partition, on the next page the partition-header is re-emitted as the compactor (and everything downstream from it) expects. It can happen however that there is nothing more for the current partition in the reader and the next fragment is another partition. Since we only push back the partition header (without a partition-end) this can result in two partitions being emitted without being separated by a partition end. We could just add the missing partition-end when needed but it is pointless, if the partition has no more data, just drop the header, we won't need it on the next page. The missing partition-end can generate an "IDL frame truncated" message as it ends up causing the query result writer to create a corrupt partition entry. Fixes: https://github.com/scylladb/scylladb/issues/9482 Closes #11912 * github.com:scylladb/scylladb: test/cql-pytest: add regression test for "IDL frame truncated" error mutation_compactor: detach_state(): make it no-op if partition was exhausted	2022-11-16 11:50:50 +03:00
Nadav Har'El	1b550dd301	cql3: fix cql3::util::maybe_quote() for keywords cql3::util::maybe_quote() is a utility function formatting an identifier name (table name, column name, etc.) that needs to be embedded in a CQL statement - and might require quoting if it contains non-alphanumeric characters, uppercase characters, or a CQL keyword. maybe_quote() made an effort to only quote the identifier name if neccessary, e.g., a lowercase name usually does not need quoting. But lowercase names that are CQL keywords - e.g., to or where - cannot be used as identifiers without quoting. This can cause problems for code that wants to generate CQL statements, such as the materialized-view problem in issue #9450 - where a user had a column called "to" and wanted to create a materialized view for it. So in this patch we fix maybe_quote() to recognize invalid identifiers by using the CQL parser, and quote them. This will quote reserved keywords, but not so-called unreserved keywords, which are allowed as identifiers and don't need quoting. This addition slows down maybe_quote(), but maybe_quote() is anyway only used in heavy operations which need to generate CQL. This patch also adds two tests that reproduce the bug and verify its fix: 1. Add to the low-level maybe_quote() test (a C++ unit test) also tests that maybe_quote() quotes reserved keywords like "to", but doesn't quote unreserved keywords like "int". 2. Add a test reproducing issue #9450 - creating a materialized view whose key column is a keyword. This new test passes on Cassandra, failed on Scylla before this patch, and passes after this patch. It is worth noting that maybe_quote() now has a "forward compatiblity" problem: If we save CQL statements generated by maybe_quote(), and a future version introduces a new reserved keyword, the parser of the future version may not be able to parse the saved CQL statement that was generated with the old mayb_quote() and didn't quote what is now a keyword. This problem can be solved in two ways: 1. Try hard not to introduced new reserved keywords. Instead, introduce unreserved keywords. We've been doing this even before recognizing this maybe_quote() future-compatibility problem. 2. In the next patch we will introduce quote() - which unconditionally quotes identifier names, even if lowercase. These quoted names will be uglier for lowercase names - but will be safe from future introduction of new keywords. So we can consider switching some or all uses of maybe_quote() to quote(). Fixes #9450 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220118161217.231811-1-nyh@scylladb.com> (cherry picked from commit `5d2f694a90`)	2022-11-07 17:01:32 +02:00
Alexander Turetskiy	01ce53d7fb	Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs. The return from DescribeTable which describes GSIs and LSIs is missing the Projection field. We do not yet support all the settings Projection (see #5036), but the default which we support is ALL, and DescribeTable should return that in its description. Fixes #11470 Closes #11693 (cherry picked from commit `636e14cc77`)	2022-11-07 17:01:32 +02:00
Botond Dénes	e54ae9efd9	test/cql-pytest: add regression test for "IDL frame truncated" error (cherry picked from commit `11af489e84`)	2022-11-07 13:43:53 +02:00
Botond Dénes	8c56b0b268	Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted. In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it). Fixes #11801 Closes #11808 * github.com:scylladb/scylladb: test/alternator: add test for issue 11801 MV: fix handling of view update which reassign the same key value materialized views: inline used-once and confusing function, replace_entry() (cherry picked from commit `e981bd4f21`)	2022-11-01 13:25:22 +02:00
Nadav Har'El	50c2c1b1d4	alternator: return ProvisionedThroughput in DescribeTable DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable operation must return a ProvisionedThroughput structure, listing both ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not stated in some DynamoDB documentation but is explictly mentioned in https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html Also in empirically, DynamoDB returns ProvisionedThroughput with zeros even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this. The ProvisionedThroughput structure being missing was a problem for applications like DynamoDB connectors for Spark, if they implicitly assume that ProvisionedThroughput is returned by DescribeTable, and fail (as described in issue #11222) if it's outright missing. So this patch adds the missing ProvisionedThroughput structure, and the xfailing test starts to pass. Note that this patch doesn't change the fact that attempting to set a table to PROVISIONED billing mode is ignored: DescribeTable continues to always return PAY_PER_REQUEST as the billing mode and zero as the provisioned capacities. Fixes #11222 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11298 (cherry picked from commit `941c719a23`)	2022-10-03 14:28:16 +03:00
Tomasz Grabiec	aa647a637a	test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone The generator was first setting the marker then applied tombstones. The marker was set like this: row.marker() = random_row_marker(); Later, when shadowable tombstones were applied, they were compacted with the marker as expected. However, the key for the row was chosen randomly in each iteration and there are multiple keys set, so there was a possibility of a key clash with an earlier row. This could override the marker without applying any tombstones, which is conditional on random choice. This could generate rows with markers uncompacted with shadowable tombstones. This broken row_cache_test::test_concurrent_reads_and_eviction on comparison between expected and read mutations. The latter was compacted because it went through an extra merge path, which compacts the row. Fix by making sure there are no key clashes. Closes #11663 (cherry picked from commit `5268f0f837`)	2022-10-02 16:45:07 +03:00
Michael Livshin	2c0040fcb3	allow pre-scrub snapshots of materialized views and secondary indices Previously, any attempt to take a materialized view or secondary index snapshot was considered a mistake and caused the snapshot operation to abort, with a suggestion to snapshot the base table instead. But an automatic pre-scrub snapshot of a view cannot be attributed to user error, so the operation should not be aborted in that case. (It is an open question whether the more correct thing to do during pre-scrub snapshot would be to silently ignore views. Or perhaps they should be ignored in all cases except when the user explicitly asks to snapshot them, by name) Closes #10760. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> (cherry picked from commit `aab4cd850c`) Fixes #10760.	2022-10-02 14:04:11 +03:00
Nadav Har'El	54564adb7c	alternator: forbid duplicate index (LSI and GSI) names Adding an LSI and GSI with the same name to the same Alternator table should be forbidden - because if both exists only one of them (the GSI) would actually be usable. DynamoDB also forbids such duplicate name. So in this patch we add a test for this issue, and fix it. Since the patch involves a few more uses of the IndexName string, we also clean up its handling a bit, to use std::string_view instead of the old-style std::string&. Fixes #10789 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `8866c326de`)	2022-10-02 13:00:03 +03:00
Tomasz Grabiec	839876e8f2	db: range_tombstone_list: Avoid quadratic behavior when applying Range tombstones are kept in memory (cache/memtable) in range_tombstone_list. It keeps them deoverlapped, so applying a range tombstone which covers many range tombstones will erase existing range tombstones from the list. This operation needs to be exception-safe, so range_tombstone_list maintains an undo log. This undo log will receive a record for each range tombstone which is removed. For exception safety reasons, before pushing an undo log entry, we reserve space in the log by calling std::vector::reserve(size() + 1). This is O(N) where N is the number of undo log entries. Therefore, the whole application is O(N^2). This can cause reactor stalls and availability issues when replicas apply such deletions. This patch avoids the problem by reserving exponentially increasing amount of space. Also, to avoid large allocations, switches the container to chunked_vector. Fixes #11211 Closes #11215 (cherry picked from commit `7f80602b01`)	2022-09-30 17:55:23 +03:00
Botond Dénes	36002e2b7c	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422 (cherry picked from commit `be9d1c4df4`)	2022-09-30 17:55:14 +03:00
Botond Dénes	91a8f9e09b	test/lib/random_schema: add a simpler overload for fixed partition count Some tests want to generate a fixed amount of random partitions, make their life easier. (cherry picked from commit `98f3d516a2`) Ref #11421 (prerequisite)	2022-09-30 17:54:55 +03:00
Michał Radwański	ebf38eaead	flat_mutation_reader: allow destructing readers which are not closed and didn't initiate any IO. In functions such as upgrade_to_v2 (excerpt below), if the constructor of transforming_reader throws, r needs to be destroyed, however it hasn't been closed. However, if a reader didn't start any operations, it is safe to destruct such a reader. This issue can potentially manifest itself in many more readers and might be hard to track down. This commit adds a bool indicating whether a close is anticipated, thus avoiding errors in the destructor. Code excerpt: flat_mutation_reader_v2 upgrade_to_v2(flat_mutation_reader r) { class transforming_reader : public flat_mutation_reader_v2::impl { // ... }; return make_flat_mutation_reader_v2<transforming_reader>(std::move(r)); } Fixes #9065. Fixes #11491 (cherry picked from commit `9ada63a9cb`)	2022-09-21 10:25:18 +03:00
Piotr Sarna	e1f78c33b4	Merge 'Fix mutation commutativity with shadowable tombstone' from Tomasz Grabiec This series fixes lack of mutation associativity which manifests as sporadic failures in row_cache_test.cc::test_concurrent_reads_and_eviction due to differences in mutations applied and read. No known production impact. Refs https://github.com/scylladb/scylladb/issues/11307 Closes #11312 * github.com:scylladb/scylladb: test: mutation_test: Add explicit test for mutation commutativity test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones db: mutation_partition: Drop unnecessary maybe_shadow() db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone mutation_partition: row: make row marker shadowing symmetric (cherry picked from commit `484004e766`)	2022-09-20 23:21:06 +02:00
Tomasz Grabiec	0634b5f734	test: row_cache: Use more narrow key range to stress overlapping reads more This makes catching issues related to concurrent access of same or adjacent entries more likely. For example, catches #11239. Closes #11260 (cherry picked from commit `8ee5b69f80`)	2022-09-20 23:20:43 +02:00
Botond Dénes	82d1446ca9	test/boost/mutation_reader_test: add v2 specific evictable reader tests One is a reincarnation of the recently removed test_multishard_combining_reader_non_strictly_monotonic_positions. The latter was actually targeting the evictable reader but through the multishard reader, probably for historic reasons (evictable reader was part of the multishard reader family). The other one checks that active range tombstones changes are properly terminated when the partition ends abruptly after recreating the reader. (cherry picked from commit `014a23bf2a`)	2022-09-15 13:51:13 +03:00
Nadav Har'El	da6a126d79	cross-tree: fix header file self-sufficiency Scylla's coding standard requires that each header is self-sufficient, i.e., it includes whatever other headers it needs - so it can be included without having to include any other header before it. We have a test for this, "ninja dev-headers", but it isn't run very frequently, and it turns out our code deviated from this requirement in a few places. This patch fixes those places, and after it "ninja dev-headers" succeeds again. This is needed because our CI runs "ninja dev-headers". Fixes #10995 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11457	2022-09-06 15:45:34 +03:00
Avi Kivity	d07e902983	Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later. Fixes: https://github.com/scylladb/scylladb/issues/11264 Closes #11273 * github.com:scylladb/scylladb: querier: querier_cache: remove now unused evict_all_for_table() database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() reader_concurrency_semaphore: add evict_inactive_reads_for_table() (cherry picked from commit `afa7960926`)	2022-09-02 11:39:43 +03:00
Piotr Grabowski	964ccf9192	type_json: support integers in scientific format Add support for specifing integers in scientific format (for example 1.234e8) in INSERT JSON statement: INSERT INTO table JSON '{"int_column": 1e7}'; Inserting a floating-point number ending with .0 is allowed, as the fractional part is zero. Non-zero fractional part (for example 12.34) is disallowed. A new test is added to test all those behaviors. Before the JSON parsing library was switched to RapidJSON from JsonCpp, this statement used to work correctly, because JsonCpp transparently casts double to integer value. This behavior differs from Cassandra, which disallows those types of numbers (1e7, 123.0 and 12.34). Fix typo in if condition: "if (value.GetUint64())" to "if (value.IsUint64())". Fixes #10100 (cherry picked from commit `efe7456f0a`)	2022-09-01 16:03:49 +03:00
Avi Kivity	dfdc128faf	Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec Scenario: cache = [ row(pos=2, continuous=false), row(pos=after(2), dummy=true) ] Scanning read starts, starts populating [-inf, before(2)] from sstables. row(pos=2) is evicted. cache = [ row(pos=after(2), dummy=true) ] Scanning read finishes reading from sstables. Refreshes cache cursor via partition_snapshot_row_cursor::maybe_refresh(), which calls partition_snapshot_row_cursor::advance_to() because iterators are invalidated. This advances the cursor to after(2). no_clustering_row_between(2, after(2)) returns true, so advance_to() returns true, and maybe_refresh() returns true. This is interpreted by the cache reader as "the cursor has not moved forward", so it marks the range as complete, without emitting the row with pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads will also miss the row. The bug is in advance_to(), which is using no_clustering_row_between(a, b) to determine its result, which by definition excludes the starting key. Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction with reduced key range in the random_mutation_generator (1024 -> 16). Fixes #11239 Closes #11240 * github.com:scylladb/scylladb: test: mvcc: Fix illegal use of maybe_refresh() tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range() tests: row_cache_test: Introduce one_shot mode to throttle row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy	2022-08-11 18:36:44 +02:00
Nadav Har'El	67a2f3aa67	test/cql-pytest: reproducer for CONTAINS NULL bug This is a reproducer for issue #10359 that a "CONTAINS NULL" and "CONTAINS KEY NULL" restrictions should not match any set, but currently do match non-empty or all sets. The tests currently fail on Scylla, so marked xfail. They also fails on Cassandra because Cassandra considers such a request an error, which we consider a mistake (see #4776) - so the tests are marked "cassandra_bug". Refs #10359. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220412130914.823646-1-nyh@scylladb.com> (cherry picked from commit `ae0e1574dc`)	2022-07-27 20:03:30 +03:00
Nadav Har'El	66e8cf8cea	expressions: don't dereference invalid map subscript in filter If we have the filter expression "WHERE m[?] = 2", the existing code simply assumed that the subscript is an object of the right type. However, while it should indeed be the right type (we already have code that verifies that), there are two more options: It can also be a NULL, or an UNSET_VALUE. Either of these cases causes the existing code to dereference a non-object as an object, leading to bizarre errors (as in issue #10361) or even crashes (as in issue #10399). Cassandra returns a invalid request error in these cases: "Unsupported unset map key for column m" or "Unsupported null map key for column m". We decided to do things differently: * For NULL, we consider m[NULL] to result in NULL - instead of an error. This behavior is more consistent with other expressions that contain null - for example NULL[2] and NULL<2 both result in NULL as well. Moreover, if in the future we allow more complex expressions, such as m[a] (where a is a column), we can find the subscript to be null for some rows and non-null for other rows - and throwing an "invalid query" in the middle of the filtering doesn't make sense. * For UNSET_VALUE, we do consider this an error like Cassandra, and use the same error message as Cassandra. However, the current implementation checks for this error only when the expression is evaluated - not before. It means that if the scan is empty before the filtering, the error will not be reported and we'll silently return an empty result set. We currently consider this ok, but we can also change this in the future by binding the expression only once (today we do it on every evaluation) and validating it once after this binding. Fixes #10361 Fixes #10399 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `fbb2a41246`)	2022-07-27 19:56:17 +03:00
Nadav Har'El	35b66c844c	expressions: fix invalid dereference in map subscript evaluation When we have an filter such as "WHERE m[2] = 3" (where m is a map column), if a row had a null value for m, our expression evaluation code incorrectly dereferences an unset optional, and continued processing the result of this dereference which resulted in undefined behavior - sometimes we were lucky enough to get "marshaling error" but other times Scylla crashed. The fix is trivial - just check before dereferencing the optional value of the map. We return null in that case, which means that we consider the result of null[2] to be null. I think this is a reasonable approach and fits our overall approach of making null dominate expressions (e.g., the value of "null < 2" is also null). The test test_filtering.py::test_filtering_null_map_with_subscript, which used to frequently fail with marshaling errors or crashes, now passes every time so its "xfail" mark is removed. Fixes #10417 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `808a93d29b`)	2022-07-27 19:50:24 +03:00
Nadav Har'El	9e7a1340b9	test/cql-pytest: improve tests for map subscripts and nulls The test test_null.py::test_map_subscript_null turned out to reproduce multiple bugs related to using map subscripts in filtering expressions. One was issue #10361 (m[null] resulted in a bizarre error) or #10399 (m[null] resulted in a crash), and a different issue was #10401 (m[2] resulted in a bizarre error or a crash if m itself was null). Moreover, the same test uncovered different bugs depending how it was run - alone or with other tests - because it was using a shared table. In this patch we introduce two separate tests in test_filtering.py which are designed to reproduce these separate bugs instead of mixing them into one test. The new tests also cover a few more corners which the previous test (which focused on nulls) missed - such as UNSET_VALUE. The two new tests (and the old test_map_subscript_null) pass on Cassandra so still assume that the Cassandra behavior - that m[null] should be an error - is the correct behavior. We may want to change the desired behavior (e.g., to decide that m[null] be null, not an error), and change the tests accordingly later - but for now the tests follow Cassandra's behavior exactly, and pass on Cassandra and fail on Scylla (so are marked xfail). The bugs reproduced by these tests involve randomness or reading uninitialized memory, so these tests sometimes pass, sometimes fail, and sometimes even crash (as reported in #10399 and #10401). So to reproduce these bugs run the tests multiple times. For example: test/cql-pytest/run --count 100 --runxfail test_filtering.py::test_filtering_null_map_with_subscript Refs #10361 Refs #10399 Refs #10401 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `189b8845fe`)	2022-07-27 19:28:17 +03:00
Tomasz Grabiec	f10fd1bc12	test: memtable: Make failed_flush_prevents_writes() immune to background merging Before the change, the test artificiallu set the soft pressure condition hoping that the background flusher will flush the memtable. It won't happen if by the time the background flusher runs the LSA region is updated and soft pressure (which is not really there) is lifted. Once apply() becomes preemptibe, backgroun partition version merging can lift the soft pressure, making the memtable flush not occur and making the test fail. Fix by triggering soft pressure on retries. Fixes #10801 Refs #10793 (cherry picked from commit `0e78ad50ea`) Closes #10802 (cherry picked from commit `3bec1cc19f`)	2022-07-25 14:19:48 +03:00
Tomasz Grabiec	1891f10141	memtable: Fix missing range tombstones during reads under ceratin rare conditions There is a bug introduced in `e74c3c8` (4.6.0) which makes memtable reader skip one a range tombstone for a certain pattern of deletions and under certain sequence of events. _rt_stream contains the result of deoverlapping range tombstones which had the same position, which were sipped from all the versions. The result of deoverlapping may produce a range tombstone which starts later, at the same position as a more recent tombstone which has not been sipped from the partition version yet. If we consume the old range tombstone from _rt_stream and then refresh the iterators, the refresh will skip over the newer tombstone. The fix is to drop the logic which drains _rt_stream so that _rt_stream is always merged with partition versions. For the problem to trigger, there have to be multiple MVCC versions (at least 2) which contain deletions of the following form: [a, c] @ t0 [a, b) @ t1, [b, d] @ t2 c > b The proper sequence for such versions is (assuming d > c): [a, b) @ t1, [b, d] @ t2 Due to the bug, the reader will produce: [a, b) @ t1, [b, c] @ t0 The reader also needs to be preempted right before processing [b, d] @ t2 and iterators need to get invalidated so that lsa_partition_reader::do_refresh_state() is called and it skips over [b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it does emit the proper range tombstone, it's possible that it will violate fragment order in the stream if _rt_stream accumulated remainders (possible with 3 MVCC versions). The problem goes away once MVCC versions merge. Fixes #10913 Fixes #10830 Closes #10914 (cherry picked from commit `a6aef60b93`)	2022-07-19 19:33:51 +03:00
Pavel Emelyanov	cd13911db4	Merge 'Scrub compaction: prevent mishandling of range tombstone changes' from Botond With v2 having individual bounds of range tombstone as separate fragments, out-of-order fragments become more difficult to handle, especially in the presence of active range tombstone. Scrub in both SKIP and SEGREGATE mode closes the partition on seeing the first invalid fragment (SEGREAGE re-opens it immediately). If there is an active range tombstone, scrub now also has to take care of closing said tombstone when closing the partition. In a normal stream it could just use the last position-in-partition to create a closing bound. But when out-of-order fragments are on the table this is not possible: the closing bound may be found later in the stream, with a position smaller than that of the current position-in-partition. To prevent extending range tombstone changes like that, Scrub now aborts the compaction on the first invalid fragment seen inside an active range tombstone. Fixing a v2 stream with range tombstone changes is definitely possible, but non-trivial, so we defer it until there is demand for it. This series also makes the mutation fragment stream validator check for open range tombstones on partition-end and adds a comprehensive test-suite for the validator. Fixes: #10168 Tests: unit(dev) * scrub-rtc-handling-fix/v2 of github.com/denesb/scylla.git: compaction/compaction: abort scrub when attempting to rectify stream with active tombstone test/boost/mutation_test: add test for mutation_fragment_stream_validator mutation_fragment_stream_validator: validate range tombstone changes (cherry picked from commit `edd0481b38`)	2022-07-14 18:49:13 +03:00
Nadav Har'El	32423ebc38	Merge 'Handle errors during snapshot' from Benny Halevy This series refactors `table::snapshot` and moves the responsibility to flush the table before taking the snapshot to the caller. `flush_on_all` and `snapshot_on_all` helpers are added to replica::database (by making it a peering_sharded_service) and upper layers, including api and snapshot-ctl now call it instead of calling cf.snapshot directly. With that, error are handed in table::snapshot and propagated back to the callers. Failure to allocate the `snapshot_manager` object is fatal, similar to failure to allocate a continuation, since we can't coordinate across the shards without it. Test: unit(dev), rest_api(debug) * github.com:scylladb/scylla: table: snapshot: handle errors table: snapshot: get rid of skip_flush param database: truncate: skip flush when taking snapshot test: rest_api: storage_service: verify_snapshot_details: add truncate database: snapshot_on_all: flush before snapshot if needed table: make snapshot method private database: add snapshot_on_all snapshot-ctl: run_snapshot_modify_operation: reject views and secondary index using the schema snapshot-ctl: refactor and coroutinize take_snapshot / take_column_family_snapshot api: storage_service: increase visibility of snapshot ops in the log api: storage_service: coroutinize take_snapshot and del_snapshot api: storage_service: take_snapshot: improve api help messages test: rest_api: storage_service: add test_storage_service_snapshot database: add flush_on_all variants test: rest_api: add test_storage_service_flush (cherry picked from commit `2c39c4c284`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10975	2022-07-12 15:24:24 +03:00
Piotr Sarna	34085c364f	view: exclude using static columns in the view filter The code which applied view filtering (i.e. a condition placed on a view column, e.g. "WHERE v = 42") erroneously used a wildcard selection, which also assumes that static columns are needed, if the base table contains any such columns. The filtering code currently assumes that no such columns are fetched, so the selection is amended to only ask for regular columns (primary key columns are sent anyway, because they are enabled via slice options, so no need to ask for them explicitly). Fixes #10851 Closes #10855 (cherry picked from commit `bc3a635c42`)	2022-07-11 17:06:55 +03:00
Nadav Har'El	d3045df9c9	Merge 'types: fix is_string for reversed types' from Piotr Sarna Checking if the type is string is subtly broken for reversed types, and these types will not be recognized as strings, even though they are. As a result, if somebody creates a column with DESC order and then tries to use operator LIKE on it, it will fail because the type would not be recognized as a string. Fixes #10183 Closes #10181 * github.com:scylladb/scylla: test: add a case for LIKE operator on a descending order column types: fix is_string for reversed types (cherry picked from commit `733672fc54`)	2022-07-03 17:59:33 +03:00
Nadav Har'El	cc22021876	alternator: forbid empty AttributesToGet In DynamoDB one can retrieve only a subset of the attributes using the AttributesToGet or ProjectionExpression paramters to read requests. Neither allows an empty list of attributes - if you don't want any attributes, you should use Select=COUNT instead. Currently we correctly refuse an empty ProjectionExpression - and have a test for it: test_projection_expression.py::test_projection_expression_toplevel_syntax However, Alternator is missing the same empty-forbidding logic for AttributesToGet. An empty AttributesToGet is currently allowed, and basically says "retrieve everything", which is sort of unexpected. So this patch adds the missing logic, and the missing test (actually two tests for the same thing - one using GetItem and the other Query). Fixes #10332 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220405113700.9768-1-nyh@scylladb.com> (cherry picked from commit `9c1ebdceea`)	2022-07-03 13:35:50 +03:00
Tomasz Grabiec	89a540d54a	sstable: partition_index_cache: Fix abort on bad_alloc during page loading When entry loading fails and there is another request blocked on the same page, attempt to erase the failed entry will abort because that would violate entry_ptr guarantees, which is supposed to keep the entry alive. The fix in `92727ac36c` was incomplete. It only helped for the case of a single loader. This patch makes a more general approach by relaxing the assert. The assert manifested like this: scylla: ./sstables/partition_index_cache.hh:71: sstables::partition_index_cache::entry::~entry(): Assertion `!is_referenced()' failed. Fixes #10617 Closes #10653 (cherry picked from commit `f87274f66a`)	2022-05-27 09:50:32 +03:00
Calle Wilund	b0233cb7c5	cdc: Ensure columns removed from log table are registered as dropped If we are redefining the log table, we need to ensure any dropped columns are registered in "dropped_columns" table, otherwise clients will not be able to read data older than now. Includes unit test. Should probably be backported to all CDC enabled versions. Fixes #10473 Closes #10474 (cherry picked from commit `78350a7e1b`)	2022-05-05 11:38:18 +02:00
Avi Kivity	e480c5bf4d	Merge 'loading_cache: force minimum size of unprivileged ' from Piotr Grabowski This series enforces a minimum size of the unprivileged section when performing `shrink()` operation. When the cache is shrunk, we still drop entries first from unprivileged section (as before this commit), however, if this section is already small (smaller than `max_size / 2`), we will drop entries from the privileged section. This is necessary, as before this change the unprivileged section could be starved. For example if the cache could store at most 50 entries and there are 49 entries in privileged section, after adding 5 entries (that would go to unprivileged section) 4 of them would get evicted and only the 5th one would stay. This caused problems with BATCH statements where all prepared statements in the batch have to stay in cache at the same time for the batch to correctly execute. To correctly check if the unprivileged section might get too small after dropping an entry, `_current_size` variable, which tracked the overall size of cache, is changed to two variables: `_unprivileged_section_size` and `_privileged_section_size`, tracking section sizes separately. New tests are added to check this new behavior and bookkeeping of the section sizes. A test is added, that sets up a CQL environment with a very small prepared statement cache, reproduces issue in #10440 and stresses the cache. Fixes #10440. Closes #10456 * github.com:scylladb/scylla: loading_cache_test: test prepared stmts cache loading_cache: force minimum size of unprivileged loading_cache: extract dropping entries to lambdas loading_cache: separately track size of sections loading_cache: fix typo in 'privileged' (cherry picked from commit `5169ce40ef`)	2022-05-04 14:35:53 +03:00
Tomasz Grabiec	7d90f7e93f	loading_cache: Make invalidation take immediate effect There are two issues with current implementation of remove/remove_if: 1) If it happens concurrently with get_ptr(), the latter may still populate the cache using value obtained from before remove() was called. remove() is used to invalidate caches, e.g. the prepared statements cache, and the expected semantic is that values calculated from before remove() should not be present in the cache after invalidation. 2) As long as there is any active pointer to the cached value (obtained by get_ptr()), the old value from before remove() will be still accessible and returned by get_ptr(). This can make remove() have no effect indefinitely if there is persistent use of the cache. One of the user-perceived effects of this bug is that some prepared statements may not get invalidated after a schema change and still use the old schema (until next invalidation). If the schema change was modifying UDT, this can cause statement execution failures. CQL coordinator will try to interpret bound values using old set of fields. If the driver uses the new schema, the coordinaotr will fail to process the value with the following exception: User Defined Type value contained too many fields (expected 5, got 6) The patch fixes the problem by making remove()/remove_if() erase old entries from _loading_values immediately. The predicate-based remove_if() variant has to also invalidate values which are concurrently loading to be safe. The predicate cannot be avaluated on values which are not ready. This may invalidate some values unnecessarily, but I think it's fine. Fixes #10117 Message-Id: <20220309135902.261734-1-tgrabiec@scylladb.com> (cherry picked from commit `8fa704972f`)	2022-05-04 14:35:37 +03:00
Avi Kivity	3e98e17d18	Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop. Fixes: #10450 Closes #10451 * github.com:scylladb/scylla: replica/database: drop_column_family(): drop querier cache entries after waiting for ops replica/database: finish coroutinizing drop_column_family() replica/database: make remove(const column_family&) private (cherry picked from commit `7f1e368e92`)	2022-05-01 17:22:57 +03:00
Nadav Har'El	fa479c84ac	config: fix some types in system.config virtual table The system.config virtual tables prints each configuration variable of type T based on the JSON printer specified in the config_type_for<T> in db/config.cc. For two variable types - experimental_features and tri_mode_restriction, the specified converter was wrong: We used value_to_json<string> or value_to_json<vector<string>> on something which was not a string. Unfortunately, value_to_json silently casted the given objects into strings, and the result was garbage: For example as noted in #10047, for experimental_features instead of printing a list of features names, e.g., "raft", we got a bizarre list of one-byte strings with each feature's number (which isn't documented or even guaranteed to not change) as well as carriage-return characters (!?). So solution is a new printable_to_json<T> which works on a type T that can be printed with operator<< - as in fact the above two types can - and the type is converted into a string or vector of strings using this operator<<, not a cast. Also added a cql-pytest test for reading system.config and in particular options of the above two types - checking that they contain sensible strings and not "garbage" like before this patch. Fixes #10047. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220209090421.298849-1-nyh@scylladb.com> (cherry picked from commit `fef7934a2d`)	2022-04-14 19:29:08 +03:00
Tomasz Grabiec	40c26dd2c5	utils/chunked_managed_vector: Fix sigsegv during reserve() Fixes the case of make_room() invoked with last_chunk_capacity_deficit but _size not in the last reserved chunk. Found during code review, no user impact. Fixes #10364. Message-Id: <20220411224741.644113-1-tgrabiec@scylladb.com> (cherry picked from commit `0c365818c3`)	2022-04-13 09:48:34 +03:00

1 2 3 4 5 ...

2762 Commits