scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Nadav Har'El	14315fcbc3	mv: fix missing view deletions in some cases of range tombstones For efficiency, if a base-table update generates many view updates that go the same partition, they are collected as one mutation. If this mutation grows too big it can lead to memory exhaustion, so since commit `7d214800d0` we split the output mutation to mutations no longer than 100 rows (max_rows_for_view_updates) each. This patch fixes a bug where this split was done incorrectly when the update involved range tombstones, a bug which was discovered by a user in a real use case (#17117). Range tombstones are read in two parts, a beginning and an end, and the code could split the processing between these two parts and the result that some of the range tombstones in update could be missed - and the view could miss some deletions that happened in the base table. This patch fixes the code in two places to avoid breaking up the processing between range tombstones: 1. The counter "_op_count" that decides where to break the output mutation should only be incremented when adding rows to this output mutation. The existing code strangely incrmented it on every read (!?) which resulted in the counter being incremented on every input fragment, and in particular could reach the limit 100 between two range tombstone pieces. 2. Moreover, the length of output was checked in the wrong place... The existing code could get to 100 rows, not check at that point, read the next input - half a range tombstone - and only then check that we reached 100 rows and stop. The fix is to calculate the number of rows in the right place - exactly when it's needed, not before the step. The first change needs more justification: The old code, that incremented _op_count on every input fragment and not just output fragments did not fit the stated goal of its introduction - to avoid large allocations. In one test it resulted in breaking up the output mutation to chunks of 25 rows instead of the intended 100 rows. But, maybe there was another goal, to stop the iteration after 100 input rows and avoid the possibility of stalls if there are no output rows? It turns out the answer is no - we don't need this _op_count increment to avoid stalls: The function build_some() uses `co_await on_results()` to run one step of processing one input fragment - and `co_await` always checks for preemption. I verfied that indeed no stalls happen by using the existing test test_long_skipped_view_update_delete_with_timestamp. It generates a very long base update where all the view updates go to the same partition, but all but the last few updates don't generate any view updates. I confirmed that the fixed code loops over all these input rows without increasing _op_count and without generating any view update yet, but it does NOT stall. This patch also includes two tests reproducing this bug and confirming its fixed, and also two additional tests for breaking up long deletions that I wanted to make sure doesn't fail after this patch (it doesn't). By the way, this fix would have also fixed issue #12297 - which we fixed a year ago in a different way. That issue happend when the code went through 100 input rows without generating any output rows, and incorrectly concluding that there's no view update to send. With this fix, the code no longer stops generating the view update just because it saw 100 input rows - it would have waited until it generated 100 output rows in the view update (or the input is really done). Fixes #17117 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17164	2024-02-06 14:57:33 +02:00
Eliran Sinvani	0e5a8cad62	Add test for mv prepared statements invalidation on base alter Issue #16392 describes a bug where when a base table is altered, it's materialized views prepared statements are not invalidated which in turn causes them to return missing data. This test reproduces this bug and serves as a regression test for this problem. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-21 15:44:06 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Nadav Har'El	92f591dc38	test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra The test function test_mv_synchronous_updates checks the synchronous_updates feature, which is a ScyllaDB extension and doesn't exist in Cassandra. So it should be marked with "scylla_only" so that it doesn't fail when running the tests on Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Jan Ciolek	50943e825b	cql-pytest: enable test_is_not_null_forbidden_in_filter IS NOT NULL is now allowed only on the view's primary key columns, so the xfail marker can be removed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-06-07 02:30:11 +02:00
Jan Ciolek	7f0c64a69d	test: remove invalid IS NOT NULL restrictions from tests The IS NOT NULL restrictions is currently supported only in the CREATE MATERIALIZED VIEW statements. These restrictions works correctly for columns that are part of the view's primary key, but they're silently ignored on other columns. The following commits will forbid placing the IS NOT NULL restriction on columns that aren't a part of the view's primary key. The tests have to be modified in order to pass, because some of them have a useless IS NOT NULL restriction on regular columns that don't belong to the view's primary key. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-05-17 15:38:03 +02:00
Nadav Har'El	363f326d49	test/cql-pytest: test for CLUSTERING ORDER BY verification in MV Since commit `73e258fc34`, Scylla has partial verification for the CLUSTERING ORDER BY clause in CREATE MATERIALIZED VIEW. Specifically, invalid column names are rejected. But for reasons explained in issue #12936 and in the test in this patch, Cassandra demands that if CLUSTERING ORDER BY appears it must list all the clustering columns, with no duplicates, and do so in the right order. This patch replaces an existing test which suggested it is fine (an extention over Cassandra) to accept a partial list of clustering columns, by a test that verifies that such a partial list, or an incorrectly-ordered list, or list with duplicates, should be rejected. The new test fails on Scylla, and passes on Cassandra, so marked as xfail. Refs #12936. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12938	2023-03-01 08:02:39 +02:00
Nadav Har'El	73e258fc34	materialized views: verify CLUSTERING ORDER BY clause Cassandra is very strict in the CLUSTERING ORDER BY clause which it allows when creating a materialized view - if it appears, it must list all the clustering columns of the view. Scylla is less strict - a subset of the clustering columns may be specified. But Scylla was too lenient - a user could specify non-clustering columns and even non-existent columns and Scylla would not fail the MV creation. This patch fixes that - with it MV creation fails if anything besides clustering columns are listed on CLUSTERING ORDER BY. An xfailing test we had for this case no longer fails after this patch so its xfail mark is removed. We also add a few more corner cases to the tests. This patch also fixs one C++ test which had exactly the error that this patch detects - the test author tried to use the partition key, instead of the clustering key, in CLUSTERING ORDER BY (this error had no effect because the specified order, "asc", was the default anyway). Fixes #10767 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12885	2023-02-27 15:09:42 +02:00
Avi Kivity	561f4ca057	test: materialized view: add test exercising synthetic empty-type columns Materialized views inject synthetic empty-type columns in some conditions. Since we just touched empty-type serialization/deserialization, add a test to exercise it and make sure it still works.	2023-01-18 10:38:24 +02:00
Nadav Har'El	ef2e5675ed	materialized views, test: add tests for CLUSTERING ORDER BY In issue #10767, concerned were raised that the CLUSTERING ORDER BY clause is handled incorrectly in a CREATE MATERIALIZED VIEW definition. The tests in this patch try to explore the different ways in which CLUSTERING ORDER BY can be used in CREATE MATERIALIZED VIEW and allows us to compare Scylla's behaivor to Cassandra, and to common sense. The tests discover that the CLUSTERING ORDER BY feature in materialized views generally works as expected, but there are three differences between Scylla and Cassandra in this feature. We consider two differences to be bugs (and hence the test is marked xfail) and one a Scylla extension: 1. When a base table has a reverse-order clustering column and this clustering column is used in the materialized view, in Cassandra the view's clustering order inherits the reversed order. In Scylla, the view's clustering order reverts to the default order. Arguably, both behaviors can be justified, but usually when in doubt we should implement Cassandra's behavior - not pick a different behavior, even if the different behavior is also reasonable. So this test (test_mv_inherit_clustering_order()) is marked "xfail", and a new issue was created about this difference: #12308. If we want to fix this behavior to match Cassandra's we should also consider backward compatibility - what happens if we change this behavior in Scylla now, after we had the opposite behavior in previous releases? We may choose to enshrine Scylla's Cassandra- incompatible behavior here - and document this difference. 2. The CLUSTERING ORDER BY should, as its name suggests, only list clustering columns. In Scylla, specifying other things, like regular columns, partition-key columns, or non-existent columns, is silently ignored, whereas it should result in an Invalid Request error (as it does in Cassandra). So test_mv_override_clustering_order_error() is marked "xfail". This is the difference already discovered in #10767. 3. When a materialized view has several clustering columns, Cassandra requires that a CLUSTERING ORDER BY clause, if present, must specify the order of all of all clustering columns. Scylla, in contrast, allows the user to override the order of only some of these columns - and the rest get the default order. I consider this to be a legitimate Scylla extension, and not a compatibility bug, so marked the test with "scylla_only", and no issue was opened about it. Refs #10767 Refs #12308 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12307	2022-12-22 09:48:16 +02:00
Nadav Har'El	92d03be37b	materialized view: fix bug in some large modifications to base partitions Sometimes a single modification to a base partition requires updates to a large number of view rows. A common example is deletion of a base partition containing many rows. A large BATCH is also possible. To avoid large allocations, we split the large amount of work into batch of 100 (max_rows_for_view_updates) rows each. The existing code assumed an empty result from one of these batches meant that we are done. But this assumption was incorrect: There are several cases when a base-table update may not need a view update to be generated (see can_skip_view_updates()) so if all 100 rows in a batch were skipped, the view update stopped prematurely. This patch includes two tests showing when this bug can happen - one test using a partition deletion with a USING TIMESTAMP causing the deletion to not affect the first 100 rows, and a second test using a specially-crafed large BATCH. These use cases are fairly esoteric, but in fact hit a user in the wild, which led to the discovery of this bug. The fix is fairly simple: To detect when build_some() is done it is no longer enough to check if it returned zero view-update rows; Rather, it explicitly returns whether or not it is done as an std::optional. The patch includes several tests for this bug, which pass on Cassandra, failed on Scylla before this patch, and pass with this patch. Fixes #12297. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12305	2022-12-14 14:50:38 +02:00
Piotr Dulikowski	4883e43677	test_materialized_view: verify that static columns are not allowed Adds a test which verifies that static columns are not allowed in materialized views. Although we added support for static columns in secondary indexes, which share a lot of code with materialized views, static columns in materialized views are not yet ready to use.	2022-12-08 07:41:33 +01:00
Nadav Har'El	2f2f01b045	materialized views: fix view writes after base table schema change When we write to a materialized view, we need to know some information defined in the base table such as the columns in its schema. We have a "view_info" object that tracks each view and its base. This view_info object has a couple of mutable attributes which are used to lazily-calculate and cache the SELECT statement needed to read from the base table. If the base-table schema ever changes - and the code calls set_base_info() at that point - we need to forget this cached statement. If we don't (as before this patch), the SELECT will use the wrong schema and writes will no longer work. This patch also includes a reproducing test that failed before this patch, and passes afterwords. The test creates a base table with a view that has a non-trivial SELECT (it has a filter on one of the base-regular columns), makes a benign modification to the base table (just a silly addition of a comment), and then tries to write to the view - and before this patch it fails. Fixes #10026 Fixes #11542	2022-11-16 13:58:21 +02:00
Nadav Har'El	e4dba6a830	test/cql-pytest: add test for when MV requires IS NOT NULL As noted in issue #11979, Scylla inconsistently (and unlike Cassandra) requires "IS NOT NULL" one some but not all materialized-view key columns. Specifically, Scylla does not require "IS NOT NULL" on the base's partition key, while Cassandra does. This patch is a test which demonstrates this inconsistency. It currently passes on Cassandra and fails on Scylla, so is marked xfail. Refs #11979 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11980	2022-11-15 14:21:48 +01:00
Botond Dénes	5621cdd7f9	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671	2022-10-03 11:28:22 +03:00
Nadav Har'El	868a884b79	test/cql-pytest: add reproducer for ignored IS NOT NULL This test reproduces issue #10365: It shows that although "IS NOT NULL" is not allowed in regular SELECT filters, in a materialized view it is allowed, even for non-key columns - but then outright ignored and does not actually filter out anything - a fact which already surprised several users. The test also fails on Cassandra - it also wrongly allows IS NOT NULL on the non-key columns but then ignores this in the filter. So the test is marked with both xfail (known to fail on Scylla) and cassandra_bug (fails on Cassandra because of what we consider to be a Cassandra bug). Refs #10365 Refs #11606 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11615	2022-09-26 09:02:08 +03:00
Nadav Har'El	aa86f808a6	test/cql-pytest: failing tests for oversized key values in MV and SI In issue #9013, we noticed that if a value larger than 64 KB is indexed, the write fails in a bad way, and we fixed it. But the test we wrote when fixing that issue already suggested that something was still wrong: Cassandra failed the write cleanly, with an InvalidRequest, while Scylla failed with a mysterious WriteFailure (with a relevant error message only in the log). This patch adds several xfailing tests which demonstrate what's still wrong. This is also summarized in issue #8627: 1. A write of an oversized value to an indexed column returns the wrong error message. 2. The same problem also exists when indexing a collection, and the indexed key or value is oversized. 3. The situation is even less pleasant when adding an index to a table with pre-existing data and an oversized value. In this case, the view building will fail on the bad row, and never finish. 4. We have exactly the same bugs not just with indexes but also with materialized views. Interestingly, Cassandra has similar bugs in materialized views as well (but not in the secondary index case, where Cassandra does behave as expected). Refs #8627. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Piotr Sarna	277aa30965	cql-pytest: extend synchronous mv test with new cases The new cases cover: - a materialized view created with synchronous updates from the start - a materialized view created with synchronous updates, but then alter to not have synchronous updates anymore	2022-07-25 10:00:28 +02:00
Michał Sala	2993bbc33b	test: cql-pytest: add a test for synchronous mode materialized views The test verifies if a synchronous updates code path was triggered in a view that had synchronous_updates property set to true. Done by inspecting query traces.	2022-07-25 09:53:33 +02:00
Jan Ciolek	012f7d5b1a	cql-pytest: Test that IS NOT only accepts NULL The IS_NOT operator can only be used during materialized view creation and it can only be used to express IS NOT NULL. Trying to write something like IS NOT 42 should cause an error. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-11 15:47:16 +02:00
Piotr Sarna	01d281442e	test: extend view filtering test case In order to cover more code paths, the test case now places filtering on various combinations of base columns, including both primary keys and regular columns. It also makes the test scylla_only, as filtering is an extension not supported in Cassandra right now. Closes #10860	2022-06-23 14:19:41 +03:00
Piotr Sarna	bc3a635c42	view: exclude using static columns in the view filter The code which applied view filtering (i.e. a condition placed on a view column, e.g. "WHERE v = 42") erroneously used a wildcard selection, which also assumes that static columns are needed, if the base table contains any such columns. The filtering code currently assumes that no such columns are fetched, so the selection is amended to only ask for regular columns (primary key columns are sent anyway, because they are enabled via slice options, so no need to ask for them explicitly). Fixes #10851 Closes #10855	2022-06-22 15:55:45 +03:00
Nadav Har'El	ef43531fb6	materialized views: allow empty strings in views and indexes Although Cassandra generally does not allow empty strings as partition keys (note they are allowed as clustering keys!), it does allow empty strings in regular columns to be indexed by a secondary index, or to become an empty partition-key column in a materialized view. As noted in issues #9375 and #9364 and verified in a few xfailing cql-pytest tests, Scylla didn't allow these cases - and this patch fixes that. The patch mostly removes unnecessary code: In one place, code prevented an sstable with an empty partition key from being written. Another piece of removed code was a function is_partition_key_empty() which the materialized-view code used to check whether the view's row will end up with an empty partition key, which was supposedly forbidden. But in fact, should have been allowed like they are allowed in Cassandra and required for the secondary-index implementation, and the entire function wasn't necessary. Note that the removed function is_partition_key_empty() was NOT required for the "IS NOT NULL" feature of materialized views - this continues to work as expected after this patch, and we add another test to confirm it. Being null and being an empty string are two different things. This patch also removes a part of a unit test which enshrined the wrong behavior. After this patch we are left with one interesting difference from Cassandra: Though Cassandra allows a user to create a view row with an empty-string partition key, and this row is fully visible in when scanning the view, this row can not be queried individually because "WHERE v=''" is forbidden when v is the partition key (of the view). Scylla does not reproduce this anomaly - and such point query does work in Scylla after this patch. We add a new test to check this case, and mark it "cassandra_bug", i.e., it's a Cassandra behavior which we consider wrong and don't want to emulate. This patch relies on #9352 and #10178 having been fixed in previous patches, otherwise the WHERE v='' does not work when reading from sstables. We add to the already existing tests we had for empty materialized-views keys a lookup with WHERE v='' which failed before fixing those two issues. Fixes #9364 Fixes #9375 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-03-08 15:34:26 +02:00
Nadav Har'El	5d2f694a90	cql3: fix cql3::util::maybe_quote() for keywords cql3::util::maybe_quote() is a utility function formatting an identifier name (table name, column name, etc.) that needs to be embedded in a CQL statement - and might require quoting if it contains non-alphanumeric characters, uppercase characters, or a CQL keyword. maybe_quote() made an effort to only quote the identifier name if neccessary, e.g., a lowercase name usually does not need quoting. But lowercase names that are CQL keywords - e.g., to or where - cannot be used as identifiers without quoting. This can cause problems for code that wants to generate CQL statements, such as the materialized-view problem in issue #9450 - where a user had a column called "to" and wanted to create a materialized view for it. So in this patch we fix maybe_quote() to recognize invalid identifiers by using the CQL parser, and quote them. This will quote reserved keywords, but not so-called unreserved keywords, which are allowed as identifiers and don't need quoting. This addition slows down maybe_quote(), but maybe_quote() is anyway only used in heavy operations which need to generate CQL. This patch also adds two tests that reproduce the bug and verify its fix: 1. Add to the low-level maybe_quote() test (a C++ unit test) also tests that maybe_quote() quotes reserved keywords like "to", but doesn't quote unreserved keywords like "int". 2. Add a test reproducing issue #9450 - creating a materialized view whose key column is a keyword. This new test passes on Cassandra, failed on Scylla before this patch, and passes after this patch. It is worth noting that maybe_quote() now has a "forward compatiblity" problem: If we save CQL statements generated by maybe_quote(), and a future version introduces a new reserved keyword, the parser of the future version may not be able to parse the saved CQL statement that was generated with the old mayb_quote() and didn't quote what is now a keyword. This problem can be solved in two ways: 1. Try hard not to introduced new reserved keywords. Instead, introduce unreserved keywords. We've been doing this even before recognizing this maybe_quote() future-compatibility problem. 2. In the next patch we will introduce quote() - which unconditionally quotes identifier names, even if lowercase. These quoted names will be uglier for lowercase names - but will be safe from future introduction of new keywords. So we can consider switching some or all uses of maybe_quote() to quote(). Fixes #9450 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220118161217.231811-1-nyh@scylladb.com>	2022-02-07 11:33:56 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Nadav Har'El	e8493e20cb	cql-pytest: test for empty-string as partition key in materialized view Scylla and Cassandra do not allow an empty string as a partition key, but a materialized view might "convert" a regular string column into a partition key, and an empty string is a perfectly valid value for this column. This can result in a view row which has an empty string as a partition key. This case works in Cassandra, but doesn't in Scylla (the row with the empty string as a partition key doesn't appear). The following test demonstrates this difference between Scylla and Cassandra (it passes on Cassandra, fails on Scylla, and accordingly marked "xfail"). Refs #9375. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210922115000.290387-1-nyh@scylladb.com>	2021-09-22 18:55:25 +03:00
Pavel Solodovnikov	b1a3b59a08	test: test_materialized_view: test_mv_select_stmt_bound_values: improve error handling Restrict expected exception message to filter only relevant exception, matching both for scylla and cassandra. For example, the former has this message: Cannot use query parameters in CREATE MATERIALIZED VIEW statements While the latter throws this: Bind variables are not allowed in CREATE MATERIALIZED VIEW statements Also, place cleanup code in try-finally clause. Tests: cql-pytest:test_materialized_view.py(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210802083912.229886-1-pa.solodovnikov@scylladb.com>	2021-08-02 11:49:50 +03:00
Pavel Solodovnikov	1ca7825cf6	test: add a test checking that bind markers within MVs SELECT statement don't lead to a crash The request should fail with `InvalidRequest` exception and shouldn't crash the database. Don't check for actual error messages, because they are different between Scylla and Cassandra. The former has this message: Cannot use query parameters in CREATE MATERIALIZED VIEW statements While the latter throws this: Bind variables are not allowed in CREATE MATERIALIZED VIEW statements Tests: cql-pytest/test_materialized_view.py(scylla dev, cassandra trunk) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-07-30 17:57:24 +03:00
Piotr Sarna	c05340c4bf	cql-pytest: add a materialized views suite with first cases cql-pytest did not have a suite for materialized views, so one is created. At the same time, test cases for building/updating a view on a base table with large cells is added as a regression test for #9047.	2021-07-15 15:40:38 +02:00

30 Commits