scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Botond Dénes	aa523141f9	Merge 'Backport Alternator TTL tests' from Nadav Har'El This series backports several patches which add or enable tests for Alternator TTL. The series does not touch the code - just tests. The goal of backporting more tests is to get the code - which is already in branch 5.1 - tested. It wasn't a good idea to backport code without backporting the tests for it. Closes #12200 Fixes #11374 * github.com:scylladb/scylladb: test/alternator: increase timeout on TTL tests test/alternator: fix timeout in flaky test test_ttl_stats test/alternator: test Alternator TTL metrics test/alternator: skip fewer Alternator TTL tests	2022-12-22 09:51:46 +02:00
Nadav Har'El	0debb419f7	Merge 'alternator: fix wrong 'where' condition for GSI range key' from Marcin Maliszkiewicz Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages). Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800) Closes #11926 * github.com:scylladb/scylladb: alternator: add maybe_quote to secondary indexes 'where' condition test/alternator: correct xfail reason for test_gsi_backfill_empty_string test/alternator: correct indentation in test_lsi_describe alternator: fix wrong 'where' condition for GSI range key (cherry picked from commit `ce7c1a6c52`)	2022-12-05 20:18:39 +02:00
Nadav Har'El	aa206a6b6a	test/alternator: increase timeout on TTL tests Some of the tests in test/alternator/test_ttl.py need an expiration scan pass to complete and expire items. In development builds on developer machines, this usually takes less than a second (our scanning period is set to half a second). However, in debug builds on Jenkins each scan often takes up to 100 (!) seconds (this is the record we've seen so far). This is why we set the tests' timeout to 120. But recently we saw another test run failing. I think the problem is that in some case, we need not one, but two scanning passes to complete before the timeout: It is possible that the test writes an item right after the current scan passed it, so it doesn't get expired, and then we a second scan at a random position, possibly making that item we mention one of the last items to be considered - so in total we need to wait for two scanning periods, not one, for the item to expire. So this patch increases the timeout from 120 seconds to 240 seconds - more than twice the highest scanning time we ever saw (100 seconds). Note that this timeout is just a timeout, it's not the typical test run time: The test can finish much more quickly, as little as one second, if items expire quickly on a fast build and machine. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12106 (cherry picked from commit `6bc3075bbd`)	2022-12-05 14:21:22 +02:00
Nadav Har'El	9baf72b049	test/alternator: fix timeout in flaky test test_ttl_stats The test `test_metrics.py::test_ttl_stats` tests the metrics associated with Alternator TTL expiration events. It normally finishes in less than a second (the TTL scanning is configured to run every 0.5 seconds), so we arbitrarily set a 60 second timeout for this test to allow for extremely slow test machines. But in some extreme cases even this was not enough - in one case we measured the TTL scan to take 63 seconds. So in this patch we increase the timeout in this test from 60 seconds to 120 seconds. We already did the same change in other Alternator TTL tests in the past - in commit `746c4bd`. Fixes #11695 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11696 (cherry picked from commit `3a30fbd56c`)	2022-12-05 14:21:22 +02:00
Nadav Har'El	8e62405117	test/alternator: test Alternator TTL metrics This patch adds a test for the metrics generated by the background expiration thread run for Alternator's TTL feature. We test three of the four metrics: scylla_expiration_scan_passes, scylla_expiration_scan_table and scylla_expiration_items_deleted. The fourth metric, scylla_expiration_secondary_ranges_scanned, counts the number of times that this node took over another node's expiration duty. so requires a multi-node cluster to test, and we can't test it in the single-node cluster test framework. To see TTL expiration in action this test may need to wait up to the setting of alternator_ttl_period_in_seconds. For a setting of 1 second (the default set by test/alternator/run), this means this test can take up to 1 second to run. If alternator_ttl_period_in_seconds is set higher, the test is skipped unless --runveryslow is requested. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `297109f6ee`)	2022-12-05 14:21:16 +02:00
Nadav Har'El	15421e45a0	test/alternator: skip fewer Alternator TTL tests Most of the Alternator TTL tests are extremely slow on DynamoDB because item expiration may be delayed up to 24 hours (!), and in practice for 10 to 30 minutes. Because of this, we marked most of these tests with the "veryslow" mark, causing them to be skipped by default - unless pytest is given the "--runveryslow" option. The result was that the TTL tests were not run in the normal test runs, which can allow regressions to be introduced (luckily, this hasn't happened). However, this "veryslow" mark was excessive. Many of the tests are very slow only on DynamoDB, but aren't very slow on Scylla. In particular, many of the tests involve waiting for an item to expire, something that happens after the configurable alternator_ttl_period_in_seconds, which is just one second in our tests. So in this patch, we remove the "veryslow" mark from 6 tests of Alternator TTL tests, and instead use two new fixtures - waits_for_expiration and veryslow_on_aws - to only skip the test when running on DynamoDB or when alternator_ttl_period_in_seconds is high - but in our usual test environment they will not get skipped. Because 5 of these 6 tests wait for an item to expire, they take one second each and this patch adds 5 seconds to the Alternator test runtime. This is unfortunate (it's more than 25% of the total Alternator test runtime!) but not a disaster, and we plan to reduce this 5 second time futher in the following patch, but decreasing the TTL scanning period even further. This patch also increases the timeout of several of these tests, to 120 seconds from the previous 10 seconds. As mentioned above, normally, these tests should always finish in alternator_ttl_period_in_seconds (1 second) with a single scan taking less than 0.2 seconds, but in extreme cases of debug builds on overloaded test machines, we saw even 60 seconds being passed, so let's increase the maximum. I also needed to make the sleep time between retries smaller, not a function of the new (unrealistic) timeout. 4 more tests remain "veryslow" (and won't run by default) because they are take 5-10 seconds each (e.g., a test which waits to see that an item does not get expired, and a test involving writing a lot of data). We should reconsider this in the future - to perhaps run these tests in our normal test runs - but even for now, the 6 extra tests that we start running are a much better protection against regressions than what we had until now. Fixes #11374 Signed-off-by: Nadav Har'El <nyh@scylladb.com> x Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `746c4bd9eb`)	2022-12-05 13:07:16 +02:00
Alexander Turetskiy	2f78df92ab	Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs. The return from DescribeTable which describes GSIs and LSIs is missing the Projection field. We do not yet support all the settings Projection (see #5036), but the default which we support is ALL, and DescribeTable should return that in its description. Fixes #11470 Closes #11693 (cherry picked from commit `636e14cc77`)	2022-11-07 10:36:04 +02:00
Botond Dénes	fa94222662	Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted. In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it). Fixes #11801 Closes #11808 * github.com:scylladb/scylladb: test/alternator: add test for issue 11801 MV: fix handling of view update which reassign the same key value materialized views: inline used-once and confusing function, replace_entry() (cherry picked from commit `e981bd4f21`)	2022-11-01 13:14:21 +02:00
Nadav Har'El	d3fd090429	alternator: return ProvisionedThroughput in DescribeTable DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable operation must return a ProvisionedThroughput structure, listing both ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not stated in some DynamoDB documentation but is explictly mentioned in https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html Also in empirically, DynamoDB returns ProvisionedThroughput with zeros even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this. The ProvisionedThroughput structure being missing was a problem for applications like DynamoDB connectors for Spark, if they implicitly assume that ProvisionedThroughput is returned by DescribeTable, and fail (as described in issue #11222) if it's outright missing. So this patch adds the missing ProvisionedThroughput structure, and the xfailing test starts to pass. Note that this patch doesn't change the fact that attempting to set a table to PROVISIONED billing mode is ignored: DescribeTable continues to always return PAY_PER_REQUEST as the billing mode and zero as the provisioned capacities. Fixes #11222 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11298 (cherry picked from commit `941c719a23`)	2022-10-03 14:26:55 +03:00
Aleksandra Martyniuk	8e892426e2	test: move scylla_inject_error from alternator/ to cql-pytest/ Move scylla_inject_error from alternator/ to cql-pytest/ so it can be reached from various tests dirs. alternator/util.py is renamed to alternator/alternator_util.py to avoid name shadowing.	2022-07-29 09:35:20 +02:00
Nadav Har'El	eaf3579c15	test/alternator: several more simple tests for UpdateItem This patch adds several more tests for Alternator's UpdateItem operation. These tests verify a few simple cases that, surprisingly, never had test coverage. The new tests pass (on both DynamoDB and Alternator) so did not expose any bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11025	2022-07-12 21:48:33 +02:00
Nadav Har'El	2581b54ea0	test/{alternator,redis}: stop using deprecated "disutils" package Python has deprecated the distutils package. In several places in the Alternator and Redis test suites, we used distutils.version to check if the library is new enough for running the test (and skip the test if it's too old). On new versions of Python, we started getting deprecation warnings such as: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives PEP 632 recommends using package.version instead of distutils.version, and indeed it works well. After applying this patch, Alternator and Redis test runs no long end in silly deprecation warnings. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11007	2022-07-11 08:00:45 +03:00
David Garcia	b85843b9cc	Fix broken links Fix broken links	2022-06-28 15:19:36 +01:00
David Garcia	bb21c3c869	Move dev docs to docs/dev	2022-06-24 18:07:08 +01:00
Nadav Har'El	3aca1ca572	alternator: make BatchGetItem group reads by partition DynamoDB API's BatchGetItem invokes a number (up to 25) of read requests in parallel, returning when all results are available. Alternator naively implemented this by sending all read requests in parallel, no matter which requests these were. That implementation was inefficient when all the requests are to different items (clustering rows) of the same partition. In a multi-node setup this will end up sending 25 separate requests to the same remote node(s). Even on a single-node setup, this may result in reading from disk more than once, and even if the partition is cached - doing an O(logN) search in each multiple times. What we do in this patch, instead, is to group all the BatchGetItem requests that aimed at the same partition into a single read request asking for a (sorted) list of clustering keys. This is similar to an "IN" request in CQL. As an example of the performance benefit of this patch, I tried a BatchGetItem request asking for 20 random items from a 10-million item partition. I measured the latency of this request on a single-node Scylla. Before this patch, I saw a latency of 17-21 ms (the lower number is when the request is retried and the requested items are already in the cache). After this patch, the latency is 10-14 ms. The performance improvement on multi-node clusters are expected to be even higher. Unfortunately the patch is less trivial than I hoped it would be, because some of the old code was organized under the assumption that each read request only returned one item (and if it failed, it means only one item failed), so this part of the code had to be reorganized (and, for making the code more readable, coroutinized). An unintended benefit of the code reorganization is that it also gave me an opportunity to fail an attempt to ask BatchGetItem the same item more than once (issue #10757). The patch also adds a few more corner cases in the tests, to be even more sure that the code reorganization doesn't introduce a regression in BatchGetItem. Fixes #10753 Fixes #10757 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-06-19 14:47:57 +03:00
Nadav Har'El	0be06e0bdf	test/alternator: additional test for BatchGetItem Our simple test for BatchGetItem on a table with sort keys still has requests with just one sort key per partition, so if BatchGetItem has a bug with requesting multiple sort keys from the same partition, such bug won't be caught by the simple tests. So in this test we add a test that does. This will be useful for the next patch, we are planning to refactor BatchGetItem's handling of multiple sort keys in the same partition - so it will be useful to have more regression tests. The tests test_batch_get_item_large and test_batch_get_item_partial would actually also catch such bugs, but they are more elaborate tests and it's nice to have smaller tests more focused on checking specific features. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-06-16 18:19:20 +03:00
Nadav Har'El	e20233dab1	alternator: improve error handling when trying to tag a GSI or LSI In issue #10786, we raised the idea of maybe allowing to tag (with TagResource) GSIs and LSIs, not just base tables. However, currently, neither DynamoDB nor Syclla allows it. So in this patch we add a test that confirms this. And while at it, we fix Alternator to return the same error message as DynamoDB in this case. Refs #10786. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-06-13 18:14:42 +03:00
Nadav Har'El	8866c326de	alternator: forbid duplicate index (LSI and GSI) names Adding an LSI and GSI with the same name to the same Alternator table should be forbidden - because if both exists only one of them (the GSI) would actually be usable. DynamoDB also forbids such duplicate name. So in this patch we add a test for this issue, and fix it. Since the patch involves a few more uses of the IndexName string, we also clean up its handling a bit, to use std::string_view instead of the old-style std::string&. Fixes #10789 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-06-13 18:14:42 +03:00
Nadav Har'El	00866a75d8	alternator: add ARN for indexes (LSI and GSI) DynamoDB gives an ARN ("Amazon Resource Name") to LSIs and GSIs. These look like BASEARN/index/INDEXNAME, where BASEARN is the ARN of the base table, and INDEXNAME is the name of the LSI or the GSI. These ARNs should be returned by DescribeTable as part of its description of each index, and this patch adds that missing IndexArn field. The ARN we're adding here is hardly useful (e.g., as explained in issue #10786, it can't be used to add tags to the index table), but nevertheless should exist for compatibility with DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-06-13 18:14:42 +03:00
Nadav Har'El	75c2bd78ae	test/alternator: reproducer for GetBatchItem duplicate keys It turns out that DynamoDB forbids requesting the same item more than once in a GetBatchItem request. Trying to do it would obviously be a waste, but DynamoDB outright refuses it - and Alternator currently doesn't (refs #10757). The test currently passes on DynamoDB and fails on Alternator, so it is marked xfail. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #10758	2022-06-09 07:04:50 +02:00
Nadav Har'El	d0ca09a925	alternator: implement DescribeContinuousBackups operation Although we don't yet support the DynamoDB API's backup features (see issue #5063), we can already implement the DescribeContinuousBackups operation. It should just say that continuous backups, and point-in-time restores, and disabled. This will be useful for client code which tries to inquire about continuous backups, even if not planning to use them in practice (e.g., see issue #10660). Refs #5063 Refs #10660 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-05-26 15:13:50 +03:00
Nadav Har'El	f6ce7891a5	test/alternator: add test for key length limits DynamoDB limits partition-key length to 2048 bytes and sort-key length to 1024 bytes. Alternator currently has no such limits officially, but if a user tries a key length of over 64 KB, the result will be an "internal server error" as Alternator runs into Scylla's low-level key length limit of 64 KB. In this patch we add (mostly xfailing) tests confirming all the above observations. The tests include extensive comments on what they are testing and why. Some of these tests (specifically, the ones checking what happens above 64 KB) should pass once Alternator is fixed. Other tests - requiring that the limits be exactly what they are in DynamoDB - may either not pass or change in the future, depending on what we decide the limits should be in Alternator. Refs #10347 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #10438	2022-04-26 18:09:19 +02:00
Nadav Har'El	84143c2ee5	alternator: implement Select option of Query and Scan This patch implements the previously-unimplemented Select option of the Query and Scan operators. The most interesting use case of this option is Select=COUNT which means we should only count the items, without returning their actual content. But there are actually four different Select settings: COUNT, ALL_ATTRIBUTES, SPECIFIC_ATTRIBUTES, and ALL_PROJECTED_ATTRIBUTES. Five previously-failing tests now pass, and their xfail mark is removed: * test_query.py::test_query_select * test_scan.py::test_scan_select * test_query_filter.py::test_query_filter_and_select_count * test_filter_expression.py::test_filter_expression_and_select_count * test_gsi.py::test_gsi_query_select_1 These tests cover many different cases of successes and errors, including combination of Select and other options. E.g., combining Select=COUNT with filtering requires us to get the parts of the items needed for the filtering function - even if we don't need to return them to the user at the end. Because we do not yet support GSI/LSI projection (issue #5036), the support for ALL_PROJECTED_ATTRIBUTES is a bit simpler than it will need to be in the future, but we can only finish that after #5036 is done. Fixes #5058. The most intrusive part of this patch is a change from attrs_to_get - a map of top-level attributes that a read needs to fetch - to an optional<attrs_to_get>. This change is needed because we also need to support the case that we want to read no attributes (Select=COUNT), and attrs_to_get.empty() used to mean that we want to read all attributes, not no attributes. After this patch, an unset optional<attrs_to_get> means read all attributes, a set but empty attrs_to_get means read no attributes, and a set and non-empty attrs_to_get means read those specific attributes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220405113700.9768-2-nyh@scylladb.com>	2022-04-11 10:04:32 +02:00
Nadav Har'El	9c1ebdceea	alternator: forbid empty AttributesToGet In DynamoDB one can retrieve only a subset of the attributes using the AttributesToGet or ProjectionExpression paramters to read requests. Neither allows an empty list of attributes - if you don't want any attributes, you should use Select=COUNT instead. Currently we correctly refuse an empty ProjectionExpression - and have a test for it: test_projection_expression.py::test_projection_expression_toplevel_syntax However, Alternator is missing the same empty-forbidding logic for AttributesToGet. An empty AttributesToGet is currently allowed, and basically says "retrieve everything", which is sort of unexpected. So this patch adds the missing logic, and the missing test (actually two tests for the same thing - one using GetItem and the other Query). Fixes #10332 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220405113700.9768-1-nyh@scylladb.com>	2022-04-11 10:21:02 +03:00
Nadav Har'El	86d01542de	test/alternator: test another example of nested function calls In the existing test we noticed that list_append(if_not_exists(...)) is allowed, but list_append(list_append(...)) is not. I wasn't sure whether if_not_exists(if_not_exists(..)) will be allowed - and this test verifies that it is - it works on both Scylla and DynamoDB, and gives the same results on both. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220407122729.155648-1-nyh@scylladb.com>	2022-04-11 09:56:02 +03:00
Nadav Har'El	67e0590bbc	alternator: remove old TODO (with test verifying it) We had an old TODO in the Alternator "Scan" operation code which suggested that we may need to do something to limit the size of pages when a row limit ("Limit") isn't given. But we do already have a built-in limit on page sizes (1 MB), so this TODO isn't needed and can be removed. But I also wanted to make sure we have a test that this limit works: We already had a test that this 1 MB limit works for a single-partition Query (test_query.py::test_query_reverse_longish - tested both forward and reversed queries). In this patch I add a similar test for a whole- table Scan. It turns out that although page size is limited in this case as well, it's not exactly 1 MB... For small tables can even reach 3 MB. I consider this "good enough" and that we can drop the TODO, but also opened issue #10327 to document this surprising (for me) finding. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220404145240.354198-1-nyh@scylladb.com>	2022-04-05 09:23:23 +03:00
Nadav Har'El	56936d3c16	test/alternator: add reproducers for scan of long string of tombstones This patch adds two xfailing tests for issue #7933. That issue is about what Scan or Query paging does when encountering a very long string of consecutive tombstones (partition or row tombstones). Ideally, in that case the scan could stop on one of these tombstones after already processing too many. But as these two tests demonstrate, the scan can't stop in the middle of a long string of tombstones - and as a result retrieving a single page can take an unbounded amount of time, which is wrong. Currently the tests are marked `@veryslow` (they each take more than a minute) because they each create a huge number of tombstones to demonstrate a huge amount of work for a single page. When we fix issue #7933 and have a much smaller limit on the number of tombstones processed in a single page, we can hopefully make these tests much shorter and remove the `@veryslow` tag. The `@veryslow` tags means that although these tests can be used manually (with `--runveryslow`) they will not yet be run as part of the usual regression tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220403070706.250147-1-nyh@scylladb.com>	2022-04-05 09:11:38 +03:00
Nadav Har'El	758f8f01d7	test/alternator: turn REST API finding into a fixture In test_tracing.py and util.py, we already have three duplicates of code which looks for the Scylla REST API. We'll soon want to add even more uses of this REST API, so it's good time to add a single fixture, "rest_api", which can be use in all tests that need the Scylla REST API instead of duplicating the same code. A test using the "rest_api" fixture will be skipped if the server isn't Scylla, or its port 10000 is not available or not responsive. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220331195337.64352-1-nyh@scylladb.com>	2022-04-01 10:51:59 +03:00
Nadav Har'El	d8c0680585	test/alternator: add regression test for old ALL_NEW bug In commit `964500e47a`, in the middle of a larger series, I fixed a small Alternator bug that I found while working on that series. The bug was that the ReturnValues=ALL_NEW feature moved out the read previous_item, which breaks operations that need previous_item, e.g., an ADD operation. Unfortunately, we never had a regression test for this fix bug, so in this patch I add one. This bug was re-discovered on an old branch by a user, at which point I noticed that we don't have a test for it - so I want to add it now, even though the bug itself is long gone from Scylla master. I verified that the new test indeed fails on old versions of Scylla before the aforementioned commit, and passes when backporting only that commit. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220327074928.3608576-1-nyh@scylladb.com>	2022-03-28 08:40:28 +02:00
Nadav Har'El	653f2df28f	alternator: fix JSON escaping of error responses In the DynamoDB API, error responses are in JSON format with specific fields ("__type" and "message" in the x-amz-json-1.0 format currently used). Alternator tried to be clever and build the string representation of this JSON itself, instead of using RapidJSON. But this optimization was a mistake - if the error message contains characters that need escaping (such as double quotes and newlines), they weren't escaped, and the resulting JSON was malformed. When the client library boto3 read this malformed JSON it got confused, cosidered the entire error response to be a string, which resulted in an ugly error message. The fix is easy - just build the JSON output as usual with RapidJSON instead of trying to optimize using string operation. The patch also includes two tests reproducing this bug and checking its fix. The first test uses boto3 and shows it got confused on the type of error (not understanding that it is a ValidationException). The second test bypasses boto3 and shows exactly where the bug happens - the response is an unparsable JSON. Fixes #10278 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220327132705.3707979-1-nyh@scylladb.com>	2022-03-27 16:32:36 +03:00
Nadav Har'El	49a8164fb7	alternator: add configurable scan period to TTL expiration Before this patch, the experimental TTL (expiration time) feature in Alternator scans tables for expiration in a tight loop - starting the next scan one second after the previous one completed. In this patch we introduce a new configuration option, alternator_ttl_period_in_seconds, which determines how frequently to start the scan. The default is 24 hours - meaning that the next scan is started 24 hours after the previous one started. The tests (test/alternator/run) change this configuration back to one second, so that expiration tests finish as quickly as possible. Please note that the scan is not slowed down to fill this 24 hours - if it finishes in one hour, it will then sleep for 23 hours. Additional work would be needed to slow down the scan to not finish too quickly. One idea not yet implemented is to move the expiration service from the "maintenance" scheduling group which it uses today to a new scheduling group, and modifying the number of shares that this group gets. Another thing worth noting about the configurable period (which defaults to 24 hours) is that when TTL is enabled on an Alternator table, it can take that amount of time until its scan starts and items start expiring from it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-02-25 07:26:11 +02:00
Nadav Har'El	4349514064	test/alternator: add smaller reproducer for Limit-less reverse query The regression test we have for Alternator's issue #9487 (where a reverse query without a Limit given was broken into 100MB pages instead of the expected 1MB) is test_query.py::test_query_reverse_long. But this is a very long test requiring a 100MB partition, and because of its slowness isn't run by default. This patch adds another version of that test, test_query_reverse_longish, which reproduces the same issue #9487 with a partition 50 times shorter (2MB) so it only takes a fraction of a second and can be enabled by default. It also requires much less network traffic which is important when running these tests non-locally. We leave the original test test_query_reverse_long behind, it can be still useful to stress Scylla even beyond the 100MB boundary, but it remains in @veryslow mode so won't run in default test runs. Refs #9487 Refs #7586 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220220161905.852994-1-nyh@scylladb.com>	2022-02-21 09:12:16 +01:00
Nadav Har'El	f292d3d679	alternator: make schema modifications in CreateTable atomic The Alternator CreateTable operation currently performs several schema- changing operations separately - one by one: It creates a keyspace, a table in that keyspace and possibly also multiple views, and it sets tags on the table. A consequence of this is that concurrent CreateTable and DeleteTable operations (for example) can result in unexpected errors or inconsistent states - for example CreateTable wants to create the table in the keyspace it just created, but a concurrent DeleteTable deleted it. We have two issues about this problem (#6391 and #9868) and three tests (test_table.py::test_concurrent_create_and_delete_table) reproducing it. In this patch we fix these problems by switching to the modern Scylla schema-changing API: Instead of doing several schema-changing operations one by one, we create a vector of schema mutation performing all these operations - and then perform all these mutations together. When the experimental Raft-based schema modifications is enabled, this completely solves the races, and the tests begin to pass. However, if the experimental Raft mode is not enabled, these tests continue to fail because there is still no locking while applying the different schema mutations (not even on a single node). So I put a special fixture "fails_without_raft" on these tests - which means that the tests xfail if run without raft, and expected to pass when run on Raft. Indeed, after this patch test/alternator/run --raft test_table.py::test_concurrent_create_and_delete_table shows three passing tests (they also pass if we drastically improve the number of iterations), while test/alternator/run test_table.py::test_concurrent_create_and_delete_table shows three xfailing tests. All other Alternator tests pass as before with this patch, verifying that the handling of new tables, new views, tags, and CDC log tables, all happen correctly even after this patch. A note about the implementation: Before this patch, the CreateTable code used high-level functions like prepare_new_column_family_announcement(). These high-level functions become unusable if we write multiple schema operations to one list of mutations, because for example this function validates that the keyspace had already been created - when it hasn't and that's the whole point. So instead we had to use lower-level function like add_table_or_view_to_schema_mutation() and before_create_column_family(). However, despite being lower level, these functions were public so I think it's reasonable to use them, and we probably have no other alternative. Fixes #6391 Fixes #9868 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-02-18 09:03:52 +02:00
Nadav Har'El	212c321c55	test/alternator: add reproducers for non-atomic table creation We add reproducing tests for two known Alternator issues, #6391 and #9868, which involve the non-atomicity of table creation. Creating a table currently involves multiple steps - creating a keyspace, a table, materialized views, and tags. If some of these steps succeed and some fail, we get an InternalServerError and potentially leave behind some half-built table. Both issues will be solved by making better use of the new Raft-based capabilities of making multiple modifications to the schema atomically, but this patch doesn't fix the problem - it just proves it exist. The new tests involve two threads - one repeatedly trying to create a table with a GSI or with tags - and the other thread repeatedly trying to delete the same table under its feet. Both bugs are reproduced almost immediately. Note that like all test/alternator tests, the new tests are usually run on just one node. So when we fix the bug and these tests begin to pass, it will not be a proof that concurrent schema modification works safely on different nodes. To prove that, we will also need a multi-node test. However, this test can prove that we used Raft-based schema modification correctly - and if we assume that the Raft-based schema modification feature is itself correct, then we can be sure that CreateTable will be correct also across multiple nodes. Although it won't hurt to check it directly. Refs #6391 Refs #9868 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220207223100.207074-1-nyh@scylladb.com>	2022-02-14 18:21:21 +02:00
Nadav Har'El	4937270803	test/alternator: add option to run with Raft-based schema changes This patch adds a "--raft" option to test/alternator/run to enable the experimental Raft-based schema changes ("--experimental-features=raft") when running Scylla for the tests. This is the same option we added to test/cql-pytest/run in a previous patch. Note that we still don't have any Alternator tests that pass or fail differently in these two modes - these will probably come later as we fix issues #9868 and #6391. But in order to work on fixing those issues we need to be able to run the tests in Raft mode. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220209123144.321344-1-nyh@scylladb.com>	2022-02-10 09:43:10 +02:00
Nadav Har'El	9982a28007	alternator: allow REMOVE of non-existent nested attribute DynamoDB allows an UpdateItem operation "REMOVE x.y" when a map x exists in the item, but x.y doesn't - the removal silently does nothing. Alternator incorrectly generated an error in this case, and unfortunately we didn't have a test for this case. So in this patch we add the missing test (which fails on Alternator before this patch - and passes on DynamoDB) and then fix the behavior. After this patch, "REMOVE x.y" will remain an error if "x" doesn't exist (saying "document paths not valid for this item"), but if "x" exists and is a map, but "x.y" doesn't, the removal will silently do nothing and will not be an error. Fixes #10043. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220207133652.181994-1-nyh@scylladb.com>	2022-02-07 18:40:48 +02:00
Nadav Har'El	8a745593a2	Merge 'alternator: fill UnprocessedKeys for failed batch reads' from Piotr Sarna DynamoDB protocol specifies that when getting items in a batch failed only partially, unprocessed keys can be returned so that the user can perform a retry. Alternator used to fail the whole request if any of the reads failed, but right now it instead produces the list of unprocessed keys and returns them to the user, as long as at least 1 read was successful. This series comes with a test based on Scylla's error injection mechanism, and thus is only useful in modes which come with error injection compiled in. In release mode, expect to see the following message: SKIPPED (Error injection not enabled in Scylla - try compiling in dev/debug/sanitize mode) Fixes #9984 Closes #9986 * github.com:scylladb/scylla: test: add total failure case for GetBatchItem test: add error injection case for GetBatchItem test: add a context manager for error injection to alternator alternator: add error injection to BatchGetItem alternator: fill UnprocessedKeys for failed batch reads	2022-01-31 15:28:24 +02:00
Piotr Sarna	c87126198d	test: add total failure case for GetBatchItem The test verifies that if all reads from a batch operation failed, the result is an error, and not a success response with UnprocessedKeys parameter set to all keys.	2022-01-31 14:21:55 +01:00
Piotr Sarna	e79c2943fc	test: add error injection case for GetBatchItem The new test case is based on Scylla error injection mechanism and forces a partial read by failing some requests from the batch.	2022-01-31 14:21:55 +01:00
Piotr Sarna	99c5bec0e2	test: add a context manager for error injection to alternator With the new context manager it's now easier to request an error to be injected via REST API. Note that error injection is only enabled in certain build modes (dev, debug, sanitize) and the test case will be skipped if it's not possible to use this mechanism.	2022-01-31 14:21:55 +01:00
Nadav Har'El	a25e265373	test/alternator: improve comment on why we need "global_random" Improve the comment that explains why we needed to use an explicitly shared random sequence instead of the usual "random". We now understand that we need this workaround to undo what the pytest-randomly plugin does. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220130155557.1181345-1-nyh@scylladb.com>	2022-01-31 10:07:56 +01:00
Piotr Sarna	471205bdcf	test/alternator: use a global random generator for all test cases It was observed (perhaps it depends on the Python implementation) that an identical seed was used for multiple test cases, which violated the assumption that generated values are in fact unique. Using a global generator instead makes sure that it was only seeded once. Tests: unit(dev) # alternator tests used to fail for me locally before this patch was applied Message-Id: <315d372b4363f449d04b57f7a7d701dcb9a6160a.1643365856.git.sarna@scylladb.com>	2022-01-30 16:40:20 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Nadav Har'El	a30e71e27a	alternator: doc, test: fix mentions of reverse queries Now that issues #7586 and #9487 were fixed, reverse queries - even in long partitions - work well, we can drop the claim in alternator/docs/compatibility.md that reverse queries are buggy for large partitions. We can also remove the "xfail" mark from the tes that checks this feature, as it now passes. Refs #7586 Refs #9487 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #9831	2022-01-16 17:46:26 +02:00
Nadav Har'El	e7e9001808	test/alternator: add more tests for GSI "Projection" We already have multiple tests for the unimplemented "Projection" feature of GSI and LSI (see issue #5036). This patch adds seven more test cases, focusing on various types of errors conditions (e.g., trying to project the same attribute twice), esoteric corner cases (it's fine to list a key in NonKeyAttributes!), and corner cases that I expect we will have in our implementation (e.g., a projected attribute may either be a real Scylla column or just an element in a map column). All new tests pass on DynamoDB and fail on Alternator (due to #5036), so marked with "xfail". Refs #5036. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211228193748.688060-1-nyh@scylladb.com>	2022-01-05 10:35:36 +02:00
Nadav Har'El	31eeb44d28	alternator: fix error on UpdateTable for non-existent table When the UpdateTable operation is called for a non-existent table, the appropriate error is ResourceNotFoundException, but before this patch we ran into an exception, which resulted in an ugly "internal server error". In this patch we use the existing get_table() function which most other operations use, and which does all the appropriate verifications and generates the appropriate Alternator api_error instead of letting internal Scylla exceptions escape to the user. This patch also includes a test for UpdateTable on a non-existent table, which used to fail before this patch and pass afterwards. We also add a test for DeleteTable in the same scenario, and see it didn't have this bug. As usual, both tests pass on DynamoDB, which confirms we generate the right error codes. Fixes #9747. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211206181605.1182431-1-nyh@scylladb.com>	2021-12-14 13:09:27 +01:00
Nadav Har'El	815324713e	test/alternator: add more tests for ADD operand mismatch The "ADD" operator in UpdateItem's AttributeUpdates supports a number of types (numbers, sets and strings), should result in a ValidationException if the attribute's existing type is different from the type of the operand - e.g., trying to ADD a number to an attribute which has a set as a value. So far we only had partial testing for this (we tested the case where both operands are sets, but of different types) so this patch adds the missing tests. The new tests pass (on both Alternator and DynamoDB) - we don't have a bug there. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211213195023.1415248-1-nyh@scylladb.com>	2021-12-14 11:15:23 +02:00
Nadav Har'El	03d67440ef	alternator: test additional metrics and fix another broken counter In issue #9406 we noticed that a counter for BatchGetItem operations was missing. When we fixed it, we added a test which checked this counter - but only this counter. It was left as a TODO to test the rest of the Alternator metrics, and this is what this patch does. Here we add a comprehensive test for all of the operations supported by Scylla and how they increase the appropriate operation counter. With this test we discovered a new bug: the DescribeTimeToLive operation incremented the UpdateTimeToLiveCounter :-( So in this patch we also include a fix for that bug, and the new test verifies that it is fixed. In addition to the operation counters, Alternator also has additional metric and we also added tests for some of them - but not all. The remaining untested metrics are listed in a TODO comment. Message-Id: <20211206154727.1170112-1-nyh@scylladb.com>	2021-12-10 08:08:54 +02:00
Piotr Sarna	26288c1a86	test,alternator: make TTL tests less prone to false negatives On my local machine, a 3 second deadline proved to cause flakiness of test_ttl_expiration case, because its execution time is just around 3 seconds. This patch addresse the problem by bumping the local timeout to 10 (and 15 for test_ttl_expiration_long, since it's dangerously near the 10 second deadline on my machine as well). Moreover, some test cases short-circuited once they detected that all needed items expired, but other ones lacked it and always used their full time slot. Since 10 seconds is a little too long for a single test case, even one marked with --veryslow, this patch also adds a couple of other short-circuits. One exception is test_ttl_expiration_hash_wrong_type, which actually depends on the fact that we should wait for the whole loop to finish. Since this case was never flaky for me with the 3 second timeout, it's left as is. Theoretically, test_ttl_expiration also kind of depends on checking the condition more than once (because the TTL of one of the values is bumped on each iteration), but empirical evidence shows that multiple iterations always occur in this test case anyway - for me, it always spinned at least 3 times. Tests: unit(release) Message-Id: <a0a479929dac37daace744e0a970567a8aa3b518.1638431933.git.sarna@scylladb.com>	2021-12-08 16:02:45 +02:00
Nadav Har'El	92e7fbe657	test/alternator: check correct error for unknown operation Add a short test verifying that Alternator responds with the correct error code (UnknownOperationException) when receiving an unknown or unsupported operation. The test passes on both AWS and Alternator, confirming that the behavior is the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211206125710.1153008-1-nyh@scylladb.com>	2021-12-08 13:56:38 +02:00

1 2 3 4 5

238 Commits