scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 05:35:48 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	8b06684a8c	docs: dev: document pytest run convenience script Closes #13995	2023-06-07 12:37:52 +03:00
Nadav Har'El	8a1334cf6f	Merge 'alternator: eliminate duplicated rjson::find() of ExpressionAttributeNames and ExpressionAttributeValues' from Marcin Maliszkiewicz Summary of the patch set: - eliminates not needed calls to rjson::find (~1% tps improvement in `perf-simple-query --write`) - adds some very specific test in this area (more general cases were covered already) - fixes some minor validation bug Fixes https://github.com/scylladb/scylladb/issues/13251 Closes #12675 * github.com:scylladb/scylladb: alternator: fix unused ExpressionAttributeNames validation when used as a part of BatchGetItem alternator: eliminate duplicated rjson::find() of ExpressionAttributeNames and ExpressionAttributeValues	2023-06-04 23:10:12 +03:00
Alexey Novikov	ffd4fcceec	Alternator: return full table description on return of DeleteTable The DeleteTable operation in Alternator shoudl return a TableDescription object describing the table which has just been deleted, similar to what DescribeTable returns Fixes scylladb#11472 Closes #11628	2023-06-04 21:00:26 +03:00
Marcin Maliszkiewicz	9ce65270d5	alternator: fix unused ExpressionAttributeNames validation when used as a part of BatchGetItem BatchGetItem request is a map of table names and 'sub-requests', ExpressionAttributeNames is defined on 'sub-request' level but the code was instead checking the top level, obtaining nullptr every time which effectively disables unused names check. Fixes #13251	2023-05-26 15:03:15 +02:00
Pavel Emelyanov	d2f5a44e3b	test/alternator: Don't use empty AWS secret key There's a test case that checks in valid credentials (wrong key). However, some boto3 libraries don't like empty secret key values: request = <FixtureRequest for <Function test_wrong_key_access>> dynamodb = dynamodb.ServiceResource() def test_wrong_key_access(request, dynamodb): print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true") url = dynamodb.meta.client._endpoint.host with pytest.raises(ClientError, match='UnrecognizedClientException'): if url.endswith('.amazonaws.com'): boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints() else: verify = not url.startswith('https') > boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints() test_authorization.py:23: ... cls = <class 'awscrt.auth.AwsCredentialsProvider'>, access_key_id = 'whatever' secret_access_key = '', session_token = None @classmethod def new_static(cls, access_key_id, secret_access_key, session_token=None): """ Create a simple provider that just returns a fixed set of credentials. Args: access_key_id (str): Access key ID secret_access_key (str): Secret access key session_token (Optional[str]): Optional session token Returns: AwsCredentialsProvider: """ assert isinstance(access_key_id, str) assert isinstance(secret_access_key, str) assert isinstance(session_token, str) or session_token is None > binding = _awscrt.credentials_provider_new_static(access_key_id, secret_access_key, session_token) E RuntimeError: 34 (AWS_ERROR_INVALID_ARGUMENT): An invalid argument was passed to a function. $ pip3 show boto3 Name: boto3 Version: 1.26.139 Summary: The AWS SDK for Python Home-page: https://github.com/boto/boto3 Author: Amazon Web Services Author-email: License: Apache License 2.0 Location: /home/xemul/.local/lib/python3.11/site-packages Requires: botocore, jmespath, s3transfer Required-by: Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14022	2023-05-24 19:46:16 +03:00
Nadav Har'El	02d31786ff	test/alternator: better README.md on how to run and write tests Improve test/alternator/README.md by adding better and more beginner- friendly introduction to how to run the Alternator tests, as well as a section about the philosophy of the Alternator test suite, and some guideliness on how to write good tests in that framework. Much of this text was copied from test/cql-pytest/README.md. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13999	2023-05-24 09:58:12 +03:00
Nadav Har'El	3c0603558c	alternator: add validation of numbers' magnitude and precision DynamoDB limits the allowed magnitude and precision of numbers - valid decimal exponents are between -130 and 125 and up to 38 significant decimal digitst are allowed. In contrast, Scylla uses the CQL "decimal" type which offers unlimited precision. This can cause two problems: 1. Users might get used to this "unofficial" feature and start relying on it, not allowing us to switch to a more efficient limited-precision implementation later. 2. If huge exponents are allowed, e.g., 1e-1000000, summing such a number with 1.0 will result in a huge number, huge allocations and stalls. This is highly undesirable. After this patch, all tests in test/alternator/test_number.py now pass. The various failing tests which verify magnitude and precision limitations in different places (key attributes, non-key attributes, and arithmetic expressions) now pass - so their "xfail" tags are removed. Fixes #6794 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-05-02 11:04:05 +03:00
Nadav Har'El	0eccc49308	test/alternator: more tests for limits on number precision and magnitude We already have xfailing tests for issue #6794 - the missing checks on precision and magnitudes of numbers in Alternator - but this patch adds checks for additional corner cases. In particular we check the case that numbers are used in a key column, which goes to a different code path than numbers used in non-key columns, so it's worth testing as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-05-02 11:04:05 +03:00
Nadav Har'El	56b8b9d670	test/alternator: reproducer for DoS in unlimited-precision addition As already noted in issue #6794, whereas DynamoDB limits the magnitude of numbers to between 10^-130 and 10^125, Scylla does not. In this patch we add yet another test for this problem, but unlike previous tests which just shown too much magnitude being allowed which always sounded like a benign problem - the test in this patch shows that this "feature" can be used to DoS Scylla - a user user can send a short request that causes arbitrarily-large allocations, stalls and CPU usage. The test is currently marked "skip" because it cause cause Scylla to take a very long time and/or run out of memory. It passes on DynamoDB because the excessive magnitude is simply not allowed there. Refs #6794 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-05-02 11:03:51 +03:00
Botond Dénes	dba1d36aa6	Merge 'alternator: fix isolation of concurrent modifications to tags' from Nadav Har'El Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed. The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected. Fixes #6389. Closes #13150 * github.com:scylladb/scylladb: test/alternator: test concurrent TagResource / UntagResource db/tags: drop unsafe update_tags() utility function alternator: isolate concurrent modification to tags db/tags: add safe modify_tags() utility functions migration_manager: expose access to storage_proxy	2023-04-11 11:17:23 +03:00
Calle Wilund	6525209983	alternator/rest api tests: Remove name assumption and rely on actual scylla info Fixes #13332 The tests user the discriminator "system" as prefix to assume keyspaces are marked "internal" inside scylla. This is not true in enterprise universe (replicated key provider). It maybe/probably should, but that train is sailing right now. Fix by removing one assert (not correct) and use actual API info in the alternator test. Closes #13333	2023-03-28 15:41:23 +03:00
Nadav Har'El	c41b2d35ed	test/alternator: test concurrent TagResource / UntagResource This patch adds an Alternator test reproducing issue #6389 - that concurrent TagResource and/or UntagResource operations was broken and some of the concurrent modifications were lost. The test has two threads, one loops adds and removes a tag A, the other adds and removes a tag B. After we add tag A, we expect tag A to be there - but due to issue #6389 this modification was sometimes lost when it raced with an operation on B. This test consistently failed before issue #6389 was fixed, and passes now after the issue was fixed by the previous patches. The bug reproduces by chance, so it requires a fairly long loop (a few seconds) to be sure it reproduces - so is marked a "veryslow" test and will not run in CI, but can be used to manually reproduce this issue with: test/alternator/run --runveryslow test_tag.py::test_concurrent_tag Refs #6389. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-03-13 13:38:15 +02:00
Nadav Har'El	7dc54771e1	test/cql-pytest: allow "run-cassandra" without building Scylla Before this patch, all scripts which use test/cql-pytest/run.py looked for the Scylla executable as their first step. This is usually the right thing to do, except in two cases where Scylla is not needed: 1. The script test/cql-pytest/run-cassandra. 2. The script test/alternator/run with the "--aws" option. So in this patch we change run.py to only look for Scylla when actually needed (the find_scylla() function is called). In both cases mentioned above, find_scylla() will never get called and the script can work even if Scylla was never built. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13010	2023-03-01 07:54:19 +02:00
Nadav Har'El	14cdd034ee	test/alternator: fix flaky test for partition-tombstone scan The test test_scan.py::test_scan_long_partition_tombstone_string checks that a full-table Scan operation ends a page in the middle of a very long string of partition tombstones, and does NOT scan the entire table in one page (if we did that, getting a single page could take an unbounded amount of time). The test is currently flaky, having failed in CI runs three times in the past two months. The reason for the flakiness is that we don't know exactly how long we need to make the sequence of partition tombstones in the test before we can be absolutely sure a single page will not read this entire sequence. For single-partition scans we have the "query_tombstone_page_limit" configuration parameter, which tells us exactly how long we need to make the sequence of row tombstones. But for a full-table scan of partition tombstones, the situation is more complicated - because the scan is done in parallel on several vnodes in parallel and each of them needs to read query_tombstone_page_limit before it stops. In my experiments, using query_tombstone_limit * 4 consecutive tombstones was always enough - I ran this test hundreds of times and it didn't fail once. But since it did fail on Jenkins very rarely (3 times in the last two months), maybe the multiplier 4 isn't enough. So this patch doubles it to 8. Hopefully this would be enough for anyone (TM). This makes this test even bigger and slower than it was. To make it faster, I changed this test's write isolation mode from the default always_use_lwt to forbid_rmw (not use LWT). This leaves the test's total run time to be similar to what it was before this patch - around 0.5 seconds in dev build mode on my laptop. Fixes #12817 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12819	2023-02-14 08:09:44 +02:00
Nadav Har'El	621c49b621	test/alternator: more tests for listing streams In issue #12601, a dtest involving paging of ListStreams showed incorrect results - the paged results had one duplicate stream and one missing stream. We believe that the cause of this bug was that the unsorted map of tables can change order between pages. In this patch we add a test test_list_streams_paged_with_new_table which can demonstrate this bug - by adding a lot of tables in mid-paging, we cause the unsorted map to be reshufled and the paging to break. This is not the same situation as in #12601 (which did not involve new tables) but we believe it demonstrates the same bug - and check its fix. Indeed this passes with the fix in pull request #12614 and fails without it. This patch also adds a second test, test_stream_arn_unchanging: That test eliminates a guess we had for the cause of #12601. We thought that maybe stream ARN changing on a table if its schema version changes, but the new test confirms that it actually behaves as expected (the stream ARN doesn't change). Refs #12601 Refs #12614 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12616	2023-02-13 16:30:24 +02:00
Botond Dénes	9efbcfa190	Merge 'test/alternator: tests for Limit parameter of ListStreams operation' from Nadav Har'El The first patch in this series enables a previously-skipped test for what happens with Limit=0. The test passes. The second patch adds an xfailng test for very large Limit. Closes #12625 * github.com:scylladb/scylladb: test/alternator: xfailing test for huge Limit in ListStreams alternator/test: un-skip test of zero Limit in ListStreams	2023-02-02 07:02:28 +02:00
Nadav Har'El	f873884b50	test/alternator: unskip test which works on modern Scylla We had one test test_gsi.py::test_gsi_identical that didn't work on KA/LA sstables due to #6157, so it was skipped. Today, Scylla no longer supports writing these old sstable formats, so the test can never find itself running on these versions, so should pass. And indeed it does, and the "skip" marker can be removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12651	2023-01-27 14:10:07 +02:00
Nadav Har'El	55558e1bd7	test/alternator: check operation on invalid TableName Issue #12538 suggested that maybe Alternator shouldn't bother reporting an invalid table name in item operations like PutItem, and that it's enough to report that the table doesn't exist. But the test added in this patch shows that DynamoDB, like Alternator, reports the invalid table name in this case - not just that the table doesn't exist. That should make us think twice before acting on issue #12538. If we do what this issue recommended, this test will need to be fixed (e.g., to accept as correct both types of errors). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12608	2023-01-24 14:14:39 +02:00
Nadav Har'El	158be3604d	test/alternator: xfailing test for huge Limit in ListStreams DynamoDB Streams limits the "Limit" parameter of ListStreams to 100 - anything larger will result in an error. Scylla doesn't necessarily need to uphold the same limit, but we should uphold some limit, as not having any limit can result (in the theoretical case of a huge number of tables with streams enabled) in an unbounded response size. So here we add a test to check that a Limit of 100,000 is not allowed. It passes on DynamoDB (in fact, any number higher than 100 will be enough threre) but fails on Alternator, so is marked "xfail". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-01-24 12:38:18 +02:00
Nadav Har'El	3beafd8441	alternator/test: un-skip test of zero Limit in ListStreams We had a skipped test on how Alternator handles Limit=0 for ListStreams which should be reported as an error. We had to skip it because boto3 did us a "favor" of discovering this parameter error before ever sending it to the server. We discovered long ago how to avoid this client-side checking in boto3, but only used it for the "dynamodb" fixture and forgot to copy the same trick to the "dynamodbstreams" fixture - and in this patch we do, and can run this test successfully. While at it, also copy the extented timeout configuration we had in the dynamodb fixture also to the dynamodbstreams fixture. There is no reason why it should be different. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-01-24 12:38:18 +02:00
Marcin Maliszkiewicz	6f055ca5f9	alternator: evaluate expressions as false for stored malformed binary data We'll try to distinguish the case when data comes from the storage rather than user reuqest. Such attribute can be used in expressions and when it can't be decoded it should make expression evaluate as false to simply exclude the row during filter query or scan. Note that this change focuses on binary type, for other types we may have some inconsistencies in the implementation.	2023-01-16 15:15:27 +01:00
Marcin Maliszkiewicz	86dc1bfdb1	utils: throw error on malformed input in base64 decode We already fixed the case of missing padding but there is also more generic one where input for decode function contains non base64 characters. This is mostly done for alternator purpose, it should discard the request containing such data and return 400 http error. Addionally some harmless integer overflow during integer casting was fixed here. This was attempted to be fixed by `2d33a3f` but since we also implicitly cast to uint8_t the problem persisted.	2023-01-16 14:36:23 +01:00
Marcin Maliszkiewicz	f53c0fd0fc	utils: throw error on missing padding in base64 decode This is done to make alternator behavior more on a pair with dynamodb. Decode function is used there when processing user requests containing binary item values. We will now discard improperly formed user input with 400 http error. It also makes it more consistent as some of our other base64 functions may have assumed padding is present. The patch should not break other usages of base64 functions as the only one is in db/hints where the code already throws std::runtime_error. Fixes #6487	2023-01-16 14:36:23 +01:00
Gleb Natapov	1688163233	raft: replace experimental raft option with dedicated flag Unlike other experimental feature we want to raft to be optional even after it leaves experimental mode. For that we need to have a separate option to enable it. The patch adds the binary option "consistent-cluster-management" for that.	2023-01-03 11:15:11 +02:00
Nadav Har'El	08c8e0d282	test/alternator: enable tests for long strings of consecutive tombstones In the past we had issue #7933 where very long strings of consecutive tombstones caused Alternator's paging to take an unbounded amount of time and/or memory for a single page. This issue was fixed (by commit `e9cbc9ee85`) but the two tests we had reproducing that issue were left with the "xfail" mark. They were also marked "veryslow" - each taking about 100 seconds - so they didn't run by default so nobody noticed they started to pass. In this patch I make these tests much faster (taking less than a second together), confirm that they pass - and remove the "xfail" mark and improve their descriptions. The trick to making these tests faster is to not create a million tombstones like we used to: We now know that after string of just 10,000 tombstones ('query_tombstone_page_limit') the page should end, so we can check specifically this number. The story is more complicated for partition tombstones, but there too it should be a multiple of query_tombstone_page_limit. To make the tests even faster, we change run.py to lower the query_tombstone_page_limit from the default 10,000 to 1000. The tests work correctly even without this change, but they are ten times faster with it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12350	2022-12-20 07:08:36 +02:00
Nadav Har'El	6fcb5302a6	alternator-test: xfail a flaky test exposing a known bug In a recent commit `757d2a4`, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now usually passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened #12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes #12143 Refs #12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12169	2022-12-01 20:00:46 +02:00
Nadav Har'El	6bc3075bbd	test/alternator: increase timeout on TTL tests Some of the tests in test/alternator/test_ttl.py need an expiration scan pass to complete and expire items. In development builds on developer machines, this usually takes less than a second (our scanning period is set to half a second). However, in debug builds on Jenkins each scan often takes up to 100 (!) seconds (this is the record we've seen so far). This is why we set the tests' timeout to 120. But recently we saw another test run failing. I think the problem is that in some case, we need not one, but two scanning passes to complete before the timeout: It is possible that the test writes an item right after the current scan passed it, so it doesn't get expired, and then we a second scan at a random position, possibly making that item we mention one of the last items to be considered - so in total we need to wait for two scanning periods, not one, for the item to expire. So this patch increases the timeout from 120 seconds to 240 seconds - more than twice the highest scanning time we ever saw (100 seconds). Note that this timeout is just a timeout, it's not the typical test run time: The test can finish much more quickly, as little as one second, if items expire quickly on a fast build and machine. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12106	2022-11-29 16:37:54 +03:00
Nadav Har'El	2dedb5ea75	alternator: make Alternator TTL feature no longer "experimental" Until now, the Alternator TTL feature was considered "experimental", and had to be manually enabled on all nodes of the cluster to be usable. This patch removes this requirement and in essence GAs this feature. Even after this patch, Alternator TTL is still a "cluster feature", i.e., for this feature to be usable every node in the cluster needs to support it. If any of the nodes is old and does not yet support this feature, the UpdateTimeToLive request will not be accepted, so although the expiration-scanning threads may exist on the newer nodes, they will not do anything because none of the tables can be marked as having expiration enabled. This patch does not contain documentation fixes - the documentation still suggests that the Alternator TTL feature is experimental. The documentation patch will come separately. Fixes #12037 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12049	2022-11-24 17:21:39 +02:00
Marcin Maliszkiewicz	4389baf0d9	test/alternator: correct xfail reason for test_gsi_backfill_empty_string Previously cited issue is closed already.	2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz	59eca20af1	test/alternator: correct indentation in test_lsi_describe Otherwise I think assert is not executed in a loop. And I am not sure why lsi variable can be bound to anything. As I tested it was pointing to the last element in lsis...	2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz	d6d20134de	alternator: fix wrong 'where' condition for GSI range key This bug doesn't manifest in a visible way to the user. Adding the index to an existing table via GlobalSecondaryIndexUpdates is not supported so we don't need to consider what could happen for empty values of index range key. After the index is added the only interesting value user can set is omitting the value (null or empty are not allowed, see test_gsi_empty_value and test_gsi_null_value). In practice no matter of 'where' condition the underlaying materialized view code is skipping row updates with missing keys as per this comment: 'If one of the key columns is missing, set has_new_row = false meaning that after the update there will be no view row'. Thats why the added test passes both before and after the patch. But it's still usefull to include it to exercise those code paths. Fixes #11800	2022-11-22 11:08:23 +01:00
Nadav Har'El	757d2a4c02	test/alternator: un-xfail a test which passes on modern Python We had an xfailing test that reproduced a case where Alternator tried to report an error when the request was too long, but the boto library didn't see this error and threw a "Broken Pipe" error instead. It turns out that this wasn't a Scylla bug but rather a bug in urllib3, which overzealously reported a "Broken Pipe" instead of trying to read the server's response. It turns out this issue was already fixed in https://github.com/urllib3/urllib3/pull/1524 and now, on modern installations, the test that used to fail now passes and reports "XPASS". So in this patch we remove the "xfail" tag, and skip the test if running an old version of urllib3. Fixes #8195 Closes #12038	2022-11-21 08:10:10 +02:00
Botond Dénes	e981bd4f21	Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted. In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it). Fixes #11801 Closes #11808 * github.com:scylladb/scylladb: test/alternator: add test for issue 11801 MV: fix handling of view update which reassign the same key value materialized views: inline used-once and confusing function, replace_entry()	2022-10-21 10:49:28 +03:00
Alexander Turetskiy	636e14cc77	Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs. The return from DescribeTable which describes GSIs and LSIs is missing the Projection field. We do not yet support all the settings Projection (see #5036), but the default which we support is ALL, and DescribeTable should return that in its description. Fixes #11470 Closes #11693	2022-10-19 19:01:08 +03:00
Nadav Har'El	2e439c9471	test/alternator: add test for issue 11801 This patch adds a test reproducing issue #11801, and confirming that the previous patch fixed it. Before the previous patch, the test passed on DynamoDB but failed on Alternator. The patch also adds four more passing tests which demonstrate that issue #11801 only happened in the very specific case where: 1. A GSI has two key attributes which weren't key attributes in the base, and 2. An update sets the second of those attributes to the same value which it already had. This bug was originally discovered and explained by @fee-mendes. Refs #11801. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-10-19 14:36:48 +03:00
Nadav Har'El	3a30fbd56c	test/alternator: fix timeout in flaky test test_ttl_stats The test `test_metrics.py::test_ttl_stats` tests the metrics associated with Alternator TTL expiration events. It normally finishes in less than a second (the TTL scanning is configured to run every 0.5 seconds), so we arbitrarily set a 60 second timeout for this test to allow for extremely slow test machines. But in some extreme cases even this was not enough - in one case we measured the TTL scan to take 63 seconds. So in this patch we increase the timeout in this test from 60 seconds to 120 seconds. We already did the same change in other Alternator TTL tests in the past - in commit `746c4bd`. Fixes #11695 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11696	2022-10-04 08:50:51 +03:00
Piotr Sarna	481240b8b4	Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El We had quite a few tests for Alternator TTL in test/alternator, but most of them did not run as part of the usual Jenkins test suite, because they were considered "very slow" (and require a special "--runveryslow" flag to run). In this series we enable six tests which run quickly enough to run by default, without an additional flag. We also make them even quicker - the six tests now take around 2.5 seconds. I also noticed that we don't have a test for the Alternator TTL metrics - and added one. Fixes #11374. Refs https://github.com/scylladb/scylla-monitoring/issues/1783 Closes #11384 * github.com:scylladb/scylladb: test/alternator: insert test names into Scylla logs rest api: add a new /system/log operation alternator ttl: log warning if scan took too long. alternator,ttl: allow sub-second TTL scanning period, for tests test/alternator: skip fewer Alternator TTL tests test/alternator: test Alternator TTL metrics	2022-09-22 09:47:50 +02:00
Nadav Har'El	999ca2d588	alternator: fix crashes an errors when using ":attrs" attribute Alternator uses a single column, a map, with the deliberately strange name ":attrs", to hold all the schema-less attributes of an item. The existing code is buggy when the user tries to write to an attribute with this strange name ":attrs". Although it is extremely unlikely that any user would happen to choose such a name, it is nevertheless a legal attribute name in DynamoDB, and should definitely not cause Scylla to crash as it does in some cases today. The bug was caused by the code assuming that to check whether an attribute is stored in its own column in the schema, we just need to check whether a column with that name exists. This is almost true, except for the name ":attrs" - a column with this name exists, but it is a map - the attribute with that name should be stored in the map, not as the map. The fix is to modify that check to special-case ":attrs". This fix makes the relevant tests, which used to crash or fail, now pass. This fix solves most of #5009, but one point is not yet solved (and perhaps we don't need to solve): It is still not allowed to use the name ":attrs" for a key attribute. But trying to do that fails cleanly (during the table creation) with an appropriate error message, so is only a very minor compatibility issue. Refs #5009 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-19 10:30:11 +03:00
Nadav Har'El	6f8dca3760	alternator: improve tests for reserved attribute name ":attrs" As explained in issue #5009, Alternator currently forbids the special attribute name ":attrs", whereas DynamoDB allows any string of approriate length (including the specific string ":attrs") to be used. We had only a partial test for this incompatibility, and this patch improves the testing of this issue. In particular, we were missing a test for the case that the name ":attrs" was used for a non-key attribute (we only tested the case it was used as a sort key). It turns out that Alternator crashes on the new test, when the test tries to write to a non-key attribute called ":attrs", so we needed to mark the new test with "skip". Moreover, it turns out that different code paths handle the attribute name ":attrs" differently, and also crash or fail in other ways - so we added more than one xfailing and skipped tests that each fails in a different place (and also a few tests that do pass). As usual, the new tests we checked to pass on DynamoDB. Refs #5009 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-19 10:30:06 +03:00
Nadav Har'El	b0371b6bf8	test/alternator: insert test names into Scylla logs The output of test/alternator/run ends in Scylla's full log file, where it is hard to understand which log messages are related to which test. In this patch, we add a log message (using the new /system/log REST API) every time a test is started and ends. The messages look like this: INFO 2022-08-29 18:07:15,926 [shard 0] api - /system/log: test/alternator: Starting test_ttl.py::test_describe_ttl_without_ttl ... INFO 2022-08-29 18:07:15,930 [shard 0] api - /system/log: test/alternator: Ended test_ttl.py::test_describe_ttl_without_ttl Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	e7e9adc519	alternator,ttl: allow sub-second TTL scanning period, for tests Alternator has the "alternator_ttl_period_in_seconds" parameter for controlling how often the expiration thread looks for expired items to delete. It is usually a very large number of seconds, but for tests to finish quickly, we set it to 1 second. With 1 second expiration latency, test/alternator/test_ttl.py took 5 seconds to run. In this patch, we change the parameter to allow a floating-point number of seconds instead of just an integer. Then, this allows us to halve the TTL period used by tests to 0.5 seconds, and as a result, the run time of test_ttl.py halves to 2.5 seconds. I think this is fast enough for now. I verified that even if I change the period to 0.1, there is no noticable slowdown to other Alternator tests, so 0.5 is definitely safe. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	746c4bd9eb	test/alternator: skip fewer Alternator TTL tests Most of the Alternator TTL tests are extremely slow on DynamoDB because item expiration may be delayed up to 24 hours (!), and in practice for 10 to 30 minutes. Because of this, we marked most of these tests with the "veryslow" mark, causing them to be skipped by default - unless pytest is given the "--runveryslow" option. The result was that the TTL tests were not run in the normal test runs, which can allow regressions to be introduced (luckily, this hasn't happened). However, this "veryslow" mark was excessive. Many of the tests are very slow only on DynamoDB, but aren't very slow on Scylla. In particular, many of the tests involve waiting for an item to expire, something that happens after the configurable alternator_ttl_period_in_seconds, which is just one second in our tests. So in this patch, we remove the "veryslow" mark from 6 tests of Alternator TTL tests, and instead use two new fixtures - waits_for_expiration and veryslow_on_aws - to only skip the test when running on DynamoDB or when alternator_ttl_period_in_seconds is high - but in our usual test environment they will not get skipped. Because 5 of these 6 tests wait for an item to expire, they take one second each and this patch adds 5 seconds to the Alternator test runtime. This is unfortunate (it's more than 25% of the total Alternator test runtime!) but not a disaster, and we plan to reduce this 5 second time futher in the following patch, but decreasing the TTL scanning period even further. This patch also increases the timeout of several of these tests, to 120 seconds from the previous 10 seconds. As mentioned above, normally, these tests should always finish in alternator_ttl_period_in_seconds (1 second) with a single scan taking less than 0.2 seconds, but in extreme cases of debug builds on overloaded test machines, we saw even 60 seconds being passed, so let's increase the maximum. I also needed to make the sleep time between retries smaller, not a function of the new (unrealistic) timeout. 4 more tests remain "veryslow" (and won't run by default) because they are take 5-10 seconds each (e.g., a test which waits to see that an item does not get expired, and a test involving writing a lot of data). We should reconsider this in the future - to perhaps run these tests in our normal test runs - but even for now, the 6 extra tests that we start running are a much better protection against regressions than what we had until now. Fixes #11374 Signed-off-by: Nadav Har'El <nyh@scylladb.com> x Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	297109f6ee	test/alternator: test Alternator TTL metrics This patch adds a test for the metrics generated by the background expiration thread run for Alternator's TTL feature. We test three of the four metrics: scylla_expiration_scan_passes, scylla_expiration_scan_table and scylla_expiration_items_deleted. The fourth metric, scylla_expiration_secondary_ranges_scanned, counts the number of times that this node took over another node's expiration duty. so requires a multi-node cluster to test, and we can't test it in the single-node cluster test framework. To see TTL expiration in action this test may need to wait up to the setting of alternator_ttl_period_in_seconds. For a setting of 1 second (the default set by test/alternator/run), this means this test can take up to 1 second to run. If alternator_ttl_period_in_seconds is set higher, the test is skipped unless --runveryslow is requested. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	e5f6adf46c	test/alternator: improve tests for DescribeTable for indexes I created new issues for each missing field in DescribeTable's response for GSIs and LSIs, so in this patch we edit the xfail messages in the test to refer to these issues. Additionally, we only had a test for these fields for GSIs, so this patch also adds a similar test for LSIs. I turns out there is a difference between the two tests - the two fields IndexStatus and ProvisionedThroughput are returned for GSIs, but not for LSIs. Refs #7750 Refs #11466 Refs #11470 Refs #11471 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11473	2022-09-07 09:50:16 +02:00
Nadav Har'El	941c719a23	alternator: return ProvisionedThroughput in DescribeTable DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable operation must return a ProvisionedThroughput structure, listing both ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not stated in some DynamoDB documentation but is explictly mentioned in https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html Also in empirically, DynamoDB returns ProvisionedThroughput with zeros even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this. The ProvisionedThroughput structure being missing was a problem for applications like DynamoDB connectors for Spark, if they implicitly assume that ProvisionedThroughput is returned by DescribeTable, and fail (as described in issue #11222) if it's outright missing. So this patch adds the missing ProvisionedThroughput structure, and the xfailing test starts to pass. Note that this patch doesn't change the fact that attempting to set a table to PROVISIONED billing mode is ignored: DescribeTable continues to always return PAY_PER_REQUEST as the billing mode and zero as the provisioned capacities. Fixes #11222 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11298	2022-08-22 09:58:09 +02:00
Nadav Har'El	c27f431580	test/alternator: fix a flaky test for full-table scan page size This patch fixes the test test_scan.py::test_scan_paging_missing_limit which failed in a Jenkins run once (that we know of). That test verifies that an Alternator Scan operation without an explicit "Limit" is nevertheless paged: DynamoDB (and also Scylla) wanted this page size to be 1 MB, but it turns out (see #10327) that because of the details of how Scylla's scan works, the page size can be larger than 1 MB. How much larger? I ran this test hundreds of times and never saw it exceed a 3 MB page - so the test asserted the page must be smaller than 4 MB. But now in one run - we got to this 4 MB and failed the test. So in this patch we increase the table to be scanned from 4 MB to 6 MB, and assert the page size isn't the full 6 MB. The chance that this size will eventually fail as well should be (famous last words...) very small for two reasons: First because 6 MB is even higher than I the maximum I saw in practice, and second because empirically I noticed that adding more data to the table reduces the variance of the page size, so it should become closer to 1 MB and reduce the chance of it reaching 6 MB. Refs #10327 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11280	2022-08-12 06:57:45 +03:00
Nadav Har'El	d03bd82222	Revert "test: move scylla_inject_error from alternator/ to cql-pytest/" This reverts commit `8e892426e2` and fixes the code in a different way: That commit moved the scylla_inject_error function from test/alternator/util.py to test/cql-pytest/util.py and renamed test/alternator/util.py. I found the rename confusing and unnecessary. Moreover, the moved function isn't even usable today by the test suite that includes it, cql-pytest, because it lacks the "rest_api" fixture :-) so test/cql-pytest/util.py wasn't the right place for it anyway. test/rest_api/rest_util.py could have been a good place for this function, but there is another complication: Although the Alternator and rest_api tests both had a "rest_api" fixture, it has a different type, which led to the code in rest_api which used the moved function to have to jump through hoops to call it instead of just passing "rest_api". I think the best solution is to revert the above commit, and duplicate the short scylla_inject_error() function. The duplication isn't an exact copy - the test/rest_api/rest_util.py version now accepts the "rest_api" fixture instead of the URL that the Alternator version used. In the future we can remove some of this duplication by having some shared "library" code but we should do it carefully and starting with agreeing on the basic fixtures like "rest_api" and "cql", without that it's not useful to share small functions that operate on them. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11275	2022-08-11 06:43:26 +03:00
Aleksandra Martyniuk	8e892426e2	test: move scylla_inject_error from alternator/ to cql-pytest/ Move scylla_inject_error from alternator/ to cql-pytest/ so it can be reached from various tests dirs. alternator/util.py is renamed to alternator/alternator_util.py to avoid name shadowing.	2022-07-29 09:35:20 +02:00
Nadav Har'El	eaf3579c15	test/alternator: several more simple tests for UpdateItem This patch adds several more tests for Alternator's UpdateItem operation. These tests verify a few simple cases that, surprisingly, never had test coverage. The new tests pass (on both DynamoDB and Alternator) so did not expose any bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11025	2022-07-12 21:48:33 +02:00
Nadav Har'El	2581b54ea0	test/{alternator,redis}: stop using deprecated "disutils" package Python has deprecated the distutils package. In several places in the Alternator and Redis test suites, we used distutils.version to check if the library is new enough for running the test (and skip the test if it's too old). On new versions of Python, we started getting deprecation warnings such as: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives PEP 632 recommends using package.version instead of distutils.version, and indeed it works well. After applying this patch, Alternator and Redis test runs no long end in silly deprecation warnings. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11007	2022-07-11 08:00:45 +03:00

1 2 3 4 5 ...

276 Commits