Commit Graph

233 Commits

Author SHA1 Message Date
Nadav Har'El
e5f6adf46c test/alternator: improve tests for DescribeTable for indexes
I created new issues for each missing field in DescribeTable's
response for GSIs and LSIs, so in this patch we edit the xfail
messages in the test to refer to these issues.

Additionally, we only had a test for these fields for GSIs, so this
patch also adds a similar test for LSIs. I turns out there is a
difference between the two tests -  the two fields IndexStatus and
ProvisionedThroughput are returned for GSIs, but not for LSIs.

Refs #7750
Refs #11466
Refs #11470
Refs #11471

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11473
2022-09-07 09:50:16 +02:00
Nadav Har'El
941c719a23 alternator: return ProvisionedThroughput in DescribeTable
DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing
mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable
operation must return a ProvisionedThroughput structure, listing both
ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not
stated in some DynamoDB documentation but is explictly mentioned in
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html
Also in empirically, DynamoDB returns ProvisionedThroughput with zeros
even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this.

The ProvisionedThroughput structure being missing was a problem for
applications like DynamoDB connectors for Spark, if they implicitly
assume that ProvisionedThroughput is returned by DescribeTable, and
fail (as described in issue #11222) if it's outright missing.

So this patch adds the missing ProvisionedThroughput structure, and
the xfailing test starts to pass.

Note that this patch doesn't change the fact that attempting to set
a table to PROVISIONED billing mode is ignored: DescribeTable continues
to always return PAY_PER_REQUEST as the billing mode and zero as the
provisioned capacities.

Fixes #11222

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11298
2022-08-22 09:58:09 +02:00
Nadav Har'El
c27f431580 test/alternator: fix a flaky test for full-table scan page size
This patch fixes the test test_scan.py::test_scan_paging_missing_limit
which failed in a Jenkins run once (that we know of).

That test verifies that an Alternator Scan operation *without* an explicit
"Limit" is nevertheless paged: DynamoDB (and also Scylla) wanted this page
size to be 1 MB, but it turns out (see #10327) that because of the details
of how Scylla's scan works, the page size can be larger than 1 MB. How much
larger? I ran this test hundreds of times and never saw it exceed a 3 MB
page - so the test asserted the page must be smaller than 4 MB. But now
in one run - we got to this 4 MB and failed the test.

So in this patch we increase the table to be scanned from 4 MB to 6 MB,
and assert the page size isn't the full 6 MB. The chance that this size will
eventually fail as well should be (famous last words...) very small for
two reasons: First because 6 MB is even higher than I the maximum I saw
in practice, and second because empirically I noticed that adding more
data to the table reduces the variance of the page size, so it should
become closer to 1 MB and reduce the chance of it reaching 6 MB.

Refs #10327

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11280
2022-08-12 06:57:45 +03:00
Nadav Har'El
d03bd82222 Revert "test: move scylla_inject_error from alternator/ to cql-pytest/"
This reverts commit 8e892426e2 and fixes
the code in a different way:

That commit moved the scylla_inject_error function from
test/alternator/util.py to test/cql-pytest/util.py and renamed
test/alternator/util.py. I found the rename confusing and unnecessary.
Moreover, the moved function isn't even usable today by the test suite
that includes it, cql-pytest, because it lacks the "rest_api" fixture :-)
so test/cql-pytest/util.py wasn't the right place for it anyway.
test/rest_api/rest_util.py could have been a good place for this function,
but there is another complication: Although the Alternator and rest_api
tests both had a "rest_api" fixture, it has a different type, which led
to the code in rest_api which used the moved function to have to jump
through hoops to call it instead of just passing "rest_api".

I think the best solution is to revert the above commit, and duplicate
the short scylla_inject_error() function. The duplication isn't an
exact copy - the test/rest_api/rest_util.py version now accepts the
"rest_api" fixture instead of the URL that the Alternator version used.

In the future we can remove some of this duplication by having some
shared "library" code but we should do it carefully and starting with
agreeing on the basic fixtures like "rest_api" and "cql", without that
it's not useful to share small functions that operate on them.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11275
2022-08-11 06:43:26 +03:00
Aleksandra Martyniuk
8e892426e2 test: move scylla_inject_error from alternator/ to cql-pytest/
Move scylla_inject_error from alternator/ to cql-pytest/ so it
can be reached from various tests dirs. alternator/util.py is
renamed to alternator/alternator_util.py to avoid name shadowing.
2022-07-29 09:35:20 +02:00
Nadav Har'El
eaf3579c15 test/alternator: several more simple tests for UpdateItem
This patch adds several more tests for Alternator's UpdateItem operation.
These tests verify a few simple cases that, surprisingly, never had test
coverage. The new tests pass (on both DynamoDB and Alternator) so did not
expose any bug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11025
2022-07-12 21:48:33 +02:00
Nadav Har'El
2581b54ea0 test/{alternator,redis}: stop using deprecated "disutils" package
Python has deprecated the distutils package. In several places in the
Alternator and Redis test suites, we used distutils.version to check if
the library is new enough for running the test (and skip the test if
it's too old). On new versions of Python, we started getting deprecation
warnings such as:

    DeprecationWarning: The distutils package is deprecated and slated for
    removal in Python 3.12. Use setuptools or check PEP 632 for potential
    alternatives

PEP 632 recommends using package.version instead of distutils.version,
and indeed it works well. After applying this patch, Alternator and
Redis test runs no long end in silly deprecation warnings.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11007
2022-07-11 08:00:45 +03:00
David Garcia
b85843b9cc Fix broken links
Fix broken links
2022-06-28 15:19:36 +01:00
David Garcia
bb21c3c869 Move dev docs to docs/dev 2022-06-24 18:07:08 +01:00
Nadav Har'El
3aca1ca572 alternator: make BatchGetItem group reads by partition
DynamoDB API's BatchGetItem invokes a number (up to 25) of read requests
in parallel, returning when all results are available. Alternator naively
implemented this by sending all read requests in parallel, no matter which
requests these were.

That implementation was inefficient when all the requests are to different
items (clustering rows) of the same partition. In a multi-node setup this
will end up sending 25 separate requests to the same remote node(s). Even
on a single-node setup, this may result in reading from disk more than
once, and even if the partition is cached - doing an O(logN) search in
each multiple times.

What we do in this patch, instead, is to group all the BatchGetItem
requests that aimed at the same partition into a single read request
asking for a (sorted) list of clustering keys. This is similar to an
"IN" request in CQL.

As an example of the performance benefit of this patch, I tried a
BatchGetItem request asking for 20 random items from a 10-million item
partition. I measured the latency of this request on a single-node
Scylla. Before this patch, I saw a latency of 17-21 ms (the lower number
is when the request is retried and the requested items are already in
the cache). After this patch, the latency is 10-14 ms. The performance
improvement on multi-node clusters are expected to be even higher.

Unfortunately the patch is less trivial than I hoped it would be,
because some of the old code was organized under the assumption that
each read request only returned one item (and if it failed, it means
only one item failed), so this part of the code had to be reorganized
(and, for making the code more readable, coroutinized).

An unintended benefit of the code reorganization is that it also gave
me an opportunity to fail an attempt to ask BatchGetItem the same
item more than once (issue #10757).

The patch also adds a few more corner cases in the tests, to be even
more sure that the code reorganization doesn't introduce a regression
in BatchGetItem.

Fixes #10753
Fixes #10757

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-19 14:47:57 +03:00
Nadav Har'El
0be06e0bdf test/alternator: additional test for BatchGetItem
Our simple test for BatchGetItem on a table with sort keys still has
requests with just one sort key per partition, so if BatchGetItem has
a bug with requesting multiple sort keys from the same partition,
such bug won't be caught by the simple tests. So in this test we add a
test that does. This will be useful for the next patch, we are planning
to refactor BatchGetItem's handling of multiple sort keys in the same
partition - so it will be useful to have more regression tests.

The tests test_batch_get_item_large and test_batch_get_item_partial
would actually also catch such bugs, but they are more elaborate tests
and it's nice to have smaller tests more focused on checking specific
features.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-16 18:19:20 +03:00
Nadav Har'El
e20233dab1 alternator: improve error handling when trying to tag a GSI or LSI
In issue #10786, we raised the idea of maybe allowing to tag (with
TagResource) GSIs and LSIs, not just base tables. However, currently,
neither DynamoDB nor Syclla allows it. So in this patch we add a
test that confirms this. And while at it, we fix Alternator to
return the same error message as DynamoDB in this case.

Refs #10786.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-13 18:14:42 +03:00
Nadav Har'El
8866c326de alternator: forbid duplicate index (LSI and GSI) names
Adding an LSI and GSI with the same name to the same Alternator table
should be forbidden - because if both exists only one of them (the GSI)
would actually be usable. DynamoDB also forbids such duplicate name.

So in this patch we add a test for this issue, and fix it.

Since the patch involves a few more uses of the IndexName string,
we also clean up its handling a bit, to use std::string_view instead
of the old-style std::string&.

Fixes #10789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-13 18:14:42 +03:00
Nadav Har'El
00866a75d8 alternator: add ARN for indexes (LSI and GSI)
DynamoDB gives an ARN ("Amazon Resource Name") to LSIs and GSIs. These
look like BASEARN/index/INDEXNAME, where BASEARN is the ARN of the base
table, and INDEXNAME is the name of the LSI or the GSI.

These ARNs should be returned by DescribeTable as part of its
description of each index, and this patch adds that missing IndexArn
field.

The ARN we're adding here is hardly useful (e.g., as explained in
issue #10786, it can't be used to add tags to the index table),
but nevertheless should exist for compatibility with DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-13 18:14:42 +03:00
Nadav Har'El
75c2bd78ae test/alternator: reproducer for GetBatchItem duplicate keys
It turns out that DynamoDB forbids requesting the same item more than
once in a GetBatchItem request. Trying to do it would obviously be a
waste, but DynamoDB outright refuses it - and Alternator currently
doesn't (refs #10757).

The test currently passes on DynamoDB and fails on Alternator, so it
is marked xfail.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10758
2022-06-09 07:04:50 +02:00
Nadav Har'El
d0ca09a925 alternator: implement DescribeContinuousBackups operation
Although we don't yet support the DynamoDB API's backup features (see
issue #5063), we can already implement the DescribeContinuousBackups
operation. It should just say that continuous backups, and point-in-time
restores, and disabled.

This will be useful for client code which tries to inquire about
continuous backups, even if not planning to use them in practice
(e.g., see issue #10660).

Refs #5063
Refs #10660

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-26 15:13:50 +03:00
Nadav Har'El
f6ce7891a5 test/alternator: add test for key length limits
DynamoDB limits partition-key length to 2048 bytes and sort-key length
to 1024 bytes. Alternator currently has no such limits officially, but
if a user tries a key length of over 64 KB, the result will be an
"internal server error" as Alternator runs into Scylla's low-level key
length limit of 64 KB.

In this patch we add (mostly xfailing) tests confirming all the above
observations. The tests include extensive comments on what they are
testing and why. Some of these tests (specifically, the ones checking
what happens above 64 KB) should pass once Alternator is fixed. Other
tests - requiring that the limits be exactly what they are in DynamoDB -
may either not pass or change in the future, depending on what we decide
the limits should be in Alternator.

Refs #10347

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10438
2022-04-26 18:09:19 +02:00
Nadav Har'El
84143c2ee5 alternator: implement Select option of Query and Scan
This patch implements the previously-unimplemented Select option of the
Query and Scan operators.

The most interesting use case of this option is Select=COUNT which means
we should only count the items, without returning their actual content.
But there are actually four different Select settings: COUNT,
ALL_ATTRIBUTES, SPECIFIC_ATTRIBUTES, and ALL_PROJECTED_ATTRIBUTES.

Five previously-failing tests now pass, and their xfail mark is removed:

 *  test_query.py::test_query_select
 *  test_scan.py::test_scan_select
 *  test_query_filter.py::test_query_filter_and_select_count
 *  test_filter_expression.py::test_filter_expression_and_select_count
 *  test_gsi.py::test_gsi_query_select_1

These tests cover many different cases of successes and errors, including
combination of Select and other options. E.g., combining Select=COUNT
with filtering requires us to get the parts of the items needed for the
filtering function - even if we don't need to return them to the user
at the end.

Because we do not yet support GSI/LSI projection (issue #5036), the
support for ALL_PROJECTED_ATTRIBUTES is a bit simpler than it will need
to be in the future, but we can only finish that after #5036 is done.

Fixes #5058.

The most intrusive part of this patch is a change from attrs_to_get -
a map of top-level attributes that a read needs to fetch - to an
optional<attrs_to_get>. This change is needed because we also need
to support the case that we want to read no attributes (Select=COUNT),
and attrs_to_get.empty() used to mean that we want to read *all*
attributes, not no attributes. After this patch, an unset
optional<attrs_to_get> means read *all* attributes, a set but empty
attrs_to_get means read *no* attributes, and a set and non-empty
attrs_to_get means read those specific attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-2-nyh@scylladb.com>
2022-04-11 10:04:32 +02:00
Nadav Har'El
9c1ebdceea alternator: forbid empty AttributesToGet
In DynamoDB one can retrieve only a subset of the attributes using the
AttributesToGet or ProjectionExpression paramters to read requests.
Neither allows an empty list of attributes - if you don't want any
attributes, you should use Select=COUNT instead.

Currently we correctly refuse an empty ProjectionExpression - and have
a test for it:
test_projection_expression.py::test_projection_expression_toplevel_syntax

However, Alternator is missing the same empty-forbidding logic for
AttributesToGet. An empty AttributesToGet is currently allowed, and
basically says "retrieve everything", which is sort of unexpected.

So this patch adds the missing logic, and the missing test (actually
two tests for the same thing - one using GetItem and the other Query).

Fixes #10332

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-1-nyh@scylladb.com>
2022-04-11 10:21:02 +03:00
Nadav Har'El
86d01542de test/alternator: test another example of nested function calls
In the existing test we noticed that list_append(if_not_exists(...))
is allowed, but list_append(list_append(...)) is not. I wasn't sure
whether if_not_exists(if_not_exists(..)) will be allowed - and this
test verifies that it is - it works on both Scylla and DynamoDB, and
gives the same results on both.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220407122729.155648-1-nyh@scylladb.com>
2022-04-11 09:56:02 +03:00
Nadav Har'El
67e0590bbc alternator: remove old TODO (with test verifying it)
We had an old TODO in the Alternator "Scan" operation code which
suggested that we may need to do something to limit the size of pages
when a row limit ("Limit") isn't given.

But we do already have a built-in limit on page sizes (1 MB),
so this TODO isn't needed and can be removed.

But I also wanted to make sure we have a test that this limit works:

We already had a test that this 1 MB limit works for a single-partition
Query (test_query.py::test_query_reverse_longish - tested both forward
and reversed queries). In this patch I add a similar test for a whole-
table Scan. It turns out that although page size is limited in this case
as well, it's not exactly 1 MB... For small tables can even reach 3 MB.
I consider this "good enough" and that we can drop the TODO, but also
opened issue #10327 to document this surprising (for me) finding.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220404145240.354198-1-nyh@scylladb.com>
2022-04-05 09:23:23 +03:00
Nadav Har'El
56936d3c16 test/alternator: add reproducers for scan of long string of tombstones
This patch adds two xfailing tests for issue #7933. That issue is about
what Scan or Query paging does when encountering a very long string of
consecutive tombstones (partition or row tombstones). Ideally, in that
case the scan could stop on one of these tombstones after already
processing too many. But as these two tests demonstrate, the scan can't
stop in the middle of a long string of tombstones - and as a result
retrieving a single page can take an unbounded amount of time, which is
wrong.

Currently the tests are marked `@veryslow` (they each take more than a
minute) because they each create a huge number of tombstones to
demonstrate a huge amount of work for a single page. When we fix
issue #7933 and have a much smaller limit on the number of tombstones
processed in a single page, we can hopefully make these tests much
shorter and remove the `@veryslow` tag. The `@veryslow` tags means
that although these tests can be used manually (with `--runveryslow`)
they will not yet be run as part of the usual regression tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220403070706.250147-1-nyh@scylladb.com>
2022-04-05 09:11:38 +03:00
Nadav Har'El
758f8f01d7 test/alternator: turn REST API finding into a fixture
In test_tracing.py and util.py, we already have three duplicates of code
which looks for the Scylla REST API. We'll soon want to add even more uses
of this REST API, so it's good time to add a single fixture, "rest_api",
which can be use in all tests that need the Scylla REST API instead of
duplicating the same code.

A test using the "rest_api" fixture will be skipped if the server isn't
Scylla, or its port 10000 is not available or not responsive.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220331195337.64352-1-nyh@scylladb.com>
2022-04-01 10:51:59 +03:00
Nadav Har'El
d8c0680585 test/alternator: add regression test for old ALL_NEW bug
In commit 964500e47a, in the middle of
a larger series, I fixed a small Alternator bug that I found while working
on that series. The bug was that the ReturnValues=ALL_NEW feature moved out
the read previous_item, which breaks operations that need previous_item,
e.g., an ADD operation. Unfortunately, we never had a regression test for
this fix bug, so in this patch I add one.

This bug was re-discovered on an old branch by a user, at which point
I noticed that we don't have a test for it - so I want to add it now,
even though the bug itself is long gone from Scylla master.

I verified that the new test indeed fails on old versions of Scylla
before the aforementioned commit, and passes when backporting only that
commit.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220327074928.3608576-1-nyh@scylladb.com>
2022-03-28 08:40:28 +02:00
Nadav Har'El
653f2df28f alternator: fix JSON escaping of error responses
In the DynamoDB API, error responses are in JSON format with specific
fields ("__type" and "message" in the x-amz-json-1.0 format currently
used). Alternator tried to be clever and build the string representation
of this JSON itself, instead of using RapidJSON. But this optimization
was a mistake - if the error message contains characters that need
escaping (such as double quotes and newlines), they weren't escaped,
and the resulting JSON was malformed. When the client library boto3
read this malformed JSON it got confused, cosidered the entire error
response to be a string, which resulted in an ugly error message.

The fix is easy - just build the JSON output as usual with RapidJSON
instead of trying to optimize using string operation.

The patch also includes two tests reproducing this bug and checking its
fix. The first test uses boto3 and shows it got confused on the type
of error (not understanding that it is a ValidationException). The
second test bypasses boto3 and shows exactly where the bug happens -
the response is an unparsable JSON.

Fixes #10278

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220327132705.3707979-1-nyh@scylladb.com>
2022-03-27 16:32:36 +03:00
Nadav Har'El
49a8164fb7 alternator: add configurable scan period to TTL expiration
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.

In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.

The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.

Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.

Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
4349514064 test/alternator: add smaller reproducer for Limit-less reverse query
The regression test we have for Alternator's issue #9487 (where a reverse
query without a Limit given was broken into 100MB pages instead of the
expected 1MB) is test_query.py::test_query_reverse_long. But this is a
very long test requiring a 100MB partition, and because of its slowness
isn't run by default.

This patch adds another version of that test, test_query_reverse_longish,
which reproduces the same issue #9487 with a partition 50 times shorter
(2MB) so it only takes a fraction of a second and can be enabled by
default. It also requires much less network traffic which is important
when running these tests non-locally.

We leave the original test test_query_reverse_long behind, it can be
still useful to stress Scylla even beyond the 100MB boundary, but it
remains in @veryslow mode so won't run in default test runs.

Refs #9487
Refs #7586

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220220161905.852994-1-nyh@scylladb.com>
2022-02-21 09:12:16 +01:00
Nadav Har'El
f292d3d679 alternator: make schema modifications in CreateTable atomic
The Alternator CreateTable operation currently performs several schema-
changing operations separately - one by one: It creates a keyspace,
a table in that keyspace and possibly also multiple views, and it sets
tags on the table. A consequence of this is that concurrent CreateTable
and DeleteTable operations (for example) can result in unexpected errors
or inconsistent states - for example CreateTable wants to create the
table in the keyspace it just created, but a concurrent DeleteTable
deleted it. We have two issues about this problem (#6391 and #9868)
and three tests (test_table.py::test_concurrent_create_and_delete_table)
reproducing it.

In this patch we fix these problems by switching to the modern Scylla
schema-changing API: Instead of doing several schema-changing
operations one by one, we create a vector of schema mutation performing
all these operations - and then perform all these mutations together.

When the experimental Raft-based schema modifications is enabled, this
completely solves the races, and the tests begin to pass. However, if
the experimental Raft mode is not enabled, these tests continue to fail
because there is still no locking while applying the different schema
mutations (not even on a single node). So I put a special fixture
"fails_without_raft" on these tests - which means that the tests
xfail if run without raft, and expected to pass when run on Raft.

Indeed, after this patch
test/alternator/run --raft test_table.py::test_concurrent_create_and_delete_table

shows three passing tests (they also pass if we drastically improve the
number of iterations), while
test/alternator/run test_table.py::test_concurrent_create_and_delete_table

shows three xfailing tests.

All other Alternator tests pass as before with this patch, verifying
that the handling of new tables, new views, tags, and CDC log tables,
all happen correctly even after this patch.

A note about the implementation: Before this patch, the CreateTable code
used high-level functions like prepare_new_column_family_announcement().
These high-level functions become unusable if we write multiple schema
operations to one list of mutations, because for example this function
validates that the keyspace had already been created - when it hasn't
and that's the whole point. So instead we had to use lower-level
function like add_table_or_view_to_schema_mutation() and
before_create_column_family(). However, despite being lower level,
these functions were public so I think it's reasonable to use them,
and we probably have no other alternative.

Fixes #6391
Fixes #9868

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-18 09:03:52 +02:00
Nadav Har'El
212c321c55 test/alternator: add reproducers for non-atomic table creation
We add reproducing tests for two known Alternator issues, #6391 and #9868,
which involve the non-atomicity of table creation. Creating a table
currently involves multiple steps - creating a keyspace, a table,
materialized views, and tags. If some of these steps succeed and some
fail, we get an InternalServerError and potentially leave behind some
half-built table.

Both issues will be solved by making better use of the new Raft-based
capabilities of making multiple modifications to the schema atomically,
but this patch doesn't fix the problem - it just proves it exist.

The new tests involve two threads - one repeatedly trying to create a
table with a GSI or with tags - and the other thread repeatedly trying
to delete the same table under its feet. Both bugs are reproduced almost
immediately.

Note that like all test/alternator tests, the new tests are usually run on
just one node. So when we fix the bug and these tests begin to pass,
it will not be a proof that concurrent schema modification works safely
on *different* nodes. To prove that, we will also need a multi-node test.
However, this test can prove that we used Raft-based schema modification
correctly - and if we assume that the Raft-based schema modification
feature is itself correct, then we can be sure that CreateTable will be
correct also across multiple nodes. Although it won't hurt to check it
directly.

Refs #6391
Refs #9868

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220207223100.207074-1-nyh@scylladb.com>
2022-02-14 18:21:21 +02:00
Nadav Har'El
4937270803 test/alternator: add option to run with Raft-based schema changes
This patch adds a "--raft" option to test/alternator/run to enable the
experimental Raft-based schema changes ("--experimental-features=raft")
when running Scylla for the tests. This is the same option we added to
test/cql-pytest/run in a previous patch.

Note that we still don't have any Alternator tests that pass or fail
differently in these two modes - these will probably come later as we
fix issues #9868 and #6391. But in order to work on fixing those issues
we need to be able to run the tests in Raft mode.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209123144.321344-1-nyh@scylladb.com>
2022-02-10 09:43:10 +02:00
Nadav Har'El
9982a28007 alternator: allow REMOVE of non-existent nested attribute
DynamoDB allows an UpdateItem operation "REMOVE x.y" when a map x
exists in the item, but x.y doesn't - the removal silently does
nothing. Alternator incorrectly generated an error in this case,
and unfortunately we didn't have a test for this case.

So in this patch we add the missing test (which fails on Alternator
before this patch - and passes on DynamoDB) and then fix the behavior.
After this patch, "REMOVE x.y" will remain an error if "x" doesn't
exist (saying "document paths not valid for this item"), but if "x"
exists and is a map, but "x.y" doesn't, the removal will silently
do nothing and will not be an error.

Fixes #10043.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220207133652.181994-1-nyh@scylladb.com>
2022-02-07 18:40:48 +02:00
Nadav Har'El
8a745593a2 Merge 'alternator: fill UnprocessedKeys for failed batch reads' from Piotr Sarna
DynamoDB protocol specifies that when getting items in a batch
failed only partially, unprocessed keys can be returned so that
the user can perform a retry.
Alternator used to fail the whole request if any of the reads failed,
but right now it instead produces the list of unprocessed keys
and returns them to the user, as long as at least 1 read was
successful.

This series comes with a test based on Scylla's error injection mechanism, and thus is only useful in modes which come with error injection compiled in. In release mode, expect to see the following message:
SKIPPED (Error injection not enabled in Scylla - try compiling in dev/debug/sanitize mode)

Fixes #9984

Closes #9986

* github.com:scylladb/scylla:
  test: add total failure case for GetBatchItem
  test: add error injection case for GetBatchItem
  test: add a context manager for error injection to alternator
  alternator: add error injection to BatchGetItem
  alternator: fill UnprocessedKeys for failed batch reads
2022-01-31 15:28:24 +02:00
Piotr Sarna
c87126198d test: add total failure case for GetBatchItem
The test verifies that if all reads from a batch operation
failed, the result is an error, and not a success response
with UnprocessedKeys parameter set to all keys.
2022-01-31 14:21:55 +01:00
Piotr Sarna
e79c2943fc test: add error injection case for GetBatchItem
The new test case is based on Scylla error injection mechanism
and forces a partial read by failing some requests from the batch.
2022-01-31 14:21:55 +01:00
Piotr Sarna
99c5bec0e2 test: add a context manager for error injection to alternator
With the new context manager it's now easier to request an error
to be injected via REST API. Note that error injection is only
enabled in certain build modes (dev, debug, sanitize)
and the test case will be skipped if it's not possible to use
this mechanism.
2022-01-31 14:21:55 +01:00
Nadav Har'El
a25e265373 test/alternator: improve comment on why we need "global_random"
Improve the comment that explains why we needed to use an explicitly
shared random sequence instead of the usual "random". We now understand
that we need this workaround to undo what the pytest-randomly plugin does.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220130155557.1181345-1-nyh@scylladb.com>
2022-01-31 10:07:56 +01:00
Piotr Sarna
471205bdcf test/alternator: use a global random generator for all test cases
It was observed (perhaps it depends on the Python implementation)
that an identical seed was used for multiple test cases,
which violated the assumption that generated values are in fact
unique. Using a global generator instead makes sure that it was
only seeded once.

Tests: unit(dev) # alternator tests used to fail for me locally
  before this patch was applied
Message-Id: <315d372b4363f449d04b57f7a7d701dcb9a6160a.1643365856.git.sarna@scylladb.com>
2022-01-30 16:40:20 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Nadav Har'El
a30e71e27a alternator: doc, test: fix mentions of reverse queries
Now that issues #7586 and #9487 were fixed, reverse queries - even in
long partitions - work well, we can drop the claim in
alternator/docs/compatibility.md that reverse queries are buggy for
large partitions.

We can also remove the "xfail" mark from the tes that checks this
feature, as it now passes.

Refs #7586
Refs #9487

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #9831
2022-01-16 17:46:26 +02:00
Nadav Har'El
e7e9001808 test/alternator: add more tests for GSI "Projection"
We already have multiple tests for the unimplemented "Projection" feature
of GSI and LSI (see issue #5036). This patch adds seven more test cases,
focusing on various types of errors conditions (e.g., trying to project
the same attribute twice), esoteric corner cases (it's fine to list a key in
NonKeyAttributes!), and corner cases that I expect we will have in our
implementation (e.g., a projected attribute may either be a real Scylla
column or just an element in a map column).

All new tests pass on DynamoDB and fail on Alternator (due to #5036), so
marked with "xfail".

Refs #5036.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211228193748.688060-1-nyh@scylladb.com>
2022-01-05 10:35:36 +02:00
Nadav Har'El
31eeb44d28 alternator: fix error on UpdateTable for non-existent table
When the UpdateTable operation is called for a non-existent table, the
appropriate error is ResourceNotFoundException, but before this patch
we ran into an exception, which resulted in an ugly "internal server
error".

In this patch we use the existing get_table() function which most other
operations use, and which does all the appropriate verifications and
generates the appropriate Alternator api_error instead of letting
internal Scylla exceptions escape to the user.

This patch also includes a test for UpdateTable on a non-existent table,
which used to fail before this patch and pass afterwards. We also add a
test for DeleteTable in the same scenario, and see it didn't have this
bug. As usual, both tests pass on DynamoDB, which confirms we generate
the right error codes.

Fixes #9747.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211206181605.1182431-1-nyh@scylladb.com>
2021-12-14 13:09:27 +01:00
Nadav Har'El
815324713e test/alternator: add more tests for ADD operand mismatch
The "ADD" operator in UpdateItem's AttributeUpdates supports a number of
types (numbers, sets and strings), should result in a ValidationException
if the attribute's existing type is different from the type of the
operand - e.g., trying to ADD a number to an attribute which has a set
as a value.

So far we only had partial testing for this (we tested the case where
both operands are sets, but of different types) so this patch adds the
missing tests. The new tests pass (on both Alternator and DynamoDB) -
we don't have a bug there.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211213195023.1415248-1-nyh@scylladb.com>
2021-12-14 11:15:23 +02:00
Nadav Har'El
03d67440ef alternator: test additional metrics and fix another broken counter
In issue #9406 we noticed that a counter for BatchGetItem operations
was missing. When we fixed it, we added a test which checked this
counter - but only this counter. It was left as a TODO to test the rest
of the Alternator metrics, and this is what this patch does.

Here we add a comprehensive test for *all* of the operations supported
by Scylla and how they increase the appropriate operation counter.

With this test we discovered a new bug: the DescribeTimeToLive operation
incremented the UpdateTimeToLiveCounter :-( So in this patch we also
include a fix for that bug, and the new test verifies that it is fixed.

In addition to the operation counters, Alternator also has additional
metric and we also added tests for some of them - but not all. The
remaining untested metrics are listed in a TODO comment.
Message-Id: <20211206154727.1170112-1-nyh@scylladb.com>
2021-12-10 08:08:54 +02:00
Piotr Sarna
26288c1a86 test,alternator: make TTL tests less prone to false negatives
On my local machine, a 3 second deadline proved to cause flakiness
of test_ttl_expiration case, because its execution time is just
around 3 seconds.
This patch addresse the problem by bumping the local timeout to 10
(and 15 for test_ttl_expiration_long, since it's dangerously near
the 10 second deadline on my machine as well).

Moreover, some test cases short-circuited once they detected that
all needed items expired, but other ones lacked it and always used
their full time slot. Since 10 seconds is a little too long for
a single test case, even one marked with --veryslow, this patch
also adds a couple of other short-circuits.
One exception is test_ttl_expiration_hash_wrong_type, which actually
depends on the fact that we should wait for the whole loop to finish.
Since this case was never flaky for me with the 3 second timeout,
it's left as is.
Theoretically, test_ttl_expiration also kind of depends on checking
the condition more than once (because the TTL of one of the values
is bumped on each iteration), but empirical evidence shows that
multiple iterations always occur in this test case anyway - for
me, it always spinned at least 3 times.

Tests: unit(release)

Message-Id: <a0a479929dac37daace744e0a970567a8aa3b518.1638431933.git.sarna@scylladb.com>
2021-12-08 16:02:45 +02:00
Nadav Har'El
92e7fbe657 test/alternator: check correct error for unknown operation
Add a short test verifying that Alternator responds with the correct
error code (UnknownOperationException) when receiving an unknown or
unsupported operation.

The test passes on both AWS and Alternator, confirming that the behavior
is the same.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211206125710.1153008-1-nyh@scylladb.com>
2021-12-08 13:56:38 +02:00
Nadav Har'El
d3abff9ea1 test/alternator: validate that TagResource needs a Tags parameter
A short new test to verify that in the TagResource operation, the
Tags parameter - specifying which tags to set - is required.

The test passes on both AWS and Alternator - they both produce a
ValidationException in this case (the specific human-readable error
message is different, though, so we don't check it).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211206140541.1157574-1-nyh@scylladb.com>
2021-12-06 15:08:16 +01:00
Calle Wilund
3e21fea2b6 test_streamts: test_streams_starting_sequence_number fix 'LastEvaluatedShardId' usage
It is not part of raw response, but of the 'StreamDescription' object.
Test fails internmittently depending on PK randomization.

Closes #9710
2021-12-01 11:05:40 +02:00
Nadav Har'El
d9c5c4eab6 test/alternator: tests for Select parameter in GSI and LSI
We already have tests for the behavior of the "Select" parameter when
querying a base table, but this patch adds additional tests for its
behavior when querying a GSI or a LSI. There are some differences:
Select=ALL_PROJECTED_ATTRIBUTES is not allowed for base tables, but is
allowed - and in fact is the default - for GSI and LSI. Also, GSI may
not allow ALL_ATTRIBUTES (which is the default for base tables) if
only a subset of the attributes were projected.

The new tests xfail because the Select and Projection features have
not yet been implemented in Alternator. They pass in DynamoDB.
After this patch we have (hopefully) complete test coverage of the
Select feature, which will be helpful when we start implementing it.

Refs #5058 (Select)
Refs #5036 (Projection)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211125100443.746917-1-nyh@scylladb.com>
2021-11-29 20:28:43 +01:00
Nadav Har'El
1c279118f4 test/alternator: more test cases for Select parameter
Add to the existing tests for the Select parameter of the Query and Scan
operations another check: That when Select is ALL_ATTRIBUTES or COUNT,
specifying AttributesToGet or ProjectionExpression is forbidden -
because the combination doesn't make sense.

The expanded test continues to xfail on Alternator (because the Select
parameter isn't yet implemented), and passes on DynamoDB. Strengthening
the tests for this feature will be helpful when we decide to implement it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211125074128.741677-1-nyh@scylladb.com>
2021-11-29 20:28:25 +01:00
Piotr Sarna
ecd122a1b0 Merge 'alternator: rudimentary implementation of TTL expiration service' from Nadav Har'El
In this patch series we add an implementation of an
expiration service to Alternator, which periodically scans the data in
the table, looking for expired items and deleting them.

We also continue to improve the TTL test suite to cover additional
corner cases discovered during the development of the code.

This implementation is good enough to make all existing tests but one,
plus a few new ones, pass, but is still a very partial and inefficient
implementation littered with FIXMEs throughout the code. Among other
things, this initial implementation doesn't do anything reasonable about pacing of
the scan or about multiple tables, it scans entire items instead of only the
needed parts, and because each shard "owns" a different subset of the
token ranges, if a node goes down, partitions which it "owns" will not
get expired.

The current tests cannot expose these problems, so we will need to develop
additional tests for them.

Because this implementation is very partial, the Alternator TTL continues
to remain "experimental", cannot be used without explicitly enabling this
experimental feature, and must not be used for any important deployment.

Refs #5060 but doesn't close the issue (let's not close it until we have a
reasonably complete implementation - not this partial one).

Closes #9624

* github.com:scylladb/scylla:
  alternator: fix TTL expiration scanner's handling of floating point
  test/alternator: add TTL test for more data
  test/alternator: remove "xfail" tag from passing tests in test_ttl.py
  test/alternator: make test_ttl.py tests fast on Alternator
  alternator: initial implmentation of TTL expiration service
  alternator: add another unwrap_number() variant
  alternator: add find_tag() function
  test/alternator: test another corner case of TTL setting
  test/alternator: test TTL expiration for table with sort key
  test/alternator: improve basic test for TTL expiration
  test/alternator: extract is_aws() function
2021-11-28 22:12:52 +02:00