Commit Graph

173 Commits

Author SHA1 Message Date
Nadav Har'El
6e1344eb4f alternator: better error handling for wrongly-encoded numbers
In the DynamoDB API, a number is encoded in JSON requests as something
like: {"N": "123"} - the type is "N" and the value "123". Note that the
value of the number is encoded as a string, because the floating-point
range and accuracy of DynamoDB differs from what various JSON libraries
may support.

We have a function unwrap_number() which supported the value of the
number being encoded as an actual number, not a string. But we should
NOT support this case - DynamoDB doesn't. In this patch we add a test
that confirms that DynamoDB doesn't, and remove the unnecessary case
from unwrap_number(). The unnecessary case also had a FIXME, so it's
a good opportunity to get rid of a FIXME.

When writing the test, I noticed that the error which DynamoDB returns
in this case is SerializionException instead of the more usual
ValidationException. I don't know why, but let's also change the error
type in this patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211115125738.197099-1-nyh@scylladb.com>
2021-11-15 14:47:49 +01:00
Nadav Har'El
b95e431228 alternator: fix bug in ReturnValues=ALL_NEW
This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in
some cases returned the OLD (pre-modification) value of some of the
attributes, instead of its NEW value.

The bug was caused by a confusion in our JSON utility function,
rjson::set(), which sounds like it can set any member of a map, but in
fact may only be used to add a *new* member - if a member with the same
name (key) already existed, the result is undefined (two values for the
same key). In ReturnValues=ALL_NEW we did exactly this: we started with
a copy of the original item, and then used set() to override some of the
members. This is not allowed.

So in this patch, we introduce a new function, rjson::replace(), which
does what we previously thought that rjson::set() does - i.e., replace a
member if it exists, or if not, add it. We call this function in
the ReturnValues=ALL_NEW code.

This patch also adds a test case that reproduces the incorrect ALL_NEW
results - and gets fixed by this patch.

In an upcoming patch, we should rename the confusingly-named set()
functions and audit all their uses. But we don't do this in this patch
yet. We just add some comments to clarify what set() does - but don't
change it, and just add one new function for replace().

Fixes #9542

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104134937.40797-1-nyh@scylladb.com>
2021-11-04 16:34:58 +01:00
Piotr Sarna
f36bbe05b4 Merge 'alternator: add support for AttributeUpdates Add operation' from Nadav Har'El
In UpdateItem's AttributeUpdates (old-style parameter) we were missing
support for the ADD operation - which can increment a number, or add
items to sets (or to lists, even though this fact isn't documented).

This two-patch series add this missing feature. The first patch just moves
an existing function to where we can reuse it, and the second patch is
the actual implementation of the feature (and enabling its test).

Fixes #5893

Closes #9574

* github.com:scylladb/scylla:
  alternator: add support for AttributeUpdates ADD operation
  alternator: move list_concatenate() function
2021-11-03 09:33:50 +01:00
Nadav Har'El
00335b1901 alternator: add support for AttributeUpdates ADD operation
In UpdateItem's AttributeUpdates (old-style parameter) we were missing
support for the ADD operation - which can increment a number, or add
items to sets (or to lists, even though this fact isn't documented).

This patch adds this feature, and the test for it begins to pass so its
"xfail" marker is removed.

Fixes #5893

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-11-03 10:19:26 +02:00
Nadav Har'El
56eb994d8f alternator: allow Authorization header to be without spaces
The "Authorization" HTTP header is used in DynamoDB API to sign
requests. Our parser for this header, in server::verify_signature(),
required the different components of this header to be separated by
a comma followed by a whitespace - but it turns out that in DynamoDB
both spaces and commas are optional - one of them is enough.

At least one DynamoDB client library - the old "boto" (which predated
boto3) - builds this header without spaces.

In this patch we add a test that shows that an Authorization header
with spaces removed works fine in DynamoDB but didn't work in
Alternator, and after this patch modifies the parsing code for this
header, the test begins to pass (and the other tests show that the
previously-working cases didn't break).

Fixes #9568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211101214114.35693-1-nyh@scylladb.com>
2021-11-03 06:38:28 +02:00
Nadav Har'El
6ae0ea0c48 alternator: return the correct Content-Type header
Although the DynamoDB API responses are JSON, additional conventions apply
to these responses - such as how error codes are encoded in JSON. For this
reason, DynamoDB uses the content type `application/x-amz-json-1.0` instead
of the standard `application/json` in its responses.

Until this patch, Scylla used `application/json` in its responses. This
unexpected content-type didn't bother any of the AWS libraries which we
tested, but it does bother the aiodynamo library (see HENNGE/aiodynamo#27).

Moreover, we should return the x-amz-json-1.0 content type for future
proofing: It turns out that AWS already defined x-amz-json-1.1 - see:
https://awslabs.github.io/smithy/1.0/spec/aws/aws-json-1_1-protocol.html
The 1.1 content type differs (only) in how it encodes error replies.
If one day DynamoDB starts to use this new reply format (it doesn't yet)
and if DynamoDB libraries will need to differenciate between the two
reply formats, Alternator better return the right one.

This patch also includes a new test that the Content-Type header is
returned with the expected value. The test passes on DynamoDB, and
after this patch it starts to pass on Alternator as well.

Fixes #9554.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211031094621.1193387-1-nyh@scylladb.com>
2021-10-31 10:50:25 +01:00
Nadav Har'El
1d751491a3 test/alternator: recognize when Scylla crashes
Before this patch, if Scylla crashes during some test in test/alternator,
all tests after it will fail because they can't connect to Scylla - and we
can get a report on hundreds of failures without a clear sign of where the
real problem was.

This patch introduces an autouse fixture (i.e., a fixture automatically
used by every test) which tries to run a do-nothing health-check request
after each test. If this health-check request fails, we conclude that
Scylla crashed and report the test in which this happened - and exit
pytest instead of failing a hundred more tests.

The failure report looks something like this:
```
! _pytest.outcomes.Exit: Scylla appears to have crashed in test test_batch.py::test_batch_get_item !
```
And the entire test run fails.

These extra health checks are not free, but they come fairly close to
being free: In my tests I measured less than 0.1 seconds slowdown of
the entire test suite (which has 618 tests) caused by the extra health
checks.

Fixes #9489

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211017123222.217559-1-nyh@scylladb.com>
2021-10-17 20:45:30 +03:00
Nadav Har'El
86e8979ff2 test/alternator, test/cql-pytest: enable specific experimental features
Issue #9467 deprecated the blanket "--experimental" option which we
used to enable all experimental Scylla features for testing, and
suggests that individual experimental features should be enabled
instead.

So this is what we do in this patch for the Scylla-running scripts
in test/alternator and test/cql-pytest: We need to enable UDF for
the CQL tests, and to enable Alternator Streams and Alternator TTL
for the Alternator tests.

Refs #9467

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-2-nyh@scylladb.com>
2021-10-15 16:36:35 +03:00
Nadav Har'El
77bd4afda7 test/alternator: avoid client-side validation
Ever since we started testing Alternator with tests written in Python
and using Amazon's "boto3" library, one limitation kept annoying us:

Boto3 verifies the validity of the request parameters before passing
them on to the server. It verifies that mandatory parameters are not
missing, that parameters have the right types, and sometimes even the
right ranges - all in the library before ever sending the request.
This meant that in many cases, we couldn't get good test coverage for
Alternator's server-side handling of *wrong* parameters.

As it turns out, it is trivial to tell boto3 to *not* do its client-side
request validation, with the `parameter_validation=False` config flag.
We just never noticed that such a flag existed :-)

So this patch adds this flag. It then fixes a few tests which expected
ParameterValidationError - this error is the client-side validation
failure, but should now be replaced by checking the server-side error.

The patch also adds a couple of invalid parameter checks that we
couldn't do before because of boto3's eagerness to check them on the
client side. We can add a lot more of these error tests in the future,
now that we got rid of client-side validation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211005095514.537226-1-nyh@scylladb.com>
2021-10-05 13:26:51 +02:00
Nadav Har'El
6dee86eade test/alternator: another test for adding a GSI to an existing table
This patch adds yet another test for Alternator's unimplemented feature
of adding a GSI to an already existing table (issue #5022), but this
test is for a very specific corner case - tables which contain string
attributes with an empty value - the corner case described in
issue #9424:

DynamoDB used to forbid any string attributes from being set to an empty
string, but this changed in May 2020, and since then empty strings are
allowed - but NOT as keys. So although it is legal to set a string
attribute to an empty string, if this table has a GSI whose key is that
specific attribute, the update command is refused. We already had a
test for this - test_gsi_empty_value.

However, the case in this patch is the case where a GSI is added to a
table *after* the table already has data. In this case (as this test
demonstrates), we are supposed to drop the items which have the empty
string key from the GSI.

Even when #5022 (the ability to add GSIs to existing tables) will be done,
this test will continue to fail. The unique problem of this test is that
Scylla's materialized views *do* allow empty strings as clustering keys
(right now) and even partition keys (after #9375 will be solved), while
we don't want them to enter the GSI. We will probably need to add to the
view's filter, which right now contains (as required) "x IS NOT NULL"
also the filter "x != ''" (when x's type is a string or binary) so that
items with empty-string keys will be dropped.

Refs #5022
Refs #9375
Refs #9424

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211003170636.477582-1-nyh@scylladb.com>
2021-10-05 13:26:43 +02:00
Nadav Har'El
b136104298 alternator/test: test for invalid numeric values
DynamoDB has a rather baroque definition of numbers, and in particular
it does *not* allow numeric attributes to be set to infinity or NaN.

Although I did check invalid numbers in the past, manually, I was never
able to write a unit test for this in the past - because the boto3
library catches such errors on the client side, and prevents the test from
sending broken requests to the server. So in this patch, I finally came up
with a solution - a context manager client_no_transform() which
yields a client which does NOT do any transformation or validation
on the request's parameters, allowing us to use boto3 to create
improper requests - and test the server's handling of them.

The test in this patch passes - it did not discover a new bug, but
it is a useful regression test and the client_no_transform() trick
can be used in more error-case tests which until now we were unable
to write.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211004161809.520236-1-nyh@scylladb.com>
2021-10-05 13:13:45 +02:00
Nadav Har'El
414b672e22 test/alternator: verify that empty-string keys are NOT allowed
Since May 2020 empty strings are allowed in DynamoDB as attribute values
(see announcment in [1]). However, they are still not allowed as keys.

We had tests that they are not allowed in keys of LSI or GSI, but missed
tests that they are not allowed as keys (partition or sort key) of base
tables. This patch add these missing tests.

These tests pass - we already had code that checked for empty keys and
generated an appropriate error.

Note that for compatibility with DynamoDB, Alternator will forbid empty
strings as keys even though Scylla *does* support this possibility
(Scylla always supported empty strings as clustering key, and empty
partition keys will become possible with issue #9352).

[1] https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dynamodb-now-supports-empty-values-for-non-key-string-and-binary-attributes-in-dynamodb-tables/

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211003122842.471001-1-nyh@scylladb.com>
2021-10-04 08:40:43 +02:00
Nadav Har'El
1edcc3a218 test/alternator: add test for reverse queries
This patch adds a reproducer for issue #7586 - that Alternator queries
(Query) operating in reverse order (ScanIndexForward = false) are
artificially limited to 100 MB partitions because of their memory use.

This test generates a partition over 100 MB in size and then tries various
reverse queries on it - with or without Limit, starting at the end or
the middle of the partition. The test currently fails when a reverse query
refuses to operate on such a large partition - the log reports this:

  ERROR ... Memory usage of reversed read exceeds hard limit of 104857600
  (configured via max_memory_for_unlimited_query_hard_limit), while reading
  partition K1H6ON3A1C

With yet-uncommitted reverse-scan improvements, the test proceeds further,
but still fails where we test that a reverse query with Limit not
explicitly specified should still be limited to a certain size (e.g. 1MB)
and cannot return the entire 100 MB partition in one response.

Please note that this is not a comprehensive test for Scylla's reverse
scan implementation: In particular we do not have separate tests for
reverse scan's implementation on different sources - memtables, sstables,
or the cache. Nor do we check all sorts of edge cases. We assume that
Scylla's reverse scan implementation will have its own unit tests
elsewhere that will check these things - and this test can focus on the
Alternator use case.

This test is marked "xfail" because it still fails on Alternator. It is
marked "veryslow" because it's a (relatively) slow test, taking multiple
seconds to set up the 100 MB partition. So run the test with the
pytest options "--runxfail --runveryslow" to see how it fails.

Refs #7586

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210930063700.407511-1-nyh@scylladb.com>
2021-09-30 09:34:39 +02:00
Nadav Har'El
5cbe9178fd alternator: add missing BatchGetItem metric
Unfortunately, defining metrics in Scylla requires some code
duplication, with the metrics declared in one place but exported in a
different place in the code. When we duplicated this code in Alternator,
we accidentally dropped the first metric - for BatchGetItem. The metric
was accounted in the code, but not exported to Prometheus.

In addition to fixing the missing metric, this patch also adds a test
that confirms that the BatchGetItem metric increases when the
BatchGetItem operation is used. This test failed before this patch, and
passes with it. The test only currently tests this for BatchGetItem
(and BatchWriteItem) but it can be later expanded to cover all the other
operations as well.

Fixes #9406

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210929121611.373074-1-nyh@scylladb.com>
2021-09-29 14:16:54 +02:00
Nadav Har'El
88177d7be7 test/alternator: add test for too many items in BatchWriteItem
DynamoDB limits the number of items that a BatchWriteItem call can write
to 25. As noted in issue #5057, in Alternator we don't have this limit
or any limit on the number of items in a BatchWriteItem - which probably
isn't wise.

This patch adds a simple xfailing test for this.

Refs #5057

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210912140736.76995-1-nyh@scylladb.com>
2021-09-29 10:48:58 +02:00
Nadav Har'El
58078a3f84 test/alternator: rename utility function test_table_name()
We have a utility function test_table_name() to create a unique name for
a test table. The funny thing is, that because this function starts with
the string "test_", pytest believes it's a test. This doesn't cause any
problems (it's consider a *passing* test), but it's nevertheless strange
to see it listed on the list of tests.

So in this page, we trivially rename this function to unique_table_name(),
a name why pytest doesn't think is the name of test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-09-19 21:05:21 +03:00
Nadav Har'El
4ffd8c1f2b alternator: stub TTL operations
This patch adds stubs for the UpdateTimeToLive and DescribeTimeToLive
operations to Alternator. These operations can enable, disable, or inquire
about, the chosen expiration-time attribute.

Currently, the information about the chosen attribute is only saved, with
no actual expiration of any items taking place.

Some of the tests for the TTL feature start to pass, so their xfail tag
is removed.

Because this this new feature is incomplete, it is not enabled unless
the "alternator-ttl" experimental feature is enabled. Moreover, for
these operations to be allowed, the entire cluster needs to support
this experimental feature, because all nodes need to participate in the
data expiration - if some old nodes don't support Alternator TTL, some
of the data they hold won't get expired... So we don't allow enabling
TTL until all the nodes in the cluster support this feature.

The implementation is in a new source file, alternator/ttl.cc. This
source file will continue to grow as we implement the expiration feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-09-19 21:05:21 +03:00
Nadav Har'El
82d2942ac8 test/alternator: test another corner case of TTL
Usually the TTL feature's expiration-time attribute is a schema-less
attribute, implemented in Alternator as a JSON-serialized item in a
bigger map column. However, key attributes are a special case because
they are implemented as separate columns. We already had test cases
showing that this case works too - for the case of hash and range keys.

In this test we test another possibility of an attribute that is
implemented as a schema column - the case of an LSI key.

As the other TTL tests, this test too passes on DynamoDB but xfails on
Alternator because the TTL feature is not yet implemented.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-09-19 21:05:21 +03:00
Nadav Har'El
1d4474d543 test/alternator/run: don't run Scylla if "--aws" option
The test/alternator/run script runs Scylla and then runs pytest against
it. But when passing the "--aws" option, the intention is that these
tests be run against AWS DynamoDB, not a local Scylla, so there is no
point in starting Scylla at all - so this is what we do in this patch.

This doesn't really add a new feature - "test/alternator/run --aws"
will now be nothing more than "cd test/alternator; pytest --aws".
But it adds the convenience that you can run the same tests on Scylla
and AWS with exactly the same "run" command, just adding the "--aws"
option, and don't need to sometimes use "run" and sometimes "pytest".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210912133239.75463-1-nyh@scylladb.com>
2021-09-12 16:50:38 +03:00
Nadav Har'El
b3f4a37a75 test/alternator: verify that nulls are valid inside string and bytes
The tests in this patch verify that null characters are valid characters
inside string and bytes (blob) attributes in Alternator. The tests
verify this for both key attributes and non-key attributes (since those
are serialized differently, it's important to check both cases).

The tests pass on both DynamoDB and Alternator - confirming that we
don't have a bug in this area.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210824163442.186881-1-nyh@scylladb.com>
2021-09-03 08:49:06 +02:00
Liu Lan
a5c54867f8 alternator: Exclusive start key must lie within the segment
...when using Segment/TotalSegment option.

The requirement is not specified in DynamoDB documents, but found
in DynamoDB Local:

{"__type":"com.amazon.coral.validate#ValidationException",
"message":"Exclusive start key must lie within the segment"}

Fixes #9272

Signed-off-by: Liu Lan <liulan_yewu@cmss.chinamobile.com>

Closes #9270
2021-09-01 11:05:45 +03:00
Nadav Har'El
cf06b7cd40 test/alternator: correct some typos in comments
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210729125317.1610573-1-nyh@scylladb.com>
2021-08-24 19:43:29 +03:00
Nadav Har'El
65381bd155 test/alternator: add tests for expression length limits
The DynamoDB documentation
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
describes several hard limits on the size of the size of expressions
(ProjectionExpression, ConditionExpression, UpdateExpression,
FilterExpression) and various elements they contain.

In this patch we begin testing those limits with a comprehensive test for
the *length* of each of these four expressions: we test that lengths up to
(and including) 4096 bytes are allowed but longer expressions are rejected.
We also add TODOs for additional documented limits that should be tested
in the future.

Currently, this test passes on DynamoDB but xfails on Alternator because
Alternator does *not* enforce any limits on the expression length. I don't
think this is a real problem, and we may consider keeping it this way,
but we should at least be aware that this difference exists and an
xfailing test will remind us.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210810081948.2012120-2-nyh@scylladb.com>
2021-08-10 12:06:21 +02:00
Nadav Har'El
9d49a32486 test/alternator: add tests for attribute name limits
DynamoDB limits attribute names in items to lengths of up 65535 bytes,
but in some cases (such as key attributes) the limit is lower - 255.
This patch adds tests for many of these cases.

All the new tests pass on DynamoDB, but some still xfail on Alternator
because Alternator is too lenient - sometimes allowing longer attribute
names than DynamoDB allows. While this may sound great, it also has
downsides: The oversized attribute names perform badly, and as they
grow, Alternator's internal limits will be reached as well, and result
in an unsightly "internal server error" being reported instead of the
expected user-friendly error.

Refs #9169.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210810081948.2012120-1-nyh@scylladb.com>
2021-08-10 12:06:13 +02:00
Nadav Har'El
dba184039a test/alternator: another test for Query's ExclusiveStartKey
We already have tests for Query's ExclusiveStartKey option, but we
only exercised it as a way for paging linearly through all the results.
Now we add a test that confirms that ExclusiveStartKey can be used not
just for paging through all the result - but also for jumping directly to
the middle of a partition after any clustering key (existing or non-
existing clustering key). The new test also for the first time verifies
that ExclusiveStartKey with a specific format works (previous tests just
copied LastEvaluatedKey to ExclusiveStartKey, so any opaque cookie could
have worked).

The test passes on both DynamoDB and Alternator so it did not find a new
bug. But it's useful to have as a regression test, in case in the future
we want to improve paging performance (see #6278) - and need to keep in
mind that ExclusiveStartKey is not just for paging.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210729114703.1609058-1-nyh@scylladb.com>
2021-08-04 15:24:47 +03:00
Nadav Har'El
2acfee8118 test/alternator: add tests for the Alternator TTL feature
This patch adds a comprehensive test suite for the DynamoDB API's TTL
(item expiration) feature.

The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice, and how item expiration interacts with other
features such as GSI, LSI and DynamoDB Streams.

Because DynamoDB has extremely long delays until items are expired, or
until expiration configuration may be changed, several of these tests
take up to 30 minutes to complete. We mark these tests with the
 "verylong" marker, so they are skipped in ordinary test runs - use the
"--runverylong" option to run them.

All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.

Refs #5060

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-07-14 14:08:55 +03:00
Nadav Har'El
9d07ce3cb6 test/alternator: add marker for "veryslow" tests
Until now, Alternator test have all been very fast, taking milliseconds
or at worst seconds each - or a bit longer on DynamoDB. However,
sometimes we need to write tests which take a huge amount of time - for
example, tests for the TTL feature may take 10 minutes because the item
expiration is delayed by that much. Because a 10 minute test is
ridiculous (all 500 Alternator tests together take just one minute
today!), we would normally run such test once, and then mark it "skip"
so will never run again.

One annoying thing about skipped tests is that there is no way to
temporarily "unskip" them when we want to run such a test anyway.
So in this patch, we introduce a better option for these very slow tests
instead of the simple "skip":

The patch introduces a marker "@pytest.mark.veryslow". By default, a
test with this marker is skipped. However, an command-line option
"--runveryslow" is introduced which causes tests with the veryslow
mark to be run anyway, and not skipped.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-07-14 00:26:21 +03:00
Nadav Har'El
2fb379bb94 test/alternator: add new_test_table() utility function
This patch adds a convenient function new_test_table() that Alternator tests
can use to safely create a temporary table, and be sure it is deleted in any
case. This function is used in a "with", as follows:

    with new_test_table(dynamodb, ...) as table:
        do_something(table)
    # at this point table has already been deleted.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-07-14 00:26:21 +03:00
Nadav Har'El
c174eeae06 alternator: do not allow LSI on base table with no sort key
The purpose of an LSI (local secondary index) in Alternator is to allow
a different sort key for the existing partitions, keeping the same
division into partititions. So it doesn't make sense to create an LSI on
a table that did not originally have a sort key (i.e., single-item partitions).

DynamoDB indeed doesn't allow this case, and Alternator forgot to forbid
it - so this patch adds the missing check to the CreateTable operation.

This patch also adds a test case for this, test_lsi_wrong_no_sort_key,
which failed before the patch and passes after it (and also passes on
DynamoDB).

Also, the existing test_lsi_wrong tests for bad LSI creation attempts
by mistake used a base table without a sort key - so while they
encountered an error as expected, it was not the right error! So we fix
that test (and split it into two tests), adding the missing sort key
and exposing the actual errors that the tests were meant to expose.
That test passed before this patch and also afterwards - but at least
after the patch it is actually testing what it was meant to be testing.

Fixes #9018.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210713123747.1012954-1-nyh@scylladb.com>
2021-07-13 15:12:01 +02:00
Nadav Har'El
775a64b003 test/alternator: test for change in CDC preimage
In pull request #8568, the CDC API changed slightly, with preimage data
gaining extra "delete$k" values for columns whose preimage was missing.
In this new test, we verify that this change did not break Alternator.
We didn't expect it to break Alternator, because it just outputs the known
base-table columns and ignores the columns which weren't a real base-table
column - like this "delete$k".

In the test we set up a stream with preimages, ensure that a real column
(note that an LSI key is a real column instead of a map element) has a
null preimage - and see that the preimage is returned as expected,
without fake columns like "delete$k".

The test passes, showing that PR #8568 was ok.
The test also passes, as expected, on DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210504120121.915829-1-nyh@scylladb.com>
2021-07-06 14:53:42 +02:00
Nadav Har'El
b26fcf5567 test/alternator: increase timeouts in test_tracing.py
The query tracing tests in test/alternator's test_tracing.py had one
timeout of 30 seconds to find the trace, and one unclearly-coded timeout
for finding the right content for the trace. We recently saw both
timeouts exceeded in tests, but only rarely and only in debug mode,
in a run 100 times slower than normal.

This patch increases both timeouts to 100 seconds. Whatever happens then,
we win: If the test stops failing, we know the new timeout was enough.
If the test continues to fail, we will be able to conclude that we have a
real bug - e.g., perhaps one of the LWT operations has a bug causing it
to hang indefinitely.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210608205026.1600037-1-nyh@scylladb.com>
2021-06-10 09:19:01 +03:00
Piotr Sarna
cb17aa1e53 Merge 'test/alternator: rewrite run script to share code with cql-pytest's run script' from Nadav Har'El
In this small series, I rewrite test/alternator/run to Python using the utility
functions developed for test/cql-pytest. In the future, we should do the same to
test/redis/run and test/scylla-gdb/run.

The benefit of this rewrite is less code duplication (all run scripts start with
the same duplicate code to deal with temporary directories, to run Scylla IP
addresses, etc.), but most importantly - in the future fixes we do to cql-pytest
(e.g., parameters needed to start Scylla efficiently, how to shut down Scylla,
etc.) will appear automatically in alternator test without needing to remember
to change both.

Another benefit is that test/alternator/run will now be Python, not a shell
script. This should make it easier to integrate it into test.py (refs #6212) in
the future - if we want to.

Closes #8792

* github.com:scylladb/scylla:
  test/alternator: rewrite test/alternator/run script in Python
  test/cql-pytest: make test run code more general
2021-06-06 19:18:49 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Nadav Har'El
f22ed3ff5c test/alternator: reduce very high timeout in one tracing test
In test_tracing.py::test_slow_query_log, the was what looked like
an innocent 30-second timeout, but this was in fact a 8 minute
timeout - because it started with sleeping 1 second, then 2 seconds,
then 3, ... until 30 seconds. Such a high timeout is frustrating when
trying to debug failures in the test - which is only expected to take
2 seconds (and all of it because of an artificial timeout).

So fix the loop to stop iterating after 60 seconds (a compromise
between 30 seconds and 8 minutes...), sleeping a constant amount
between iterations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210601150631.1037158-1-nyh@scylladb.com>
2021-06-06 09:21:23 +03:00
Nadav Har'El
0bb2e010f5 test/alternator: rewrite test/alternator/run script in Python
We already wrote the test/cql-pytest/run script in Python in a way
it can be reusable for the other test/*/run scripts.

So this patch replaces the test/alternator/run shell script with Python
code which does the same thing (safely runs Scylla with Alternator and
pytest on it in a temporary directory and IP address), but sharing most
of the code that cql-pytest uses.

The benefit of reusing the test/cql-pytest/run.py library goes beyond
shorter code - the main benefit will be that we can't forget to fix one
of the test/*/run scripts (e.g., add more command line options or fix a
bug) when fixing another one.

To make the test/cql-pytest/run.py library reusable for running
Alternator, I needed to generalize a few things in this patch (e.g.,
the way we check and wait for Scylla to boot with the different APIs we
intend to check). There is also one bug-fix on how interrupts are
handled (they are now better guaranteed to kill pytest) - and now fixing
this bug benefits all runners using run.py (cql-pytest/run,
cql-pytest/run-cassandra and alternator/run).

In the future, we can port the runners which are still duplicate shell
scripts - test/redis/run and test/scylla-gdb/run - to Python in a
similar manner to what we did here for test/alternator/run.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-06-03 11:23:00 +03:00
Nadav Har'El
a2379b96b1 alternator test: test for large BatchGetItem
This patch adds an Alternator test, test_batch_get_item_large,
which checks a BatchGetItem with a moderately large (1.5 MB) response.
The test passes - we do not have a bug in BatchGetItem - but it
does reproduce issue #8522 - the long response is stored in memory as
one long contiguous string and causes a warning about an over-sized
allocation:

  WARN ... seastar_memory - oversized allocation: 2281472 bytes.

Incidentally, this test also reproduces a second contiguous
allocation problem - issue #8183 (in BatchWriteItem which we use
in this test to set up the item to read).

Refs #8522
Refs #8183

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210520161619.110941-1-nyh@scylladb.com>
2021-05-21 08:38:53 +02:00
Nadav Har'El
7d2df8a9bc test/alternator,cql-pytest: fix resource leak on failure
In the alternator and cql-pytest test frameworks, we have some convenient
contextmanager-based functions that allows us to create a temporary
resource (e.g., a table) that will be automatically deleted, for
example:

    with create_stream_test_table(...) as table:
        test_something(table)

However, our implementation of these functions wasn't safe. We had
code looking like:

    table = ...
    yield table
    table.delete()

The thinking was that the cleanup part (the table.delete()) will be
called after the user's code. However, if the user's code threw
(i.e., a failed assertion), the cleanup wasn't called... When the user's
code throws, it looks as if the "yield" throws. So the correct code
should look like:

    table = ...
    try:
        yield table
    finally:
        table.delete()

Python's contextmanager documentation indeed gives this idiom in its
example.

This patch fixes all contextmanager implementations in our tests to do
the cleanup even if the user's "with" block throws.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210428083748.552203-1-nyh@scylladb.com>
2021-04-28 10:51:02 +02:00
Piotr Sarna
0779fa8428 test: add username verification to alternator tracing tests
The test case now additionally checks if the username entry from
found tracing events matches the username used by the test suite.
2021-04-26 11:54:02 +02:00
Nadav Har'El
50f3201ee2 alternator: fix inequality check of two sets
In issue #5021 we noted that Alternator's equality operator needs to be
fixed for the case of comparing two sets, because the equality check needs
to take into account the possibility of different element order.

Unfortunately, we fixed only the equality check operator, but forgot there
is also an inequality operator!

So in this patch we fix the inequality operator, and also add a test for
it that was previously missing.

The implementation of the inequality operator is trivial - it's just the
negation of the equality test. Our pre-existing tests verify that this is
the correct implementation (e.g., if attribute x doesn't exist, then "x = 3"
is false but "x <> 3" is true).

Refs #5021
Fixes #8513

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210419141450.464968-1-nyh@scylladb.com>
2021-04-20 13:14:19 +02:00
Nadav Har'El
dae7528fe5 alternator: fix equality check of nested document containing a set
In issue #5021 we noticed that the equality check in Alternator's condition
expressions needs to handle sets differently - we need to compare the set's
elements ignoring their order. But the implementation we added to fix that
issue was only correct when the entire attribute was a set... In the
general case, an attribute can be a nested document, with only some
inner set. The equality-checking function needs to tranverse this nested
document, and compare the sets inside it as appropriate. This is what
we do in this patch.

This patch also adds a new test comparing equality of a nested document with
some inner sets. This test passes on DynamoDB, failed on Alternator before
this patch, and passes with this patch.

Refs #5021
Fixes #8514

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210419184840.471858-1-nyh@scylladb.com>
2021-04-20 13:14:10 +02:00
Nadav Har'El
46448b0983 alternator: fix equality check of two unset attributes
When a condition expression (ConditionExpression, FilterExpression, etc.)
checks for equality of two item attributes, i.e., "x = y", and when one of
these attributes was missing we correctly returned false.
However, we also need to return false when *both* attributes are missing in
the item, because this is what DynamoDB does in this case. In other words
an unset attribute is never equal to anything - not even to another unset
attribute. This was not happening before this patch:

When x and y were both missing attributes, Alternator incorrectly returned
true for "x = y", and this patch fixes this case. It also fixes "x <> y"
which should to be true when both x and y are unset (but was false
before this patch).

The other comparison operators - <, <=, >, >=, BETWEEN, were all
implemented correctly even before this patch.

This patch also includes tests for all the two-unset-attribute cases of
all the operators listed above. As usual, we check that these tests pass
on both DynamoDB and Alternator to confirm our new behavior is the correct
one - before this patch, two of the new tests failed on Alternator and
passed on DynamoDB.

Fixes #8511

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210419123911.462579-1-nyh@scylladb.com>
2021-04-20 13:14:00 +02:00
Nadav Har'El
989589b570 test/cql-pytest,alternator,redis: avoid an annoying warning
This patch avoids an annoying warning

    Warning: Unknown config ini key: flake8-ignore

when running one of the pytest-based test projects (cql-pytest,
alternator and redis) on recent versions of pytest.

In commit 2022da2405, we added to the
toplevel Scylla directory a "tox.ini" file with some intention to
configure Python syntax checking. One of the configurations in this
tox.ini is:

    [pytest]
    flake8-ignore =
        E501

It turns out that pytest, if a certain test directory does not have its
own pytest.ini file, looks up in ancestor directory for various
configuration files (the configuration file precedence is described in
https://docs.pytest.org/en/stable/customize.html), and this includes
this tox.ini configuration section. Recent versions of pytest complain
about the "flake8-ignore" configuration parameter, which they don't
recognize. This parameter may be ok (?) if you install a flake8 pytest
plugin, but we do not require users to do this for running these tests.

Moreover, whatever noble intentions this commit and its tox.ini had,
nobody ever followed up on it. The three pytest-based test directories
never adhered to flake8's recommended syntax, and never intended to do
so. None of the developers of these tests use flake8, or seem to wish
to do so. If this ever changes, we can change the pytest.ini or undo this
commit and go back to a top-level tox.ini, but I don't see this happening
anytime soon.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210411085708.300851-1-nyh@scylladb.com>
2021-04-12 08:04:06 +02:00
Nadav Har'El
0d0db05cf3 test/alternator: speed up two slow xfailing tests
By far the two slowest Alternator tests when running a development build on
my laptop are
	test_gsi.py::test_gsi_projection_include
and
	test_gsi.py::test_gsi_projection_keys_only
Each of those takes around 3.2, and the sum of just these two tests is as
much as 10% (!) of all other 600 tests.

The reason why these tests are slow is that they check scanning a GSI
with *projection*. Scylla currently ignores the projection, so the scan
returns the wrong value. Because this is a GSI, which supports only
eventually- consistent reads, we need to retry the read - and did it for
up to 3 seconds!

But this retry only makes sense if the GSI read did not *yet* return
the expected data. But in these xfailing test, we read a *wrong* item
(with too many attributes) almost immediately, and this should indicate
an immediate failure - no amount of retry would help. So in this patch
we detect this case and fail the test immediately instead of wasting
3 seconds in retries.

On my laptop with dev build, this patch reduces the time to run the
entire Alternator test suite from 70 seconds to 63 seconds.

Also, now that we never just waste time until the timeout, we can
increase it to any number, and in this patch we increase it from 3
seconds to 5.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317183918.1775383-1-nyh@scylladb.com>
2021-04-06 14:49:15 +02:00
Nadav Har'El
15cab90f7b test/alternator: switch some fixture scopes from "session" to "module"
In conftest.py we have several fixtures creating shared tables which many
test files can share, so they are marked with the "session" scope - all
the tests in the testing session may share the same instance. This is fine.

Some of test files have additional fixtures for creating special tables
needed only in those files. Those were also, unnecessarily, marked
"session" scope as well. This means that these temporary tables are
only deleted at the very end of test suite, event though they can be
deleted at the end of the test file which needed them. This is exactly
what the "module" fixture scope is, so this patch changes all the
fixtures private to one test file to be "module".

After this patch, the teardown of the last test in the suite goes down
from 4 seconds to just 1.5 seconds (it's still long because there are
still plenty of session-scoped fixtures in conftest.py).

Another small benefit is that the peak disk usage of the test suite is
lower, because some of the temporary tables are deleted sooner.

This patch does not change any test functionality, and also does not
make any test faster - it just changes the order of the fixture
teardowns.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317175036.1773774-1-nyh@scylladb.com>
2021-04-06 14:43:36 +02:00
Avi Kivity
8785dd62cb tests: use kernel page cache
Tests are short-lived and use a small amount of data. They
are also often run repeatly, and the data is deleted immediately
after the test. This is a good scenario for using the kernel page
cache, as it can cache read-only data from test to test, and avoid
spilling write data to disk if it is deleted quickly.

Acknowledge this by using the new --kernel-page-cache option for
tests.

This is expected to help on large machines, where the disk can be
overloaded. Smaller machines with NVMe disks probably will not see
a difference.

Closes #8347
2021-03-30 12:04:55 +02:00
Nadav Har'El
0b2cf21932 alternator-test: increase read timeout and avoid retries
By default the boto3 library waits up to 60 second for a response,
and if got no response, it sends the same request again, multiple
times. We already noticed in the past that it retries too many times
thus slowing down failures, so in our test configuration lowered the
number of retries to 3, but the setting of 60-second-timeout plus
3 retries still causes two problems:

  1. When the test machine and the build are extremely slow, and the
     operation is long (usually, CreateTable or DeleteTable involving
     multiple views), the 60 second timeout might not be enough.

  2. If the timeout is reached, boto3 silently retries the same operation.
     This retry may fail because the previous one really succeeded at
     least partially! The symptom is tests which report an error when
     creating a table which already exists, or deleting a table which
     dooesn't exist.

The solution in this patch is first of all to never do retries - if
a query fails on internal server error, or times out, just report this
failure immediately. We don't expect to see transient errors during
local tests, so this is exactly the right behavior.
The second thing we do is to increase the default timeout. If 1 minute
was not enough, let's raise it to 5 minutes. 5 minutes should be enough
for every operation (famous last words...).

Even if 5 minutes is not enough for something, at least we'll now see
the timeout errors instead of some wierd errors caused by retrying an
operation which was already almost done.

Fixes #8135

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210222125630.1325011-1-nyh@scylladb.com>
2021-03-18 18:58:08 +02:00
Nadav Har'El
4a7d3175e9 test/alternator: make another test faster
The slowest test in test_streams.py is test_list_streams_paged. It is meant
to test the ListStreams operation with paging. The existing test repeated
its test four times, for four different stream types. However, there is
no reason to suspect that the ListStreams operation might somehow be
different for the four stream types... We already have other tests which
create streams of the four types, and uses these streams - we don't
need the test for ListStreams to also test creating the four types.

By doing this test just once, not four times, we can save around 1.5
seconds of test time.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210318073755.1784349-1-nyh@scylladb.com>
2021-03-18 11:24:18 +01:00
Nadav Har'El
79af728335 test/alternator: make tracing test a bit faster
In the test test_tracing.py::test_tracing_all, we do some operations and
then need to wait until they appear in the tracing table.
The current code used an exponentially-increasing delay during this wait,
starting with 0.1 seconds and then doubling the delay until we find what
we're looking for.

However, it turns out that the delay until the data appears in the table
is deliberately chosen by Scylla - and is always around 2 seconds.
In this case, an exponential delay is really bad - we will usually wait
for around 1 seconds too long after the needed wait of 2 seconds.

So in this patch we replace the exponential delay by a constant delay -
we wait 0.3 seconds between each retry.

This change makes the test test_tracing.py::test_tracing_all finish
in a little over 2 seconds, instead of a little over 3 seconds
before this patch. We cannot reduce this 2 second time any further
unless we make the 2-second tracing delay configurable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210318000040.1782933-1-nyh@scylladb.com>
2021-03-18 11:24:18 +01:00
Nadav Har'El
4e87f95b42 test/alternator: remove slow and unhelpful test
The test test_table.py::test_table_streams_on creates tables with various
stream types, and then immediately deletes them without testing anything.
This is a slow test (taking almost a full second on my laptop), and is
redundant because in test_streams.py we have tests which create tables
with streams in the same way - but then actually test that things work
with these streams. So this test might as well be removed, and this is
what we do in this patch.

Removing this test shaves another second from the Alternator test suite's
run time.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317230530.1780849-1-nyh@scylladb.com>
2021-03-18 11:24:18 +01:00
Nadav Har'El
879656e3e0 test/alternator: make a test faster, safer and more correct
The test
test_condition_expression.py::test_condition_expression_with_forbidden_rmw
takes half a second to run (dev build, on my laptop), one of the slowest
tests in Alternator's test suite. Part of the reason was that it needlessly
set the same table to forbidden_rmw, multiple times.

Instead of doing that, we switch to using the test_table_s_forbid_rmw
fixture, which is a table like test_table_s but created just once in
forbid_rmw mode.

The result is a faster test (0.05 seconds instead of 0.5 seconds), but
also safer if we ever want to run tests in parallel. It also fixes a
bug in the test: At the end of the test, we intended to double-check
that although the forbid_rmw table forbids read-modify-write operations,
it does allow pure writes. Yet the test did this after clearing the
forbid_rmw mode... So after this patch the test verifies this on the
forbid_rmw table, as intended.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317222703.1779992-1-nyh@scylladb.com>
2021-03-18 11:24:18 +01:00