Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.
So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.
This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.
Refs #16567
Refs #16807
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#17306
If an Alternator table uses tablets (we'll turn this on in a following
patch), some tests are known to fail because of features not yet
supported with tablets, namely:
Refs #16317 - Support Alternator Streams with tablets (CDC)
Refs #16567 - Support Alternator TTL with tablets
This patch changes all tests failing on tablets due to one of these two
known issues to explicitly ask to disable tablets when creating their
test table. This means that at least we continue to test these two
features (Streams and TTL) even if they don't yet work with tablets.
We'll need to remember to remove this override when tablet support
for CDC and Alternator TTL arrives. I left a comment in the right
places in the code with the relevant issue numbers, to remind us what
to change when we fix those issues.
This patch also adds xfail_tablets and skip_tablets fixtures that can
be used to xfail or skip tests when running with tablets - but we
don't use them yet - and may never use them, but since I already wrote
this code it won't hurt having it, just in case. When running without
tablets, or against an older Scylla or on DynamoDB, the tests with
these marks are run normally.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
The Alternator test test_ttl.py::test_ttl_expiration_gsi_lsi was flaky.
The test incorrectly assumes that when we write an already expired item,
it will be visible for a short time until being deleted by the TTL thread.
But this doesn't need to be true - if the test is slow enough, it may go
look or the item after it was already expired!
So we fix this test by splitting it into two parts - in the first part
we write a non-expiring item, and notice it eventually appears in the
GSI, LSI, and base-table. Then we write the same item again, with an
expiration time - and now it should eventually disappear from the GSI,
LSI and base-table.
This patch also fixes a small bug which prevented this test from running
on DynamoDB.
Fixes#14495
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#14496
Some of the tests in test/alternator/test_ttl.py need an expiration scan
pass to complete and expire items. In development builds on developer
machines, this usually takes less than a second (our scanning period is
set to half a second). However, in debug builds on Jenkins each scan
often takes up to 100 (!) seconds (this is the record we've seen so far).
This is why we set the tests' timeout to 120.
But recently we saw another test run failing. I think the problem is
that in some case, we need not one, but *two* scanning passes to
complete before the timeout: It is possible that the test writes an
item right after the current scan passed it, so it doesn't get expired,
and then we a second scan at a random position, possibly making that
item we mention one of the last items to be considered - so in total
we need to wait for two scanning periods, not one, for the item to
expire.
So this patch increases the timeout from 120 seconds to 240 seconds -
more than twice the highest scanning time we ever saw (100 seconds).
Note that this timeout is just a timeout, it's not the typical test
run time: The test can finish much more quickly, as little as one
second, if items expire quickly on a fast build and machine.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12106
Most of the Alternator TTL tests are extremely slow on DynamoDB because
item expiration may be delayed up to 24 hours (!), and in practice for
10 to 30 minutes. Because of this, we marked most of these tests
with the "veryslow" mark, causing them to be skipped by default - unless
pytest is given the "--runveryslow" option.
The result was that the TTL tests were not run in the normal test runs,
which can allow regressions to be introduced (luckily, this hasn't happened).
However, this "veryslow" mark was excessive. Many of the tests are very
slow only on DynamoDB, but aren't very slow on Scylla. In particular,
many of the tests involve waiting for an item to expire, something that
happens after the configurable alternator_ttl_period_in_seconds, which
is just one second in our tests.
So in this patch, we remove the "veryslow" mark from 6 tests of Alternator TTL
tests, and instead use two new fixtures - waits_for_expiration and
veryslow_on_aws - to only skip the test when running on DynamoDB or
when alternator_ttl_period_in_seconds is high - but in our usual test
environment they will not get skipped.
Because 5 of these 6 tests wait for an item to expire, they take one
second each and this patch adds 5 seconds to the Alternator test
runtime. This is unfortunate (it's more than 25% of the total Alternator
test runtime!) but not a disaster, and we plan to reduce this 5 second
time futher in the following patch, but decreasing the TTL scanning
period even further.
This patch also increases the timeout of several of these tests, to 120
seconds from the previous 10 seconds. As mentioned above, normally,
these tests should always finish in alternator_ttl_period_in_seconds
(1 second) with a single scan taking less than 0.2 seconds, but in
extreme cases of debug builds on overloaded test machines, we saw even
60 seconds being passed, so let's increase the maximum. I also needed
to make the sleep time between retries smaller, not a function of the
new (unrealistic) timeout.
4 more tests remain "veryslow" (and won't run by default) because they
are take 5-10 seconds each (e.g., a test which waits to see that an item
does *not* get expired, and a test involving writing a lot of data).
We should reconsider this in the future - to perhaps run these tests in
our normal test runs - but even for now, the 6 extra tests that we
start running are a much better protection against regressions than what
we had until now.
Fixes#11374
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
x
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This reverts commit 8e892426e2 and fixes
the code in a different way:
That commit moved the scylla_inject_error function from
test/alternator/util.py to test/cql-pytest/util.py and renamed
test/alternator/util.py. I found the rename confusing and unnecessary.
Moreover, the moved function isn't even usable today by the test suite
that includes it, cql-pytest, because it lacks the "rest_api" fixture :-)
so test/cql-pytest/util.py wasn't the right place for it anyway.
test/rest_api/rest_util.py could have been a good place for this function,
but there is another complication: Although the Alternator and rest_api
tests both had a "rest_api" fixture, it has a different type, which led
to the code in rest_api which used the moved function to have to jump
through hoops to call it instead of just passing "rest_api".
I think the best solution is to revert the above commit, and duplicate
the short scylla_inject_error() function. The duplication isn't an
exact copy - the test/rest_api/rest_util.py version now accepts the
"rest_api" fixture instead of the URL that the Alternator version used.
In the future we can remove some of this duplication by having some
shared "library" code but we should do it carefully and starting with
agreeing on the basic fixtures like "rest_api" and "cql", without that
it's not useful to share small functions that operate on them.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11275
Move scylla_inject_error from alternator/ to cql-pytest/ so it
can be reached from various tests dirs. alternator/util.py is
renamed to alternator/alternator_util.py to avoid name shadowing.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
On my local machine, a 3 second deadline proved to cause flakiness
of test_ttl_expiration case, because its execution time is just
around 3 seconds.
This patch addresse the problem by bumping the local timeout to 10
(and 15 for test_ttl_expiration_long, since it's dangerously near
the 10 second deadline on my machine as well).
Moreover, some test cases short-circuited once they detected that
all needed items expired, but other ones lacked it and always used
their full time slot. Since 10 seconds is a little too long for
a single test case, even one marked with --veryslow, this patch
also adds a couple of other short-circuits.
One exception is test_ttl_expiration_hash_wrong_type, which actually
depends on the fact that we should wait for the whole loop to finish.
Since this case was never flaky for me with the 3 second timeout,
it's left as is.
Theoretically, test_ttl_expiration also kind of depends on checking
the condition more than once (because the TTL of one of the values
is bumped on each iteration), but empirical evidence shows that
multiple iterations always occur in this test case anyway - for
me, it always spinned at least 3 times.
Tests: unit(release)
Message-Id: <a0a479929dac37daace744e0a970567a8aa3b518.1638431933.git.sarna@scylladb.com>
The expiration-time attribute used by Alternator's TTL feature has a
numeric type, meaning that it may be a floating point number - not just
an integer, and implemented as big_decimal which has a separate integer
mantissa and exponent. Our code which checked expiration incorrectly
looked only at the mantissa - resulting in incorrect handling of
expiration times which have a fractional part - 123.4 was treated as
1234 instead of 123.
This patch fixes the big_decimal handling in the expiration checking,
and also adds to the test test_ttl.py::test_ttl_expiration check also
for non-integer floating point as well as one with an exponent. The
new tests pass on DynamoDB, and failed on Alternator before this patch -
and pass with it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The existing TTL tests use only tiny tables, so don't exercise the
expiration-time scanner's use of paging. So in this patch we add
another test with a much larger table (with 40,000 items).
To verify that this test indeed checks paging, I stopped the scanner's
iteration after one page, and saw that this test starts failing (but
the smaller tests all pass).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Most tests in test_ttl.py now pass, so remove their "xfail" tag.
The only remaining failing test is test_ttl_expiration_streams -
which cannot yet pass because the expiration event is not yet marked.
Note that the fact that almost all tests for Alternator's TTL feature
now pass does not mean the feature is complete. The current implementation
is very partial and inefficient, and only works reasonably in tests on
a single node. The current tests cannot expose these problems, so
we will need to develop additional tests for them. The tests will of
course remain useful to see that as the implementation continues to
improve, none of the tests that already work will break.
The Alternator TTL continues to remain "experimental", cannot be used
without explicitly enabling this experimental feature, and must not be
used for any important deployment.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The tests for the TTL feature in test/alternator/test_ttl.py takes huge
amount of time on DynamoDB - 10 to 30 minutes (!) - because it delays
expiration of items a long time after their intended expiration times.
We intend Scylla's implementation to have a configurable delay for the
expiration scanner, which we will be able to configure to very short
delays for tests. So These tests can be made much faster on Scylla.
So in this patch we change all of the tests to finish much more quickly
on Scylla.
Many of the tests still fail, because the TTL feature is not implemented
yet.
Although after this change all the tests in test_ttl.py complete in
a reasonable amount of time (around 3 seconds each), we still mark
them as "veryslow" and the "--runveryslow" flag is needed to run them.
We should consider changing this in the future, so that these tests will
run as part of our default test suite.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Although it isn't terribly useful, an Alternator user can enable TTL
with an expiration-time attribute set to a *key* attribute. Because
expiration times should be numeric - not other types like strings -
DynamoDB could warn the user when a chosen key attribute hs a non-
numeric type (since key attributes do have fixed types!). But DynamoDB
doesn't warn about this - it simply expires nothing. This test
verifies this that it indeed does this.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The basic test for TTL expiration, test_ttl.py::test_ttl_expiration,
uses a table with only a partition key. Most of the item expiration
logic is exactly the same for tables that also have a sort key, but
the step of *deleting* the item is different, so let's add a test
that verifies that also in this case, the expired item is properly
deleted.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch improves test_ttl.py::test_ttl_expiration in two ways:
First, it checks yet another case - that items that have the wrong type
for the expiration-time column (e.g., a string) never get expired - even
if that string happens to contain a number that looks like an expiration
time.
Second, instead of the huge 15-minute duration for this test, the
test now has a configurable duration; We still need to use a very long
duration on AWS, but in Scylla we expect to be able to configure the
TTL scan frequency, and can finish this test in just a few seconds!
We already have experimental code which makes this test pass in just
3 seconds.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Ever since we started testing Alternator with tests written in Python
and using Amazon's "boto3" library, one limitation kept annoying us:
Boto3 verifies the validity of the request parameters before passing
them on to the server. It verifies that mandatory parameters are not
missing, that parameters have the right types, and sometimes even the
right ranges - all in the library before ever sending the request.
This meant that in many cases, we couldn't get good test coverage for
Alternator's server-side handling of *wrong* parameters.
As it turns out, it is trivial to tell boto3 to *not* do its client-side
request validation, with the `parameter_validation=False` config flag.
We just never noticed that such a flag existed :-)
So this patch adds this flag. It then fixes a few tests which expected
ParameterValidationError - this error is the client-side validation
failure, but should now be replaced by checking the server-side error.
The patch also adds a couple of invalid parameter checks that we
couldn't do before because of boto3's eagerness to check them on the
client side. We can add a lot more of these error tests in the future,
now that we got rid of client-side validation.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211005095514.537226-1-nyh@scylladb.com>
We have a utility function test_table_name() to create a unique name for
a test table. The funny thing is, that because this function starts with
the string "test_", pytest believes it's a test. This doesn't cause any
problems (it's consider a *passing* test), but it's nevertheless strange
to see it listed on the list of tests.
So in this page, we trivially rename this function to unique_table_name(),
a name why pytest doesn't think is the name of test.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds stubs for the UpdateTimeToLive and DescribeTimeToLive
operations to Alternator. These operations can enable, disable, or inquire
about, the chosen expiration-time attribute.
Currently, the information about the chosen attribute is only saved, with
no actual expiration of any items taking place.
Some of the tests for the TTL feature start to pass, so their xfail tag
is removed.
Because this this new feature is incomplete, it is not enabled unless
the "alternator-ttl" experimental feature is enabled. Moreover, for
these operations to be allowed, the entire cluster needs to support
this experimental feature, because all nodes need to participate in the
data expiration - if some old nodes don't support Alternator TTL, some
of the data they hold won't get expired... So we don't allow enabling
TTL until all the nodes in the cluster support this feature.
The implementation is in a new source file, alternator/ttl.cc. This
source file will continue to grow as we implement the expiration feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Usually the TTL feature's expiration-time attribute is a schema-less
attribute, implemented in Alternator as a JSON-serialized item in a
bigger map column. However, key attributes are a special case because
they are implemented as separate columns. We already had test cases
showing that this case works too - for the case of hash and range keys.
In this test we test another possibility of an attribute that is
implemented as a schema column - the case of an LSI key.
As the other TTL tests, this test too passes on DynamoDB but xfails on
Alternator because the TTL feature is not yet implemented.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds a comprehensive test suite for the DynamoDB API's TTL
(item expiration) feature.
The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice, and how item expiration interacts with other
features such as GSI, LSI and DynamoDB Streams.
Because DynamoDB has extremely long delays until items are expired, or
until expiration configuration may be changed, several of these tests
take up to 30 minutes to complete. We mark these tests with the
"verylong" marker, so they are skipped in ordinary test runs - use the
"--runverylong" option to run them.
All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.
Refs #5060
Signed-off-by: Nadav Har'El <nyh@scylladb.com>