Commit Graph

420 Commits

Author SHA1 Message Date
Nadav Har'El
7ccf77b84f test/alternator: another test for UpdateExpression's SET
I found on StackOverflow an interesting discussion about the fact that
DynamoDB's UpdateExpression documentation "recommends" to use SET
instead of ADD, and the rather convoluted expression that is actually
needed to emulate ADD using SET:
```
SET #count = if_not_exists(#count, :zero) + :one
```

https://stackoverflow.com/questions/14077414/dynamodb-increment-a-key-value

Although we do have separate tests for the different pieces of that
idiom - a SET with missing attribute or item, the if_not_exists()
function, etc. - I thought it would be nice to have a dedicated test
that verifies that this idiom actually works, and moreover that the more
naive "SET #count = #count + :one" does NOT work if the item or the
attribute are missing.

Unsurprisingly, the new test passes on both Alternator and DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23963
2025-05-07 13:57:50 +03:00
Nadav Har'El
b4a9fe9928 test/alternator: another test for expression with a lot of ORs
We already have a test, test_limits.py::test_deeply_nested_expression_2,
which checks that in the long condition expression

        a<b or (a<b or (a<b or (a<b or (....))))

with more than MAX_DEPTH (=400) repeats is rejected by Alternator,
as part of commit 04e5082d52 which
restricted the depth of the recursive parser to prevent crashing Scylla.

However, I got curious what will happen without the parentheses:

        a<b or a<b or a<b or a<b or ...

It turns out that our parser actually parses this syntax without
recursion - it's just a loop (a "*" in the Antlr alternator/expressions.g
allows reading more and more ORs in a loop). So Alternator doesn't limit
the length of this expression more than the length limit of 4096 bytes
which we also have. We can fit 584 repeats in the above expression in
4096 bytes, and it will not be rejected even though 584 > 400.
This test confirms that this is indeed the case.

The test is Scylla-only because on DynamoDB, this expression is rejected
because it has more than 300 "OR" operators. Scylla doesn't have this
specific limit - we believe the other limitations (on total expression
length, and on depth) are better for protecting Scylla. Remember that
in an expression like "(((((((((((((" there is a very high recursion
depth of the parser but zero operators, so counting the operators does
nothing to protect Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23973
2025-05-07 13:57:18 +03:00
Nadav Har'El
252c5b5c9d Merge 'Alternator batch_write_item wcu' from Amnon Heiman
This series adds support for WCU tracking in batch_write_item and tests it.

The patches include:

Switch the metrics (RCU and WCU) to count units vs half-units as they were, to make the metrics clearer for users.

Adding a public static get_half_units function to wcu_consumed_capacity_counter for use by batch write item, which cannot directly use the counter object.

Adding WCU calculation support to batch_write_item, based on item size for puts and a fixed 1 WCU for deletes. WCU metrics are updated, and consumed capacity is returned per table when requested.

The return handling was refactored to be coroutine-like for easier management of the consumed capacity array.

Adding tests that validate WCU calculation for batch put requests on a single table and across multiple tables, ensuring delete operations are counted correctly.

Adding a test that validates that WCU metrics are updated correctly during batch write item operations, ensuring the WCU of each item is calculated independently.

**Need backport, WCU is partially supported, and is missing from batch_write_item**

Fixes #23940

Closes scylladb/scylladb#23941

* github.com:scylladb/scylladb:
  alternator/test_metrics.py: batch_write validate WCU
  alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU
  alternator/executor: add WCU for batch_write_items
  alternator/consumed_capacity: make wcu get_units public
  Alternator: Change the WCU/RCU to use units
2025-05-06 13:31:53 +03:00
Amnon Heiman
2ab99d7a07 alternator/test_metrics.py: batch_write validate WCU
This patch adds a test that verifies the WCU metrics are updated
correctly during a batch_write_item operation.
It ensures that the WCU of each item is calculated independently.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-05-05 13:20:24 +03:00
Amnon Heiman
14570f1bb5 alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU
This patch adds two tests:
A test that validates WCU calculation for batch put requests on a single table.

A test that validates WCU calculation for batch requests across multiple
tables, including ensuring that delete operations are counted as 1 WCU.

Both tests verify that the consumed capacity is reported correctly
according to the WCU rules.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-05-05 13:20:23 +03:00
Amnon Heiman
5ae11746fa Alternator: Change the WCU/RCU to use units
This patch changes the RCU/WCU Alternator metrics to use whole units
instead of half units. The change includes the following:

Change the metrics documentation. Keep the RCU counter internally in
half units, but return the actual (whole unit) value.
Change the RCU name to be rcu_half_units_total to indicates that it
counts half units.
Change the WCU to count in whole units instead of half units.

Update the tests accordingly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-05-05 13:18:09 +03:00
Nadav Har'El
834107ae97 test/cqlpy,alternator: fix reporting of Scylla crash during test
The cqlpy and alternator test frameworks use a single Scylla node started
once for all tests to run on. In the distant past, we had a problem where
if one test caused Scylla to crash, the result was a confusing report of
hundreds of failed tests - all tests after the crash "failed" and it wasn't
easy to find which test really caused the crash.

Our old solution to this problem was to have an autouse fixture (called
cql_test_connection or dynamodb_test_connection) which tested the
connection at the end of each test, and if it detected Scylla has
crashed - it used pytest.exit() to report the error and have pytest
exit and therefore stop running any further tests (which would have
led to all of them testing).

This approach had two problems:

1. The pytest.exit() caused the entire cqlpy suite to report a failure,
   but but not the individual test - the individual test might have
   failed as well, but that isn't guaranteed and in any case this test's
   output is missing the informative message that Scylla crashed during
   the test. This was fine when for each cqlpy failure we had two separate
   error logs in Jenkins - the specific failed function, and the failed
   file - but when we recently got rid of the suplication by removing the
   second one, we no longer see the "Scylla crashed" messages any more.

2. Exiting pytest will be the wrong thing to do if the same pytest
   run could run tests from different test suites. We don't do this
   today, but we plan to support this approach soon.

This patch fixes both problems by replacing the pytest.exit() call by
setting a "scylla_crashed" flag and using pytest.fail(). The pytest.fail()
causes the current test - the one which caused Scylla to crash - to be
reported as an "ERROR" and the "Scylla crashed" message will correctly
appear in this test's log. The flag will cause all other tests in the
same test suite to be skip()ed. But other tests in other directories,
depending on different fixtures, might continue to run normally.

Fixes #23287

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23307
2025-05-05 10:15:56 +03:00
Piotr Szymaniak
e588c8667f alternator: Limit attribute name lengths
Attribute names are now checked against DynamoDB-compatible length
limits. When exceeded, Alternator emits exception identical or similar
to the DDB one. It might be worth noting that DDB emits more than a
single kind of an exception string for some exceptions. The tests'
catch clauses handle all the observed kinds of messages from DynamoDB.
The validation differentiates between key and non-key attributes and
applies the limit accordingly.

AWS DDB raises exceptions with somewhat different contents when the
get request contains ProjectionExpression, so this case needed separate
treatment to emit the corresponding exception string. The
length-validating function was declared and defined in
expressions.hh/.cc respectively, because that's where the relevant
parsing happens.

** Tests

The following tests were validated when handling this issue:
test_limit_attribute_length_nonkey_good,
test_limit_attribute_length_nonkey_bad,
test_limit_attribute_length_key_good,
test_limit_attribute_length_key_bad,
test_limit_attribute_length_gsi_lsi_good,
test_limit_attribute_length_gsi_lsi_bad,
test_limit_attribute_length_gsi_lsi_projection_bad.

Some of the tests were expanded into being more granular. Namely, there
is a new test function
`test_limit_attribute_length_key_bad_incoherent_names`
which groups tests with too long attribute names in the case of
incorrect (incoherent) user requests.
Similarily, there is a new test function
`test_limit_attribute_length_gsi_lsi_bad_incoherent_names`
All the tests cover now each combination of the key/keys being too long.
Both the new fuctions contain tests that verify that ScyllaDB throws
length-related exceptions (instead of the coherency-related), similar
to what DynamoDB does.

The new test test_limit_gsiu_key_len_bad covers the case of too long
attribute name inside GlobalSecondaryIndexUpdates.
The new test test_limit_gsiu_key_len_bad_incoherent_names covers the
case of incorrect (incoherent) user requests containing too long
attribute names and GlobalSecondaryIndexUpdates.

test_limit_attribute_length_key_bad was found to have contaned an
illegal KeySchema structure.

Some of the tests were corrected their match clause.

All the tests are stripped of the xfail flag except
test_limit_attribute_length_key_bad, which has it changed since it
still fails due to Projection in GSI and LIS not implemented in Alternator.
The xfail now points to #5036.

Fixes scylladb/scylladb#9169

Closes scylladb/scylladb#23097
2025-04-27 18:39:20 +03:00
Amnon Heiman
3acde5f904 test_returnconsumedcapacity.py: test RCU for batch get item
This patch adds tests for consumed capacity in batch get item.  It tests
both the simple case and the multi-item, multi-table case that combines
consistent and non-consistent reads.
2025-04-16 17:05:32 +03:00
Nadav Har'El
258213f73b Merge 'Alternator batch count histograms' from Amnon Heiman
This series adds a histogram for get and write batch sizes.
It uses the estimated_histogram implementation which starts from 1 with 1.2 exponential factor, which works
extremely tight to 20 but still covers all the way to 100.

Histograms will be reported per node.

**Backport to 2025.1 so we'll have information about user batch size limitation**

Closes scylladb/scylladb#23379

* github.com:scylladb/scylladb:
  alternator: Add tests for the batch items histograms
  alternator: Add histogram for batch item count
2025-04-09 22:41:14 +03:00
Nadav Har'El
84fd52315f alternator: in GetRecords, enforce Limit to be <= 1000
Alternator Streams' "GetRecords" operation has a "Limit" parameter on
how many records to return. The DynamoDB documentations says that the
upper limit on this Limit parameter is 1000 - but Alternator didn't
enforce this. In this patch we begin enforcing this highest Limit, and
also add a test for verifying this enforcement. As usual, the new test
passes on DynamoDB, and after this patch - also on Alternator.

The reason why it's useful to have *some* upper limit on Limit is that
the existing executor::get_records() implementation does not really have
preemption points in all the necessary places. In particular, we have a
loop on all returned records without preemption points. We also store
the returned records in a RapidJson vector, which requires a contiguous
allocation.

Even before this patch, GetRecords had a hard limit of 1 MB of results.
But still, in some cases 1 MB of results may be a lot of results, and we
can see stalls in the aforementioned places being O(number of results).

Fixes #23534

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23547
2025-04-07 12:52:03 +03:00
Amnon Heiman
b55f24c14d alternator: Add tests for the batch items histograms
This patch adds a test for the batch‑items histogram for both get and
write operations.

It update the check_increases_metric_exact helper function so that it
would get a list of expected value and labels (labels can be None).
This makes it easy to test multiple buckets in a histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-04-06 18:22:23 +03:00
Nadav Har'El
431de48df9 test/alternator: test for item with many attributes
A user complained that he couldn't read or write an item with more than
16 attributes (!) in Alternator. This isn't true, but I realized that we
don't have a simple test for this case - all test use just a few attributes.
So let's add such a test, doing PutItem, UpdateItem and GetItem with 400
attributes. Unsurprisingly, the test passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23568
2025-04-03 22:35:49 +03:00
Nadav Har'El
a9a6f9eecc test/alternator: increase timeout in Alternator RBAC test
On our testing infrastructure, tests often run a hundred times (!)
slower than usual, for various reasons that we can't always avoid.
This is why all our test frameworks drastically increase the default
timeouts.

We forgot to increase the timeout in one place - where Alternator tests
use CQL. This is needed for the Alternator role-based access control
(RBAC) tests, which is configured via CQL and therefore the Alternator
test unusually uses CQL.

So in this patch we increase the timeout of CQL driver used by
Alternator tests to the same high timeouts (60-120 seconds) used by
the regular CQL tests. As the famous saying goes, these timeouts should
be enough for anyone.

Fixes #23569.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23578
2025-04-03 22:31:08 +03:00
Botond Dénes
fcdae20fd1 Merge 'Add tablet enforcing option' from Benny Halevy
This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing
`enable_tablets` option. It can be set to the following values:
    disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option
    enabled:  New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option
    enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option

`tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether
tablets are disabled or enabled by default for new keyspaces, respectively.
In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}`
keyspace option, when the keyspace is created.

`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces,
like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`

Refs scylladb/scylla-enterprise#4355

* Requires backport to 2025.1

Closes scylladb/scylladb#22273

* github.com:scylladb/scylladb:
  boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy
  tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option
  db/config: add tablets_mode_for_new_keyspaces option
2025-04-03 16:32:19 +03:00
Radosław Cybulski
c36614e16d alternator: add size check to BatchItemWrite
Add a size check for BatchItemWrite command - if the item count is
bigger than configuration value `alternator_maximum_batch_write_size`,
an error will be raised and no modification will happen.

This is done to synchronize with DynamoDB, where maximum size of
BatchItemWrite is 25. To avoid complaints from clients, who use
our feature of BatchWriteItem being limitless we set default value
to 100.

Fixes #5057

Closes scylladb/scylladb#23232
2025-04-02 14:48:00 +03:00
Benny Halevy
c62865df90 db/config: add tablets_mode_for_new_keyspaces option
The new option deprecates the existing `enable_tablets` option.
It will be extended in the next patch with a 3rd value: "enforced"
while will enable tablets by default for new keyspace but
without the posibility to opt out using the `tablets = {'enabled':
false}` keyspace schema option.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-24 14:54:45 +02:00
Nadav Har'El
317de64281 test/alternator: enable debugging output during Python crashes
For a long time now, we've been seeing (see #17564), once in a while,
Alternator tests crashing with the Python process getting killed on
SIGSEGV after the tests have already finished successfully and all
pytest had to do is exit. We have not been able to figure out where the
bug is. Unfortunately, we've never been able to reproduce this bug
locally - and only rarely we see it in CI runs, and when it happens
we don't any information on why it happend.

So the goal of this patch is to print more information that might
hopefully help us next time we see this problem in CI (this patch
does NOT fix the bug). This patch adds to test/alternator's conftest.py
a call to faulthandler.enable(). This traps SIGSEGV and prints a stack
trace (for each thread, if there are several) showing what Python was
trying to do while it is crashing. Hopefully we'll see in this output
some specific cleanup function belonging to boto3 or urllib or whatever,
and be able to figure out where the bug is and how to avoid it.

We could have added this faulthandler.enable() call to the top-level
conftest.py or to test.py, but since we only ever had this Python
crash in Alternator tests, I think it is more suitable that we limit
this desperate debugging attempt only to Alternator tests.

Refs #17564

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23340
2025-03-19 18:18:51 +03:00
Nadav Har'El
c0821842de alternator: document the state of tablet support in Alternator
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes #21629

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22462
2025-03-14 14:03:15 +03:00
Piotr Szymaniak
f887466c3f alternator: Clean error handling on CreateTable without AttributeDefinitions
If user fails to supply the AttributeDefinitions parameter when creating
a table, Scylla used to fail on RAPIDJSON_ASSERT. Now it calls a polite
exception, which is fully in-line with what DynamoDB does.

The commit supplies also a new, relevant test routine.

Fixes #23043

Closes scylladb/scylladb#23041
2025-02-26 14:24:57 +02:00
Piotr Szymaniak
c1f186c98a alternator: re-enabling/changing existing stream's StreamViewType as well as disabling the nonexistent stream
Table updates that try to enable stream (while changing or not the
StreamViewType) on a table that already has the stream enabled
will result in ValidationError.

Table updates that try to disable stream on a table that does not
have the stream enabled will result in ValidationError.

Add two tests to verify the above.

Mark the test for changing the existing stream's StreamViewType
not to xfail.

Fixes scylladb/scylladb#6939

Closes scylladb/scylladb#22827
2025-02-16 09:57:49 +02:00
Nadav Har'El
cae8a7222e alternator: fix view build on oversized GSI key attribute
Before this patch, the regular_column_transformation constructor, which
we used in Alternator GSIs to generates a view key from a regular-column
cell, accepted a cell of any size. As a reviewer (Avi) noticed, very
long cells are possible, well beyond what Scylla allows for keys (64KB),
and because regular_column_transformation stores such values in a
contiguous "bytes" object it can cause stalls.

But allowing oversized attributes creates an even more accute problem:
While view building (backfilling in DynamoDB jargon), if we encounter
an oversized (>64KB) key, the view building step will fail and the
entire view building will hang forever.

This patch fixes both problems by adding to regular_column_transformation's
constructor the check that if the cell is 64KB or larger, an empty value
is returned for the key. This causes the backfilling to silently skip
this item, which is what we expect to happen (backfilling cannot do
anything to fix or reject the pre-existing items in the best table).

A test test_gsi_updatetable.py::test_gsi_backfill_oversized_key is
introduced to reproduce this problem and its fix. The test adds a 65KB
attribute to a base table, and then adds GSIs to this table with this
attribute as its partition key or its sort key. Before this patch, the
backfilling process for the new GSIs hangs, and never completes.
After this patch, the backfilling completes and as expected contains
other base-table items but not the item with the oversized attribute.
The new test also passes on DynamoDB.

However, while implementing this fix I realized that issue #10347 also
exists for GSIs. Issue #10347 is about the fact that DynamoDB limits
partition key and sort key attributes to 2048 and 1024 bytes,
respectively. In the fix described above we only handled the accute case
of lengths above 64 KB, but we should actually skip items whose GSI
keys are over 2048 or 1024 bytes - not 64KB. This extra checking is
not handled in this patch, and is part of a wider existing issue:
Refs #10347

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:50 +01:00
Nadav Har'El
67d2ea4c4b test/alternator: unflake test for IndexStatus
The test for IndexStatus verifies that on a newly created table and GSI,
the IndexStatus is "ACTIVE". However, in Alternator, this doesn't strictly
need to happen *immediately* - view building, even for an empty table -
can take a short while in debug mode. This make the test test
test_gsi_describe_indexstatus flaky in debug mode.

The fix is to wait for the GSI to become active with wait_for_gsi()
before checking it is active. This is sort of silly and redundant,
but the important point that if the IndexStatus is incorrect this test
will fail, it doesn't really matter whether the wait_for_gsi() or
the DescribeTable assertion is what fails.

Now that wait_for_gsi() is used in two test files, this patch moves it
(and its friend, wait_for_gsi_gone()) to util.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
4ba17387e6 test/alternator: work around unrelated bug causing test flakiness
The alternator test test_gsi_updatetable.py::test_gsi_delete_with_lsi
Creates a GSI together with a table, and then deletes it. We have a
bug unrelated to the purpose of this test - #9059 - that causes view
building to sometimes crash Scylla if the view is deleted while the
view build is starting. We see specifically in debug builds that even
view building of an *empty* table might not finish before the test
deletes the view - so this bug happens.

Work around that bug by waiting for the GSI to build after creating
the table with the GSI. This shouldn't be necessary (in DynamoDB,
a GSI created with the table always begins ready with the table),
but doesn't hurt either.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
ac648950f1 test/alternator: remove xfail from all tests for issue 11567
The previous patches fully implemented issue 11567 - supporting
UpdateTable to add or delet a GSI on an existing Alternator table.
All 14 tests that were marked xfail because of this issue now pass,
so this patch removes their xfail. There are no more xfailing tests
referring to this issue.

These 14 tests, most of them in test/alternator/test_gsi_updatetable.py,
cover all aspects of this feature, including adding a GSI, deleting a
GSI, interactions between GSI and LSI, RBAC when adding or deleting a GSI,
data type limitation on an attribute that becomes a GSI key or stops
being one, GSI backfill, DescribeTable and backfill, various error
conditions, and more.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
cea7aacc52 alternator: add IndexStatus/Backfilling in DescribeTable
This patch adds the missing IndexStatus and Backfilling fields for the
GSIs listed by a DescribeTable request. These fields allow an application
to check whether a GSI has been fully built (IndexStatus=ACTIVE) or
currently being built (IndexStatus=CREATING, Backfilling=true).

This feature is necessary when a GSI can be added to an existing table
so its backfilling might take time - and the application might want to
wait for it.

One test - test_gsi.py::test_gsi_describe_indexstatus - begins to pass
with this fix, so the xfail tag is removed from it.

Fixes #11471.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
bfdd805f15 test/alternator: fix running against installation blocking CQL
One of the design goals of the Alternator test suite (test/alternator)
is that developers should be able to run the tests against some already
running installation by running `cd test/alternator; pytest [--url ...]`.

Some of our presentations and documents recommend running Alternator
via docker as:

    docker run --name scylla -d -p 8000:8000 scylladb/scylla:latest
         --alternator-port=8000 --alternator-write-isolation=always

This only makes port 8000 available to the host - the CQL port is
blocked. We had a bug in conftest.py's get_valid_alternator_role()
which caused it to fail (and fail every single test) when CQL is
not available. What we really want is that when CQL is not available
and we can't figure out a correct secret key to connect to Alternator,
we just try a connect with a fake key - and hope that the option
alternator-enforce-authorization is turned off. In fact, this is what
the code comments claim was already happening - but we failed to
handle the case that CQL is not available at all.

After this patch, one can run Alternator with the above docker
command, and then run tests against it.

By the way, this provides another way for running any old release of
Scylla and running Alternator tests against it. We already supported
a similar feature via test/alternator/run's "--release" option, but
its implementation doesn't use docker.

Fixes #22591

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22592
2025-02-05 19:01:31 +03:00
Nadav Har'El
698a63e14b test/alternator: test for invalid B value in UpdateItem
This patch adds an Alternator test for the case of UpdateItem attempting
to insert in invalid B (bytes) value into an item. Values of type B
use base64 encoding, and an attempt to insert a value which isn't
valid base64 should be rejected, and this is what this test verifies.

The new tests reproduce issue #17539, which claimed we have a bug in
this area. However, test/alternator/run with the "--release" option
shows that this bug existed in Scylla 5.2, but but fixed long ago, in
5.3 and doesn't exist in master. But we never had a regression test this
issue, so now we do.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22029
2025-01-30 11:33:03 +02:00
Nadav Har'El
98a8ae0552 test/alternator: functional tests for Alternator multi-item transactions
This patch adds extensive functional tests for the DynamoDB multi-item
transactions feature - the TransactWriteItems and TransactGetItems
requests. We add 43 test functions, spanning more than 1000 lines of code,
covering the different parameters and corner cases of these requests.

Because we don't support the transaction feature in Alternator yet (this
is issue #5064), all of these tests fail on Alternator but all of them
were tested to pass on DynamoDB. So all new tests are marked "xfail".

These tests will be handy for whoever will implement this feature as
an acceptance test, and can also be useful for whoever will just want to
understand this feature better - the tests are short and simple and
heavily commented.

Note that these tests only check the correct functionality of individual
calls of these requests - these tests cannot and do not check the
consistency or isolation guarantees of concurrent invocations of
several requests. Such tests would require a different test framework,
such as the one requested in issue #6350, and are therefore not part of
this patch.

Note that this patch includes ONLY tests, and does not mean that an
implementation of the feature will soon follow. In fact, nobody is
currently working on implementing this feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22239
2025-01-30 11:22:05 +02:00
Nadav Har'El
955ac1b7b7 test/alternator: close boto3 client before shutting down
For several years now, we have seen a strange, and very rare, flakiness
in Alternator tests described in issue #17564: We see all the test pass,
pytest declares them to have passed, and while Python is existing, it
crashes with a signal 11 (SIGSEGV). Because this happens exclusively in
test/alternator and never in the test/cqlpy, we suspect that something
that the test/alternator leaves behind but test/cqlpy does not, causes
some race and crashes during shutdown.

The immediate suspect is the boto3 library, or rather, the urllib3 library
which it uses. This is more-or-less the only thing that test/alternator
does which test/cqlpy doesn't. The urllib3 library keeps around pools of
reusable connections, and it's possible (although I don't actually have any
proof for it) that these open connections may cause a crash during shutdown.

So in this patch I add to the "dynamodb" and "dynamodbstreams" fixtures
(which all Alternator tests use to connect to the server), a teardown which
calls close() for the boto3 client object. This close() call percolates
down to calling clear() on urllib3's PoolManager. Hopefully, this will
make some difference in the chance to crash during shutdown - and if it
doesn't, it won't hurt.

Refs #17564

Closes scylladb/scylladb#22341
2025-01-16 19:21:00 -05:00
Nadav Har'El
321d0fd3b1 Merge 'Alternator: Add WCU suppport for update item' from Amnon Heiman
This series adds WCU support for the Alternator update item.
This motivation behind it, is to have a rough estimation of what a similar operation would have taken from WCU perspective if used with DynamoDB.

The calculation is done while minimal overhead is the prime objective, the results are values that is less or equal to what it would have been in DynamoDB

** New feature, no need to backport. **

Closes scylladb/scylladb#21999

* github.com:scylladb/scylladb:
  alternator/test_returnconsumedcapacity.py: update item
  alternator/executor.cc: Add WCU for update_item
2025-01-13 14:35:46 +02:00
Amnon Heiman
7390116620 alternator/test_returnconsumedcapacity.py: update item
This patch adds tests for return consumed capacity for update_item.

The tests cover: a simple update for a small object, a missing item, an
update with a very large attribute (where the attribute itself is more
than 1KB), and an update of a big item that uses read-before-write.
2025-01-06 09:55:17 +02:00
Nadav Har'El
e919794db8 test/alternator: fix mistakes introduced with test_service_levels.py
This patch undoes multiple mistakes done when introducing the test
for service levels in pull request #22031:

1. The PR introduced in test/alternator/run and test/alternator/suite.yaml
   a permanent role and service level that the service-level test is
   supposed to use. This was a mistake - the test can create the service
   level for its own use, using CQL, it does not need to assume such a
   service level already exists.
   It's important to fix this to allow the service level test to run
   against an installation of Scylla not set up by our own scripts.
   Moreover, while the code in suite.yaml was correct, the code in
   "run" was incorrect (used an outdated keyspace name). This patch
   removes that incorrect code.

2. The PR introduced a duplicate "cql" fixture, copied verbatim
   from test_cql_rbac.py (including a comment that was correct only
   in the latter file :-)). Let's de-duplicate it, using the fixture
   that I moved to conftest.py in the previous patch.

3. The PR used temporary_grant(). This needelessly complicated the test
   and added even more duplicate code, and this patch removes all that
   stuff. This test is about service levels, not RBAC and "grant".
   This test should just use a superuser role that has the permissions
   to do everything, and don't need to be granted specific permissions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-01-05 19:40:14 +02:00
Nadav Har'El
879c0a3bd6 test/alternator: move "cql" fixture to test/alternator/conftest.py
Most Alternator test use only the DynamoDB API, not CQL. Tests in
test_cql_rbac.py did need CQL to set up roles and RBAC, so this file
introduced a "cql" fixture to make CQL requests.

A recently-introduced test/alternator/test_service_levels.py also
needs access to CQL - it currently uses it for misguided reasons but
the next patch will need it for creating a role and a service level.
So instead of duplicating this fixture, let's move this fixture into
test/alternator/conftest.py that all Alternator tests can share.

The next patch will clean up this duplication in test_service_levels.py
and the other mistakes it introduced.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-01-05 19:33:55 +02:00
Piotr Dulikowski
b23bc3a5d5 alternator: execute under scheduling group for service level
Now, the Alternator API requests are executed under the correct
scheduling group of the service level assigned to the currently logged
in user.
2025-01-02 07:13:34 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Nadav Har'El
d9af154772 test/alternator: more tests for UpdateTable create and delete GSI
We already have in test_gsi_updatetable.py several functional tests for
the Alternator feature of adding or deleting a GSI on an existing table,
through the UpdateTable operation.
This patch adds many more tests for various corner cases of this feature -
tests developed in parallel with actually implementing that feature.

All test in test_gsi_updatetable.py pass on Amazon DynamoDB but currently
xfail on Alternator, due to the following issues:

 * #11567: Alternator: allow adding a GSI to a pre-existing table
 * #9424: Alternator GSIs should exclude items with empty-string key components

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
5c7b8c8e4d test/alternator: make UpdateTable tests wait less
The UpdateTable tests for creating and deleting a GSI need to wait for
the asynchronous operation of the view's building and deletion, using
two utility functions wait_for_gsi() and wait_for_gsi_gone().

Because I originally wrote these tests for DynamoDB and its extremely
high latency for these operations, these functions waited a whole second
before checking for the end of the wait. This whole-second sleep is
absurd in Alternator where building a small view takes just a fraction of
a second. So let's lower the sleep time from 1 second to 0.1 seconds,
and allow these tests to pass much faster on Alternator (once this
feature is implemented in Alternator, of course - until then all these
tests still fail immediately on an unimplemented operation).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
b1bd5cdf0f test/alternator: move UpdateTable tests to a separate file
The source file test/alternator/test_gsi.py has already grown very
large, so this patch moves all the existing tests related to using
UpdateTable to add or delete a GSIs to a separate file:
test_gsi_updatetable.py.

We just move tests here - no new tests or functional changes to the
tests - but did use the opportunity for some small improvements in
the comments.

In the next patch we'll add more tests to this new file.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
cc308bd0cc test/alternator: add another test for elaborate GSI updates
We have a test, test/alternator/test_gsi.py::test_update_gsi_pk which
created a GSI whose *partition key* was a regular column in the base
table, and exercised various elaborate updates requiring adding,
updating and deleting of rows from the materialized view.

In this patch, we add another similar test case, just for a *clustering
key*.

Both these tests are important regression tests - when we later
reimplement GSI we'll want to verify that none of the complex update
scenarios got broken (and indeed, some broken code did break these
tests).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
9094fe1608 test/alternator: test that DescribeTable returns IndexStatus for GSI
This patch adds a test reproducing issue #11471 - where DescribeTable
on a table that as an already built GSI (creating with the table itself)
must return IndexStatus == "ACTIVE".

This test passes on DynamoDB, but xfails on Alternator because of
issue #11471.

We actually had this check earlier, but it was part of a bigger xfailing
tests that checked multiple features. It's better to have it as a
separate test just for this feature, as we'll soon fix this issue and
make this test pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
1b120e3c7e test/alternator: fix wrong test for UpdateTable metrics
The test we had for counting Alternator operations metrics ran the
UpdateTable request without any parameters, which isn't actually a
valid call - Amazon DynamoDB rejects such a call, saying one of the
different parameters must be present, and we'll want to do that
later too.

So let's fix the test to use a valid UpdateTable request, one that
does the silly BillingMode='PAY_PER_REQUEST'. This is already the
current setting, so nothing is really changed, but it's still counted
as an operation in the metric.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
85088516b2 test/alternator: add test for missing attribute in item in LSI
Test that when a table has an LSI, then if the indexed attribute is
missing, the item is added to the base table but not the index.

We already have exactly the same test for GSI in test_gsi.py, but forgot
to do write the same test for LSI. It's important to test this scenario
separately for GSIs and LSIs because in an upcoming GSI reimplementation
we plan to make the GSI and LSI implementation slightly different, and
they can have separate bugs (and in fact, we had such an LSI-specific
bug in one broken implementation).

We also have the same scenario that is tested here in the test
test_streams.py::test_streams_updateitem_old_image_lsi_missing_column
but that was a Alternator Streams test and we should have a more basic
test for this scenario in test_lsi.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
b00f5a6070 test/alternator: test that DescribeTable doesn't return IndexStatus for LSI
Whereas GSIs have an IndexStatus when described by DescribeTable,
LSIs do not. The purpose of IndexStatus is to tell when the index is live,
and this is not needed for LSIs because they cannot be added to a base
table that already exists.

We already had a test for this, but it was hidden in an xfailing test
for many different DescribeTable attributes - so let's move it into it's
own, *passing*, test. The new tests passes on both Alternator and
Amazon DynamoDB.

This test is an important regression test for when we later add
IndexStatus support to GSI, and this test will ensure that we don't
accidentally introduce IndexStatus to LSIs as well - DynamoDB doesn't
generate it for LSIs so neither should Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
373b37b5da test/alternator: add tests for RBAC for create and delete GSI
In later patches we will implement (as requested in issue #11567) the
UpdateTable operation for creating a new GSI or removing a GSI on an
existing table. In this patch we add to test/alternator/test_cql_rbac.py
tests to exhaustively check that the new operations will behave as expected
in respect to role-based access control (RBAC):

1. UpdateTable requires the ALTER permissions on the affected table -
   as was already the case before (and was documented in compatibility.md).
   This should also be true for the newly-implemented UpdateTable
   operations that create a GSI and delete a GSI, and we test that.

   The above statement may sound counter-intuitive - why does creating
   or deleting a GSI require ALTER permissions (on the base table), not
   CREATE or DROP permissions? But this makes sense when you consider
   that CREATE permissions should allow you create new independent tables,
   not to change the behavior or performance of existing tables (which
   adding a GSI does).

2. When a role has permissions to create a GSI, it should be able to
   read the new GSI (SELECT permissions). This is known as "auto-grant".

3. When a GSI is deleted, whatever permissions was set on it is revoked,
   so that if it's later recreated, the old permissions don't resurface.
   This is known as "auto-revoke".

Because the UpdateTable feature for creating and deleting a GSI is not
yet enabled, the new tests are all marked "xfail".

The new tests, like all tests in the file test/alternator/test_cql_rbac.py
are Scylla-only and are skipped on Amazon DynamoDB - because they test
the Scylla-only CQL-based role-based access control API.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:55:28 +02:00
Amnon Heiman
d2ca1ebfa0 test_returnconsumedcapacity.py: Add delete Item tests
This patch adds three basic tests for delete item. A simple one that
validate that a simple short delete item returns 1 WCU.
The second tries to delete a missing item.
The third stores a bigger item and use the ReturnValues='ALL_OLD' to
make the API gets the previous stored item and see that the WCU is as
expected.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-12-03 15:55:41 +02:00
Amnon Heiman
f4c79d7728 test_metrics validate split wcu_total to ops
This patch modify the post_item WCU test to validate that it uses the
right ops. Note that the test will pass even before this change but we
want to validate the extra label.
2024-12-03 15:55:41 +02:00
Nadav Har'El
6d37b53653 test/alternator: move comment next to bizarre code that it explains
In commit 9ff9cd37c3 we added in
test/alternator/test_number.py a workaround for a boto3 bug that
prevented us (and still prevents us) from testing numbers with high
precision. Because the workaround was so bizarre, the three lines it
requires - two imports and an assignment - were preceded by a 5-line
comment explaining it.

Unfortunately, a later commit 93b9b85c12
went and arbitrarily moved import lines around to satisfy some PEP-8
"requirements", resulting in the comment being separated from the lines
it was supposed to explain.

This patch moves the comment in front of the main line it explains.
The two imports that are needed just for this line and aren't used
elsewhere remain in their current place (where the PEP8 police demands
they stay), but this is less important for the understanding of this
trick so it's fine.

No functionality of the test was changed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21635
2024-12-02 10:56:09 +01:00
Andrei Chekun
8bf62a086f test.py: Create central conftest.
Central conftest allows to reduce code duplication and execute all tests
with one pytest command

Closes scylladb/scylladb#21454
2024-11-24 20:09:48 +02:00
Nadav Har'El
7014aec452 Merge 'Alternator measuring RCU and WCU' from Amnon Heiman
Read and Write Consumed Capacity units are an abstract way of measuring Alternator actions. In general, they correspond to the read or write data.

In the long run, the RCU/WCU adds a way of charging an operation and limiting usage.

This series addresses two issues: consume capacity request API and metering.

The Alternator (and DynmoDB) API has an optional parameter allowing users to check the number of units an operation consumes. When a user adds that parameter, the response will contain the number of units used for the operation.

This series adds the consume capacity support to the get_item and put_item, adds a metric to collect the overall RCU and WCU used, and adds a test for the new functionality.

Follow-up PRs will add support for more operations and GSI.

Replaces #19811
Partially implement: #5027

Closes scylladb/scylladb#21543

* github.com:scylladb/scylladb:
  alternator/test_metrics: Add tests for table consumption units
  test_returnconsumedcapacity.py: Add putItem tests
  Alternator: add WCU support
  Add test/alternator/test_returnconsumedcapacity.py
  alternator/executor: Add consume capacity for get_item
  alsternator/stats: Add rcu and wcu metrics to stats
  alternator/executor.hh: white-space cleanup
  Add the consume_capacity helper class
2024-11-24 19:27:03 +02:00