scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Piotr Wieczorek	445e58bbc5	alternator: Correct RCU undercount in BatchGetItem The `describe_multi_item` function treated the last reference-captured argument as the number of used RCU half units. The caller `batch_get_item`, however, expected this parameter to hold an item size. This RCU value was then passed to `rcu_consumed_capacity_counter::get_half_units`, treating the already-calculated RCU integer as if it were a size in bytes. This caused a second conversion that undercounted the true RCU. During conversion, the number of bytes is divided by `RCU_BLOCK_SIZE_LENGTH` (=4KB), so the double conversion divided the number of bytes by 16 MB. The fix removes the second conversion in `describe_multi_item` and changes the API of `describe_multi_item`. Fixes: https://github.com/scylladb/scylladb/pull/25847 Closes scylladb/scylladb#25842 (cherry picked from commit `a55c5e9ec7`) Closes scylladb/scylladb#26538	2025-10-14 11:49:30 +03:00
Nadav Har'El	d61bce8685	alternator: fix bug in combination of AttributeUpdates + ReturnValues In test/alternator/test_returnvalues.py we had tests for the ReturnValues feature on UpdateItem requests - but we only tested UpdateItem requests with the "modern" UpdateExpression, and forgot to test the combination of ReturnValues with the old AttributeUpdates API. It turns out this combination is buggy: when both ReturnValues=ALL_OLD and AttributeUpdates need the previous value of the item, we may wrongly std::move() the value out, and the operation will fail with a strange error: An error occurred (ValidationException) when calling the UpdateItem operation: JSON assert failed on condition 'IsObject()' The fix in this patch is trivial - just move the std::move() to the correct place, after both UpdateExpression and AttributeUpdates handling is done. This patch also includes a reproducing test, which fails before this patch and passes with it - and of course passes on DynamoDB. This test reproduces two cases where the bug happened, as well as one case where it didn't (to make sure we don't regress in what already worked). Fixes #25894 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25900 (cherry picked from commit `3c0032deb4`) Closes scylladb/scylladb#26096	2025-09-19 19:25:15 +03:00
Nadav Har'El	5d6aa6e8c2	utils, alternator: fix detection of invalid base-64 This patch fixes an error-path bug in the base-64 decoding code in utils/base64.cc, which among other things is used in Alternator to decode blobs in JSON requests. The base-64 decoding code has a lookup table, which was wrongly sized 255 bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF) was included in an invalid base-64 string, instead of detecting that this is an invalid byte (since the only valid bytes in a base-64 string are A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a nonsense 6-bit part, or even crash on an out-of-bounds read. Besides the trivial fix, this patch also includes a reproducing test, which tries to write a blob as a supposedly base-64 encoded string with a 0xFF byte in it. The test fails before this patch (the write succeeds, unexpectedly), and passes after this patch (the write fails as expected). The test also passes on DynamoDB. Fixes #25701 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25705 (cherry picked from commit `ff91027eac`) Closes scylladb/scylladb#25767	2025-09-04 11:38:55 +03:00
Szymon Malewski	4c375b257b	test/alternator: enable more relevant logs in CI. This patch sets, for alternator test suite, all 'alternator-*' loggers and 'paxos' logger to trace level. This should significantly ease debugging of failed tests, while it has no effect on test time and increases log size only by 7%. This affects running alternator tests only with `test.py`, not with `test/alternator/run`. Closes #24645 Closes scylladb/scylladb#25327 (cherry picked from commit `eb11485969`) Closes scylladb/scylladb#25383	2025-08-11 06:51:23 +03:00
Nadav Har'El	b7da50d781	alternator: avoid oversized allocation in Query/Scan This patch fixes one cause of oversized allocations - and therefore potentially stalls and increased tail latencies - in Alternator. Alternator's Scan or Query operation return a page of results. When the number of items is not limited by a "Limit" parameter, the default is to return a 1 MB page. If items are short, a large number of them can fit in that 1MB. The test test_query.py::test_query_large_page_small_rows has 30,000 items returned in a single page. In the response JSON, all these items are returned in a single array "Items". Before this patch, we build the full response as a RapidJSON object before sending it. The problem is that unfortunately, RapidJSON stores arrays as contiguous allocations. This results in large contiguous allocations in workloads that scan many small items, and large contiguous allocations can also cause stalls and high tail latencies. For example, before this patch, running test/alternator/run --runveryslow \ test_query.py::test_query_large_page_small_rows reports in the log: oversized allocation: 573440 bytes. After this patch, this warning no longer appears. The patch solves the problem by collecting the scanned items not in a RapidJSON array, but rather in a chunked_vector<rjson::value>, i.e, a chunked (non-contiguous) array of items (each a JSON value). After collecting this array separately from the response object, we need to print its content without actually inserting it into the object - we add a new function print_with_extra_array() to do that. The new separate-chunked-vector technique is used when a large number (currently, >256) of items were scanned. When there is a smaller number of items in a page (this is typical when each item is longer), we just insert those items in the object and print it as before. Beyond the original slow test that demonstrated the oversized allocation (which is now gone), this patch also includes a new test which exercises the new code with a scan of 700 (>256) items in a page - but this new test is fast enough to be permanently in our test suite and not a manual "veryslow" test as the other test. Fixes #23535 (cherry picked from commit `2385fba4b6`)	2025-07-27 07:42:01 +00:00
Nadav Har'El	50d370f06e	test/alternator: reproducer for streams bug with long table name The two tests in this patch reproduce issue #24598: When enabling Alternator streams on an Alternator table with a very long name, such as the maximum allowed name length 222, the result is an I/O error and a Scylla shutdown. The two tests are currently marked "skip", otherwise they would crash the Scylla being tested. Refs #24598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-06-29 11:40:55 +03:00
Nadav Har'El	0ce0b2934f	alternator: improve, document and test table/index name lengths Whereas DynamoDB limits the names of tables, LSIs and GSIs to 255 characters each, Alternator currently has different (and lower) limitations: 1. A table name must be up to 222 characters. 2. For a GSI, the sum of the table's and GSI's name length, plus 1, must be up to 222 characters. 3. For an LSI, the sum of the table's and LSI's name length, plus 2, must be up to 222 characters. These specific limitations were never documented, so in this patch we add this information to docs/alternator/compatibility.md. Moreover, these limitations where only partially tested, so in this patch we add testing for more cases that we forgot to check - such as length of LSI names (only GSI were checked before this patch), or adding a GSI to an existing table. It is important to check all these corner cases because there is a risk that if we attempt to create a table without checking its length, we can end up with an I/O error that brings down Scylla. In one case - UpdateTable adding a GSI to an existing table - the new test exposed a trivial bug: Because UpdateTable wants to verify the new GSI doesn't have the same name as an existing LSI, it mistakenly applied the LSI's length name limit instead of the GSI's name length limit, which is one byte less than it should be. So this patch fixes this trivial bug as well. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-06-29 11:40:55 +03:00
Wojciech Mitros	5eb4466789	Return correct creation date time in describe table Add system:table_creation_time tag with value - timestamp in milliseconds of creation table. If the tag is present, it will used to fill creation timestamp value (when CreateTable or DescribeTable is called). If the tag is missing, value 0 for timestamp will be substituted (in other words table was created on 1th january of 1970). Update test to change how we make sure timestamp is actually used - we create two tables one after another and make sure their creation timestamp is in correct order. Update tests, that work with tags to filter system tags out. Fixes #5013 Closes scylladb/scylladb#24007	2025-06-10 15:25:57 +03:00
Nadav Har'El	a714079a62	Merge 'Add Support for Per-Table Metrics in Alternator' from Amnon Heiman This series introduces per-table metrics support for Alternator. It includes the following commits: Add optional per-table metrics for Alternator Introduces a shared_ptr-based mechanism that allows Alternator to register per-table metrics. These metrics follow the table's lifecycle, similar to how CQL metrics are handled. The use of shared_ptr ensures no direct dependency between table stats and Alternator. Enable registration of stats objects per table Adds support for registering a stats object using a keyspace and table name. Per-table metrics are prefixed with alternator_table to differentiate them from per-shard metrics. Metrics are reported once per node, and those not meaningful at the table level (e.g. create/delete) are excluded. All metrics use the skip_when_empty flag. Update per-table metrics handling Adds a helper function to retrieve the stats object from a table schema. Updates both per-shard and per-table metrics, resulting in some code duplication. Add tests for per-table metrics Extends existing tests to also validate the per-table metrics. These tests ensure that the new metrics are correctly registered and updated. This series improves observability in Alternator by enabling fine-grained per-table metrics without disrupting existing per-shard metrics. No need to backport Fixes #19824 Closes scylladb/scylladb#24046 * github.com:scylladb/scylladb: alternator/test_metrics.py: Test the per-table metrics alternator/executor.cc: Update per-table metrics alternator/stats: Add per-table metrics replica/database.hh: Add alternator per-table metrics alternator/stats.hh: Introduce a per-table stats container	2025-06-08 10:42:05 +03:00
Pavel Emelyanov	f5743c6afc	Merge 'test/alternator: make tests runnable on DynamoDB Local' from Nadav Har'El The Alternator tests should pass on Alternator (of course), and almost always also on DynamoDB to verify that the tests themselves are correct and don't just enshrine Alternator's incorrect behavior. Although much less important, it is sometimes useful to be able to check if the test also pass on other DynamoDB clones, especially "DynamoDB Local" - Amazon's DynamoDB mock written in Java. In issue https://github.com/scylladb/scylladb/issues/7775 we noted that some of our tests don't actually pass on DynamoDB Local, for different reasons, but at the time that issue was created most of the tests did work. However, checking now on a newer version of DynamoDB Local (2.6.1), I notice that _all_ tests failed because of some silly reasons that are easy to fix - and this is what the two patches in this series fix. After these fixes, most of the Alternator tests pass on DynamoDB Local. But not all of them - #7775 is still open. No backport needed - these are just test framework improvements for developers. Closes scylladb/scylladb#24361 * github.com:scylladb/scylladb: test/alternator: any response from healthcheck means server is alive test/alternator: fall back to legal-looking access key id	2025-06-06 08:50:58 +03:00
Piotr Szymaniak	de96c28625	alternator: Add support for TTL when using tablets Support for TTL-based data removal when using tablets. The essence of this commit is a separate code path for finding token ranges owned by the current shard for the cases when tablets are used and not vnodes. At the same time, the vnodes-case is not touched not to cause any regressions. The TTL-caused data removal is normally performed by the primary replica (both when using vnodes and tablets). For the tablets case, the already-existing method tablet_map::get_primary_replica(tablet_id) is used to know if a shard execuring the TTL-related data removal is the primary replica for each tablet. A new method tablet_map::get_secondary_replica(tablet_id) has been added. It is needed by the data invalidation procedure to remove data when the primary replica node is down - the data is then removed by the secondary replica node. The mechanism is the same as in the vnodes case. Since alternator now supports TTL, the test `test_ttl_enable_error_with_tablets` has been removed. Also, tests in the test_ttl.py have been made to run twice, once with vnodes and once with tablets. When run with tablets, the due to lack of support for LWT with tablets (#18068), tests use 'system:write_isolation' of 'unsafe_rmw'. This approach allows early regression testing with tablets and is meant only as a tentative solution. Fixes scylladb/scylladb#16567 Closes scylladb/scylladb#23662	2025-06-05 17:39:29 +03:00
Amnon Heiman	760c8c3333	alternator/test_metrics.py: Test the per-table metrics This patch adds tests for the newly added per-table metrics. It mainly redoes existing tests, but verifies that the per-table metrics are updated correctly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-06-05 15:12:19 +03:00
Nadav Har'El	6cbcabd100	alternator: hide internal tags from users The "tags" mechanism in Alternator is a convenient way to attach metadata to Alternator tables. Recently we have started using it more and more for internal metadata storage: * UpdateTimeToLive stores the attribute in a tag system:ttl_attribute * CreateTable stores provisioned throughput in tags system:provisioned_rcu and system:provisioned_wcu * CreateTable stores the table's creation time in a tag called system:table_creation_time. We do not want any of these internal tags to be visible to a ListTagsOfResource request, because if they are visible (as before this patch), systems such as Terraform can get confused when they suddenly see a tag which they didn't set - and may even attempt to delete it (as reported in issue #24098). Moreover, we don't want any of these internal tags to be writable with TagResource or UntagResource: If a user wants to change the TTL setting they should do it via UpdateTimeToLive - not by writing directly to tags. So in this patch we forbid read or write to any tag that begins with the "system:" prefix, except one: "system:write_isolation". That tag is deliberately intended to be writable by the user, as a configuration mechanism, and is never created internally by Scylla. We should have perhaps chosen a different prefix for configurable vs. internal tags, or chosen more unique prefixes - but let's not change these historic names now. This patch also adds regression tests for the internal tags features, failing before this patch and passing after: 1. internal tags, specifically system:ttl_attribute, are not visible in ListTagsOfResource, and cannot be modified by TagResource or UntagResource. 2. system:write_isolation is not internal, and be written by either TagResource or UntagResource, and read with ListTagsOfResource. This patch also fixes a bug in the test where we added more checks for system:write_isolation - test_tag_resource_write_isolation_values. This test forgot to remove the system:write_isolation tags from test_table when it ended, which would lead to other tests that run later to run with a non-default write isolation - something which we never intended. Fixes #24098. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#24299	2025-06-03 20:40:50 +03:00
Nadav Har'El	ac70e34de9	test/alternator: verify that DeleteItem returns an empty object A user on StackOverflow (https://stackoverflow.com/questions/79650278) reported that DeleteItem returns the apropriate response (an empty object) on DynamoDB, but doesn't on "DynamoDB Local" (Amazon's local mock of DynamoDB). I wrote the test in this patch to make sure that Alternator doesn't have this bug, and indeed it doesn't: When DeleteItem is used without any option that asks for additional output, its reponse is, as expected, an empty object. As usual, the new test passes on both Alternator and AWS DynamoDB. (I didn't actually test on DynamoDB Local, I have some problems with running that, but it doesn't matter, we have no intention of testing DynamoDB Local). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#24359	2025-06-03 18:47:34 +03:00
Nadav Har'El	e32559758a	test/alternator: any response from healthcheck means server is alive In the Alternator tests we check (in dynamodb_test_connect()) after every test that the server is still alive, so we can blaim the test that just ran if it crashes the server. We check the server's health using a simple GET response, which works on both DynamoDB and Alternator, e.g., ``` $ curl http://dynamodb.us-east-2.amazonaws.com/ healthy: dynamodb.us-east-2.amazonaws.com ``` However, it turns out that new versions of DynamoDB Local - Amazon's local mock of DynamoDB, for some reason insists that all requests - including this health check - must be signed, so our unsigned health request is rejected with error 400, saying the request must be signed. So the current code which insists that the response have error code 200, fails and the test incorrectly things that DynamoDB Local crashed during the test. The fix is trivial: Just don't check that the error code is 200. Any HTTP response from the server means it is still alive! If the server is not alive, we will get an exception, not any HTTP response, and this will lead the code to the "server has crashed" case. Refs #7775 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-06-03 12:25:51 +03:00
Nadav Har'El	9732545958	test/alternator: fall back to legal-looking access key id When the Alternator tests run against Scylla, they figure out (using CQL) the correct username and password needed to connect. When it can't, we fell back to some silly pair 'unknown_user', 'unknown_secret', assuming that the server won't check it anyway. It turns out that if we want to run tests against new version of DynamoDB Local (Amazon's local mock of DynamoDB), it indeed doesn't authentication, but starting in DynamoDB Local 2.0, it does check that the access key ID (the username) itself is valid, and considers "unknown_user" to be invalid because it contains an underscore - AWS_ACCESS_KEY_ID must only contains letters and numbers. See https://repost.aws/articles/ARc4hEkF9CRgOrw8kSMe6CwQ/ for Amazon's explanation for this change in DynamoDB Local 2. The trivial fix is to remove the underscore from the silly username. After this patch, Alternator tests can connect to DynamoDB Local. They still can't complete correctly - this will be fixed in the next patch. Refs #7775 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-06-03 12:25:51 +03:00
Pavel Emelyanov	086777e5de	Merge 'test.py: python: run tests using bare pytest command' from Evgeniy Naydanov Main change is splitting logic of `PythonTest.run()` method into `PythonTest.run_ctx()` context manager and `PythonTest.run()` method itself and add the `host` fixture which uses `PythonTest.run_ctx()` context manager to setup and teardown ScyllaDB node if `--test-py-init` argument is used. Otherwise, this fixture returns a value of `--host` CLI argument. Use dynamic scope provided by `testpy_test_fixture_scope()` function instead of `session` to maintain compatibility with `test.py` and `./run` scripts. Other related changes: * Add utility `get_testpy_test()` function to `pylib.suite.base` which combines all required steps to create an instance of `Test` class and rework `testpy_test` fixture to use it. * Switch to use dynamic fixture scope controlled by `--test-py-init` CLI argument to improve compatibility with test.py. And because in test.py mode the scope is `session`, also change default event loop scope to `session`. * Convert `get_valid_alternator_role()` to fixture to have more control on the scope of the cache used. Additionally, function `new_dynamodb_session()` was also converted to a fixture, because it uses `get_valid_alternator_role()`. * Replace dups of `cql` and `this_dc` fixtures in `rest_api` and `pylib/cql_repl` with imports from `cqlpy`. * Change `build_mode` fixture to return "unknown" if no --mode arguments provided (this is mainly for alternator and cqlpy tests) * Create a parent directory for a test log file just before opening this file in `run_test()` function instead of having this as a side effect in `Test.__init__()`. And changes that remove pytest CLI argument duplicates to be able to run tests from different test suites in one pytest session: * Add 3 supplementary functions to `test.pylib.suite.python`: `add_host_option()` (which adds `--host` options to pytest session), `add_cql_connection_options()` (which adds `--port`, and `--ssl`), and `--add-s3-options` (which adds options related to S3 connection.) Each function decorated with `@cache` decorator to be executed once per pytest session and avoid CLI options duplication for runs which executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables` in one pytest session. * Move `--auth_username` and `--auth_password` options from `cluster/conftest.py` to add_scylla_cql_connection_options() and slightly rework `cql` fixture to support these options. * Remove `--input`, `--output`, and `--keep-tmp` pytest CLI opionts from `cluster/object_store/conftest.py` because they are not used in these suite. * Remove `--omit-scylla-output` CLI option from pytest argparser. Instead, remove it from `sys.argv` in `cqlpy/run.py`. Also, no need to check this option in `alternator/run`. Closes scylladb/scylladb#23849 * github.com:scylladb/scylladb: test.py: python: run tests using bare pytest command test.py: rework testpy_test fixture test.py: alternator: convert get_valid_alternator_role() to fixture test.py: python: split logic of PythonTest.run() test.py: add credentials options to add_cql_connection_options() test.py: python: remove dups of cql and this_dc fixtures test.py: remove duplication of pytest CLI options test.py: remove unused CLI options test.py: remove `--omit-scylla-output` from pytest argparser test.py: set build_mode to "unknown" if no --mode argument test.py: create directory for test log in run_test()	2025-05-30 08:48:43 +03:00
Szymon Malewski	18d237a393	alternator/executor: Added checks in `batch_write_item` This patch adds checks validating 'BatchWriteItem' requests mostly to avoid ugly fallback message. It changes request's behaviour in case of an empty array of WriteRequests - previously such an array was ignored and whole request might succeed, now it raises ValidationException, following the documentation and behaviour of DynamoDB. Patch includes tests in test_manual_requests (`test_batch_write_item_invalid_payload`, `test_batch_write_item_empty_request_list`) testing with several offending cases. Fixes #23233 Closes scylladb/scylladb#23878	2025-05-29 20:33:57 +03:00
Evgeniy Naydanov	0ee0e3f14d	test.py: python: run tests using bare pytest command Add the `host` fixture which uses `PythonTest.run_ctx()` context manager to setup and teardown ScyllaDB node if `--test-py-init` argument is used. Otherwise, this fixture returns a value of `--host` CLI argument. Use dynamic scope provided by `testpy_test_fixture_scope()` function instead of `session` to maintain compatibility with test.py and ./run scripts.	2025-05-29 12:33:41 +00:00
Evgeniy Naydanov	b65cb517b8	test.py: alternator: convert get_valid_alternator_role() to fixture Convert `get_valid_alternator_role()` to fixture to have more control on the scope of the cache used. Additionally, function `new_dynamodb_session()` was also converted to a fixture, because it uses `get_valid_alternator_role()`.	2025-05-29 12:15:28 +00:00
Evgeniy Naydanov	6780461df8	test.py: remove duplication of pytest CLI options Add 3 supplementary functions to `test.pylib.suite.python`: `add_host_option()` (which adds `--host` options to pytest session), `add_cql_connection_options()` (which adds `--port`, and `--ssl`), and `--add-s3-options` (which adds options related to S3 connection.) Each function decorated with `@cache` decorator to be executed once per pytest session and avoid CLI options duplication for runs which executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables` in one pytest session.	2025-05-29 12:15:28 +00:00
Evgeniy Naydanov	b7b68355ef	test.py: remove `--omit-scylla-output` from pytest argparser Remove `--omit-scylla-output` CLI option from pytest argparser. Instead, remove it from `sys.argv` in `cqlpy/run.py`. Also, no need to check this option in `alternator/run`.	2025-05-29 12:15:28 +00:00
Nadav Har'El	7c24e09b0d	test/alternator: add some Alternator-over-HTTPS tests This patch adds a few tests for Alternator over HTTPS (encrypted HTTP, a.k.a. TLS or SSL). The tests are skipped unless run with "--https", so they will not be run in CI. Nevertheless, they are useful to improve our understanding on how DynamoDB works over HTTPS and can be a basis for adding more tests for HTTPS support. The included tests pass on both Alternator and AWS DynamoDB. One test checks that both TLS 1.2 and TLS 1.3 are properly supported, and if chosen by the client, are actually honored. The same test also checks that TLS 1.1 is not supported, and results with a proper error if attempted. Both AWS DynamoDB and Alterator support the same protocols. Another test verifies that HTTP (unencrypted) requests cannot be sent over an HTTPS port. This is important for security - an installation that chooses to allow only HTTPS wants users to only use encrypted connections, and would not want users to continue sending unencrypted requests to the HTTPS port. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23493	2025-05-12 15:38:33 +03:00
Nadav Har'El	7ccf77b84f	test/alternator: another test for UpdateExpression's SET I found on StackOverflow an interesting discussion about the fact that DynamoDB's UpdateExpression documentation "recommends" to use SET instead of ADD, and the rather convoluted expression that is actually needed to emulate ADD using SET: ``` SET #count = if_not_exists(#count, :zero) + :one ``` https://stackoverflow.com/questions/14077414/dynamodb-increment-a-key-value Although we do have separate tests for the different pieces of that idiom - a SET with missing attribute or item, the if_not_exists() function, etc. - I thought it would be nice to have a dedicated test that verifies that this idiom actually works, and moreover that the more naive "SET #count = #count + :one" does NOT work if the item or the attribute are missing. Unsurprisingly, the new test passes on both Alternator and DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23963	2025-05-07 13:57:50 +03:00
Nadav Har'El	b4a9fe9928	test/alternator: another test for expression with a lot of ORs We already have a test, test_limits.py::test_deeply_nested_expression_2, which checks that in the long condition expression a<b or (a<b or (a<b or (a<b or (....)))) with more than MAX_DEPTH (=400) repeats is rejected by Alternator, as part of commit `04e5082d52` which restricted the depth of the recursive parser to prevent crashing Scylla. However, I got curious what will happen without the parentheses: a<b or a<b or a<b or a<b or ... It turns out that our parser actually parses this syntax without recursion - it's just a loop (a "*" in the Antlr alternator/expressions.g allows reading more and more ORs in a loop). So Alternator doesn't limit the length of this expression more than the length limit of 4096 bytes which we also have. We can fit 584 repeats in the above expression in 4096 bytes, and it will not be rejected even though 584 > 400. This test confirms that this is indeed the case. The test is Scylla-only because on DynamoDB, this expression is rejected because it has more than 300 "OR" operators. Scylla doesn't have this specific limit - we believe the other limitations (on total expression length, and on depth) are better for protecting Scylla. Remember that in an expression like "(((((((((((((" there is a very high recursion depth of the parser but zero operators, so counting the operators does nothing to protect Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23973	2025-05-07 13:57:18 +03:00
Nadav Har'El	252c5b5c9d	Merge 'Alternator batch_write_item wcu' from Amnon Heiman This series adds support for WCU tracking in batch_write_item and tests it. The patches include: Switch the metrics (RCU and WCU) to count units vs half-units as they were, to make the metrics clearer for users. Adding a public static get_half_units function to wcu_consumed_capacity_counter for use by batch write item, which cannot directly use the counter object. Adding WCU calculation support to batch_write_item, based on item size for puts and a fixed 1 WCU for deletes. WCU metrics are updated, and consumed capacity is returned per table when requested. The return handling was refactored to be coroutine-like for easier management of the consumed capacity array. Adding tests that validate WCU calculation for batch put requests on a single table and across multiple tables, ensuring delete operations are counted correctly. Adding a test that validates that WCU metrics are updated correctly during batch write item operations, ensuring the WCU of each item is calculated independently. Need backport, WCU is partially supported, and is missing from batch_write_item Fixes #23940 Closes scylladb/scylladb#23941 * github.com:scylladb/scylladb: alternator/test_metrics.py: batch_write validate WCU alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU alternator/executor: add WCU for batch_write_items alternator/consumed_capacity: make wcu get_units public Alternator: Change the WCU/RCU to use units	2025-05-06 13:31:53 +03:00
Amnon Heiman	2ab99d7a07	alternator/test_metrics.py: batch_write validate WCU This patch adds a test that verifies the WCU metrics are updated correctly during a batch_write_item operation. It ensures that the WCU of each item is calculated independently. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:20:24 +03:00
Amnon Heiman	14570f1bb5	alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU This patch adds two tests: A test that validates WCU calculation for batch put requests on a single table. A test that validates WCU calculation for batch requests across multiple tables, including ensuring that delete operations are counted as 1 WCU. Both tests verify that the consumed capacity is reported correctly according to the WCU rules. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:20:23 +03:00
Amnon Heiman	5ae11746fa	Alternator: Change the WCU/RCU to use units This patch changes the RCU/WCU Alternator metrics to use whole units instead of half units. The change includes the following: Change the metrics documentation. Keep the RCU counter internally in half units, but return the actual (whole unit) value. Change the RCU name to be rcu_half_units_total to indicates that it counts half units. Change the WCU to count in whole units instead of half units. Update the tests accordingly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:18:09 +03:00
Nadav Har'El	834107ae97	test/cqlpy,alternator: fix reporting of Scylla crash during test The cqlpy and alternator test frameworks use a single Scylla node started once for all tests to run on. In the distant past, we had a problem where if one test caused Scylla to crash, the result was a confusing report of hundreds of failed tests - all tests after the crash "failed" and it wasn't easy to find which test really caused the crash. Our old solution to this problem was to have an autouse fixture (called cql_test_connection or dynamodb_test_connection) which tested the connection at the end of each test, and if it detected Scylla has crashed - it used pytest.exit() to report the error and have pytest exit and therefore stop running any further tests (which would have led to all of them testing). This approach had two problems: 1. The pytest.exit() caused the entire cqlpy suite to report a failure, but but not the individual test - the individual test might have failed as well, but that isn't guaranteed and in any case this test's output is missing the informative message that Scylla crashed during the test. This was fine when for each cqlpy failure we had two separate error logs in Jenkins - the specific failed function, and the failed file - but when we recently got rid of the suplication by removing the second one, we no longer see the "Scylla crashed" messages any more. 2. Exiting pytest will be the wrong thing to do if the same pytest run could run tests from different test suites. We don't do this today, but we plan to support this approach soon. This patch fixes both problems by replacing the pytest.exit() call by setting a "scylla_crashed" flag and using pytest.fail(). The pytest.fail() causes the current test - the one which caused Scylla to crash - to be reported as an "ERROR" and the "Scylla crashed" message will correctly appear in this test's log. The flag will cause all other tests in the same test suite to be skip()ed. But other tests in other directories, depending on different fixtures, might continue to run normally. Fixes #23287 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23307	2025-05-05 10:15:56 +03:00
Piotr Szymaniak	e588c8667f	alternator: Limit attribute name lengths Attribute names are now checked against DynamoDB-compatible length limits. When exceeded, Alternator emits exception identical or similar to the DDB one. It might be worth noting that DDB emits more than a single kind of an exception string for some exceptions. The tests' catch clauses handle all the observed kinds of messages from DynamoDB. The validation differentiates between key and non-key attributes and applies the limit accordingly. AWS DDB raises exceptions with somewhat different contents when the get request contains ProjectionExpression, so this case needed separate treatment to emit the corresponding exception string. The length-validating function was declared and defined in expressions.hh/.cc respectively, because that's where the relevant parsing happens. ** Tests The following tests were validated when handling this issue: test_limit_attribute_length_nonkey_good, test_limit_attribute_length_nonkey_bad, test_limit_attribute_length_key_good, test_limit_attribute_length_key_bad, test_limit_attribute_length_gsi_lsi_good, test_limit_attribute_length_gsi_lsi_bad, test_limit_attribute_length_gsi_lsi_projection_bad. Some of the tests were expanded into being more granular. Namely, there is a new test function `test_limit_attribute_length_key_bad_incoherent_names` which groups tests with too long attribute names in the case of incorrect (incoherent) user requests. Similarily, there is a new test function `test_limit_attribute_length_gsi_lsi_bad_incoherent_names` All the tests cover now each combination of the key/keys being too long. Both the new fuctions contain tests that verify that ScyllaDB throws length-related exceptions (instead of the coherency-related), similar to what DynamoDB does. The new test test_limit_gsiu_key_len_bad covers the case of too long attribute name inside GlobalSecondaryIndexUpdates. The new test test_limit_gsiu_key_len_bad_incoherent_names covers the case of incorrect (incoherent) user requests containing too long attribute names and GlobalSecondaryIndexUpdates. test_limit_attribute_length_key_bad was found to have contaned an illegal KeySchema structure. Some of the tests were corrected their match clause. All the tests are stripped of the xfail flag except test_limit_attribute_length_key_bad, which has it changed since it still fails due to Projection in GSI and LIS not implemented in Alternator. The xfail now points to #5036. Fixes scylladb/scylladb#9169 Closes scylladb/scylladb#23097	2025-04-27 18:39:20 +03:00
Amnon Heiman	3acde5f904	test_returnconsumedcapacity.py: test RCU for batch get item This patch adds tests for consumed capacity in batch get item. It tests both the simple case and the multi-item, multi-table case that combines consistent and non-consistent reads.	2025-04-16 17:05:32 +03:00
Nadav Har'El	258213f73b	Merge 'Alternator batch count histograms' from Amnon Heiman This series adds a histogram for get and write batch sizes. It uses the estimated_histogram implementation which starts from 1 with 1.2 exponential factor, which works extremely tight to 20 but still covers all the way to 100. Histograms will be reported per node. Backport to 2025.1 so we'll have information about user batch size limitation Closes scylladb/scylladb#23379 * github.com:scylladb/scylladb: alternator: Add tests for the batch items histograms alternator: Add histogram for batch item count	2025-04-09 22:41:14 +03:00
Nadav Har'El	84fd52315f	alternator: in GetRecords, enforce Limit to be <= 1000 Alternator Streams' "GetRecords" operation has a "Limit" parameter on how many records to return. The DynamoDB documentations says that the upper limit on this Limit parameter is 1000 - but Alternator didn't enforce this. In this patch we begin enforcing this highest Limit, and also add a test for verifying this enforcement. As usual, the new test passes on DynamoDB, and after this patch - also on Alternator. The reason why it's useful to have some upper limit on Limit is that the existing executor::get_records() implementation does not really have preemption points in all the necessary places. In particular, we have a loop on all returned records without preemption points. We also store the returned records in a RapidJson vector, which requires a contiguous allocation. Even before this patch, GetRecords had a hard limit of 1 MB of results. But still, in some cases 1 MB of results may be a lot of results, and we can see stalls in the aforementioned places being O(number of results). Fixes #23534 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23547	2025-04-07 12:52:03 +03:00
Amnon Heiman	b55f24c14d	alternator: Add tests for the batch items histograms This patch adds a test for the batch‑items histogram for both get and write operations. It update the check_increases_metric_exact helper function so that it would get a list of expected value and labels (labels can be None). This makes it easy to test multiple buckets in a histogram. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-04-06 18:22:23 +03:00
Nadav Har'El	431de48df9	test/alternator: test for item with many attributes A user complained that he couldn't read or write an item with more than 16 attributes (!) in Alternator. This isn't true, but I realized that we don't have a simple test for this case - all test use just a few attributes. So let's add such a test, doing PutItem, UpdateItem and GetItem with 400 attributes. Unsurprisingly, the test passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23568	2025-04-03 22:35:49 +03:00
Nadav Har'El	a9a6f9eecc	test/alternator: increase timeout in Alternator RBAC test On our testing infrastructure, tests often run a hundred times (!) slower than usual, for various reasons that we can't always avoid. This is why all our test frameworks drastically increase the default timeouts. We forgot to increase the timeout in one place - where Alternator tests use CQL. This is needed for the Alternator role-based access control (RBAC) tests, which is configured via CQL and therefore the Alternator test unusually uses CQL. So in this patch we increase the timeout of CQL driver used by Alternator tests to the same high timeouts (60-120 seconds) used by the regular CQL tests. As the famous saying goes, these timeouts should be enough for anyone. Fixes #23569. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23578	2025-04-03 22:31:08 +03:00
Botond Dénes	fcdae20fd1	Merge 'Add tablet enforcing option' from Benny Halevy This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing `enable_tablets` option. It can be set to the following values: disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option enabled: New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option `tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether tablets are disabled or enabled by default for new keyspaces, respectively. In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}` keyspace option, when the keyspace is created. `tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`. However, it does not allow to opt-out when creating new keyspaces by setting `tablets = {'enabled': false}` Refs scylladb/scylla-enterprise#4355 * Requires backport to 2025.1 Closes scylladb/scylladb#22273 * github.com:scylladb/scylladb: boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option db/config: add tablets_mode_for_new_keyspaces option	2025-04-03 16:32:19 +03:00
Radosław Cybulski	c36614e16d	alternator: add size check to BatchItemWrite Add a size check for BatchItemWrite command - if the item count is bigger than configuration value `alternator_maximum_batch_write_size`, an error will be raised and no modification will happen. This is done to synchronize with DynamoDB, where maximum size of BatchItemWrite is 25. To avoid complaints from clients, who use our feature of BatchWriteItem being limitless we set default value to 100. Fixes #5057 Closes scylladb/scylladb#23232	2025-04-02 14:48:00 +03:00
Benny Halevy	c62865df90	db/config: add tablets_mode_for_new_keyspaces option The new option deprecates the existing `enable_tablets` option. It will be extended in the next patch with a 3rd value: "enforced" while will enable tablets by default for new keyspace but without the posibility to opt out using the `tablets = {'enabled': false}` keyspace schema option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-24 14:54:45 +02:00
Nadav Har'El	317de64281	test/alternator: enable debugging output during Python crashes For a long time now, we've been seeing (see #17564), once in a while, Alternator tests crashing with the Python process getting killed on SIGSEGV after the tests have already finished successfully and all pytest had to do is exit. We have not been able to figure out where the bug is. Unfortunately, we've never been able to reproduce this bug locally - and only rarely we see it in CI runs, and when it happens we don't any information on why it happend. So the goal of this patch is to print more information that might hopefully help us next time we see this problem in CI (this patch does NOT fix the bug). This patch adds to test/alternator's conftest.py a call to faulthandler.enable(). This traps SIGSEGV and prints a stack trace (for each thread, if there are several) showing what Python was trying to do while it is crashing. Hopefully we'll see in this output some specific cleanup function belonging to boto3 or urllib or whatever, and be able to figure out where the bug is and how to avoid it. We could have added this faulthandler.enable() call to the top-level conftest.py or to test.py, but since we only ever had this Python crash in Alternator tests, I think it is more suitable that we limit this desperate debugging attempt only to Alternator tests. Refs #17564 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23340	2025-03-19 18:18:51 +03:00
Nadav Har'El	c0821842de	alternator: document the state of tablet support in Alternator In commit `c24bc3b` we decided that creating a new table in Alternator will by default use vnodes - not tablets - because of all the missing features in our tablets implementation that are important for Alternator, namely - LWT, CDC and Alternator TTL. We never documented this, or the fact that we support a tag `experimental:initial_tablets` which allows to override this decision and create an Alternator table using tablets. We also never documented what exactly doesn't work when Alternator uses tablet. This patch adds the missing documentation in docs/alternator/new-apis.md (which is a good place for describing the `experimental:initial_tablets` tag). The patch also adds a new test file, test_tablets.py, which includes tests for all the statements made in the document regarding how `experimental:initial_tablets` works and what works or doesn't work when tablets are enabled. Two existing tests - for TTL and Streams non-support with tablets - are moved to the new test file. When the tablets feature will finally be completed, both the document and the tests will need to be modified (some of the tests should be outright deleted). But it seems this will not happen for at least several months, and that is too long to wait without accurate documentation. Fixes #21629 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22462	2025-03-14 14:03:15 +03:00
Piotr Szymaniak	f887466c3f	alternator: Clean error handling on CreateTable without AttributeDefinitions If user fails to supply the AttributeDefinitions parameter when creating a table, Scylla used to fail on RAPIDJSON_ASSERT. Now it calls a polite exception, which is fully in-line with what DynamoDB does. The commit supplies also a new, relevant test routine. Fixes #23043 Closes scylladb/scylladb#23041	2025-02-26 14:24:57 +02:00
Piotr Szymaniak	c1f186c98a	alternator: re-enabling/changing existing stream's StreamViewType as well as disabling the nonexistent stream Table updates that try to enable stream (while changing or not the StreamViewType) on a table that already has the stream enabled will result in ValidationError. Table updates that try to disable stream on a table that does not have the stream enabled will result in ValidationError. Add two tests to verify the above. Mark the test for changing the existing stream's StreamViewType not to xfail. Fixes scylladb/scylladb#6939 Closes scylladb/scylladb#22827	2025-02-16 09:57:49 +02:00
Nadav Har'El	cae8a7222e	alternator: fix view build on oversized GSI key attribute Before this patch, the regular_column_transformation constructor, which we used in Alternator GSIs to generates a view key from a regular-column cell, accepted a cell of any size. As a reviewer (Avi) noticed, very long cells are possible, well beyond what Scylla allows for keys (64KB), and because regular_column_transformation stores such values in a contiguous "bytes" object it can cause stalls. But allowing oversized attributes creates an even more accute problem: While view building (backfilling in DynamoDB jargon), if we encounter an oversized (>64KB) key, the view building step will fail and the entire view building will hang forever. This patch fixes both problems by adding to regular_column_transformation's constructor the check that if the cell is 64KB or larger, an empty value is returned for the key. This causes the backfilling to silently skip this item, which is what we expect to happen (backfilling cannot do anything to fix or reject the pre-existing items in the best table). A test test_gsi_updatetable.py::test_gsi_backfill_oversized_key is introduced to reproduce this problem and its fix. The test adds a 65KB attribute to a base table, and then adds GSIs to this table with this attribute as its partition key or its sort key. Before this patch, the backfilling process for the new GSIs hangs, and never completes. After this patch, the backfilling completes and as expected contains other base-table items but not the item with the oversized attribute. The new test also passes on DynamoDB. However, while implementing this fix I realized that issue #10347 also exists for GSIs. Issue #10347 is about the fact that DynamoDB limits partition key and sort key attributes to 2048 and 1024 bytes, respectively. In the fix described above we only handled the accute case of lengths above 64 KB, but we should actually skip items whose GSI keys are over 2048 or 1024 bytes - not 64KB. This extra checking is not handled in this patch, and is part of a wider existing issue: Refs #10347 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-02-06 09:59:50 +01:00
Nadav Har'El	67d2ea4c4b	test/alternator: unflake test for IndexStatus The test for IndexStatus verifies that on a newly created table and GSI, the IndexStatus is "ACTIVE". However, in Alternator, this doesn't strictly need to happen immediately - view building, even for an empty table - can take a short while in debug mode. This make the test test test_gsi_describe_indexstatus flaky in debug mode. The fix is to wait for the GSI to become active with wait_for_gsi() before checking it is active. This is sort of silly and redundant, but the important point that if the IndexStatus is incorrect this test will fail, it doesn't really matter whether the wait_for_gsi() or the DescribeTable assertion is what fails. Now that wait_for_gsi() is used in two test files, this patch moves it (and its friend, wait_for_gsi_gone()) to util.py. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-02-06 09:59:49 +01:00
Nadav Har'El	4ba17387e6	test/alternator: work around unrelated bug causing test flakiness The alternator test test_gsi_updatetable.py::test_gsi_delete_with_lsi Creates a GSI together with a table, and then deletes it. We have a bug unrelated to the purpose of this test - #9059 - that causes view building to sometimes crash Scylla if the view is deleted while the view build is starting. We see specifically in debug builds that even view building of an empty table might not finish before the test deletes the view - so this bug happens. Work around that bug by waiting for the GSI to build after creating the table with the GSI. This shouldn't be necessary (in DynamoDB, a GSI created with the table always begins ready with the table), but doesn't hurt either. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-02-06 09:59:49 +01:00
Nadav Har'El	ac648950f1	test/alternator: remove xfail from all tests for issue 11567 The previous patches fully implemented issue 11567 - supporting UpdateTable to add or delet a GSI on an existing Alternator table. All 14 tests that were marked xfail because of this issue now pass, so this patch removes their xfail. There are no more xfailing tests referring to this issue. These 14 tests, most of them in test/alternator/test_gsi_updatetable.py, cover all aspects of this feature, including adding a GSI, deleting a GSI, interactions between GSI and LSI, RBAC when adding or deleting a GSI, data type limitation on an attribute that becomes a GSI key or stops being one, GSI backfill, DescribeTable and backfill, various error conditions, and more. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-02-06 09:59:49 +01:00
Nadav Har'El	cea7aacc52	alternator: add IndexStatus/Backfilling in DescribeTable This patch adds the missing IndexStatus and Backfilling fields for the GSIs listed by a DescribeTable request. These fields allow an application to check whether a GSI has been fully built (IndexStatus=ACTIVE) or currently being built (IndexStatus=CREATING, Backfilling=true). This feature is necessary when a GSI can be added to an existing table so its backfilling might take time - and the application might want to wait for it. One test - test_gsi.py::test_gsi_describe_indexstatus - begins to pass with this fix, so the xfail tag is removed from it. Fixes #11471. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-02-06 09:59:48 +01:00
Nadav Har'El	bfdd805f15	test/alternator: fix running against installation blocking CQL One of the design goals of the Alternator test suite (test/alternator) is that developers should be able to run the tests against some already running installation by running `cd test/alternator; pytest [--url ...]`. Some of our presentations and documents recommend running Alternator via docker as: docker run --name scylla -d -p 8000:8000 scylladb/scylla:latest --alternator-port=8000 --alternator-write-isolation=always This only makes port 8000 available to the host - the CQL port is blocked. We had a bug in conftest.py's get_valid_alternator_role() which caused it to fail (and fail every single test) when CQL is not available. What we really want is that when CQL is not available and we can't figure out a correct secret key to connect to Alternator, we just try a connect with a fake key - and hope that the option alternator-enforce-authorization is turned off. In fact, this is what the code comments claim was already happening - but we failed to handle the case that CQL is not available at all. After this patch, one can run Alternator with the above docker command, and then run tests against it. By the way, this provides another way for running any old release of Scylla and running Alternator tests against it. We already supported a similar feature via test/alternator/run's "--release" option, but its implementation doesn't use docker. Fixes #22591 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22592	2025-02-05 19:01:31 +03:00

1 2 3 4 5 ...

443 Commits