Commit Graph

48024 Commits

Author SHA1 Message Date
Gleb Natapov
bb29591daf topology coordinator: Do not cancel global requests in cancel_all_requests
This was mistakenly added by fbd75c5c06.
The function is called after checking that no topology request can
proceed, so it cancels them, but this has nothing to do with global
request. Also, for some reason, the cancellation was added in the loop
over topology requests.
2025-06-09 13:38:49 +03:00
Gleb Natapov
be0b328b19 topology coordinator: store request type for each global command 2025-06-09 13:38:49 +03:00
Gleb Natapov
00fd427be0 topology request: make it possible to hold global request types in request_type field
topology_request table has a filed to hold a request type, but
currently it can hold only per node requests. This patch makes it
possible to store global request types there as well.
2025-06-09 13:38:49 +03:00
Gleb Natapov
3a496067c6 topology coordinator: move alter table global request parameters into topology_request table
Currently parameters to alter table global topology command are stored
in static column in the topology table, but this way there can be only one
outstanding alter table request. This patch moves the parameters to
the topology_request table where parameters are stored per request.
2025-06-09 13:38:49 +03:00
Gleb Natapov
a9244bf037 topology coordinator: move cleanup global command to report completion through topology_request table
We want to unify all command to report completion through the
topology_requests table.
2025-06-09 13:38:49 +03:00
Gleb Natapov
6a52ba2251 topology coordinator: no need to create updates vector explicitly 2025-06-09 13:38:49 +03:00
Gleb Natapov
69dacb5894 topology coordinator: use topology_request_tracking_mutation_builder::done() instead of open code it 2025-06-09 13:38:49 +03:00
Gleb Natapov
7257391c8f topology coordinator: handle error during new_cdc_generation command processing
Currently if there is an error during new_cdc_generation command it is
retried in a loop.  Since the status of the command executing is now
reported through the topology request table we can fail the command
instead,
2025-06-09 13:38:48 +03:00
Gleb Natapov
389f0f6280 topology coordinator: remove unneeded semicolon 2025-06-09 13:38:48 +03:00
Gleb Natapov
ba371c09fc topology coordinator: fix indentation after the last commit 2025-06-09 13:38:48 +03:00
Gleb Natapov
b8c11f330a topology coordinator: move new_cdc_generation topology request to use topology_request table for completion
Currently it checks the completion by waiting for new generation to
appear, but we want to unify all commands to check for completion in
topology_request table.
2025-06-09 13:38:48 +03:00
Gleb Natapov
6d09c76a12 gms/feature_service: add TOPOLOGY_GLOBAL_REQUEST_QUEUE feature flag
Will be needed to coordinate between old and new nodes during upgrade.
2025-06-09 13:38:48 +03:00
Raphael S. Carvalho
2d716f3ffe replica: Fix truncate assert failure
Truncate doesn't really go well with concurrent writes. The fix (#23560) exposed
a preexisting fragility which I missed.

1) truncate gets RP mark X, truncated_at = second T
2) new sstable written during snapshot or later, also at second T (difference of MS)
3) discard_sstables() get RP Y > saved RP X, since creation time of sstable
with RP Y is equal to truncated_at = second T.

So the problem is that truncate is using a clock of second granularity for
filtering out sstables written later, and after we got low mark and truncate time,
it can happen that a sstable is flushed later within the same second, but at a
different millisecond.
By switching to a millisecond clock (db_clock), we allow sstables written later
within the same second from being filtered out. It's not perfect but
extremely unlikely a new write lands and get flushed in the same
millisecond we recorded truncated_at timepoint. In practice, truncate
will not be used concurrently to writes, so this should be enough for
our tests performing such concurrent actions.
We're moving away from gc_clock which is our cheap lowres_clock, but
time is only retrieved when creating sstable objects, which frequency of
creation is low enough for not having significant consequences, and also
db_clock should be cheap enough since it's usually syscall-less.

Fixes #23771.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24426
2025-06-08 15:59:15 +03:00
Nadav Har'El
a714079a62 Merge 'Add Support for Per-Table Metrics in Alternator' from Amnon Heiman
This series introduces per-table metrics support for Alternator. It includes the following commits:

Add optional per-table metrics for Alternator
Introduces a shared_ptr-based mechanism that allows Alternator to register per-table metrics. These metrics follow the table's lifecycle, similar to how CQL metrics are handled. The use of shared_ptr ensures no direct dependency between table stats and Alternator.

Enable registration of stats objects per table
Adds support for registering a stats object using a keyspace and table name. Per-table metrics are prefixed with alternator_table to differentiate them from per-shard metrics. Metrics are reported once per node, and those not meaningful at the table level (e.g. create/delete) are excluded. All metrics use the skip_when_empty flag.

Update per-table metrics handling
Adds a helper function to retrieve the stats object from a table schema. Updates both per-shard and per-table metrics, resulting in some code duplication.

Add tests for per-table metrics
Extends existing tests to also validate the per-table metrics. These tests ensure that the new metrics are correctly registered and updated.

This series improves observability in Alternator by enabling fine-grained per-table metrics without disrupting existing per-shard metrics.
**No need to backport**

Fixes #19824

Closes scylladb/scylladb#24046

* github.com:scylladb/scylladb:
  alternator/test_metrics.py: Test the per-table metrics
  alternator/executor.cc: Update per-table metrics
  alternator/stats: Add per-table metrics
  replica/database.hh: Add alternator per-table metrics
  alternator/stats.hh: Introduce a per-table stats container
2025-06-08 10:42:05 +03:00
Botond Dénes
8498bd6376 Merge 'Replace container_to_vec with std::ranges' from Pavel Emelyanov
The helper in question converts an iterable collection to a vector of fmt::to_string()-s of the collection elements.
Patch the caller to use standard library and remove the helper.

Closes scylladb/scylladb#24357

* github.com:scylladb/scylladb:
  api: Drop no longer used container_to_vec helper
  api: Use std::ranges to stringify collections
  api: Use std::ranges to convert std::set<sstring> to std::vector<string>
  api: Use db::config::data_file_directories()' vector directly
  api: Coroutinize get_live_endpoint()
2025-06-06 10:57:06 +03:00
Pavel Emelyanov
12420dc644 api: Shorten get_host_to_id_map() handler
The handler does

- gets host IDs from local token metadata
- for each ID gets the host IP and generates IP:ID std::pair
- converts the sequence of generated pairs into std::unordered_map
- converts the unordered map into vector of jsonable key:value objects

This patch removes the 3rd step and makes the needed jsonable object in
step 2 directly, thus eliminating the interposing unordered_map
creation.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24354
2025-06-06 10:54:23 +03:00
Pavel Emelyanov
428edd41f5 api: Make us of datablse::get_all_keyspaces()
There are two places in the API that want to get the list of keyspace
names. For that they call database::get_keyspaces() and then extract
keys from the returned name to class keyspace map.

There's a database::get_all_keyspaces() method that does exactly that.

Remove the map_keys helper from the api/api.hh that becomes unused.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24353
2025-06-06 10:53:09 +03:00
Pavel Emelyanov
f5743c6afc Merge 'test/alternator: make tests runnable on DynamoDB Local' from Nadav Har'El
The Alternator tests should pass on Alternator (of course), and almost always also on DynamoDB to verify that the tests themselves are correct and don't just enshrine Alternator's incorrect behavior. Although much less important, it is sometimes useful to be able to check if the test also pass on other DynamoDB clones, especially "DynamoDB Local" - Amazon's DynamoDB mock written in Java.

In issue https://github.com/scylladb/scylladb/issues/7775 we noted that some of our tests don't actually pass on DynamoDB Local, for different reasons, but at the time that issue was created most of the tests did work. However, checking now on a newer version of DynamoDB Local (2.6.1), I notice that _all_ tests failed because of some silly reasons that are easy to fix - and this is what the two patches in this series fix. After these fixes, most of the Alternator tests pass on DynamoDB Local. But not all of them - #7775 is still open.

No backport needed - these are just test framework improvements for developers.

Closes scylladb/scylladb#24361

* github.com:scylladb/scylladb:
  test/alternator: any response from healthcheck means server is alive
  test/alternator: fall back to legal-looking access key id
2025-06-06 08:50:58 +03:00
Nadav Har'El
b0f98f7d4b mv: test that view's SELECT automatically includes primary key
Both ScyllaDB's and Datastax's documentation suggest that when creating a
view with CREATE MATERIALIZED VIEW, its SELECT clause doesn't need to list
the view's primary key columns because those are selected automatically.
For example, our documentation has an example in
https://docs.scylladb.com/manual/stable/features/materialized-views.html

```
CREATE MATERIALIZED VIEW building_by_city2 AS
        SELECT meters FROM buildings
        WHERE city IS NOT NULL
        PRIMARY KEY(city, name);
```

Note how the primary key columns - city and name - are not explicitly
SELECTed.

I just discovered that while this behavior was indeed true in Cassandra
3 (and still true in ScyllaDB), it actually got broken in Cassandra 4 and 5.
I reported this apprent regression to Cassandra (CASSANDRA-20701), and
proposing the regression test in this patch to ensure that Scylla can't
suffer a similar regression in the future.

The new test passes on ScyllaDB and Cassandra 3, but fails on Cassandra
4 and 5 (and therefore tagged with "cassandra_bug").

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24399
2025-06-05 16:52:49 +02:00
Piotr Szymaniak
de96c28625 alternator: Add support for TTL when using tablets
Support for TTL-based data removal when using tablets.
The essence of this commit is a separate code path for finding token
ranges owned by the current shard for the cases when tablets are used
and not vnodes. At the same time, the vnodes-case is not touched not to
cause any regressions.

The TTL-caused data removal is normally performed by the primary
replica (both when using vnodes and tablets). For the tablets case,
the already-existing method tablet_map::get_primary_replica(tablet_id)
is used to know if a shard execuring the TTL-related data removal is
the primary replica for each tablet.

A new method tablet_map::get_secondary_replica(tablet_id) has been
added. It is needed by the data invalidation procedure to remove data
when the primary replica node is down - the data is then removed by the
secondary replica node. The mechanism is the same as in the vnodes case.

Since alternator now supports TTL, the test
`test_ttl_enable_error_with_tablets` has been removed.
Also, tests in the test_ttl.py have been made to run twice, once with
vnodes and once with tablets. When run with tablets, the due to lack of
support for LWT with tablets (#18068), tests use
'system:write_isolation' of 'unsafe_rmw'. This approach allows early
regression testing with tablets and is meant only as a tentative
solution.

Fixes scylladb/scylladb#16567

Closes scylladb/scylladb#23662
2025-06-05 17:39:29 +03:00
Amnon Heiman
760c8c3333 alternator/test_metrics.py: Test the per-table metrics
This patch adds tests for the newly added per-table metrics. It mainly
redoes existing tests, but verifies that the per-table metrics are
updated correctly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-06-05 15:12:19 +03:00
Amnon Heiman
3ad7a24eee alternator/executor.cc: Update per-table metrics
This patch adds support for updating per-table metrics. It introduces a
helper function that retrieves the stats object from a table schema.
The code uses a lw_shared_ptr for the stats object to ensure safe updates
even if the table holding it has been deleted.

There is some duplication in the updated code, as both per-shard and
per-table metrics are updated.

The rmw_operation::execute function now accepts two stats objects: one
for the global metrics and one for the per-table metrics.  The use of
execute was also modified—rather than modifying the WCU directly, a
parameter is used so both global and per-table stats can be updated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-06-05 15:12:13 +03:00
Amnon Heiman
d6afd42342 alternator/stats: Add per-table metrics
This patch allows registering a stats object per table. The per-table
stats object needs its metrics registry to be part of the table's
lifecycle, but there could be a scenario in which a table is already
deleted while some Alternator operations are still in progress.  To
handle this, the patch separates the registry from the metrics holder.
It is safe to modify a parameter that is not registered.

Metrics registration is performed via functions instead of the
constructor.

The registration accepts a keyspace and table name as parameters.

The per-table metrics use an alternator_table prefix to distinguish them
from their per-shard equivalents.

The metrics are aggregated and reported once per node.  Metrics that do
not make sense to report per table (such as create and delete) are not
registered. All metrics are marked with skip_when_empty.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-06-05 14:44:03 +03:00
Amnon Heiman
005df0c5c4 replica/database.hh: Add alternator per-table metrics
This patch adds optional per-table metrics for Alternator.

Like CQL, some of Alternator's statistics should be per-table. The
shared_ptr allows Alternator to register such metrics in a way that
makes them part of the table's lifecycle.

Using a shared_ptr does not create dependencies between the table_stats
and Alternator.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-06-05 14:38:14 +03:00
Amnon Heiman
af262317b5 alternator/stats.hh: Introduce a per-table stats container
A per-table stats container will be used to safely hold alternator
per-table stats.

It is build in a way that even if the metrics it holds are no longer
registered, it is still safe to use.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-06-05 14:38:14 +03:00
Ernest Zaslavsky
a39b773d36 encryption_test: Catch exact exception
Apparently `test_kms_network_error` will succeed at any circumstances since most of our exceptions derive from `std::exception`, so whatever happens to the test, for whatever reason it will throw, the test will be marked as passed.

Start catching the exact exception that we expect to be thrown.

Maybe somewhat related to https://github.com/scylladb/scylladb/issues/22628

Fixes: https://github.com/scylladb/scylladb/issues/24145

reapplies reverted: https://github.com/scylladb/scylladb/pull/24065

Should be backported to 2025.2.

Closes scylladb/scylladb#24242
2025-06-05 08:32:51 +03:00
Benny Halevy
8b387109fc disk_space_monitor: add space_source_registration
Register the current space_source_fn in an RAII
object that resets monitor._space_source to the
previous function when the RAII object is destroyed.

Use space_source_registration in database_test::
mutation_dump_generated_schema_deterministic_id_version
to prevent use-after-stack-return in the test.

Fixes #24314

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#24342
2025-06-04 16:25:24 +03:00
Ernest Zaslavsky
1446f57635 minio: update CLI usage, remove deprecated mc options
Replace phased-out `mc` command options with supported alternatives.
Ensures compatibility with the latest MinIO version.

Closes scylladb/scylladb#24363
2025-06-04 16:22:48 +03:00
Anna Stuchlik
8b989d7fb1 doc: add the upgrade guide from 2025.1 to 2025.2
This commit adds the upgrade guide from version 2025.1 to 2025.2.
Also, it removes the upgrade guides existing for the previous version
that are irrelevant in 2025.2 (upgrade from OSS 6.2 and Enterprise 2024.x).

Note that the new guide does not include the "Enable Consistent Topology Updates" page,
as users upgrading to 2025.2 have consistent topology updates already enabled.

Fixes https://github.com/scylladb/scylladb/issues/24133

Fixes https://github.com/scylladb/scylladb/issues/24265

Closes scylladb/scylladb#24266
2025-06-04 14:00:05 +03:00
Szymon Malewski
5969809607 mapreduce_service: Prevent race condition
In parallelized aggregation functions super-coordinator (node performing final merging step) receives and merges each partial result in parallel coroutines (`parallel_for_each`).
Usually responses are spread over time and actual merging is atomic.
However sometimes partial results are received at the similar time and if an aggregate function (e.g. lua script) yields, two coroutines can try to overwrite the same accumulator one after another,
which leads to losing some of the results.
To prevent this, in this patch each coroutine stores merging results in its own context and overwrites accumulator atomically, only after it was fully merged.
Comparing to the previous implementation order of operands in merging function is swapped, but the order of aggregation is not guaranteed anyway.

Fixes #20662

Closes scylladb/scylladb#24106
2025-06-04 13:47:11 +03:00
Nadav Har'El
6cbcabd100 alternator: hide internal tags from users
The "tags" mechanism in Alternator is a convenient way to attach metadata
to Alternator tables. Recently we have started using it more and more for
internal metadata storage:

  * UpdateTimeToLive stores the attribute in a tag system:ttl_attribute
  * CreateTable stores provisioned throughput in tags
    system:provisioned_rcu and system:provisioned_wcu
  * CreateTable stores the table's creation time in a tag called
    system:table_creation_time.

We do not want any of these internal tags to be visible to a
ListTagsOfResource request, because if they are visible (as before this
patch), systems such as Terraform can get confused when they suddenly
see a tag which they didn't set - and may even attempt to delete it
(as reported in issue #24098).

Moreover, we don't want any of these internal tags to be writable
with TagResource or UntagResource: If a user wants to change the TTL
setting they should do it via UpdateTimeToLive - not by writing
directly to tags.

So in this patch we forbid read or write to *any* tag that begins
with the "system:" prefix, except one: "system:write_isolation".
That tag is deliberately intended to be writable by the user, as
a configuration mechanism, and is never created internally by
Scylla. We should have perhaps chosen a different prefix for
configurable vs. internal tags, or chosen more unique prefixes -
but let's not change these historic names now.

This patch also adds regression tests for the internal tags features,
failing before this patch and passing after:
1. internal tags, specifically system:ttl_attribute, are not visible
   in ListTagsOfResource, and cannot be modified by TagResource or
   UntagResource.
2. system:write_isolation is not internal, and be written by either
   TagResource or UntagResource, and read with ListTagsOfResource.

This patch also fixes a bug in the test where we added more checks
for system:write_isolation - test_tag_resource_write_isolation_values.
This test forgot to remove the system:write_isolation tags from
test_table when it ended, which would lead to other tests that run
later to run with a non-default write isolation - something which we
never intended.

Fixes #24098.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24299
2025-06-03 20:40:50 +03:00
Pavel Emelyanov
37e6ff1a3c Merge 'test.py: cql: run tests using bare pytest command' from Evgeniy Naydanov
Create a custom pytest test collector for .cql files and move CQL test execution logic from `CQLApprovalTest` class and `pylib/cql_repl/cql_repl.py` file to `CqlTest.runtest()` method.

In result, the only difference between CQLApproval and Python suite types is suffixes of test files.

Also there is a separate commit to remove dead code:

There is `write_junit_failure_report()` method in Test class which was used to generate a JUnitXML report.  But it became a dead code after removal of `write_junit_report()` function in 1e1d213592 to avoid duplication of error reporting in Jenkins (see https://github.com/scylladb/scylladb/issues/23220.)  This commit removes this method and all its implementations in subclasses.

Closes scylladb/scylladb#24301

* github.com:scylladb/scylladb:
  test.py: cql: don't exit from pytest session on failed CQL
  test.py: cql: run tests using bare pytest command
  test.py: python: set test.id according to --run_id argument
  test.py: python: pass --tmpdir from test.py to all Python tests
  test.py: remove dead code after removing of write_junit_report()
2025-06-03 19:32:06 +03:00
Pavel Emelyanov
24f430c6d2 Merge 'test.py: dtest: port next_gating tests from auth_roles_test.py' from Evgeniy Naydanov
Copy `auth_roles_test.py` from scylla-dtest test suite, remove all not next_gating tests from it, and make it works with `test.py`

As a part of the porting process, copy missed utility functions from scylla-dtest, remove unused imports and markers.

Enable the test in `suite.yaml` (run in dev mode only.)

Closes scylladb/scylladb#24343

* github.com:scylladb/scylladb:
  test.py: dtest: make auth_roles_test.py run using test.py
  test.py: dtest: add wait_for_any_log() to tools/log_utils.py
  test.py: dtest: add part of tools/assertions.py
  test.py: dtest: pickup latest code for retrying.py from dtest
  test.py: dtest: copy unmodified auth_roles_test.py
2025-06-03 18:54:47 +03:00
Patryk Jędrzejczak
8756c233e0 test: test_raft_recovery_user_data: disable hinted handoff
The test is currently flaky, writes can fail with "Too many in flight
hints: 10485936". See scylladb/scylladb#23565 for more details.

We suspect that scylladb/scylladb#23565 is caused by an infrastructure
issue - slow disks on some machines we run CI jobs on.

Since the test fails often and investigation doesn't seem to be easy,
we first deflake the test in this patch by disabling hinted handoff.

For replacing nodes, we provide `cfg` because there should have been
`cfg` in the first place. The test was correct anyway because:
- `tablets_mode_for_new_keyspaces` is set to `true` by default in
  test/cluster/suite.yaml,
- `endpoint_snitch` is set to `GossipingPropertyFileSnitch` by default
  if the property file is provided in `ScyllaServer.__init__`.

Ref scylladb/scylladb#23565

We should backport this patch to 2025.2 because this test is also flaky
on CI jobs using 2025.2. Older branches don't have this test.

Closes scylladb/scylladb#24364
2025-06-03 17:48:42 +02:00
Nadav Har'El
ac70e34de9 test/alternator: verify that DeleteItem returns an empty object
A user on StackOverflow (https://stackoverflow.com/questions/79650278)
reported that DeleteItem returns the apropriate response (an empty
object) on DynamoDB, but doesn't on "DynamoDB Local" (Amazon's local
mock of DynamoDB). I wrote the test in this patch to make sure that
Alternator doesn't have this bug, and indeed it doesn't: When DeleteItem
is used without any option that asks for additional output, its reponse
is, as expected, an empty object.

As usual, the new test passes on both Alternator and AWS DynamoDB.
(I didn't actually test on DynamoDB Local, I have some problems with
running that, but it doesn't matter, we have no intention of testing
DynamoDB Local).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24359
2025-06-03 18:47:34 +03:00
Avi Kivity
744015cf26 test.py: allow cmake configuration and ./configure.py configuration to coexist
Cmake emits its build.ninja into build/, while configure.py emits
build.ninja into ./. test.py uses this difference to choose the directory
structure to test.

The problem is that vscode will randomly call cmake to understand the
directory structure, so we end up with both build.ninja set up.

Invert the logic to look for ./build.ninja to determine the mode (instead
of build/build.ninja which can exist even if the user uses traditional
configuration).

It can still happen that a stray ./build.ninja exists (for example due
to switching branches), but that is rarer than having vscode auto-create
it.

Closes scylladb/scylladb#24269
2025-06-03 16:46:41 +03:00
Piotr Dulikowski
f6669422e1 Merge 'test.py: refactor test facades for better error handling' from Andrei Chekun
Switching to f-string formatting to simplify the code and to unify it with a general approach for formatting strings.
If the log file absent or empty test fails with an error regarding a missing boost log file, however, it's not helpful since it's not a root cause of the fail. Adding logic to log this issue as a warning in a pytest's log file and continue with providing results to the pytest itself.

Closes scylladb/scylladb#24307

* github.com:scylladb/scylladb:
  test.py: enhance boost_facade missing log file handling
  test.py: switch using f-string instead format in facades
2025-06-03 14:03:07 +02:00
Pavel Emelyanov
96029c7c93 Update seastar submodule
* seastar d7ff58f2...26badcb1 (22):
  > http/client: Skip HEAD reply body processing
  > httpd: Remove unused connection::_req member
  > httpd: Don't write body for HEAD replies
  > http: Move trailing chunk write into reply.cc
  > http_client: Add ECONNRESET to retryable errors
  > stall_detector: no backtrace if exception
  > http: Add test for "aborted" client
  > http: in the client, fix malforming of requests with zero-sized bodies
  > http: Track bytes read from a response
  > http: Add test for improper client handling of aborted requests
  > aio_storage_context: Rename iocb_pool::_iocb_pool to _all_iocbs
  > resource: Add some debug-level logging to memory allocation
  > resource: Rework sysconf memory fallback
  > resource: Indentation fix after previous patch
  > resource: Calculate available memory from NUMA nodes
  > resource: Move NUMA nodes vector evaluation up
  > reactor: Drop _reuseport boolean
  > reactor: Simplify network stack creation and initialization
  > reactor: Remove write-only _thread_id
  > reactor: Keep task-queues in std::array instead of static_vector
  > reactor: Mark _id and task_queue::_id const
  > memory: Report oversized alloc count as metric

scylla-gdb update included:

The reactor::_task_queues can be std::array or unique ptrs. Also check
the tq_ptr for being nullptr, as array doesn't have "size" only
"capacity" and can have non-registered groups.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24294
2025-06-03 13:47:05 +03:00
Nadav Har'El
e32559758a test/alternator: any response from healthcheck means server is alive
In the Alternator tests we check (in dynamodb_test_connect()) after
every test that the server is still alive, so we can blaim the test
that just ran if it crashes the server. We check the server's health
using a simple GET response, which works on both DynamoDB and
Alternator, e.g.,
```
$ curl http://dynamodb.us-east-2.amazonaws.com/
healthy: dynamodb.us-east-2.amazonaws.com
```

However, it turns out that new versions of DynamoDB Local - Amazon's
local mock of DynamoDB, for some reason insists that all requests -
including this health check - must be signed, so our unsigned health
request is rejected with error 400, saying the request must be signed.
So the current code which insists that the response have error code
200, fails and the test incorrectly things that DynamoDB Local crashed
during the test.

The fix is trivial: Just don't check that the error code is 200.
Any HTTP response from the server means it is still alive! If the
server is not alive, we will get an exception, not any HTTP response,
and this will lead the code to the "server has crashed" case.

Refs #7775

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-03 12:25:51 +03:00
Nadav Har'El
9732545958 test/alternator: fall back to legal-looking access key id
When the Alternator tests run against Scylla, they figure out (using
CQL) the correct username and password needed to connect. When it can't,
we fell back to some silly pair 'unknown_user', 'unknown_secret',
assuming that the server won't check it anyway.

It turns out that if we want to run tests against new version of
DynamoDB Local (Amazon's local mock of DynamoDB), it indeed doesn't
authentication, but starting in DynamoDB Local 2.0, it does check that
the access key ID (the username) itself is valid, and considers
"unknown_user" to be invalid because it contains an underscore -
AWS_ACCESS_KEY_ID must only contains letters and numbers.
See https://repost.aws/articles/ARc4hEkF9CRgOrw8kSMe6CwQ/ for Amazon's
explanation for this change in DynamoDB Local 2.

The trivial fix is to remove the underscore from the silly username.
After this patch, Alternator tests can connect to DynamoDB Local.
They still can't complete correctly - this will be fixed in the next
patch.

Refs #7775

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-03 12:25:51 +03:00
Evgeniy Naydanov
f0d283afd7 test.py: cql: don't exit from pytest session on failed CQL
There is the fixture in `test/cql/conftest.py` which checks
CQL connection after each test and exit from pytest session if
the connection was failed.  For CQL tests it's simply no
difference what to use: pytest.exit() or pytest.fail() because
tests are executing one-by-one in separate pytest sessions.

Change it to pytest.fail() for future integration into a single
pytest session.
2025-06-03 07:54:51 +00:00
Evgeniy Naydanov
cdc4b520da test.py: cql: run tests using bare pytest command
Create a custom pytest test collector for .cql files and
move CQL test execution logic from `CQLApprovalTest` class
and `pylib/cql_repl/cql_repl.py` file to `CqlTest.runtest()`
method.

In result, the only difference between CQLApproval and Python
suite types is suffixes of test files.
2025-06-03 07:54:51 +00:00
Evgeniy Naydanov
0fba0df4f6 test.py: python: set test.id according to --run_id argument
test.py uses `Test.id` attribute to distinguish repeated tests
in one run and pass it as `--run_id` CLI argument to pytest.
Use this argument to set the test's `id` attribute inside pytest
session to fix problem with paths to some test artifacts.
2025-06-03 07:54:51 +00:00
Michał Chojnowski
ea4d251ad2 compress: fix a use-after-free in dictionary_holder::get_recommended_dict()
The function calls copy() on a foreign_ptr
(stored in a map) which can be destroyed
(erased from the map) before the copy() completes.
This is illegal.

One way to fix this would be to apply an rwlock
to the map. Another way is to wrap the `foreign_ptr`
in a `lw_shared_ptr` and extend its lifetime over
the `copy()` call. This patch does the latter.

Fixes scylladb/scylladb#24165
Fixes scylladb/scylladb#24174

Closes scylladb/scylladb#24175
2025-06-03 10:42:38 +03:00
Piotr Dulikowski
f5b18d275b Merge 'test/boost: Adjust tests to RF-rack-valid keyspaces' from Dawid Mędrek
This PR adjusts existing Boost tests so they respect the invariant
introduced by enabling `rf_rack_valid_keyspaces` configuration option.
We disable it explicitly in more problematic tests. After that, we
enable the option by default in the whole test suite.

Fixes scylladb/scylladb#23958

Backport: backporting to 2025.1 and 2025.2 to be able to test the implementation there too.

Closes scylladb/scylladb#23802

* github.com:scylladb/scylladb:
  test/lib/cql_test_env.cc: Enable rf_rack_valid_keyspaces by default
  test/boost/tablets_test.cc: Explicitly disable rf_rack_valid_keyspaces in problematic tests
  test/boost/tablets_test.cc: Fix indentation in test_load_balancing_with_random_load
  test/boost/tablets_test.cc: Adjust test_load_balancing_with_random_load to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancing_works_with_in_progress_transitions to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancing_resize_requests to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancing_with_two_empty_nodes to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancer_shuffle_mode to RF-rack-validity
2025-06-03 08:43:34 +02:00
Evgeniy Naydanov
ac972231fa test.py: python: pass --tmpdir from test.py to all Python tests
`--tmpdir` CLI argument is used to point to the directory with logs
and other test artifacts.  It has default values both in test.py
and pytest (`test/conftest.py`). These values are the same.  But for
non-default values it's required to pass it from test.py to pytest
explicitly.  This done for Topology tests, but not for all Python test
suites.  The commit fixes the problem by adding the argument in
`_prepare_pytest_command()` method of the base `PythonTest` class.
2025-06-03 05:45:05 +00:00
Evgeniy Naydanov
17401aaf31 test.py: remove dead code after removing of write_junit_report()
There is `write_junit_failure_report()` method in Test class which
was used to generate a JUnitXML report.  But it became a dead code
after removal of `write_junit_report()` function in
1e1d213592 to avoid duplication of
error reporting in Jenkins (see #23220.)  This commit removes this
method and all its implementations in subclasses.
2025-06-03 02:28:41 +00:00
Pavel Emelyanov
eb5160cb4d api: Drop no longer used container_to_vec helper
All callers are patched to use std::ranges.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-06-02 20:09:58 +03:00
Pavel Emelyanov
f6afc02951 api: Use std::ranges to stringify collections
There are several endpoints that have collection of objects at hand and
want a vector of corresponding strings. Use std::ranges library for
conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-06-02 20:09:56 +03:00
Pavel Emelyanov
b943902ff7 api: Use std::ranges to convert std::set<sstring> to std::vector<string>
The column_family/get_sstables_for_key endpoint collects a set of
sstable names and converts it to vector of strings using homebrew
helper. The std::ranges convertor works just as nice.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-06-02 20:09:28 +03:00