Commit Graph

8929 Commits

Author SHA1 Message Date
Raphael S. Carvalho
2d716f3ffe replica: Fix truncate assert failure
Truncate doesn't really go well with concurrent writes. The fix (#23560) exposed
a preexisting fragility which I missed.

1) truncate gets RP mark X, truncated_at = second T
2) new sstable written during snapshot or later, also at second T (difference of MS)
3) discard_sstables() get RP Y > saved RP X, since creation time of sstable
with RP Y is equal to truncated_at = second T.

So the problem is that truncate is using a clock of second granularity for
filtering out sstables written later, and after we got low mark and truncate time,
it can happen that a sstable is flushed later within the same second, but at a
different millisecond.
By switching to a millisecond clock (db_clock), we allow sstables written later
within the same second from being filtered out. It's not perfect but
extremely unlikely a new write lands and get flushed in the same
millisecond we recorded truncated_at timepoint. In practice, truncate
will not be used concurrently to writes, so this should be enough for
our tests performing such concurrent actions.
We're moving away from gc_clock which is our cheap lowres_clock, but
time is only retrieved when creating sstable objects, which frequency of
creation is low enough for not having significant consequences, and also
db_clock should be cheap enough since it's usually syscall-less.

Fixes #23771.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24426
2025-06-08 15:59:15 +03:00
Nadav Har'El
a714079a62 Merge 'Add Support for Per-Table Metrics in Alternator' from Amnon Heiman
This series introduces per-table metrics support for Alternator. It includes the following commits:

Add optional per-table metrics for Alternator
Introduces a shared_ptr-based mechanism that allows Alternator to register per-table metrics. These metrics follow the table's lifecycle, similar to how CQL metrics are handled. The use of shared_ptr ensures no direct dependency between table stats and Alternator.

Enable registration of stats objects per table
Adds support for registering a stats object using a keyspace and table name. Per-table metrics are prefixed with alternator_table to differentiate them from per-shard metrics. Metrics are reported once per node, and those not meaningful at the table level (e.g. create/delete) are excluded. All metrics use the skip_when_empty flag.

Update per-table metrics handling
Adds a helper function to retrieve the stats object from a table schema. Updates both per-shard and per-table metrics, resulting in some code duplication.

Add tests for per-table metrics
Extends existing tests to also validate the per-table metrics. These tests ensure that the new metrics are correctly registered and updated.

This series improves observability in Alternator by enabling fine-grained per-table metrics without disrupting existing per-shard metrics.
**No need to backport**

Fixes #19824

Closes scylladb/scylladb#24046

* github.com:scylladb/scylladb:
  alternator/test_metrics.py: Test the per-table metrics
  alternator/executor.cc: Update per-table metrics
  alternator/stats: Add per-table metrics
  replica/database.hh: Add alternator per-table metrics
  alternator/stats.hh: Introduce a per-table stats container
2025-06-08 10:42:05 +03:00
Pavel Emelyanov
f5743c6afc Merge 'test/alternator: make tests runnable on DynamoDB Local' from Nadav Har'El
The Alternator tests should pass on Alternator (of course), and almost always also on DynamoDB to verify that the tests themselves are correct and don't just enshrine Alternator's incorrect behavior. Although much less important, it is sometimes useful to be able to check if the test also pass on other DynamoDB clones, especially "DynamoDB Local" - Amazon's DynamoDB mock written in Java.

In issue https://github.com/scylladb/scylladb/issues/7775 we noted that some of our tests don't actually pass on DynamoDB Local, for different reasons, but at the time that issue was created most of the tests did work. However, checking now on a newer version of DynamoDB Local (2.6.1), I notice that _all_ tests failed because of some silly reasons that are easy to fix - and this is what the two patches in this series fix. After these fixes, most of the Alternator tests pass on DynamoDB Local. But not all of them - #7775 is still open.

No backport needed - these are just test framework improvements for developers.

Closes scylladb/scylladb#24361

* github.com:scylladb/scylladb:
  test/alternator: any response from healthcheck means server is alive
  test/alternator: fall back to legal-looking access key id
2025-06-06 08:50:58 +03:00
Nadav Har'El
b0f98f7d4b mv: test that view's SELECT automatically includes primary key
Both ScyllaDB's and Datastax's documentation suggest that when creating a
view with CREATE MATERIALIZED VIEW, its SELECT clause doesn't need to list
the view's primary key columns because those are selected automatically.
For example, our documentation has an example in
https://docs.scylladb.com/manual/stable/features/materialized-views.html

```
CREATE MATERIALIZED VIEW building_by_city2 AS
        SELECT meters FROM buildings
        WHERE city IS NOT NULL
        PRIMARY KEY(city, name);
```

Note how the primary key columns - city and name - are not explicitly
SELECTed.

I just discovered that while this behavior was indeed true in Cassandra
3 (and still true in ScyllaDB), it actually got broken in Cassandra 4 and 5.
I reported this apprent regression to Cassandra (CASSANDRA-20701), and
proposing the regression test in this patch to ensure that Scylla can't
suffer a similar regression in the future.

The new test passes on ScyllaDB and Cassandra 3, but fails on Cassandra
4 and 5 (and therefore tagged with "cassandra_bug").

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24399
2025-06-05 16:52:49 +02:00
Piotr Szymaniak
de96c28625 alternator: Add support for TTL when using tablets
Support for TTL-based data removal when using tablets.
The essence of this commit is a separate code path for finding token
ranges owned by the current shard for the cases when tablets are used
and not vnodes. At the same time, the vnodes-case is not touched not to
cause any regressions.

The TTL-caused data removal is normally performed by the primary
replica (both when using vnodes and tablets). For the tablets case,
the already-existing method tablet_map::get_primary_replica(tablet_id)
is used to know if a shard execuring the TTL-related data removal is
the primary replica for each tablet.

A new method tablet_map::get_secondary_replica(tablet_id) has been
added. It is needed by the data invalidation procedure to remove data
when the primary replica node is down - the data is then removed by the
secondary replica node. The mechanism is the same as in the vnodes case.

Since alternator now supports TTL, the test
`test_ttl_enable_error_with_tablets` has been removed.
Also, tests in the test_ttl.py have been made to run twice, once with
vnodes and once with tablets. When run with tablets, the due to lack of
support for LWT with tablets (#18068), tests use
'system:write_isolation' of 'unsafe_rmw'. This approach allows early
regression testing with tablets and is meant only as a tentative
solution.

Fixes scylladb/scylladb#16567

Closes scylladb/scylladb#23662
2025-06-05 17:39:29 +03:00
Amnon Heiman
760c8c3333 alternator/test_metrics.py: Test the per-table metrics
This patch adds tests for the newly added per-table metrics. It mainly
redoes existing tests, but verifies that the per-table metrics are
updated correctly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-06-05 15:12:19 +03:00
Ernest Zaslavsky
a39b773d36 encryption_test: Catch exact exception
Apparently `test_kms_network_error` will succeed at any circumstances since most of our exceptions derive from `std::exception`, so whatever happens to the test, for whatever reason it will throw, the test will be marked as passed.

Start catching the exact exception that we expect to be thrown.

Maybe somewhat related to https://github.com/scylladb/scylladb/issues/22628

Fixes: https://github.com/scylladb/scylladb/issues/24145

reapplies reverted: https://github.com/scylladb/scylladb/pull/24065

Should be backported to 2025.2.

Closes scylladb/scylladb#24242
2025-06-05 08:32:51 +03:00
Benny Halevy
8b387109fc disk_space_monitor: add space_source_registration
Register the current space_source_fn in an RAII
object that resets monitor._space_source to the
previous function when the RAII object is destroyed.

Use space_source_registration in database_test::
mutation_dump_generated_schema_deterministic_id_version
to prevent use-after-stack-return in the test.

Fixes #24314

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#24342
2025-06-04 16:25:24 +03:00
Ernest Zaslavsky
1446f57635 minio: update CLI usage, remove deprecated mc options
Replace phased-out `mc` command options with supported alternatives.
Ensures compatibility with the latest MinIO version.

Closes scylladb/scylladb#24363
2025-06-04 16:22:48 +03:00
Nadav Har'El
6cbcabd100 alternator: hide internal tags from users
The "tags" mechanism in Alternator is a convenient way to attach metadata
to Alternator tables. Recently we have started using it more and more for
internal metadata storage:

  * UpdateTimeToLive stores the attribute in a tag system:ttl_attribute
  * CreateTable stores provisioned throughput in tags
    system:provisioned_rcu and system:provisioned_wcu
  * CreateTable stores the table's creation time in a tag called
    system:table_creation_time.

We do not want any of these internal tags to be visible to a
ListTagsOfResource request, because if they are visible (as before this
patch), systems such as Terraform can get confused when they suddenly
see a tag which they didn't set - and may even attempt to delete it
(as reported in issue #24098).

Moreover, we don't want any of these internal tags to be writable
with TagResource or UntagResource: If a user wants to change the TTL
setting they should do it via UpdateTimeToLive - not by writing
directly to tags.

So in this patch we forbid read or write to *any* tag that begins
with the "system:" prefix, except one: "system:write_isolation".
That tag is deliberately intended to be writable by the user, as
a configuration mechanism, and is never created internally by
Scylla. We should have perhaps chosen a different prefix for
configurable vs. internal tags, or chosen more unique prefixes -
but let's not change these historic names now.

This patch also adds regression tests for the internal tags features,
failing before this patch and passing after:
1. internal tags, specifically system:ttl_attribute, are not visible
   in ListTagsOfResource, and cannot be modified by TagResource or
   UntagResource.
2. system:write_isolation is not internal, and be written by either
   TagResource or UntagResource, and read with ListTagsOfResource.

This patch also fixes a bug in the test where we added more checks
for system:write_isolation - test_tag_resource_write_isolation_values.
This test forgot to remove the system:write_isolation tags from
test_table when it ended, which would lead to other tests that run
later to run with a non-default write isolation - something which we
never intended.

Fixes #24098.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24299
2025-06-03 20:40:50 +03:00
Pavel Emelyanov
37e6ff1a3c Merge 'test.py: cql: run tests using bare pytest command' from Evgeniy Naydanov
Create a custom pytest test collector for .cql files and move CQL test execution logic from `CQLApprovalTest` class and `pylib/cql_repl/cql_repl.py` file to `CqlTest.runtest()` method.

In result, the only difference between CQLApproval and Python suite types is suffixes of test files.

Also there is a separate commit to remove dead code:

There is `write_junit_failure_report()` method in Test class which was used to generate a JUnitXML report.  But it became a dead code after removal of `write_junit_report()` function in 1e1d213592 to avoid duplication of error reporting in Jenkins (see https://github.com/scylladb/scylladb/issues/23220.)  This commit removes this method and all its implementations in subclasses.

Closes scylladb/scylladb#24301

* github.com:scylladb/scylladb:
  test.py: cql: don't exit from pytest session on failed CQL
  test.py: cql: run tests using bare pytest command
  test.py: python: set test.id according to --run_id argument
  test.py: python: pass --tmpdir from test.py to all Python tests
  test.py: remove dead code after removing of write_junit_report()
2025-06-03 19:32:06 +03:00
Pavel Emelyanov
24f430c6d2 Merge 'test.py: dtest: port next_gating tests from auth_roles_test.py' from Evgeniy Naydanov
Copy `auth_roles_test.py` from scylla-dtest test suite, remove all not next_gating tests from it, and make it works with `test.py`

As a part of the porting process, copy missed utility functions from scylla-dtest, remove unused imports and markers.

Enable the test in `suite.yaml` (run in dev mode only.)

Closes scylladb/scylladb#24343

* github.com:scylladb/scylladb:
  test.py: dtest: make auth_roles_test.py run using test.py
  test.py: dtest: add wait_for_any_log() to tools/log_utils.py
  test.py: dtest: add part of tools/assertions.py
  test.py: dtest: pickup latest code for retrying.py from dtest
  test.py: dtest: copy unmodified auth_roles_test.py
2025-06-03 18:54:47 +03:00
Patryk Jędrzejczak
8756c233e0 test: test_raft_recovery_user_data: disable hinted handoff
The test is currently flaky, writes can fail with "Too many in flight
hints: 10485936". See scylladb/scylladb#23565 for more details.

We suspect that scylladb/scylladb#23565 is caused by an infrastructure
issue - slow disks on some machines we run CI jobs on.

Since the test fails often and investigation doesn't seem to be easy,
we first deflake the test in this patch by disabling hinted handoff.

For replacing nodes, we provide `cfg` because there should have been
`cfg` in the first place. The test was correct anyway because:
- `tablets_mode_for_new_keyspaces` is set to `true` by default in
  test/cluster/suite.yaml,
- `endpoint_snitch` is set to `GossipingPropertyFileSnitch` by default
  if the property file is provided in `ScyllaServer.__init__`.

Ref scylladb/scylladb#23565

We should backport this patch to 2025.2 because this test is also flaky
on CI jobs using 2025.2. Older branches don't have this test.

Closes scylladb/scylladb#24364
2025-06-03 17:48:42 +02:00
Nadav Har'El
ac70e34de9 test/alternator: verify that DeleteItem returns an empty object
A user on StackOverflow (https://stackoverflow.com/questions/79650278)
reported that DeleteItem returns the apropriate response (an empty
object) on DynamoDB, but doesn't on "DynamoDB Local" (Amazon's local
mock of DynamoDB). I wrote the test in this patch to make sure that
Alternator doesn't have this bug, and indeed it doesn't: When DeleteItem
is used without any option that asks for additional output, its reponse
is, as expected, an empty object.

As usual, the new test passes on both Alternator and AWS DynamoDB.
(I didn't actually test on DynamoDB Local, I have some problems with
running that, but it doesn't matter, we have no intention of testing
DynamoDB Local).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#24359
2025-06-03 18:47:34 +03:00
Avi Kivity
744015cf26 test.py: allow cmake configuration and ./configure.py configuration to coexist
Cmake emits its build.ninja into build/, while configure.py emits
build.ninja into ./. test.py uses this difference to choose the directory
structure to test.

The problem is that vscode will randomly call cmake to understand the
directory structure, so we end up with both build.ninja set up.

Invert the logic to look for ./build.ninja to determine the mode (instead
of build/build.ninja which can exist even if the user uses traditional
configuration).

It can still happen that a stray ./build.ninja exists (for example due
to switching branches), but that is rarer than having vscode auto-create
it.

Closes scylladb/scylladb#24269
2025-06-03 16:46:41 +03:00
Piotr Dulikowski
f6669422e1 Merge 'test.py: refactor test facades for better error handling' from Andrei Chekun
Switching to f-string formatting to simplify the code and to unify it with a general approach for formatting strings.
If the log file absent or empty test fails with an error regarding a missing boost log file, however, it's not helpful since it's not a root cause of the fail. Adding logic to log this issue as a warning in a pytest's log file and continue with providing results to the pytest itself.

Closes scylladb/scylladb#24307

* github.com:scylladb/scylladb:
  test.py: enhance boost_facade missing log file handling
  test.py: switch using f-string instead format in facades
2025-06-03 14:03:07 +02:00
Nadav Har'El
e32559758a test/alternator: any response from healthcheck means server is alive
In the Alternator tests we check (in dynamodb_test_connect()) after
every test that the server is still alive, so we can blaim the test
that just ran if it crashes the server. We check the server's health
using a simple GET response, which works on both DynamoDB and
Alternator, e.g.,
```
$ curl http://dynamodb.us-east-2.amazonaws.com/
healthy: dynamodb.us-east-2.amazonaws.com
```

However, it turns out that new versions of DynamoDB Local - Amazon's
local mock of DynamoDB, for some reason insists that all requests -
including this health check - must be signed, so our unsigned health
request is rejected with error 400, saying the request must be signed.
So the current code which insists that the response have error code
200, fails and the test incorrectly things that DynamoDB Local crashed
during the test.

The fix is trivial: Just don't check that the error code is 200.
Any HTTP response from the server means it is still alive! If the
server is not alive, we will get an exception, not any HTTP response,
and this will lead the code to the "server has crashed" case.

Refs #7775

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-03 12:25:51 +03:00
Nadav Har'El
9732545958 test/alternator: fall back to legal-looking access key id
When the Alternator tests run against Scylla, they figure out (using
CQL) the correct username and password needed to connect. When it can't,
we fell back to some silly pair 'unknown_user', 'unknown_secret',
assuming that the server won't check it anyway.

It turns out that if we want to run tests against new version of
DynamoDB Local (Amazon's local mock of DynamoDB), it indeed doesn't
authentication, but starting in DynamoDB Local 2.0, it does check that
the access key ID (the username) itself is valid, and considers
"unknown_user" to be invalid because it contains an underscore -
AWS_ACCESS_KEY_ID must only contains letters and numbers.
See https://repost.aws/articles/ARc4hEkF9CRgOrw8kSMe6CwQ/ for Amazon's
explanation for this change in DynamoDB Local 2.

The trivial fix is to remove the underscore from the silly username.
After this patch, Alternator tests can connect to DynamoDB Local.
They still can't complete correctly - this will be fixed in the next
patch.

Refs #7775

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-06-03 12:25:51 +03:00
Evgeniy Naydanov
f0d283afd7 test.py: cql: don't exit from pytest session on failed CQL
There is the fixture in `test/cql/conftest.py` which checks
CQL connection after each test and exit from pytest session if
the connection was failed.  For CQL tests it's simply no
difference what to use: pytest.exit() or pytest.fail() because
tests are executing one-by-one in separate pytest sessions.

Change it to pytest.fail() for future integration into a single
pytest session.
2025-06-03 07:54:51 +00:00
Evgeniy Naydanov
cdc4b520da test.py: cql: run tests using bare pytest command
Create a custom pytest test collector for .cql files and
move CQL test execution logic from `CQLApprovalTest` class
and `pylib/cql_repl/cql_repl.py` file to `CqlTest.runtest()`
method.

In result, the only difference between CQLApproval and Python
suite types is suffixes of test files.
2025-06-03 07:54:51 +00:00
Evgeniy Naydanov
0fba0df4f6 test.py: python: set test.id according to --run_id argument
test.py uses `Test.id` attribute to distinguish repeated tests
in one run and pass it as `--run_id` CLI argument to pytest.
Use this argument to set the test's `id` attribute inside pytest
session to fix problem with paths to some test artifacts.
2025-06-03 07:54:51 +00:00
Piotr Dulikowski
f5b18d275b Merge 'test/boost: Adjust tests to RF-rack-valid keyspaces' from Dawid Mędrek
This PR adjusts existing Boost tests so they respect the invariant
introduced by enabling `rf_rack_valid_keyspaces` configuration option.
We disable it explicitly in more problematic tests. After that, we
enable the option by default in the whole test suite.

Fixes scylladb/scylladb#23958

Backport: backporting to 2025.1 and 2025.2 to be able to test the implementation there too.

Closes scylladb/scylladb#23802

* github.com:scylladb/scylladb:
  test/lib/cql_test_env.cc: Enable rf_rack_valid_keyspaces by default
  test/boost/tablets_test.cc: Explicitly disable rf_rack_valid_keyspaces in problematic tests
  test/boost/tablets_test.cc: Fix indentation in test_load_balancing_with_random_load
  test/boost/tablets_test.cc: Adjust test_load_balancing_with_random_load to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancing_works_with_in_progress_transitions to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancing_resize_requests to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancing_with_two_empty_nodes to RF-rack-validity
  test/boost/tablets_test.cc: Adjust test_load_balancer_shuffle_mode to RF-rack-validity
2025-06-03 08:43:34 +02:00
Evgeniy Naydanov
ac972231fa test.py: python: pass --tmpdir from test.py to all Python tests
`--tmpdir` CLI argument is used to point to the directory with logs
and other test artifacts.  It has default values both in test.py
and pytest (`test/conftest.py`). These values are the same.  But for
non-default values it's required to pass it from test.py to pytest
explicitly.  This done for Topology tests, but not for all Python test
suites.  The commit fixes the problem by adding the argument in
`_prepare_pytest_command()` method of the base `PythonTest` class.
2025-06-03 05:45:05 +00:00
Evgeniy Naydanov
17401aaf31 test.py: remove dead code after removing of write_junit_report()
There is `write_junit_failure_report()` method in Test class which
was used to generate a JUnitXML report.  But it became a dead code
after removal of `write_junit_report()` function in
1e1d213592 to avoid duplication of
error reporting in Jenkins (see #23220.)  This commit removes this
method and all its implementations in subclasses.
2025-06-03 02:28:41 +00:00
Andrei Chekun
738cbc07b5 test.py: enhance boost_facade missing log file handling
If the log file absent or empty test fails with an error regarding a missing boost log file, however, it's not helpful since it's not a root cause of the fail. Adding logic to log this issue as a warning in a pytest's log file and continue with providing results to the pytest itself.
2025-06-02 12:17:10 +02:00
Andrei Chekun
5f6740c1fa test.py: switch using f-string instead format in facades
Switching to f-string formatting to simplify the code and to unify it with a general approach for formatting strings.
2025-06-02 12:16:47 +02:00
Pavel Emelyanov
7fef2c4f61 Merge 'test.py: fix metrics gathering' from Andrei Chekun
Move of the run_process done in https://github.com/scylladb/scylladb/pull/24091 was not fully correct. The method run_process was not overridden in the class ResourceGatherOn, so no metrics are collected at all.
Additionally, fix metrics DB location second time.

Closes scylladb/scylladb#24306

* github.com:scylladb/scylladb:
  test.py: fix metrics DB location
  test.py: fix the possibility to gather resource metrics for test
2025-06-02 13:12:42 +03:00
Evgeniy Naydanov
e780164a67 test.py: dtest: make auth_roles_test.py run using test.py
As a part of the porting process, remove unused imports and
markers, remove non-next_gating tests, and code for old
ScyllaDB versions.

Enable the test in suite.yaml (run in dev mode only)
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
145c2fed97 test.py: dtest: add wait_for_any_log() to tools/log_utils.py
Copy wait_for_any_log() function from dtest tools/log_utils.py
with few modifications:

 - Add type hints;
 - Change timeout for node.watch_log_for() calls from 0 to 0.1
   because dtest shim's implementation uses asyncio.timeout()
   and 0 means not "one time" but "never run";
 - Use set() instead of list() for `ret` variable;
 - Remove redundant `found` variable.
 - Remove `remaining` variable and use shallow copies to make
   the code more correct.  As a side effect this makes the
   TimeoutError message more correct too;
 - Use f-string formatting for TimeoutError message;
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
ff2aea7e5b test.py: dtest: add part of tools/assertions.py
Copy few assertion functions from dtest tools/assertions.py:

 - assertion_exception()
 - assertion_invalid()
 - assertion_one()
 - assertion_all()
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
9d70b6307b test.py: dtest: pickup latest code for retrying.py from dtest
Sync retrying.py with dtest.
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
40464faef3 test.py: dtest: copy unmodified auth_roles_test.py
The test is disabled in suite.yaml
2025-06-02 05:14:41 +00:00
Botond Dénes
c52aec3d2f Merge 'tablets: fix missing data after tablet merge ' from Raphael Raph Carvalho
Consider the following scenario:

1) let's assume tablet 0 has range [1, 5] (pre merge)
2) tablet merge happens, tablet 0 has now range [1, 10]
3) tablet_sstable_set isn't refreshed, so holds a stale state, thinks tablet 0 still has range [1, 5]
4) during a full scan, forward service will intersect the full range with tablet ranges and consume one tablet at a time
5) replica service is asked to consume range [1, 10] of tablet 0 (post merge)

We have two possible outcomes:

With cache bypass:

1) cache reader is bypassed
2) sstable reader is created on range [1, 10]
3) unrefreshed tablet_sstable_set holds stale state, but select correctly all sstables intersecting with range [1, 10]

With cache:

1) cache reader is created
2) finds partition with token 5 is cached
3) sstable reader is created on range [1, 4] (later would fast forward to range [6, 10]; also belongs to tablet 0)
4) incremental selector consumes the pre-merge sstable spanning range [1, 5]
4.1) since the partitioned_sstable_set pre-merge contains only that sstable, EOS is reached
4.2) since EOS is reached, the fast forward to range [6, 10] is not allowed.
So with the set refreshed, sstable set is aligned with tablet ranges, and no premature EOS is signalled, otherwise preventing fast forward to from happening and all data from being properly captured in the read.

This change fixes the bug and triggers a mutation source refresh whenever the number of tablets for the table has changed, not only when we have incoming tablets.

Additionally, includes a fix for range reads that span more than one tablet, which can happen during split execution.

Fixes: https://github.com/scylladb/scylladb/issues/23313

This change needs to be backported to all supported versions which implement tablet merge.

Closes scylladb/scylladb#24287

* github.com:scylladb/scylladb:
  replica: Fix range reads spanning sibling tablets
  test: add reproducer and test for mutation source refresh after merge
  tablets: trigger mutation source refresh on tablet count change
2025-05-30 15:37:29 +03:00
Pavel Emelyanov
a65ffdd0df test/result_utils: Do not assume map_reduce reducing order
When map_reduce is called on a collection, one shouldn't expect that it
processes the elements of the collection in any specific order.

Current test of map-reduce over boost outcome assumes that if reduce
function is the string concatenation, then it would concatenate the
given vector of strings in the order they are listed. That requirement
should be relaxed, and the result may have reversed concatentation.

Fixes scylladb/scylladb#24321

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24325
2025-05-30 09:38:59 +02:00
Michael Litvak
3a1be33143 test_cdc_generation_publishing: fix to read monotonically
The test test_multiple_unpublished_cdc_generations reads the CDC
generation timestamps to verify they are published in the correct order.
To do so it issues reads in a loop with a short sleep period and checks
the differences between consecutive reads, assuming they are monotonic.

However the assumption that the reads are monotonic is not valid,
because the reads are issued with consistency_level=ONE, thus we may read
timestamps {A,B} from some node, then read timestamps {A} from another
node that didn't apply the write of the new timestamp B yet. This will
trigger the assert in the test and fail.

To ensure the reads are monotonic we change the test to use consistency
level ALL for the reads.

Fixes scylladb/scylladb#24262

Closes scylladb/scylladb#24272
2025-05-30 08:35:56 +02:00
Pavel Emelyanov
086777e5de Merge 'test.py: python: run tests using bare pytest command' from Evgeniy Naydanov
Main change is splitting logic of `PythonTest.run()` method into `PythonTest.run_ctx()` context manager and `PythonTest.run()` method itself and add the `host` fixture which uses `PythonTest.run_ctx()` context manager to setup and teardown ScyllaDB node if `--test-py-init` argument is used. Otherwise, this fixture returns a value of `--host` CLI argument. Use dynamic scope provided by `testpy_test_fixture_scope()` function instead of `session` to maintain compatibility with `test.py` and `./run` scripts.

Other related changes:

* Add utility `get_testpy_test()` function to `pylib.suite.base` which combines all required steps to create an instance of `Test` class and rework `testpy_test` fixture to use it.

* Switch to use dynamic fixture scope controlled by `--test-py-init` CLI argument to improve compatibility with test.py.  And because in test.py mode the scope is `session`, also change default event loop scope to `session`.

* Convert `get_valid_alternator_role()` to fixture to have more control on the scope of the cache used. Additionally, function `new_dynamodb_session()` was also converted to a fixture, because it uses `get_valid_alternator_role()`.

* Replace dups of `cql` and `this_dc` fixtures in `rest_api` and `pylib/cql_repl` with imports from `cqlpy`.

* Change `build_mode` fixture to return "unknown" if no --mode arguments provided (this is mainly for alternator and cqlpy tests)

* Create a parent directory for a test log file just before opening this file in `run_test()` function instead of having this as a side effect in `Test.__init__()`.

And changes that remove pytest CLI argument duplicates to be able to run tests from different test suites in one pytest session:

* Add 3 supplementary functions to `test.pylib.suite.python`: `add_host_option()` (which adds `--host` options to pytest session), `add_cql_connection_options()` (which adds `--port`, and `--ssl`), and `--add-s3-options` (which adds options related to S3 connection.) Each function decorated with `@cache` decorator to be executed once per pytest session and avoid CLI options duplication for runs which executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables` in one pytest session.

* Move `--auth_username` and `--auth_password` options from `cluster/conftest.py` to add_scylla_cql_connection_options() and slightly rework `cql` fixture to support these options.

* Remove `--input`, `--output`, and `--keep-tmp` pytest CLI opionts from `cluster/object_store/conftest.py` because they are not used in these suite.

* Remove `--omit-scylla-output` CLI option from pytest argparser. Instead, remove it from `sys.argv` in `cqlpy/run.py`.  Also, no need to check this option in `alternator/run`.

Closes scylladb/scylladb#23849

* github.com:scylladb/scylladb:
  test.py: python: run tests using bare pytest command
  test.py: rework testpy_test fixture
  test.py: alternator: convert get_valid_alternator_role() to fixture
  test.py: python: split logic of PythonTest.run()
  test.py: add credentials options to add_cql_connection_options()
  test.py: python: remove dups of cql and this_dc fixtures
  test.py: remove duplication of pytest CLI options
  test.py: remove unused CLI options
  test.py: remove `--omit-scylla-output` from pytest argparser
  test.py: set build_mode to "unknown" if no --mode argument
  test.py: create directory for test log in run_test()
2025-05-30 08:48:43 +03:00
Botond Dénes
7db956965e mutation/mutation_compactor: cache regular/shadowable max-purgable in separate members
Max purgeable has two possible values for each partition: one for
regular tombstones and one for shadowable ones. Yet currently a single
member is used to cache the max-purgeable value for the partition, so
whichever kind of tombstone is checked first, its max-purgeable will
become sticky and apply to the other kind of tombstones too. E.g. if the
first can_gc() check is for a regular tombstone, its max-purgeable will
apply to shadowable tombstones in the partition too, meaning they might
not be purged, even though they are purgeable, as the shadowable
max-purgeable is expected to be more lenient. The other way around is
worse, as it will result in regular tombstone being incorrectly purged,
permitted by the more lenient shadowable tombstone max-purgeable.
Fix this by caching the two possible values in two separate members.
A reproducer unit test is also added.

Fixes: scylladb/scylladb#23272

Closes scylladb/scylladb#24171
2025-05-29 22:52:08 +03:00
Szymon Malewski
18d237a393 alternator/executor: Added checks in batch_write_item
This patch adds checks validating 'BatchWriteItem' requests mostly to avoid ugly fallback message.
It changes request's behaviour in case of an empty array of WriteRequests - previously such an array was ignored and whole request might succeed, now it raises ValidationException, following the documentation and behaviour of DynamoDB.
Patch includes tests in test_manual_requests (`test_batch_write_item_invalid_payload`, `test_batch_write_item_empty_request_list`) testing with several offending cases.

Fixes #23233

Closes scylladb/scylladb#23878
2025-05-29 20:33:57 +03:00
Robert Bindar
c570941692 Add nodetool refresh --scope option
This change adds the --scope option to nodetool refresh.
Like in the case of nodetool restore, you can pass either of:
* node - On the local node.
* rack - On the local rack.
* dc - In the datacenter (DC) where the local node lives.
* all (default) - Everywhere across the cluster.
as scope.

The feature is based on the existing load_and_stream paths, so it
requires passing --load-and-stream to the refresh command.
Also, it is not compatible with the --primary-replica-only option.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#23861
2025-05-29 16:12:09 +03:00
Evgeniy Naydanov
0ee0e3f14d test.py: python: run tests using bare pytest command
Add the `host` fixture which uses `PythonTest.run_ctx()` context manager
to setup and teardown ScyllaDB node if `--test-py-init` argument is used.
Otherwise, this fixture returns a value of `--host` CLI argument.

Use dynamic scope provided by `testpy_test_fixture_scope()` function
instead of `session` to maintain compatibility with test.py and ./run
scripts.
2025-05-29 12:33:41 +00:00
Evgeniy Naydanov
b67048f3ee test.py: rework testpy_test fixture
Add utility `get_testpy_test()` function to `pylib.suite.base` which
combines all required steps to create an instance of `Test` class.

Remove redundant `testpy_testsuite` fixture.

Switch to use dynamic fixture scope controlled by `--test-py-init` CLI
argument to improve compatibility with test.py.  And because in test.py
mode the scope is `session`, also change default event loop scope to
`session`.

The fixture is None for test.py mode.

test.py runs tests file-by-file as separate pytest sessions, so, `session`
scope is effectively close to be the same as `module` (can be a difference
in the order.)  In case of running tests with bare pytest command, we need
to use `module` scope to maintain same behavior as test.py, since we run
all tests in one pytest session.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
b65cb517b8 test.py: alternator: convert get_valid_alternator_role() to fixture
Convert `get_valid_alternator_role()` to fixture to have more control
on the scope of the cache used.

Additionally, function `new_dynamodb_session()` was also converted to
a fixture, because it uses `get_valid_alternator_role()`.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
1f94a9c052 test.py: python: split logic of PythonTest.run()
Split logic of `PythonTest.run()` method into `PythonTest.run_ctx()`
context manager and `PythonTest.run()` method itself.

Done this to reuse setup/teardown code with bare pytest command runs.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
27cbfc77fb test.py: add credentials options to add_cql_connection_options()
Move `--auth_username` and `--auth_password` options from
`cluster/conftest.py` to add_cql_connection_options() and slightly
rework `cql` fixture to support these options.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
2bba4acdea test.py: python: remove dups of cql and this_dc fixtures
Replace dups of `cql` and `this_dc` fixtures in `rest_api` and
`pylib/cql_repl` with imports from `cqlpy`.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
6780461df8 test.py: remove duplication of pytest CLI options
Add 3 supplementary functions to `test.pylib.suite.python`:
`add_host_option()` (which adds `--host` options to pytest session),
`add_cql_connection_options()` (which adds `--port`, and `--ssl`),
and `--add-s3-options` (which adds options related to S3 connection.)
Each function decorated with `@cache` decorator to be executed once per
pytest session and avoid CLI options duplication for runs which
executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables`
in one pytest session.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
056c5db829 test.py: remove unused CLI options
Remove `--input`, `--output`, and `--keep-tmp` pytest CLI opionts
from `cluster/object_store/conftest.py` because they are not used
in these suite.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
b7b68355ef test.py: remove --omit-scylla-output from pytest argparser
Remove `--omit-scylla-output` CLI option from pytest argparser.
Instead, remove it from `sys.argv` in `cqlpy/run.py`.  Also, no need
to check this option in `alternator/run`.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
f262d4c323 test.py: set build_mode to "unknown" if no --mode argument
Change `build_mode` fixture to return "unknown" if no --mode arguments
provided (this is mainly for alternator and cqlpy tests)
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
30d542b8f1 test.py: create directory for test log in run_test()
Create a parent directory for a test log file just before opening this
file in `run_test()` function instead of having this as a side effect
in `Test.__init__()`.
2025-05-29 12:15:28 +00:00