Commit Graph

11801 Commits

Author SHA1 Message Date
Evgeniy Naydanov
145c2fed97 test.py: dtest: add wait_for_any_log() to tools/log_utils.py
Copy wait_for_any_log() function from dtest tools/log_utils.py
with few modifications:

 - Add type hints;
 - Change timeout for node.watch_log_for() calls from 0 to 0.1
   because dtest shim's implementation uses asyncio.timeout()
   and 0 means not "one time" but "never run";
 - Use set() instead of list() for `ret` variable;
 - Remove redundant `found` variable.
 - Remove `remaining` variable and use shallow copies to make
   the code more correct.  As a side effect this makes the
   TimeoutError message more correct too;
 - Use f-string formatting for TimeoutError message;
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
ff2aea7e5b test.py: dtest: add part of tools/assertions.py
Copy few assertion functions from dtest tools/assertions.py:

 - assertion_exception()
 - assertion_invalid()
 - assertion_one()
 - assertion_all()
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
9d70b6307b test.py: dtest: pickup latest code for retrying.py from dtest
Sync retrying.py with dtest.
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
40464faef3 test.py: dtest: copy unmodified auth_roles_test.py
The test is disabled in suite.yaml
2025-06-02 05:14:41 +00:00
Botond Dénes
c52aec3d2f Merge 'tablets: fix missing data after tablet merge ' from Raphael Raph Carvalho
Consider the following scenario:

1) let's assume tablet 0 has range [1, 5] (pre merge)
2) tablet merge happens, tablet 0 has now range [1, 10]
3) tablet_sstable_set isn't refreshed, so holds a stale state, thinks tablet 0 still has range [1, 5]
4) during a full scan, forward service will intersect the full range with tablet ranges and consume one tablet at a time
5) replica service is asked to consume range [1, 10] of tablet 0 (post merge)

We have two possible outcomes:

With cache bypass:

1) cache reader is bypassed
2) sstable reader is created on range [1, 10]
3) unrefreshed tablet_sstable_set holds stale state, but select correctly all sstables intersecting with range [1, 10]

With cache:

1) cache reader is created
2) finds partition with token 5 is cached
3) sstable reader is created on range [1, 4] (later would fast forward to range [6, 10]; also belongs to tablet 0)
4) incremental selector consumes the pre-merge sstable spanning range [1, 5]
4.1) since the partitioned_sstable_set pre-merge contains only that sstable, EOS is reached
4.2) since EOS is reached, the fast forward to range [6, 10] is not allowed.
So with the set refreshed, sstable set is aligned with tablet ranges, and no premature EOS is signalled, otherwise preventing fast forward to from happening and all data from being properly captured in the read.

This change fixes the bug and triggers a mutation source refresh whenever the number of tablets for the table has changed, not only when we have incoming tablets.

Additionally, includes a fix for range reads that span more than one tablet, which can happen during split execution.

Fixes: https://github.com/scylladb/scylladb/issues/23313

This change needs to be backported to all supported versions which implement tablet merge.

Closes scylladb/scylladb#24287

* github.com:scylladb/scylladb:
  replica: Fix range reads spanning sibling tablets
  test: add reproducer and test for mutation source refresh after merge
  tablets: trigger mutation source refresh on tablet count change
2025-05-30 15:37:29 +03:00
Pavel Emelyanov
a65ffdd0df test/result_utils: Do not assume map_reduce reducing order
When map_reduce is called on a collection, one shouldn't expect that it
processes the elements of the collection in any specific order.

Current test of map-reduce over boost outcome assumes that if reduce
function is the string concatenation, then it would concatenate the
given vector of strings in the order they are listed. That requirement
should be relaxed, and the result may have reversed concatentation.

Fixes scylladb/scylladb#24321

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24325
2025-05-30 09:38:59 +02:00
Michael Litvak
3a1be33143 test_cdc_generation_publishing: fix to read monotonically
The test test_multiple_unpublished_cdc_generations reads the CDC
generation timestamps to verify they are published in the correct order.
To do so it issues reads in a loop with a short sleep period and checks
the differences between consecutive reads, assuming they are monotonic.

However the assumption that the reads are monotonic is not valid,
because the reads are issued with consistency_level=ONE, thus we may read
timestamps {A,B} from some node, then read timestamps {A} from another
node that didn't apply the write of the new timestamp B yet. This will
trigger the assert in the test and fail.

To ensure the reads are monotonic we change the test to use consistency
level ALL for the reads.

Fixes scylladb/scylladb#24262

Closes scylladb/scylladb#24272
2025-05-30 08:35:56 +02:00
Pavel Emelyanov
086777e5de Merge 'test.py: python: run tests using bare pytest command' from Evgeniy Naydanov
Main change is splitting logic of `PythonTest.run()` method into `PythonTest.run_ctx()` context manager and `PythonTest.run()` method itself and add the `host` fixture which uses `PythonTest.run_ctx()` context manager to setup and teardown ScyllaDB node if `--test-py-init` argument is used. Otherwise, this fixture returns a value of `--host` CLI argument. Use dynamic scope provided by `testpy_test_fixture_scope()` function instead of `session` to maintain compatibility with `test.py` and `./run` scripts.

Other related changes:

* Add utility `get_testpy_test()` function to `pylib.suite.base` which combines all required steps to create an instance of `Test` class and rework `testpy_test` fixture to use it.

* Switch to use dynamic fixture scope controlled by `--test-py-init` CLI argument to improve compatibility with test.py.  And because in test.py mode the scope is `session`, also change default event loop scope to `session`.

* Convert `get_valid_alternator_role()` to fixture to have more control on the scope of the cache used. Additionally, function `new_dynamodb_session()` was also converted to a fixture, because it uses `get_valid_alternator_role()`.

* Replace dups of `cql` and `this_dc` fixtures in `rest_api` and `pylib/cql_repl` with imports from `cqlpy`.

* Change `build_mode` fixture to return "unknown" if no --mode arguments provided (this is mainly for alternator and cqlpy tests)

* Create a parent directory for a test log file just before opening this file in `run_test()` function instead of having this as a side effect in `Test.__init__()`.

And changes that remove pytest CLI argument duplicates to be able to run tests from different test suites in one pytest session:

* Add 3 supplementary functions to `test.pylib.suite.python`: `add_host_option()` (which adds `--host` options to pytest session), `add_cql_connection_options()` (which adds `--port`, and `--ssl`), and `--add-s3-options` (which adds options related to S3 connection.) Each function decorated with `@cache` decorator to be executed once per pytest session and avoid CLI options duplication for runs which executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables` in one pytest session.

* Move `--auth_username` and `--auth_password` options from `cluster/conftest.py` to add_scylla_cql_connection_options() and slightly rework `cql` fixture to support these options.

* Remove `--input`, `--output`, and `--keep-tmp` pytest CLI opionts from `cluster/object_store/conftest.py` because they are not used in these suite.

* Remove `--omit-scylla-output` CLI option from pytest argparser. Instead, remove it from `sys.argv` in `cqlpy/run.py`.  Also, no need to check this option in `alternator/run`.

Closes scylladb/scylladb#23849

* github.com:scylladb/scylladb:
  test.py: python: run tests using bare pytest command
  test.py: rework testpy_test fixture
  test.py: alternator: convert get_valid_alternator_role() to fixture
  test.py: python: split logic of PythonTest.run()
  test.py: add credentials options to add_cql_connection_options()
  test.py: python: remove dups of cql and this_dc fixtures
  test.py: remove duplication of pytest CLI options
  test.py: remove unused CLI options
  test.py: remove `--omit-scylla-output` from pytest argparser
  test.py: set build_mode to "unknown" if no --mode argument
  test.py: create directory for test log in run_test()
2025-05-30 08:48:43 +03:00
Botond Dénes
7db956965e mutation/mutation_compactor: cache regular/shadowable max-purgable in separate members
Max purgeable has two possible values for each partition: one for
regular tombstones and one for shadowable ones. Yet currently a single
member is used to cache the max-purgeable value for the partition, so
whichever kind of tombstone is checked first, its max-purgeable will
become sticky and apply to the other kind of tombstones too. E.g. if the
first can_gc() check is for a regular tombstone, its max-purgeable will
apply to shadowable tombstones in the partition too, meaning they might
not be purged, even though they are purgeable, as the shadowable
max-purgeable is expected to be more lenient. The other way around is
worse, as it will result in regular tombstone being incorrectly purged,
permitted by the more lenient shadowable tombstone max-purgeable.
Fix this by caching the two possible values in two separate members.
A reproducer unit test is also added.

Fixes: scylladb/scylladb#23272

Closes scylladb/scylladb#24171
2025-05-29 22:52:08 +03:00
Szymon Malewski
18d237a393 alternator/executor: Added checks in batch_write_item
This patch adds checks validating 'BatchWriteItem' requests mostly to avoid ugly fallback message.
It changes request's behaviour in case of an empty array of WriteRequests - previously such an array was ignored and whole request might succeed, now it raises ValidationException, following the documentation and behaviour of DynamoDB.
Patch includes tests in test_manual_requests (`test_batch_write_item_invalid_payload`, `test_batch_write_item_empty_request_list`) testing with several offending cases.

Fixes #23233

Closes scylladb/scylladb#23878
2025-05-29 20:33:57 +03:00
Robert Bindar
c570941692 Add nodetool refresh --scope option
This change adds the --scope option to nodetool refresh.
Like in the case of nodetool restore, you can pass either of:
* node - On the local node.
* rack - On the local rack.
* dc - In the datacenter (DC) where the local node lives.
* all (default) - Everywhere across the cluster.
as scope.

The feature is based on the existing load_and_stream paths, so it
requires passing --load-and-stream to the refresh command.
Also, it is not compatible with the --primary-replica-only option.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#23861
2025-05-29 16:12:09 +03:00
Evgeniy Naydanov
0ee0e3f14d test.py: python: run tests using bare pytest command
Add the `host` fixture which uses `PythonTest.run_ctx()` context manager
to setup and teardown ScyllaDB node if `--test-py-init` argument is used.
Otherwise, this fixture returns a value of `--host` CLI argument.

Use dynamic scope provided by `testpy_test_fixture_scope()` function
instead of `session` to maintain compatibility with test.py and ./run
scripts.
2025-05-29 12:33:41 +00:00
Evgeniy Naydanov
b67048f3ee test.py: rework testpy_test fixture
Add utility `get_testpy_test()` function to `pylib.suite.base` which
combines all required steps to create an instance of `Test` class.

Remove redundant `testpy_testsuite` fixture.

Switch to use dynamic fixture scope controlled by `--test-py-init` CLI
argument to improve compatibility with test.py.  And because in test.py
mode the scope is `session`, also change default event loop scope to
`session`.

The fixture is None for test.py mode.

test.py runs tests file-by-file as separate pytest sessions, so, `session`
scope is effectively close to be the same as `module` (can be a difference
in the order.)  In case of running tests with bare pytest command, we need
to use `module` scope to maintain same behavior as test.py, since we run
all tests in one pytest session.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
b65cb517b8 test.py: alternator: convert get_valid_alternator_role() to fixture
Convert `get_valid_alternator_role()` to fixture to have more control
on the scope of the cache used.

Additionally, function `new_dynamodb_session()` was also converted to
a fixture, because it uses `get_valid_alternator_role()`.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
1f94a9c052 test.py: python: split logic of PythonTest.run()
Split logic of `PythonTest.run()` method into `PythonTest.run_ctx()`
context manager and `PythonTest.run()` method itself.

Done this to reuse setup/teardown code with bare pytest command runs.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
27cbfc77fb test.py: add credentials options to add_cql_connection_options()
Move `--auth_username` and `--auth_password` options from
`cluster/conftest.py` to add_cql_connection_options() and slightly
rework `cql` fixture to support these options.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
2bba4acdea test.py: python: remove dups of cql and this_dc fixtures
Replace dups of `cql` and `this_dc` fixtures in `rest_api` and
`pylib/cql_repl` with imports from `cqlpy`.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
6780461df8 test.py: remove duplication of pytest CLI options
Add 3 supplementary functions to `test.pylib.suite.python`:
`add_host_option()` (which adds `--host` options to pytest session),
`add_cql_connection_options()` (which adds `--port`, and `--ssl`),
and `--add-s3-options` (which adds options related to S3 connection.)
Each function decorated with `@cache` decorator to be executed once per
pytest session and avoid CLI options duplication for runs which
executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables`
in one pytest session.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
056c5db829 test.py: remove unused CLI options
Remove `--input`, `--output`, and `--keep-tmp` pytest CLI opionts
from `cluster/object_store/conftest.py` because they are not used
in these suite.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
b7b68355ef test.py: remove --omit-scylla-output from pytest argparser
Remove `--omit-scylla-output` CLI option from pytest argparser.
Instead, remove it from `sys.argv` in `cqlpy/run.py`.  Also, no need
to check this option in `alternator/run`.
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
f262d4c323 test.py: set build_mode to "unknown" if no --mode argument
Change `build_mode` fixture to return "unknown" if no --mode arguments
provided (this is mainly for alternator and cqlpy tests)
2025-05-29 12:15:28 +00:00
Evgeniy Naydanov
30d542b8f1 test.py: create directory for test log in run_test()
Create a parent directory for a test log file just before opening this
file in `run_test()` function instead of having this as a side effect
in `Test.__init__()`.
2025-05-29 12:15:28 +00:00
Piotr Dulikowski
c8d52a4318 Merge 'test.py: dtest: port bypass_cache_test.py' from Evgeniy Naydanov
Copy bypass_cache_test.py from scylla-dtest test suite and make it works with test.py

As a part of the porting process, copy missed utility functions from scylla-dtest, remove unused imports and markers, and add missed `single_node` marker description to pytest.ini

Enable the test in suite.yaml (run in dev mode only.)

Also add missed `ScyllaCluster.nodetool()` method in dtest shim code.

Closes scylladb/scylladb#24230

* github.com:scylladb/scylladb:
  test.py: dtest: make bypass_cache_test.py run using test.py
  test.py: dtest: add missed ScyllaCluster.nodetool()
  test.py: dtest: copy unmodified bypass_cache_test.py
2025-05-29 13:48:10 +02:00
Avi Kivity
c00824c7df Merge 'transport: Implement SCYLLA_USE_METADATA_ID support' from Andrzej Jackowski
Metadata id was introduced in CQLv5 to make metadata of prepared
statement metadata consistent between driver and database.
This commit introduces a protocol extension that allows to use the same
mechanism in CQLv4. As CQLv5 is currently unsupported in ScyllaDb (as well
as in some of the drivers), the motivation is to allow fixing https://github.com/scylladb/scylladb/issues/20860.

This change:
     - Implement metadata::calculate_metadata_id()
     - Implement SCYLLA_USE_METADATA_ID protocol extension for CQLv4
     - Added description of SCYLLA_USE_METADATA_ID in documentation
     - Add boost tests to confirm correctness of the function
     - Add python tests for table metadata change corner-cases

Fixes scylladb/scylladb#20860

Also see related https://scylladb.atlassian.net/wiki/spaces/RND/pages/42238631/MetadataId+extension+in+CQLv4+Requirement+Document

No backport needed (unless specifically requested by a customer), because there are existing workarounds for the issue

Closes scylladb/scylladb#23292

* github.com:scylladb/scylladb:
  test: add tests for prepared statement metadata consistency corner cases
  transport: implement SCYLLA_USE_METADATA_ID support
  cql3: implement metadata::calculate_metadata_id()
2025-05-29 12:27:31 +03:00
Andrei Chekun
0c5676ffb4 test.py: fix metrics DB location
This was already fixed, but unintentionally during rebases it was
reverted and merged to master in the same PR.
2025-05-28 20:13:38 +02:00
Andrei Chekun
6e92791538 test.py: fix the possibility to gather resource metrics for test
Move of the run_process done in #24091 was not fully correct. The method
run_process was not overridden in the class ResourceGatherOn, so no
metrics are collected at all.
2025-05-28 20:13:31 +02:00
Nadav Har'El
61581d458e Merge 'vector_index: add custom index class from Michał Hudobski
This PR adds a class that allows for validation (and in the future creating and querying) of custom indexes and implements it for vector indexes. Currently custom vector_index creation runs a usual index creation process. This PR does not change that, however it adds validation of the parameters that need to have certain values for the actual creation of the vector index in the future. The only thing left for the vector_index feature to work as intended should be the integration with the Vector Store service.

This is a continuation of https://github.com/scylladb/scylladb/pull/23720
Refs:  [VS-55
](https://scylladb.atlassian.net/browse/VS-55) (Support setting index parametrs and similarity function in CREATE INDEX)
Fixes: [VS-13](https://scylladb.atlassian.net/browse/VS-13) (Validate that the base type is numeric when creating the vector index)

[VS-13]: https://scylladb.atlassian.net/browse/VS-13?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#24212

* github.com:scylladb/scylladb:
  test/cqlpy: remove xfail and add more vector tests
  vector_index: allow options when custom class is provided
  vector_index: add custom index and vector index classes
2025-05-28 10:42:29 +03:00
Raphael S. Carvalho
53df911145 replica: Fix range reads spanning sibling tablets
We don't guarantee that coordinators will only emit range reads that
span only one tablet.

Consider this scenario:

1) split is about to be finalized, barrier is executed, completes.
2) coordinator starts a read, uses pre-split erm (split not committed to group0 yet)
3) split is committed to group0, all replicas switch storage.
4) replica-side read is executed, uses a range which spans tablets.

We could fix it with two-phase split execution. Rather than pushing the
complexity to higher levels, let's fix incremental selector which should
be able to serve all the tokens owned by a given shard. During split
execution, either of sibling tablets aren't going anywhere since it
runs with state machine locked, so a single read spanning both
sibling tablets works as long as the selector works across tablet
boundaries.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-05-27 22:39:40 -03:00
Michał Hudobski
195e6a82de test/cqlpy: remove xfail and add more vector tests
We have added validation for options and type of column for vector
indexes. This commit adds tests for that validation.
2025-05-27 21:04:50 +02:00
Dawid Mędrek
c60035cbf6 test/lib/cql_test_env.cc: Enable rf_rack_valid_keyspaces by default
We've adjusted all of the Boost tests so they respect the invariant
enforced by the `rf_rack_valid_keyspaces` configuration option, or
explicitly disabled the option in those that turned out to be more
problematic and will require more attention. Thanks to that, we can
now enable it by default in the test suite.
2025-05-27 18:53:39 +02:00
Dawid Mędrek
237638f4d3 test/boost/tablets_test.cc: Explicitly disable rf_rack_valid_keyspaces in problematic tests
Some of the tests in the file verify more subtle parts of the behavior
of tablets and rely on topology layouts or using keyspaces that violate
the invariant the `rf_rack_valid_keyspaces` configuration option is
trying to enforce. Because of that, we explicitly disable the option
to be able to enable it by default in the rest of the test suite in
the following commit.
2025-05-27 18:53:36 +02:00
Dawid Mędrek
22d6c7e702 test/boost/tablets_test.cc: Fix indentation in test_load_balancing_with_random_load 2025-05-27 16:01:14 +02:00
Dawid Mędrek
fa62f68a57 test/boost/tablets_test.cc: Adjust test_load_balancing_with_random_load to RF-rack-validity
We make sure that the keyspaces created in the test are always RF-rack-valid.
To achieve that, we change how the test is performed.

Before this commit, we first created a cluster and then ran the actual test
logic multiple times. Each of those test cases created a keyspace with a random
replication factor.

That cannot work with `rf_rack_valid_keyspaces` set to true. We cannot modify
the property file of a node (see commit: eb5b52f598),
so once we set up the cluster, we cannot adjust its layout to work with another
replication factor.

To solve that issue, we also recreate the cluster in each test case. Now we choose
the replication factor at random, create a cluster distributing nodes across as many
racks as RF, and perform the rest of the logic. We perform it multiple times in
a loop so that the test behaves as before these changes.
2025-05-27 15:52:38 +02:00
Dawid Mędrek
cd615c3ef7 test/boost/tablets_test.cc: Adjust test_load_balancing_works_with_in_progress_transitions to RF-rack-validity
We distribute the nodes used in the test across two racks so we can
run the test with `rf_rack_valid_keyspaces` set to true.

We want to avoid cross-rack migrations and keep the test as realistic
as possible. Since host3 is supposed to function as a new node in the
cluster, we change the layout of it: now, host1 has 2 shards and resides
in a separate rack. Most of the remaining test logic is preserved and behaves
as before this commit.

There is a slight difference in the tablet migrations. Before the commit,
we were migrating a tablet between nodes of different shard counts. Now
it's impossible because it would force us to migrate tablets between racks.
However, since the test wants to simply verify that an ongoing migration
doesn't interfere with load balancing and still leads to a perfect balance,
that still happens: we explicitly migrate ONLY 1 tablet from host2 to host3,
so to achieve the goal, one more tablet needs to be migrated, and we test
that.
2025-05-27 15:41:27 +02:00
Ferenc Szili
1f9f724441 test: add reproducer and test for mutation source refresh after merge
This change adds a reproducer and test for the fix where the local mutation
source is not always refreshed after a tablet merge.
2025-05-27 15:18:36 +02:00
Wojciech Mitros
5074daf1b7 test: actually wait for tablets to distribute across nodes
In test_tablet_mv_replica_pairing_during_replace, after we create
the tables, we want to wait for their tablets to distribute evenly
across nodes and we have a wait_for for that.
But we don't await this wait_for, so it's a no-op. This patch fixes
it by adding the missing await.

Refs scylladb/scylladb#23982
Refs scylladb/scylladb#23997

Closes scylladb/scylladb#24250
2025-05-27 15:12:25 +02:00
Botond Dénes
542b2ed0de Merge 'Remove req_params facility from API' from Pavel Emelyanov
The class was introduced to facilitate path and query parameters parsing from requests, but in fact it's mostly dead code.

First, the class introduces the concept of  "mandatory" parameters which are seastar path params. If missing, the parameter validation throws, but in all cases where this option is used in scylla it's impossible to get empty path param -- if the parameter is missing seastar returns 404 (not found) before calling handler.

Second, the req_params::get<T>() doesn't work for anything but string argument (or types such that optional<T> can be implicitly casted to optional<sstring>). And it's in fact only used to get sstrings, so it compiles and works so far.

The remaining ability to parse bool from string is partially duplicated by the validate_bool() method. Using plain method to parse string to bool is less code than req_params introduce.

One (arguably) useful thing req_params do it validate the incoming request _not_ to contain unknown query parameters. However, quite a few endpoints use this, most of them just cherry-pick parameters they want and ignore the others. There's already a comprehensive description of accepted parameters for each endpoint in api-doc/ and req_params duplicate it. Good validation code should rely on api-doc/, not on its partial copy.

Having said that, this PR introduces validate_bool_x() helper to do req_params-like parsing of strings to bools, patches existing handlers to use existing parameters parsing facilities (such as validate_keyspace() and parse_table_infos()) and drops the req_params.

Closes scylladb/scylladb#24159

* github.com:scylladb/scylladb:
  api: Drop class req_params
  api: Stop using req_params in parse_scrub_options
  api: Stop using req_params in tasks::force_keyspace_compaction_async
  api: Stop using req_params in ss::force_keyspace_compaction
  api: Stop using req_params in ss::force_compaction
  api: Stop using req_params in cf::force_major_compaction
  api: Add validate_bool_x() helper
2025-05-27 14:29:05 +03:00
Ernest Zaslavsky
7d0d3ec1c8 load_and_stream: Add abortion flow to mutation streaming
* The new abort command explicitly represents the abortion flow in
mutation streaming, clearly identifying operations that are
intentionally aborted. This reduces ambiguity around failures in
streaming operations.
* In the error-handling section, aborted operations are now
explicitly marked as the cause of the streaming failure. This allows
us to differentiate them from genuine errors and appropriately adjust
log severity to reduce unnecessary alarm caused by aborted streaming
failures.
* To avoid alarming users with excessive error logs, log severity for
streaming failures caused by aborted operations has been downgraded.
This helps keep logs cleaner and prevents unnecessary concerns.
* A new feature has been added to ensure mixed clusters during updates
do not receive unsupported RPC messages, improving compatibility and
stability.

fixes: https://github.com/scylladb/scylladb/issues/23076

Closes scylladb/scylladb#23214
2025-05-27 14:21:58 +03:00
Dawid Mędrek
1199c68bac test/boost/tablets_test.cc: Adjust test_load_balancing_resize_requests to RF-rack-validity
We assign the nodes created by the test to separate racks. It has no impact
on the test since the keyspace used in the test uses RF=2, so the tablet
replicas will still be the same.
2025-05-27 13:18:11 +02:00
Dawid Mędrek
e4e3b9c3a1 test/boost/tablets_test.cc: Adjust test_load_balancing_with_two_empty_nodes to RF-rack-validity
We distribute the nodes used in the test between two racks. Although
that may affect how tablets behave in general, this change will not
have any real impact on the test. The test verifies that load balancing
eventually balances tablets in the cluster, which will still happen.
Because of that, the changes in this commit are safe to apply.
2025-05-27 13:18:09 +02:00
Dawid Mędrek
6e2fb79152 test/boost/tablets_test.cc: Adjust test_load_balancer_shuffle_mode to RF-rack-validity
We distribute the nodes used in the test between two racks. Although that
may have an impact on how tablets behave, it's orthogonal to what the test
verifies -- whether the topology coordinator is continuously in the tablet
migration track. Because of that, it's safe to make this change without
influencing the test.
2025-05-27 13:18:07 +02:00
Botond Dénes
485df63fd5 Merge 'Extend compaction_history table with additional compaction statistics' from Łukasz Paszkowski
Currently, the `system.compaction_history` table miss information like the type of compaction (cleanup, major, resharding, etc), the sstable generations involved (in and out), shard's id the compaction was triggered on and statistics on purged tombstones to be collected during compaction.

The series extends the table with the following columns:

-  "compaction_type" (text)
- "shard_id" (int)
- "sstables_in" (list<sstableinfo_type>)
- "sstables_out" (list<sstableinfo_type>)
- "total_tombstone_purge_attempt" (long)
- "total_tombstone_purge_failure_due_to_overlapping_with_memtable" (long)
- "total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable" (long)

with a user defined type `sstableinfo_type` that holds the information about sstable file

- generation (uuid)
- origin (text)
- size (long)

Additional statistics stored in the compaction_history have been incorporated in the API  `/compaction_manager/compaction_history` and the `nodetool compactionhistory` command.

No backport is required. It extends the existing compaction history output.

Fixes https://github.com/scylladb/scylladb/issues/3791

Closes scylladb/scylladb#21288

* github.com:scylladb/scylladb:
  nodetool: Refactor of compactionhistory_operation
  nodetool: Add more stats into compactionhistory output
  api/compaction_manager: Extend compaction_history api
  compaction: Collect tombstone purge stats during compaction
  compacting_reader: Extend to accept tombstone purge statistics
  mutation_compactor: Collect tombstone purge attempts
  compaction_garbage_collector: Extend return type of max_purgeable_fn
  compaction: Extend compaction_result to collect more information
  system_keyspace: Upgrade compaction_history table
  system_keyspace: Create UDT: sstableinfo_type
  system_keyspace: Extract compaction_history struct
  system_keyspace: Squeeze update_compaction_history parameters
  compaction/compaction_manager: update_history accepts compaction_result as rvalue
2025-05-27 14:12:13 +03:00
Andrzej Jackowski
555d897a15 test: wait for normal state propagation in test_auth_v2_migration
By default, cluster tests have skip_wait_for_gossip_to_settle=0 and
ring_delay_ms=0. In tests with gossip topology, it may lead to a race,
where nodes see different state of each other.

In case of test_auth_v2_migration, there are three nodes. If the first
node already knows that the third node is NORMAL, and the second node
does not, the system_auth tables can return incomplete results.

To avoid such a race, this commit adds a check that all nodes see other
nodes as NORMAL before any writes are done.

Refs: #24163

Closes scylladb/scylladb#24185
2025-05-27 11:41:09 +03:00
Nadav Har'El
8487d81c6e Merge 'test: mark difference in handling IFs in LWT as scylla_only' from Andrzej Jackowski
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS"). Cassandra tries to detect condition conflicts, and prints
an error instead of silently failing the batch, but in ScyllaDB
we considered this check to be inconsistent and unhelpful, and
decided not to implement it.

In this series, we extend the documentation of the ScyllaDB behaviour
by extending the documents and improving relevant LWT tests.

Fixes: https://github.com/scylladb/scylladb/issues/13011

Backport not needed, only docs and minor tests changes.

Closes scylladb/scylladb#24086

* github.com:scylladb/scylladb:
  test: mark difference in handling IFs in LWT as scylla_only
  docs: cql: add explicit explanation how mixing IFs works in LWT
  docs: lwt: add two missing spaces
2025-05-27 09:35:41 +03:00
Evgeniy Naydanov
efdb2abdc6 test.py: dtest: make bypass_cache_test.py run using test.py
As a part of the porting process, copy missed utility functions from scylla-dtest,
remove unused imports and markers, and add single_node marker description to pytest.ini

Enable the test in suite.yaml (run in dev mode only)
2025-05-27 05:48:26 +00:00
Evgeniy Naydanov
3a2410324c test.py: dtest: add missed ScyllaCluster.nodetool()
The method executes nodetool command on each running node in a cluster.
2025-05-27 05:48:26 +00:00
Evgeniy Naydanov
6105bb9530 test.py: dtest: copy unmodified bypass_cache_test.py
Test is disabled in suite.yaml
2025-05-27 05:48:26 +00:00
Andrzej Jackowski
7dc0c4cf4f test: close logfile/socket_dir for stopped servers in recycle_cluster
PythonTestSuite::recycle_cluster is a function that releases resources
of an old, dirty cluster to make it reusable. It closes log_file and
maintenance_socket_dir for running nodes in a dirty cluster, however it
doesn't do the same for stopped nodes. It leads to leakage of file
descriptors of stopped nodes, which in turn can lead to hitting ulimit
of open files (that is often 1024) if the leaking test is repeated with
`./test.py --repeat ...`. The problem was detected when tests from
`test/cluster/dtest/` directory were executed with high `repeat` value.

This commit extends `recycle_cluster` to close and cleanup logfile and
`socket_dir` for nodes that are stopped (because self.servers in
ScyllaCluster is ChainMap of self.running and self.stopped).

Closes scylladb/scylladb#24243
2025-05-27 08:37:43 +03:00
Andrzej Jackowski
ea6ef5d0aa test: mark difference in handling IFs in LWT as scylla_only
There is a difference how ScyllaDB and Cassandra handle conditional
batches with different IF statements (such as "IF EXISTS" and "IF NOT
EXISTS"). Cassandra tries to detect condition conflicts, and prints
an error instead of silently failing the batch, but in ScyllaDB
we considered this check to be inconsistent and unhelpful, and
decided not to implement it.

This commit:
 - Make test_lwt_with_batch_conflict_1 scylla_only instead of xfail,
   change the scenario to pass with the current implementation.
 - Add test_lwt_with_batch_conflict_3 that shows how Cassandra fails
   batch statement with different conditions, even when the conditions
   are not contradictory.
 - Add test_lwt_with_batch_conflict_4/5 that shows how static rows
   are handled in conditional batches.

Fixes: #13011
2025-05-26 15:47:11 +02:00
Piotr Dulikowski
4508823294 Merge 'test.py: dtest: few fixes missed in the initial implementation' from Evgeniy Naydanov
There are few problems found in the dtest shim code after scylladb/scylladb#21580 was merged:

- The call of `init_default_config()` method was missed in scylladb/scylladb#21580.  It is required to handle dtest options and markers.
- The implementation of dtest shim uses `server_id` to format a name of a node in a cluster. This is a difference in behavior with dtest. Some of dtests use code like `cluster.nodes()["node1"]` to get access to a node object.
- Default timeout was missed in `ScyllaNode.wait_until_stopped()` method. Set it to 600 for debug mode or to 127 otherwise.

Closes scylladb/scylladb#24225

* github.com:scylladb/scylladb:
  test.py: dtest: set default wait_seconds based on build mode
  test.py: dtest: name nodes in cluster using index starting from 1
  test.py: dtest: initialize default config in dtest setup fixture
2025-05-26 13:37:12 +02:00