Commit Graph

9398 Commits

Author SHA1 Message Date
Dario Mirovic
84e6979adf test/cqlpy: test_protocol_exceptions.py refactor message frame building
Frame building is repetitive and increases verbosity, reducing code readability.
This patch solves it by extracting common functionality of frame building into
`_build_frame`. Also, helpers `_send_frame` and `_recv_frame` are introduced.
While `_recv_frame` is not really useful, it goes well in pair with `_send_frame`.

Refs: #24567
2025-08-31 23:40:01 +02:00
Dario Mirovic
19c610d9f7 test/cqlpy: test_protocol_exceptions.py refactor duplicate code
The code that measures errors and exceptions in `test_protocol_exceptions.py` tests
is repetitive. This patch refactors common functionality in a separate `_test_impl`
function, improving readability.

Refs: #24567
2025-08-31 23:39:58 +02:00
Dario Mirovic
8120709231 transport: replace make_frame throw with return result
`cql_transport::response::make_frame` used to throw `protocol_exception`.
With this change it will return `result_with_exception_ptr<sstring>` instead.
Code changes are propagated to `cql_transport::cql_server::response::make_message`
and from there to `cql_transport::cql_server::connection::write_response`.

`write_response` continuation calling `make_message` used to transform
the exception from `make_message` to an exception future, and now the
logic stays the same, just explicitly stated at this code layer,
so the behavior is not changed.

Refs: #24567
2025-08-28 23:33:33 +02:00
Dario Mirovic
51995af258 transport: replace throwing protocol_exception with returns
Replace throwing `protocol_exception` with returning it as a result
or an exceptional future in the transport server module. The goal is
to improve performance.

Most of the `protocol_exception` throws were made from
`fragmented_temporary_buffer` module, by passing `exception_thrower()`
to its `read*` methods. `fragmented_temporary_buffer` is changed so
that it now accepts an exception creator, not exception thrower.
`fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced
`fragmented_temporary_buffer_concepts::ExceptionThrower` and all
methods that have been throwing now return failed result of type
`utils::result_with_exception_ptr`. This change is then propagated to the callers.

The scope of this patch is `protocol_exception`, so commitlog just calls
`.value()` method on the result. If the result failed, that will throw the
exception from the result, as defined by `utils::result_with_exception_ptr_throw_policy`.
This means that the behavior of commitlog module stays the same.

transport server module handles results gracefully. All the caller functions
that return non-future value `T` now return `utils::result_with_exception_ptr<T>`.
When the caller is a function that returns a future, and it receives
failed result, `make_exception_future(std::move(failed_result).value())`
is returned. The rest of the callstack up to the transport server `handle_error`
function is already working without throwing, and that's how zero throws is
achieved.

Fixes: #24567
2025-08-28 23:31:36 +02:00
Dario Mirovic
8b0a551177 test/cqlpy: add unknown compression algorithm test case
Add `test_unknown_compression_algorithm` test case to `test_protocol_exceptions.py` test suite.
This change improves test coverage for zero throws protocol exception handling.

Refs: #24567
2025-08-25 13:31:40 +02:00
Nadav Har'El
87dd96f9a2 Merge ' Alternator: DynamoDB compatible WCU Calculation via Read-Before-Write Support' from Amnon Heiman
This series adds support for a DynamoDB-compatible Write Capacity Unit (WCU) calculation in Alternator by introducing an optional forced read-before-write mechanism.

Alternator's model differs from DynamoDB, and as a result, some write operations may report lower WCU usage compared to what DynamoDB would report. While this is acceptable in many cases, there are scenarios where users may require accurate WCU reporting that aligns more closely with DynamoDB's behavior.

To address this, a new configuration option, alternator_force_read_before_write, is introduced. When enabled, Alternator will perform a read before executing PutItem, UpdateItem, and DeleteItem operations. This allows it to take the existing item size into account when computing the WCU. BatchWriteItem support is also extended to use this mechanism. Because BatchWriteItem does not support returning old items directly, several internal changes were made to support reading previous item sizes with minimal overhead. Reads are performed at consistency level LOCAL_ONE for efficiency, and the WCU calculation is now done in multiple stages to accurately account for item size differences.

In addition to the implementation changes, test coverage was added to validate the new behavior. These tests confirm that WCU is calculated based on the larger of the old and new items when read-before-write is active, including for BatchWriteItem.

This feature comes with performance overhead and is therefore disabled by default. It can be enabled at runtime via the system.config table and should be used only when precise WCU tracking is necessary.
**New feature, no need to backport**

Closes scylladb/scylladb#24436

* github.com:scylladb/scylladb:
  alternator/test_returnconsumedcapacity.py: Test forced read before write
  alternator/executor.cc: DynamoDB WCU calculation in BatchWriteItem using read-before-write
  executor.cc: get_previous_item with consistency level
  executor: Extend API of put_or_delete_item
  alternator/executor.cc: Accurate WCU for put, update, delete
  config: add alternator_force_read_before_write
2025-08-24 11:38:24 +03:00
Avi Kivity
8815491085 treewide: include boost headers as "system" headers
Boost is external to the project so treat its headers as "system"
headers and include them with angle brackets.

Closes scylladb/scylladb#25619
2025-08-22 17:21:24 +03:00
Piotr Dulikowski
5709d94826 Merge 'cql3: Warn when creating RF-rack-invalid keyspace' from Dawid Mędrek
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

If the option is turned off, we will also log all of the
RF-rack-invalid keyspaces at start-up.

We provide validation tests.

Fixes scylladb/scylladb#23330

Backport: we'd like to encourage the user to abide by the restriction
even when they don't enforce it to make it easier in the future to
adjust the schema when there's no way to disable it anymore. Because
of that, we'd like to backport it to all relevant versions, starting with 2025.1.

Closes scylladb/scylladb#24785

* github.com:scylladb/scylladb:
  main: Log RF-rack-invalid keyspaces at startup
  cql3/statements: Fix indentation
  cql3: Warn when creating RF-rack-invalid keyspace
2025-08-22 11:33:32 +02:00
Evgeniy Naydanov
ab15c94a09 test.py: dtest/commitlog_test: add test_pinned_cl_segment_doesnt_resurrect_data
test_pinned_cl_segment_doesnt_resurrect_data was not moved in #24946 from
scylla-dtest to this repo, because it's marked as xfail (#14879), but
actually the issue is fixed and there is no reason to keep the test in
scylla-dtest.

Also remove unused imports.

Closes scylladb/scylladb#25592
2025-08-22 11:30:10 +03:00
Raphael S. Carvalho
149f9d8448 replica: Fix race between drop table and merge completion handling
Consider this:
1) merge finishes, wakes up fiber to merge compaction groups
2) drop table happens, which in turn invokes truncate underneath
3) merge fiber stops old groups
4) truncate disables compaction on all groups, but the ones stopped
5) truncate performs a check that compaction has been disabled on
all groups, including the ones stopped
6) the check fails because groups being stopped didn't have compaction
explicitly disabled on them

To fix it, the check on step 6 will ignore groups that have been
stopped, since those are not eligible for having compaction explicitly
disabled on them. The compaction check is there, so ongoing compaction
will not propagate data being truncated, but here it happens in the
context of drop table which doesn't leave anything behind. Also, a
group stopped is somewhat equivalent to compaction disabled on it,
since the procedure to stop a group stops all ongoing compaction
and eventually removes its state from compaction manager.

Fixes #25551.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#25563
2025-08-22 10:19:43 +03:00
Botond Dénes
3dcb596201 Merge 'test: properly unset recovery_leader in the recovery procedure tests' from Patryk Jędrzejczak
After changing the type of the `recovery_leader` config option from
`sstring` to `UUID` in #25032, setting `recovery_leader` to an empty
string became an incorrect way to unset it. The following error started
to appear in the recovery procedure tests:
```
init - marshaling error: UUID string size mismatch: '' : recovery_leader
```
We unset `recovery_leader` properly in this PR. To do it, we introduce
a simple way to remove config options in tests.

Backport is unneeded. This error was harmless, and Scylla ignored
`recovery_leader` after logging the error as expected by the tests.

Closes scylladb/scylladb#25365

* github.com:scylladb/scylladb:
  test: properly unset recovery_leader in the recovery procedure tests
  test: manager_client: allow removing a config option
  test: manager_client: add docstring to server_update_config
2025-08-22 10:09:37 +03:00
Patryk Jędrzejczak
193a74576a test/cluster/conftest: cluster_con: provide default values for port and use_ssl
Some cluster tests use `cluster_con` when they need a different load
balancing policy or auth provider. However, no test uses a port
other than 9042 or enables SSL, but all tests must pass `9042, False`
because these parameters don't have default values. This makes the code
more verbose. Also, it's quite obvious that 9042 stands for port, but
it's not obvious what `False` is related to, so there is a need to check
the definition of `cluster_con` while reading any test that uses it.

No reason to backport, it's only a minor refactoring.

Closes scylladb/scylladb#25516
2025-08-22 09:51:24 +03:00
Andrzej Jackowski
86fc513bd9 auth: allow dropping roles in saslauthd_authenticator
Before this change, `saslauthd_authenticator` prevented dropping
roles. The current documentation instructs users to `Ensure Scylla has
the same users and roles as listed in the LDAP directory`. Therefore,
ScyllaDB should allow dropping roles so administrators can remove
obsolete roles from both LDAP and ScyllaDB.

The code change is minimal — dropping a role is a no-op, similar to the
existing no-op implementations for successful `create` and `alter`
operations.

`saslauthd_authenticator_test` is updated to verify that dropping
a role doesn't throw anymore.

Fixes: scylladb/scylladb#25571

Closes scylladb/scylladb#25574
2025-08-22 09:40:44 +03:00
Dawid Mędrek
837d267cbf main: Log RF-rack-invalid keyspaces at startup
When the configuration option `rf_rack_valid_keyspaces` is enabled and there
is an RF-rack-invalid keyspace, starting a node fails. However, when the
configuration option is disabled, but there still is a keyspace that violates
the condition, we'd like Scylla to print a warning informing the user about
the fact. That's what happens in this commit.

We provide a validation test.
2025-08-21 19:35:33 +02:00
Dawid Mędrek
60ea22d887 cql3: Warn when creating RF-rack-invalid keyspace
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

We provide a validation test.
2025-08-21 19:29:33 +02:00
Evgeniy Naydanov
3a98331731 test.py: don't fail if use multiple tests from one dir in commandline
There is the stash item REPEATED_FILES for directory items which used to cut
recursion.  But if multiple tests from one directory added to ./test.py
commandline this solution prevents handling non-first tests well because
it was already collected for the first one.  Change behavior to not store
all repeated files in the stash but just files which are in the process
of repetition.  Rename the stash item to REPEATING_FILES to reflect this
change.

Closes scylladb/scylladb#25611
2025-08-21 19:43:13 +03:00
Botond Dénes
09dc285b4a Merge 'Remove redis from scylla source tree' from Ran Regev
- **remove redis documentation**
First, remove the redis documentation.

- **remove ./redis and dependencies**
Second, remove the redis directory and its dependencies from the project.

Fixes: #25144

This is a cleanup, no need to backport.

Closes scylladb/scylladb#25148

* github.com:scylladb/scylladb:
  remove ./redis and dependencies
  remove redis documentation
2025-08-21 14:26:11 +03:00
Pavel Emelyanov
47750496d2 Merge 'test.py: metrics: add host_id suffix to .db file' from Evgeniy Naydanov
CI can run several test.py sessions on different machines (builders) for one build and, and to be not overwritten, .db file with metrics need to have some unique name: add host_id as we already do for .xml report in `run_pytest()`

Also add host_id columns to metric tables in case we will somehow aggregate .db files.

Add host_id suffix to `toxiproxy_server.log` for the same reason.

Fixes: https://github.com/scylladb/scylladb/issues/25462

Closes scylladb/scylladb#25542

* github.com:scylladb/scylladb:
  test.py: add host_id suffix to toxiproxy_server.log
  test.py: metrics: add host_id suffix to .db file
2025-08-21 11:34:47 +03:00
Robert Bindar
3291a5cc75 Fix dbuild boost::gregorian usage error
On my dbuild runs, compiler complained about
no member "gregorian" in namespace boost in the
user_function_test.cc file. Was also noticed in CI.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#25593
2025-08-21 11:32:47 +03:00
Ran Regev
ebf1db5c5e remove ./redis and dependencies
Remove ./redis and all its usages.
This is the second commit that removes
./redis from Scylla

Signed-off-by: Ran Regev <ran.regev@scylladb.com>
2025-08-20 17:53:23 +03:00
Avi Kivity
eefb6a0642 Merge 'storage_proxy: node_local_only: always use my_host_id' from Petr Gusev
The previous implementation did not handle topology changes well:
* In `node_local_only` mode with CL=1, if the current node is pending, the CL is increased to 2, causing
`unavailable_exception`.
* If the current tablet is in `write_both_read_old` and we try to read with `node_local_only` on the new node, the replica list will be empty.

This patch changes `node_local_only` mode to always use `my_host_id` as the replica list. An explicit check ensures the current node is a replica for the operation; otherwise `on_internal_error` is called.

backport: not needed, since `node_local_only` is only used in LWT for tablets and it hasn't been released yet.

Closes scylladb/scylladb#25508

* github.com:scylladb/scylladb:
  test_tablets_lwt: add test_lwt_during_migration
  storage_proxy: node_local_only: always use my_host_id
2025-08-20 12:11:44 +03:00
Botond Dénes
d20304fdf8 Merge 'test.py: dtest: port next_gating tests from commitlog_test.py' from Evgeniy Naydanov
Copy `commitlog_test.py` from scylla-dtest test suite and make it works with `test.py`

As a part of the porting process, remove unused imports and markers, remove non-next_gating tests and tests marked with `skip`, 'skip_if', and `xfail` markers.

test.py uses `commitlog` directory instead of dtest's `commitlogs`.

Also, add `commitlog_segment_size_in_mb: 32` option to test_stop_failure_policy to make _provoke_commitlog_failure
work.

Tests `test_total_space_limit_of_commitlog_with_large_limit` and `test_total_space_limit_of_commitlog_with_medium_limit` use too much disk space and have too big execution time.  Keep them in scylla-dtest for now.

Enable the test in `suite.yaml` (run in dev mode only.)

Additional modifications to test.py/dtest shim code:
- add ScyllaCluster.flush() method
- add ScyllaNode.stress() method
-  add tools/files.py::corrupt_file() function
- add tools/data.py::run_query_with_data_processing() function
- copy some assertions from dtest

Also add missed mode restriction for auth_test.py file.

Closes scylladb/scylladb#24946

* github.com:scylladb/scylladb:
  test.py: dtest: remove slow and greedy tests from commitlog_test.py
  test.py: dtest: make commitlog_test.py run using test.py
  test.py: dtest: add ScyllaCluster.flush() method
  test.py: dtest: add ScyllaNode.stress() method
  test.py: dtest: add tools/data.py::run_query_with_data_processing() function
  test.py: dtest: add tools/files.py::corrupt_file() function
  test.py: dtest: copy some assertions from dtest
  test.py: dtest: copy unmodified commitlog_test.py
2025-08-19 17:25:07 +03:00
Petr Gusev
894c8081e6 test_tablets_lwt: add test_lwt_during_migration 2025-08-19 16:11:56 +02:00
Evgeniy Naydanov
47e4d470af test.py: add host_id suffix to toxiproxy_server.log 2025-08-19 11:33:47 +00:00
Evgeniy Naydanov
8ea49092b7 test.py: metrics: add host_id suffix to .db file
CI can run several test.py sessions on different machines (builders) for
one build and, and to be not overwritten, .db file with metrics need to
have some unique name: add host_id as we already do for .xml report in
run_pytest()

Also add host_id columns to metric tables in case we will somehow
aggregate .db files.
2025-08-19 11:33:11 +00:00
Botond Dénes
66db95c048 Merge 'Preserve PyKMIP logs from failed KMIP tests' from Nikos Dragazis
This PR extends the `tmpdir` class with an option to preserve the directory if the destructor is called during stack unwinding. It also uses this feature in KMIP tests, where the tmpdir contains PyKMIP server logs, which may be useful when diagnosing test failures.

Fixes #25339.

Not so important to be backported.

Closes scylladb/scylladb#25367

* github.com:scylladb/scylladb:
  encryption_at_rest_test: Preserve tmpdir from failing KMIP tests
  test/lib: Add option to preserve tmpdir on exception
2025-08-19 13:17:29 +03:00
Avi Kivity
611918056a Merge 'repair: Add tablet incremental repair support' from Asias He
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.

Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.

This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:

The repaired_at on disk field of a SSTable is used.
   - A 64-bit number increases sequentially

The sstables_repaired_at is added to the system.tablets table.
   - repaired_at <= sstables_repaired_at means the sstable is repaired

The being_repaired in memory field of a SSTable is added.
   - A repair UUID tells which sstable has participated in the repair

Initial test results:

    1) Medium dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~500GB
    Cluster pre-populated with ~500GB of data before starting repairs job.
    Results for Repair Timings:
    The regular repair run took 210 mins.
    Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
    The speedup is: 183 mins  / 48s = 228X

    2) Small dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~167GB
    Cluster pre-populated with ~167GB of data before starting the repairs job.
    Regular repair 1st run took 110s,  2nd and 3rd runs took 110s.
    Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
    The speedup is: 110s / 1.5s = 73X

    3) Large dataset results
    Node amount: 6
    Instance type: i4i.2xlarge, 3 racks
    50% of base load, 50% read/write
    Dataset == Sum of data on each node

    Dataset     Non-incremental repair (minutes)
    1.3 TiB     31:07
    3.5 TiB     25:10
    5.0 TiB     19:03
    6.3 TiB     31:42

    Dataset     Incremental repair (minutes)
    1.3 TiB     24:32
    3.0 TiB     13:06
    4.0 TiB     5:23
    4.8 TiB     7:14
    5.6 TiB     3:58
    6.3 TiB     7:33
    7.0 TiB     6:55

Fixes #22472

Closes scylladb/scylladb#24291

* github.com:scylladb/scylladb:
  replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
  compaction: Move compaction_reenabler to compaction_reenabler.hh
  topology_coordinator: Make rpc::remote_verb_error to warning level
  repair: Add metrics for sstable bytes read and skipped from sstables
  test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
  test.py: Add tests for tablet incremental repair
  repair: Add tablet incremental repair support
  compaction: Add tablet incremental repair support
  feature_service: Add TABLET_INCREMENTAL_REPAIR feature
  tablet_allocator: Add tablet_force_tablet_count_increase and decrease
  repair: Add incremental helpers
  sstable: Add being_repaired to sstable
  sstables: Add set_repaired_at to metadata_collector
  mutation_compactor: Introduce add operator to compaction_stats
  tablet: Add sstables_repaired_at to system.tablets table
  test: Fix drain api in task_manager_client.py
2025-08-19 13:13:22 +03:00
Dawid Mędrek
2227eb48bb test/cqlpy/test_cdc.py: Add validation test for re-attached log tables
When the user disables CDC on a table, the CDC log table is not removed.
Instead, it's detached from the base table, and it functions as a normal
table (with some differences). If that log table lives up to the point
when the user re-enabled CDC on the base table, instead of creating a new
log table, the old one is re-attached to the base.

For more context on that, see commit:
scylladb/scylladb@adda43edc7.

In this commit, we add validation tests that check whether the changes
on the base table after disabling CDC are reflected on the log table
after re-enabling CDC. The definition of the log table should be the same
as if CDC had never been disabled.

Closes scylladb/scylladb#25071
2025-08-19 10:15:41 +02:00
Botond Dénes
f8b79d563a Merge 's3: Minor refactoring and beautification of S3 client and tests' from Ernest Zaslavsky
This pull request introduces minor code refactoring and aesthetic improvements to the S3 client and its associated test suite. The changes focus on enhancing readability, consistency, and maintainability without altering any functional behavior.

No backport is required, as the modifications are purely cosmetic and do not impact functionality or compatibility.

Closes scylladb/scylladb#25490

* github.com:scylladb/scylladb:
  s3_client: relocate `req` creation closer to usage
  s3_client: reformat long logging lines for readability
  s3_test: extract file writing code to a function
2025-08-18 18:48:42 +03:00
Avi Kivity
96956e48c4 Merge 'utils: stall_free: detect clear_gently method of const payload types' from Benny Halevy
Currently, when a container or smart pointer holds a const payload
type, utils::clear_gently does not detect the object's clear_gently
method as the method is non-const and requires a mutable object,
as in the following example in class tablet_metadata:
```
    using tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>;
    using table_to_tablet_map = std::unordered_map<table_id, tablet_map_ptr>;
```

That said, when a container is cleared gently the elements it holds
are destroyed anyhow, so we'd like to allow to clear them gently before
destruction.

This change still doesn't allow directly calling utils::clear_gently
an const objects.

And respective unit tests.

Fixes #24605
Fixed #25026

* This is an optimization that is not strictly required to backport (as https://github.com/scylladb/scylladb/pull/24618 dealt with clear_gently of `tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>` well enough)

Closes scylladb/scylladb#24606

* github.com:scylladb/scylladb:
  utils: stall_free: detect clear_gently method of const payload types
  utils: stall_free: clear gently a foreign shared ptr only when use_count==1
2025-08-18 12:52:02 +03:00
Evgeniy Naydanov
ab1a093d94 test.py: dtest: remove slow and greedy tests from commitlog_test.py
Tests test_total_space_limit_of_commitlog_with_large_limit and
test_total_space_limit_of_commitlog_with_medium_limit use too much
disk space and have too big execution time.  Keep them in
scylla-dtest for now.
2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
647043d957 test.py: dtest: make commitlog_test.py run using test.py
As a part of the porting process, remove unused imports and
markers, remove non-next_gating tests and tests marked with
`skip`, 'skip_if', and `xfail` markers.

test.py uses `commitlog` directory instead of dtest's
`commitlogs`.

Remove test_stop_failure_policy test because the way how it
provoke commitlog failure (change file permission) doesn't
work on CI.

Enable the test in suite.yaml (run in dev mode only)
2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
5f6e083124 test.py: dtest: add ScyllaCluster.flush() method 2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
c378dc3fab test.py: dtest: add ScyllaNode.stress() method 2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
6f42019900 test.py: dtest: add tools/data.py::run_query_with_data_processing() function 2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
2c4f2de3b0 test.py: dtest: add tools/files.py::corrupt_file() function 2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
80b797e376 test.py: dtest: copy some assertions from dtest
Copy assertions required for commitlog_test.py:
  - assert_almost_equal
  - assert_row_count
  - assert_row_count_in_select_less
  - assert_lists_equal_ignoring_order
2025-08-18 09:42:13 +00:00
Evgeniy Naydanov
1a2d132456 test.py: dtest: copy unmodified commitlog_test.py 2025-08-18 09:42:13 +00:00
Pavel Emelyanov
4f55af9578 Merge 'test.py: pytest: support --mode/--repeat in a common way for all tests' from Evgeniy Naydanov
Implement repetition of files using `pytest_collect_file` hook: run file collection as many times as needed to cover all `--mode`/`--repeat` combinations.  Store build mode and run ID to the stash of repeated item.

Some additional changes done:

- Add `TestSuiteConfig` class to handle all operations with `test_config.yaml`
- Add support for `run_first` option in `test_config.yaml`
- Move disabled test logic to `pytest_collect_file` hook.

These changes allow to to remove custom logic for  `--mode`, `--repeat`, and disabled tests in the code for C++ tests and prepare for switching of Python/CQLApproval/Topology tests to pytest runner.

Also, this PR includes required refactoring changes and fixes:

- Simplify support of C++ tests: remove redundant facade abstraction and put all code into 3 files: `base.py`, `boost.py`, and `unit.py`
- Remove unused imports in `test.py`
- Use the constant for `"suite.yaml"` string
- Some test suites have own test runners based on pytest, and they don't need all stuff we use for `test.py`.  Move all code related to `test.py` framework to `test/pylib/runner.py` and use it as a plugin conditionally (by using `SCYLLA_TEST_RUNNER` env variable.)
- Add `cwd` parameter to `run_process()` methods in `resource_gather` module to avoid using of `os.chdir()` (and sort parameters in the same order as in `subprocess.Popen`.)
- `extra_scylla_cmdline_options` is a list of commandline arguments and, actually, each argument should be a separate item.  Few configuration files have `--reactor-backend` option added in the format which doesn't follow this rule.

This PR is a refactoring step for https://github.com/scylladb/scylladb/pull/25443

Closes scylladb/scylladb#25465

* github.com:scylladb/scylladb:
  test.py: pytest: support --mode/--repeat in a common way for all tests
  test.py: pytest: streamline suite configuration handling
  test.py: refactor: remove unused imports in test.py
  test.py: fix run with bare pytest after merge of scylladb/scylladb#24573
  test.py: refactor: move framework-related code to test.pylib.runner
  test.py: resource_gather: add cwd parameter to run_process()
  test.py: refactor: use proper format for extra_scylla_cmdline_options
2025-08-18 12:24:04 +03:00
Avi Kivity
e9928b31b8 Merge 'sstables/trie: add BTI key translation routines' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25396
Next part: implementing sstable index writers and readers on top of the abstract trie writers/readers.

The new code added in this PR isn't used outside of tests yet, but it's posted as a separate PR for reviewability.

This series provides translation routines for ring positions and clustering positions
from Scylla's native in-memory structures to BTI's byte-comparable encoding.

This translation is performed whenever a new decorated key or clustering block
are added to a BTI index, and whenever a BTI index is queried for a range of positions.

For a description of the encoding, see
fad1f74570/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md (multi-component-sequences-partition-or-clustering-keys-tuples-bounds-and-nulls)

The translation logic, with all the fragment awareness, lazy
evaluation and avoidable copies, is fairly bloated for the common cases
of simple and small keys. This is a potential optimization target for later.

No backports needed, new functionality.

Closes scylladb/scylladb#25506

* github.com:scylladb/scylladb:
  sstables/trie: add BTI key translation routines
  tests/lib: extract generate_all_strings to test/lib
  tests/lib: extract nondeterministic_choice_stack to test/lib
  sstables/trie/trie_traversal: extract comparable_bytes_iterator to its own file
  sstables/mx: move clustering_info from writer.cc to types.hh
  sstables/trie: allow `comparable_bytes_iterator` to return a mutable span
  dht/ring_position: add ring_position_view::weight()
2025-08-18 11:55:26 +03:00
Asias He
76316f44a7 repair: Add metrics for sstable bytes read and skipped from sstables
scylla_repair_inc_sst_skipped_bytes: Total number of bytes skipped from
sstables for incremental repair on this shard.

scylla_repair_inc_sst_read_bytes : Total number of bytes read from
sstables for incremental repair on this shard.
2025-08-18 11:01:22 +08:00
Asias He
b0364fcba3 test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
Disable incremental repair so that the second repair can still work on
the repaired data set.
2025-08-18 11:01:22 +08:00
Asias He
ad5275fd4c test.py: Add tests for tablet incremental repair
The following tests are added for tablet incremental repair:

- Basic incremental repair

- Basic incremental repair with error

- Minor compaction and incremental repair

- Major compaction and incremental repair

- Scrub compaction and incremental repair

- Cleanup/Upgrade compaction and incremental repair

- Tablet split and incremental repair

- Tablet merge and incremental repair
2025-08-18 11:01:21 +08:00
Asias He
f9021777d8 compaction: Add tablet incremental repair support
This patch addes incremental_repair support in compaction.

- The sstables are split into repaired and unrepaired set.

- Repaired and unrepaired set compact sperately.

- The repaired_at from sstable and sstables_repaired_at from
  system.tablets table are used to decide if a sstable is repaired or
  not.

- Different compactions tasks, e.g., minor, major, scrub, split, are
  serialized with tablet repair.
2025-08-18 11:01:21 +08:00
Evgeniy Naydanov
e44b26b809 test.py: pytest: support --mode/--repeat in a common way for all tests
Implement repetition of files using pytest_collect_file hook: run
file collection as many times as needed to cover all --mode/--repeat
combinations.  Also move disabled test logic to this hook.

Store build mode and run_id in pytest item stashes.

Simplify support of C++ tests: remove redundant facade abstraction and put
all code into 3 files: base.py, boost.py, and unit.py

Add support for `run_first` option in test_config.yaml
2025-08-17 15:26:23 +00:00
Evgeniy Naydanov
bffb6f3d01 test.py: pytest: streamline suite configuration handling
Move test_config.yaml handling code from common_cpp_conftest.py to
TestSuiteConfig class in test/pylib/runner.py
2025-08-17 12:32:36 +00:00
Evgeniy Naydanov
a2a59b18a3 test.py: refactor: remove unused imports in test.py
Also use the constant for "suite.yaml" string.
2025-08-17 12:32:36 +00:00
Evgeniy Naydanov
a188523448 test.py: fix run with bare pytest after merge of scylladb/scylladb#24573
To run tests with bare pytest command we need to have almost the
same set of options as test.py because we reuse code from test.py.

scylladb/scylladb#24573 added `--pytest-arg` option to test.py but
not to test/conftest.py which breaks running Python tests using
bare pytest command.
2025-08-17 12:32:35 +00:00
Evgeniy Naydanov
600d05471b test.py: refactor: move framework-related code to test.pylib.runner
Some test suites have own test runners based on pytest, and they
don't need all stuff we use for test.py.  Move all code related to
test.py framework to test/pylib/runner.py and use it as a plugin
conditionally (by using TEST_RUNNER variable.)
2025-08-17 12:32:35 +00:00
Evgeniy Naydanov
f2619d2bb0 test.py: resource_gather: add cwd parameter to run_process()
Also done sort arguments in Popen call to match the signature.
2025-08-17 12:32:35 +00:00