Commit Graph

49073 Commits

Author SHA1 Message Date
Avi Kivity
6dc2c42f8b alternator: streams: refactor std::views::transform with side effect
std::views::trasform()s should not have side effects since they could be
called several times, depending on the algorithm they're paired with.
For example, std::ranges::to() can run the algorithm once to measure
the resulting container size, and then a second time to copy the data
(avoiding reallocations). If that happens, then the side-effect happens
twice.

Avoid this be refactoring the code. Make the side-effect -- appending
to the `column` vector -- happen first, then use that result to generate
the `regular_column` vector.

In this case, the side effect did not happen twice because small_vector's
std::from_range_t constructor only reserves if the input range is sized
(and it is not), but better not have the weakness in the code.

Closes scylladb/scylladb#25011
2025-08-25 09:40:05 +03:00
Michael Litvak
25fb3b49fa dist/docker: add dc and rack arguments
add --dc and --rack commandline arguments to the scylla docker image, to
allow starting a node with a specified dc and rack names in a simple
way.

This is useful mostly for small examples and demonstrations of starting
multiple nodes with different racks, when we prefer not to bother with
editing configuration files. The ability to assign nodes to different
racks is especially important with RF=Rack enforcing.

The previous method to achieve this is to set the snitch to
GossipingPropertyFileSnitch and provide a configuration file in
/etc/scylla/cassandra-rackdc.properties with the name of the dc and
rack.

The new dc and rack parameters are implemented similarly by using the
snitch GossipingPropertyFileSnitch and writing the dc and rack values to
the rackdc properties file. We don't support passing the parameters
together with a different snitch, or when mounting a properties file
from the host, because we don't want to overwrite it.

Example:
docker run -d --name scylla1 scylladb/scylla --dc my_dc1 --rack my_rack1

Fixes scylladb/scylladb#23423

Closes scylladb/scylladb#25607
2025-08-24 17:48:07 +03:00
Nadav Har'El
87dd96f9a2 Merge ' Alternator: DynamoDB compatible WCU Calculation via Read-Before-Write Support' from Amnon Heiman
This series adds support for a DynamoDB-compatible Write Capacity Unit (WCU) calculation in Alternator by introducing an optional forced read-before-write mechanism.

Alternator's model differs from DynamoDB, and as a result, some write operations may report lower WCU usage compared to what DynamoDB would report. While this is acceptable in many cases, there are scenarios where users may require accurate WCU reporting that aligns more closely with DynamoDB's behavior.

To address this, a new configuration option, alternator_force_read_before_write, is introduced. When enabled, Alternator will perform a read before executing PutItem, UpdateItem, and DeleteItem operations. This allows it to take the existing item size into account when computing the WCU. BatchWriteItem support is also extended to use this mechanism. Because BatchWriteItem does not support returning old items directly, several internal changes were made to support reading previous item sizes with minimal overhead. Reads are performed at consistency level LOCAL_ONE for efficiency, and the WCU calculation is now done in multiple stages to accurately account for item size differences.

In addition to the implementation changes, test coverage was added to validate the new behavior. These tests confirm that WCU is calculated based on the larger of the old and new items when read-before-write is active, including for BatchWriteItem.

This feature comes with performance overhead and is therefore disabled by default. It can be enabled at runtime via the system.config table and should be used only when precise WCU tracking is necessary.
**New feature, no need to backport**

Closes scylladb/scylladb#24436

* github.com:scylladb/scylladb:
  alternator/test_returnconsumedcapacity.py: Test forced read before write
  alternator/executor.cc: DynamoDB WCU calculation in BatchWriteItem using read-before-write
  executor.cc: get_previous_item with consistency level
  executor: Extend API of put_or_delete_item
  alternator/executor.cc: Accurate WCU for put, update, delete
  config: add alternator_force_read_before_write
2025-08-24 11:38:24 +03:00
Avi Kivity
8815491085 treewide: include boost headers as "system" headers
Boost is external to the project so treat its headers as "system"
headers and include them with angle brackets.

Closes scylladb/scylladb#25619
2025-08-22 17:21:24 +03:00
Piotr Dulikowski
5709d94826 Merge 'cql3: Warn when creating RF-rack-invalid keyspace' from Dawid Mędrek
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

If the option is turned off, we will also log all of the
RF-rack-invalid keyspaces at start-up.

We provide validation tests.

Fixes scylladb/scylladb#23330

Backport: we'd like to encourage the user to abide by the restriction
even when they don't enforce it to make it easier in the future to
adjust the schema when there's no way to disable it anymore. Because
of that, we'd like to backport it to all relevant versions, starting with 2025.1.

Closes scylladb/scylladb#24785

* github.com:scylladb/scylladb:
  main: Log RF-rack-invalid keyspaces at startup
  cql3/statements: Fix indentation
  cql3: Warn when creating RF-rack-invalid keyspace
2025-08-22 11:33:32 +02:00
Evgeniy Naydanov
ab15c94a09 test.py: dtest/commitlog_test: add test_pinned_cl_segment_doesnt_resurrect_data
test_pinned_cl_segment_doesnt_resurrect_data was not moved in #24946 from
scylla-dtest to this repo, because it's marked as xfail (#14879), but
actually the issue is fixed and there is no reason to keep the test in
scylla-dtest.

Also remove unused imports.

Closes scylladb/scylladb#25592
2025-08-22 11:30:10 +03:00
Raphael S. Carvalho
149f9d8448 replica: Fix race between drop table and merge completion handling
Consider this:
1) merge finishes, wakes up fiber to merge compaction groups
2) drop table happens, which in turn invokes truncate underneath
3) merge fiber stops old groups
4) truncate disables compaction on all groups, but the ones stopped
5) truncate performs a check that compaction has been disabled on
all groups, including the ones stopped
6) the check fails because groups being stopped didn't have compaction
explicitly disabled on them

To fix it, the check on step 6 will ignore groups that have been
stopped, since those are not eligible for having compaction explicitly
disabled on them. The compaction check is there, so ongoing compaction
will not propagate data being truncated, but here it happens in the
context of drop table which doesn't leave anything behind. Also, a
group stopped is somewhat equivalent to compaction disabled on it,
since the procedure to stop a group stops all ongoing compaction
and eventually removes its state from compaction manager.

Fixes #25551.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#25563
2025-08-22 10:19:43 +03:00
kendrick-ren
d6e62aeb6a Update launch-on-gcp.rst
Add the missing '=' mark in --zone option. Otherwise the command complains.

Closes scylladb/scylladb#25471
2025-08-22 10:13:52 +03:00
Botond Dénes
3dcb596201 Merge 'test: properly unset recovery_leader in the recovery procedure tests' from Patryk Jędrzejczak
After changing the type of the `recovery_leader` config option from
`sstring` to `UUID` in #25032, setting `recovery_leader` to an empty
string became an incorrect way to unset it. The following error started
to appear in the recovery procedure tests:
```
init - marshaling error: UUID string size mismatch: '' : recovery_leader
```
We unset `recovery_leader` properly in this PR. To do it, we introduce
a simple way to remove config options in tests.

Backport is unneeded. This error was harmless, and Scylla ignored
`recovery_leader` after logging the error as expected by the tests.

Closes scylladb/scylladb#25365

* github.com:scylladb/scylladb:
  test: properly unset recovery_leader in the recovery procedure tests
  test: manager_client: allow removing a config option
  test: manager_client: add docstring to server_update_config
2025-08-22 10:09:37 +03:00
Benny Halevy
45c496c276 api: storage_service: fix token_range documentation
Note that the token_range type is used only by describe_ring.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25609
2025-08-22 10:06:21 +03:00
Patryk Jędrzejczak
193a74576a test/cluster/conftest: cluster_con: provide default values for port and use_ssl
Some cluster tests use `cluster_con` when they need a different load
balancing policy or auth provider. However, no test uses a port
other than 9042 or enables SSL, but all tests must pass `9042, False`
because these parameters don't have default values. This makes the code
more verbose. Also, it's quite obvious that 9042 stands for port, but
it's not obvious what `False` is related to, so there is a need to check
the definition of `cluster_con` while reading any test that uses it.

No reason to backport, it's only a minor refactoring.

Closes scylladb/scylladb#25516
2025-08-22 09:51:24 +03:00
David Garcia
07d798a59d docs: fix sidebar on local preview
Closes scylladb/scylladb#25560
2025-08-22 09:50:07 +03:00
David Garcia
c3c70ba73f docs: expose alternator metrics
Renders in the docs some metrics introduced in https://github.com/scylladb/scylladb/pull/24046/files that were not being displayed in https://docs.scylladb.com/manual/stable/reference/metrics.html

Closes scylladb/scylladb#25561
2025-08-22 09:49:52 +03:00
David Garcia
461a0bad8a docs: do not show any version warning for upgrade guide pages
Closes scylladb/scylladb#25562
2025-08-22 09:49:27 +03:00
Avi Kivity
e140bd6355 Update seastar submodule
* seastar 1520326e6...0a90f7945 (13):
  > include: Keep linux-aio.hh deeper in internal
  > include: Move array_map.hh to util/internal/
  > io_tester: Add support for scheduling supergroups
  > Merge 'tls: Force buffer splitting into gnutls record block sized chunks' from Calle Wilund
  > Add iotune --get-best-iops-with-buffer-sizes option
  > noncopyable_function: use memcpy instead of bytewise copy loop
  > test: Make fair_queue_test validation code use BOOST_CHECK_...-s
  > test: Rework test_fair_queue_random_run
  > Merge 'Remove capacity configuration for fair_queue tests' from Pavel Emelyanov
  > reactor: replace boost::barrier with std::barrier<>
  > rpc: server::process(): reindent
  > test: Remove no-op dispatch from fair_queue ticker
  > Merge 'json: API level 8: use noncopyable_function in json_return_type' from Benny (#2921)

Closes scylladb/scylladb#25624
2025-08-22 09:41:02 +03:00
Andrzej Jackowski
86fc513bd9 auth: allow dropping roles in saslauthd_authenticator
Before this change, `saslauthd_authenticator` prevented dropping
roles. The current documentation instructs users to `Ensure Scylla has
the same users and roles as listed in the LDAP directory`. Therefore,
ScyllaDB should allow dropping roles so administrators can remove
obsolete roles from both LDAP and ScyllaDB.

The code change is minimal — dropping a role is a no-op, similar to the
existing no-op implementations for successful `create` and `alter`
operations.

`saslauthd_authenticator_test` is updated to verify that dropping
a role doesn't throw anymore.

Fixes: scylladb/scylladb#25571

Closes scylladb/scylladb#25574
2025-08-22 09:40:44 +03:00
Yaron Kaikov
c0fd3deeab github: Enhance label sync to support P0/P1 priority labels
Extend the existing label synchronization system to handle P0 and P1
priority labels in addition to backport/* labels:

- Add P0/P1 label syncing between issues and PRs bidirectionally
- Automatically add 'force_on_cloud' label to PRs when P0/P1 labels
  are present (either copied from issues or added directly)

The workflow now triggers on P0 and P1 label events in addition to
backport/* labels, ensuring priority labels are properly reflected
across the entire PR lifecycle.

Refs: https://github.com/scylladb/scylla-pkg/issues/5383

Closes scylladb/scylladb#25604
2025-08-22 06:50:13 +03:00
Dawid Mędrek
837d267cbf main: Log RF-rack-invalid keyspaces at startup
When the configuration option `rf_rack_valid_keyspaces` is enabled and there
is an RF-rack-invalid keyspace, starting a node fails. However, when the
configuration option is disabled, but there still is a keyspace that violates
the condition, we'd like Scylla to print a warning informing the user about
the fact. That's what happens in this commit.

We provide a validation test.
2025-08-21 19:35:33 +02:00
Dawid Mędrek
af8a3dd17b cql3/statements: Fix indentation 2025-08-21 19:29:36 +02:00
Dawid Mędrek
60ea22d887 cql3: Warn when creating RF-rack-invalid keyspace
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

We provide a validation test.
2025-08-21 19:29:33 +02:00
Evgeniy Naydanov
3a98331731 test.py: don't fail if use multiple tests from one dir in commandline
There is the stash item REPEATED_FILES for directory items which used to cut
recursion.  But if multiple tests from one directory added to ./test.py
commandline this solution prevents handling non-first tests well because
it was already collected for the first one.  Change behavior to not store
all repeated files in the stash but just files which are in the process
of repetition.  Rename the stash item to REPEATING_FILES to reflect this
change.

Closes scylladb/scylladb#25611
2025-08-21 19:43:13 +03:00
Dawid Pawlik
01e7a48030 vector_store_client: fix HTTP error message formatting
Content of the HTTP error was logged in Scylla as literal
list of chars (default temporary buffer formatting).

Changed to print the sstring made out of temporary buffer,
which fixes the problem with formatting, making the output
clear and readable for humans.

Fixes: VECTOR-141

Closes scylladb/scylladb#25329
2025-08-21 14:33:41 +02:00
Botond Dénes
09dc285b4a Merge 'Remove redis from scylla source tree' from Ran Regev
- **remove redis documentation**
First, remove the redis documentation.

- **remove ./redis and dependencies**
Second, remove the redis directory and its dependencies from the project.

Fixes: #25144

This is a cleanup, no need to backport.

Closes scylladb/scylladb#25148

* github.com:scylladb/scylladb:
  remove ./redis and dependencies
  remove redis documentation
2025-08-21 14:26:11 +03:00
Pavel Emelyanov
47750496d2 Merge 'test.py: metrics: add host_id suffix to .db file' from Evgeniy Naydanov
CI can run several test.py sessions on different machines (builders) for one build and, and to be not overwritten, .db file with metrics need to have some unique name: add host_id as we already do for .xml report in `run_pytest()`

Also add host_id columns to metric tables in case we will somehow aggregate .db files.

Add host_id suffix to `toxiproxy_server.log` for the same reason.

Fixes: https://github.com/scylladb/scylladb/issues/25462

Closes scylladb/scylladb#25542

* github.com:scylladb/scylladb:
  test.py: add host_id suffix to toxiproxy_server.log
  test.py: metrics: add host_id suffix to .db file
2025-08-21 11:34:47 +03:00
Robert Bindar
3291a5cc75 Fix dbuild boost::gregorian usage error
On my dbuild runs, compiler complained about
no member "gregorian" in namespace boost in the
user_function_test.cc file. Was also noticed in CI.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#25593
2025-08-21 11:32:47 +03:00
Petr Gusev
f261b4594d ip_address_updater: call raft_topology_update_ip even if ip hasn't changed
Previously, the prev_ip check caused problems for bootstrapping nodes.
Suppose a bootstrapping node A appears in the system.peers table of
some other node B. Its record has only ID and IP of the node A, due to
the special handling of bootstrapping nodes in raft_topology_update_ip.
Suppose node B gets temporarily isolated from the topology coordinator.
The topology coordinator fences out node B and succesfully finishes
bootstrapping of the node A. Later, when the connectivity is restored,
topology_state_load runs on the node B, node A is already in
normal state, but the gossiper on B might not yet have any state for
it yet. In this case, raft_topology_update_ip would not update
system.peers because the gossiper state is missing. Subsequently,
on_join/on_restart/on_alive events would skip updates because the IP
in gossiper matches the IP for that node in system.peers.

Removing the check avoids this issue, with negligible overhead:
* on_join/on_restart/on_alive happen only once in a
node’s lifetime
* topology_state_load already updates all nodes each time it runs.

This problem was found by a fencing test, which crashed a
node while another node was going through the bootstrapping
process. After restart the node saw that other node already
is in normal state, since the topology coordinator fenced out
this node and managed to finish the bootstrapping process
successfully. This test will be provided in a separate
fencing-for-paxos PR.

Closes scylladb/scylladb#25596
2025-08-21 10:02:06 +02:00
Ernest Zaslavsky
4bee0491ba cmake: Add missing incremental.cc to repair/CMakeLists.txt
Add `incremental.cc` to `repair/CMakeLists.txt` to fix CMake based build

Closes scylladb/scylladb#25601
2025-08-21 09:40:36 +03:00
Asias He
b12404ba52 streaming: Enclose potential throws in try block and ensure sink close before logging
- Move the initialization of log_done inside the try block to catch any
  exceptions it may throw.

- Relocate the failure warning log after sink.close() cleanup
  to guarantee sink.close() is always called before logging errors.

Refs #25497

Closes scylladb/scylladb#25591
2025-08-20 19:46:56 +02:00
Ran Regev
ebf1db5c5e remove ./redis and dependencies
Remove ./redis and all its usages.
This is the second commit that removes
./redis from Scylla

Signed-off-by: Ran Regev <ran.regev@scylladb.com>
2025-08-20 17:53:23 +03:00
Ran Regev
6eca083137 remove redis documentation
As part of removing redis from Scylla source tree.
This commit removes all related documentation.
Following commit remove the code itself.

Signed-off-by: Ran Regev <ran.regev@scylladb.com>
2025-08-20 17:53:23 +03:00
Benny Halevy
6129411a5e locator: utils: get_all_ranges, construct_range_to_endpoint_map: use end-bound ranges
Commit 60d2cc886a changed
get_all_ranges to return start-bound ranges and pre-calculate
the wrapping range, and then construct_range_to_endpoint_map
to pass r.start() (that is now always engaged) as the vnode token.

However, as can be seen in token_metadata_impl::first_token
the token ranges (a.k.a. vnodes) **end** with the sorted tokens,
not start with them, so an arbitrary token t belongs to a
vnode in some range `sorted_tokens[i-1] < t <= sorted_tokens[i]`

Fixes #25541

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25580
2025-08-20 15:15:40 +02:00
Avi Kivity
eefb6a0642 Merge 'storage_proxy: node_local_only: always use my_host_id' from Petr Gusev
The previous implementation did not handle topology changes well:
* In `node_local_only` mode with CL=1, if the current node is pending, the CL is increased to 2, causing
`unavailable_exception`.
* If the current tablet is in `write_both_read_old` and we try to read with `node_local_only` on the new node, the replica list will be empty.

This patch changes `node_local_only` mode to always use `my_host_id` as the replica list. An explicit check ensures the current node is a replica for the operation; otherwise `on_internal_error` is called.

backport: not needed, since `node_local_only` is only used in LWT for tablets and it hasn't been released yet.

Closes scylladb/scylladb#25508

* github.com:scylladb/scylladb:
  test_tablets_lwt: add test_lwt_during_migration
  storage_proxy: node_local_only: always use my_host_id
2025-08-20 12:11:44 +03:00
Avi Kivity
34f661e5aa Merge 'Make api/column_family endpoints capture and use sharded<database>' from Pavel Emelyanov
The http_context object carries sharded<database> reference and all handlers in the api/ code can use it they way they want. This creates potential use-after-free, because the context is initialized very early and is destroyed very late. All other services are used by handlers differently -- after a service is initialized, the relevant endpoints are registered and the service reference is captured on handlers. Since endpoint deregistration is defer-scheduled at the same place, this guarantees that handlers cannot use the service after it's stopped.

This PR does the same for api/ handlers -- the sharded<database> reference is captured inside set_server_column_family() and then used by handlers lambdas.

Similar changes for other services: #21053, #19417, #15831, etc

It's a part of the on-going cleanup of service dependencies, no need to backport

Closes scylladb/scylladb#25467

* github.com:scylladb/scylladb:
  api/column_family: Capture sharded<database> to call get_cf_stats()
  api: Patch get_cf_stats to get sharded<database>& argument
  api: Drop CF map-reducers ability to work with http context
  api: Patch callers of map_reduce_cf(_raw)? to use sharded<database>
  api: Use captured sharded<database> reference in handlers
  api/column_family: Make map_reduce_cf_time_histogram() use sharded<database>
  api/column_famliy: Make sum_sstable() use sharded<database>
  api/column_family: Make get_cf_unleveled_sstables() use sharded<database>
  api/column_famliy: Make get_cf_stats_count() use sharded<database>
  api/column_family: Make get_cf_rate_and_histogram() use sharded<database>
  api/column_family: Make get_cf_histogram() use sharded<database>
  api/column_family: Make get_cf_stats_sum() use sharded<database>
  api/column_family: Make set_tables_tombstone_gc() use sharded<database>
  api/column_family: Make set_tables_autocompaction() use sharded<database>
  api/column_family: Make for_tables_on_all_shards() use sharded<database>
  api: Capture sharded<database> for set_server_column_family()
  api: Make CF map-reducers work on sharded<database> directly
  api: Make map_reduce_cf_time_histogram() file-local
  api: Remove unused ctx argument from run_toppartitions_query()
2025-08-20 12:09:39 +03:00
Avi Kivity
352cda4467 treewide: avoid including gms/feature_service.hh from headers
To avoid dependency proliferation, switch to forward declarations.

In one case, we introduce indirection via std::unique_ptr and
deinline the constructor and destructor.

Ref #1

Closes scylladb/scylladb#25584
2025-08-20 10:30:27 +03:00
Botond Dénes
d20304fdf8 Merge 'test.py: dtest: port next_gating tests from commitlog_test.py' from Evgeniy Naydanov
Copy `commitlog_test.py` from scylla-dtest test suite and make it works with `test.py`

As a part of the porting process, remove unused imports and markers, remove non-next_gating tests and tests marked with `skip`, 'skip_if', and `xfail` markers.

test.py uses `commitlog` directory instead of dtest's `commitlogs`.

Also, add `commitlog_segment_size_in_mb: 32` option to test_stop_failure_policy to make _provoke_commitlog_failure
work.

Tests `test_total_space_limit_of_commitlog_with_large_limit` and `test_total_space_limit_of_commitlog_with_medium_limit` use too much disk space and have too big execution time.  Keep them in scylla-dtest for now.

Enable the test in `suite.yaml` (run in dev mode only.)

Additional modifications to test.py/dtest shim code:
- add ScyllaCluster.flush() method
- add ScyllaNode.stress() method
-  add tools/files.py::corrupt_file() function
- add tools/data.py::run_query_with_data_processing() function
- copy some assertions from dtest

Also add missed mode restriction for auth_test.py file.

Closes scylladb/scylladb#24946

* github.com:scylladb/scylladb:
  test.py: dtest: remove slow and greedy tests from commitlog_test.py
  test.py: dtest: make commitlog_test.py run using test.py
  test.py: dtest: add ScyllaCluster.flush() method
  test.py: dtest: add ScyllaNode.stress() method
  test.py: dtest: add tools/data.py::run_query_with_data_processing() function
  test.py: dtest: add tools/files.py::corrupt_file() function
  test.py: dtest: copy some assertions from dtest
  test.py: dtest: copy unmodified commitlog_test.py
2025-08-19 17:25:07 +03:00
Michał Chojnowski
c1b513048c sstables/types.hh: fix fmt::formatter<sstables::deletion_time>
Obvious typo.

Fixes scylladb/scylladb#25556

Closes scylladb/scylladb#25557
2025-08-19 17:21:18 +03:00
Petr Gusev
894c8081e6 test_tablets_lwt: add test_lwt_during_migration 2025-08-19 16:11:56 +02:00
Petr Gusev
ed6bec2cac storage_proxy: node_local_only: always use my_host_id
The previous implementation did not handle topology changes well:
* In node_local_only mode with CL=1, if the current node is pending,
  the CL is raised to 2, causing unavailable_exception.
* If the current tablet is in write_both_read_old and we read with
  node_local_only on the new node, the replica list is empty.

This patch changes node_local_only mode to always use my_host_id as
the replica list. An explicit check ensures the current node is a
replica for the operation; otherwise on_internal_error is called.
2025-08-19 16:11:49 +02:00
Evgeniy Naydanov
47e4d470af test.py: add host_id suffix to toxiproxy_server.log 2025-08-19 11:33:47 +00:00
Evgeniy Naydanov
8ea49092b7 test.py: metrics: add host_id suffix to .db file
CI can run several test.py sessions on different machines (builders) for
one build and, and to be not overwritten, .db file with metrics need to
have some unique name: add host_id as we already do for .xml report in
run_pytest()

Also add host_id columns to metric tables in case we will somehow
aggregate .db files.
2025-08-19 11:33:11 +00:00
Botond Dénes
66db95c048 Merge 'Preserve PyKMIP logs from failed KMIP tests' from Nikos Dragazis
This PR extends the `tmpdir` class with an option to preserve the directory if the destructor is called during stack unwinding. It also uses this feature in KMIP tests, where the tmpdir contains PyKMIP server logs, which may be useful when diagnosing test failures.

Fixes #25339.

Not so important to be backported.

Closes scylladb/scylladb#25367

* github.com:scylladb/scylladb:
  encryption_at_rest_test: Preserve tmpdir from failing KMIP tests
  test/lib: Add option to preserve tmpdir on exception
2025-08-19 13:17:29 +03:00
Avi Kivity
611918056a Merge 'repair: Add tablet incremental repair support' from Asias He
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.

Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.

This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:

The repaired_at on disk field of a SSTable is used.
   - A 64-bit number increases sequentially

The sstables_repaired_at is added to the system.tablets table.
   - repaired_at <= sstables_repaired_at means the sstable is repaired

The being_repaired in memory field of a SSTable is added.
   - A repair UUID tells which sstable has participated in the repair

Initial test results:

    1) Medium dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~500GB
    Cluster pre-populated with ~500GB of data before starting repairs job.
    Results for Repair Timings:
    The regular repair run took 210 mins.
    Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
    The speedup is: 183 mins  / 48s = 228X

    2) Small dataset results
    Node amount: 3
    Instance type: i4i.2xlarge
    Disk usage per node: ~167GB
    Cluster pre-populated with ~167GB of data before starting the repairs job.
    Regular repair 1st run took 110s,  2nd and 3rd runs took 110s.
    Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
    The speedup is: 110s / 1.5s = 73X

    3) Large dataset results
    Node amount: 6
    Instance type: i4i.2xlarge, 3 racks
    50% of base load, 50% read/write
    Dataset == Sum of data on each node

    Dataset     Non-incremental repair (minutes)
    1.3 TiB     31:07
    3.5 TiB     25:10
    5.0 TiB     19:03
    6.3 TiB     31:42

    Dataset     Incremental repair (minutes)
    1.3 TiB     24:32
    3.0 TiB     13:06
    4.0 TiB     5:23
    4.8 TiB     7:14
    5.6 TiB     3:58
    6.3 TiB     7:33
    7.0 TiB     6:55

Fixes #22472

Closes scylladb/scylladb#24291

* github.com:scylladb/scylladb:
  replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
  compaction: Move compaction_reenabler to compaction_reenabler.hh
  topology_coordinator: Make rpc::remote_verb_error to warning level
  repair: Add metrics for sstable bytes read and skipped from sstables
  test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
  test.py: Add tests for tablet incremental repair
  repair: Add tablet incremental repair support
  compaction: Add tablet incremental repair support
  feature_service: Add TABLET_INCREMENTAL_REPAIR feature
  tablet_allocator: Add tablet_force_tablet_count_increase and decrease
  repair: Add incremental helpers
  sstable: Add being_repaired to sstable
  sstables: Add set_repaired_at to metadata_collector
  mutation_compactor: Introduce add operator to compaction_stats
  tablet: Add sstables_repaired_at to system.tablets table
  test: Fix drain api in task_manager_client.py
2025-08-19 13:13:22 +03:00
Dawid Pawlik
50eeb11c84 .gitignore: add rust target
When using automatic rust build tools in IDE,
the files generated in `rust/target/` directory
has been treated by git as unstaged changes.

After the change, the generated files will not
pollute the git changes interface.

Closes scylladb/scylladb#25389
2025-08-19 13:09:18 +03:00
Dawid Mędrek
6a71461e53 treewide: Fix spelling errors
The errors were spotted by our GitHub Actions.

Closes scylladb/scylladb#24822
2025-08-19 13:07:43 +03:00
libo2_yewu
fa84e20b7a scripts/coverage.py: correct the coverage report path
the `path/name` directory is not exist and needs to be created first.

Signed-off-by: libo-sober <libo_sober@163.com>

Closes scylladb/scylladb#25480
2025-08-19 13:01:49 +03:00
Avi Kivity
41475858aa storage_proxy: endpoint_filter(): fix rack count confusion
endpoint_filter() is used by batchlog to select nodes to replicate
to.

It contains an unordered_multimap data structure that maps rack names
to nodes.

It misuses std::unordered_map::bucket_count() to count the number of
racks. While values that share a key in a multimap will definitly
be in the same bucket, it's possible for values that don't share a
key to share a bucket. Therefore bucket_count() undercounts the
number of racks.

Fix this by using a more accurate data structure: a map of a set.

The patch changes validated.bucket_count() to validated.size()
and validated.size() to a new variable nr_validated.

The patch does cause an extra two allocations per rack (one for the
unordered_map node, one for the unordered_set bucket vector), but
this is only used for logged batches, so it is amortized over all
the mutations in the logged batch.

Closes scylladb/scylladb#25493
2025-08-19 11:58:39 +03:00
Dawid Mędrek
2227eb48bb test/cqlpy/test_cdc.py: Add validation test for re-attached log tables
When the user disables CDC on a table, the CDC log table is not removed.
Instead, it's detached from the base table, and it functions as a normal
table (with some differences). If that log table lives up to the point
when the user re-enabled CDC on the base table, instead of creating a new
log table, the old one is re-attached to the base.

For more context on that, see commit:
scylladb/scylladb@adda43edc7.

In this commit, we add validation tests that check whether the changes
on the base table after disabling CDC are reflected on the log table
after re-enabling CDC. The definition of the log table should be the same
as if CDC had never been disabled.

Closes scylladb/scylladb#25071
2025-08-19 10:15:41 +02:00
Botond Dénes
f8b79d563a Merge 's3: Minor refactoring and beautification of S3 client and tests' from Ernest Zaslavsky
This pull request introduces minor code refactoring and aesthetic improvements to the S3 client and its associated test suite. The changes focus on enhancing readability, consistency, and maintainability without altering any functional behavior.

No backport is required, as the modifications are purely cosmetic and do not impact functionality or compatibility.

Closes scylladb/scylladb#25490

* github.com:scylladb/scylladb:
  s3_client: relocate `req` creation closer to usage
  s3_client: reformat long logging lines for readability
  s3_test: extract file writing code to a function
2025-08-18 18:48:42 +03:00
Aleksandra Martyniuk
a10e241228 replica: lower severity of failure log
Flush failure with seastar::named_gate_closed_exception is expected
if a respective compaction group was already stopped.

Lower the severity of a log in dirty_memory_manager::flush_one
for this exception.

Fixes: https://github.com/scylladb/scylladb/issues/25037.

Closes scylladb/scylladb#25355
2025-08-18 13:30:42 +03:00
Avi Kivity
96956e48c4 Merge 'utils: stall_free: detect clear_gently method of const payload types' from Benny Halevy
Currently, when a container or smart pointer holds a const payload
type, utils::clear_gently does not detect the object's clear_gently
method as the method is non-const and requires a mutable object,
as in the following example in class tablet_metadata:
```
    using tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>;
    using table_to_tablet_map = std::unordered_map<table_id, tablet_map_ptr>;
```

That said, when a container is cleared gently the elements it holds
are destroyed anyhow, so we'd like to allow to clear them gently before
destruction.

This change still doesn't allow directly calling utils::clear_gently
an const objects.

And respective unit tests.

Fixes #24605
Fixed #25026

* This is an optimization that is not strictly required to backport (as https://github.com/scylladb/scylladb/pull/24618 dealt with clear_gently of `tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>` well enough)

Closes scylladb/scylladb#24606

* github.com:scylladb/scylladb:
  utils: stall_free: detect clear_gently method of const payload types
  utils: stall_free: clear gently a foreign shared ptr only when use_count==1
2025-08-18 12:52:02 +03:00