Commit Graph

49299 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
e41fc841cd test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency
In the Raft-based topology, a decommissioning node is removed from group
0 after the decommission request is considered finished (and the token
ring is updated). `wait_for_token_ring_and_group0_consistency` doesn't
handle such a case; it only handles cases where the token ring is
updated later. We fix this in this commit.

We rely on the new implementation of
`wait_for_token_ring_and_group0_consistency` in the following commit to
fix flakiness of some tests.

We also update the obsolete docstring in this commit.
2025-09-09 19:01:09 +02:00
Botond Dénes
a89d0a747b Merge 'test.py: add different levels of verbosity for output' from Andrei Chekun
Add another level of verbosity: quiet.
Before this it was used as a default one, but it provides not enough information.
These changes should be coupled with pytest-sugar plugin to have an intended information for each level.
Invoke the pytest as a module, instead of a separate process, to get access to the terminal to be able to it interactively.

Framework change only, so backporting in to 2025.3

Fixes: #25403

Closes scylladb/scylladb#25698

* github.com:scylladb/scylladb:
  test.py: add additional level of verbosity for output
  test.py: start pytest as a module instead of subprocess
2025-09-09 11:49:51 +03:00
Asias He
cb7db47ae1 repair: Add incremental_mode option for tablet repair
This patch introduces a new `incremental_mode` parameter to the tablet
repair REST API, providing more fine-grained control over the
incremental repair process.

Previously, incremental repair was on and could not be turned off. This
change allows users to select from three distinct modes:

- `regular`: This is the default mode. It performs a standard
  incremental repair, processing only unrepaired sstables and skipping
  those that are already repaired. The repair state (`repaired_at`,
  `sstables_repaired_at`) is updated.

- `full`: This mode forces the repair to process all sstables, including
  those that have been previously repaired. This is useful when a full
  data validation is needed without disabling the incremental repair
  feature. The repair state is updated.

- `disabled`: This mode completely disables the incremental repair logic
  for the current repair operation. It behaves like a classic
  (pre-incremental) repair, and it does not update any incremental
  repair state (`repaired_at` in sstables or `sstables_repaired_at` in
  the system.tablets table).

The implementation includes:

- Adding the `incremental_mode` parameter to the
  `/storage_service/repair/tablet` API endpoint.
- Updating the internal repair logic to handle the different modes.
- Adding a new test case to verify the behavior of each mode.
- Updating the API documentation and developer documentation.

Fixes #25605

Closes scylladb/scylladb#25693
2025-09-09 06:50:21 +03:00
Avi Kivity
c4ed7dd814 Merge 'gossiper: fix issues in processing gossip status during the startup and when messages are delayed to avoid empty host ids' from Emil Maskovsky
Populate the local state during gossiper initialization in start_gossiping, preventing an empty state from being added to _endpoint_state_map and returned in get_endpoint_states responses, that was causing an 'empty host id issue' on the other nodes during nodes restart.

Check for a race condition in do_apply_state_locally In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked.
During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash.

This change adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map.

Fixes https://github.com/scylladb/scylladb/issues/25831
Fixes https://github.com/scylladb/scylladb/issues/25803
Fixes https://github.com/scylladb/scylladb/issues/25702
Fixes https://github.com/scylladb/scylladb/issues/25621

Ref https://github.com/scylladb/scylla-enterprise/issues/5613

Backport: The issue affects all current releases(2025.x), therefore this PR needs to be backported to all 2025.1-2025.3.

Closes scylladb/scylladb#25849

* github.com:scylladb/scylladb:
  gossiper: fix empty initial local node state
  gossiper: add test for a race condition in start_gossiping
  gossiper: check for a race condition in `do_apply_state_locally`
  test/gossiper: add reproducible test for race condition during node decommission
2025-09-08 20:51:01 +03:00
Andrei Chekun
ea4cd431c9 test.py: add pytest-sugar plugin to the dependencies
This plugin allows having better terminal output with progress bar for
the tests.

Closes scylladb/scylladb#25845

[avi: regenerate frozen toolchain]

Closes scylladb/scylladb#25860
2025-09-08 20:50:02 +03:00
Radosław Cybulski
6d150e2d0c Fix oversized allocation in paxos under pressure
When cpu pressured, `_locks` structure in paxos might grow and cause
oversized allocations and performance drops. We reserve memory ahead of
time.

Fixes #25559

Closes scylladb/scylladb#25874
2025-09-08 20:49:00 +03:00
Yaron Kaikov
d57741edc2 build_docker.sh: enable debug symboles installation
Adding the latest scylla.repo location to our docker container, this
will allow installation scylla-debuginfo package in case it's needed

Fixes: https://github.com/scylladb/scylladb/issues/24271

Closes scylladb/scylladb#25646
2025-09-08 18:39:27 +03:00
Pavel Emelyanov
34d1648d21 main: Properly handle zero allocation warning threshold
The --help text says about --large-memory-allocation-warning-threshold:

"Warn about memory allocations above this size; set to zero to disable."

That's half-true: setting the value to zero spams logs with warnings of
allocation of any size, as seastar treats zero threshold literaly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25850
2025-09-08 12:41:19 +02:00
Asias He
451e1ec659 streaming: Fix use after move in the tablet_stream_files_handler
The files object is moved before the log when stream finishes. We've
logged the files when the stream starts. Skip it in the end of
streaming.

Fixes #25830

Closes scylladb/scylladb#25835
2025-09-08 11:59:52 +02:00
Sergey Zolotukhin
b34d543f30 gossiper: fix empty initial local node state
This change removes the addition of an empty state to `_endpoint_state_map`.
Instead, a new state is created locally and then published via replicate,
avoiding the issue of an empty state existing in `_endpoint_state_map`
before the preemption point. Since this resolves the issue tested in
`test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed.

Fixes: scylladb/scylladb#25831
2025-09-08 11:38:31 +02:00
Sergey Zolotukhin
775642ea23 gossiper: add test for a race condition in start_gossiping
This change adds a test for a race condition in `start_gossiping` that
can lead to an empty self state sent in `gossip_get_endpoint_states_response`.

Test for scylladb/scylladb#25831
2025-09-08 11:38:30 +02:00
Sergey Zolotukhin
f08df7c9d7 gossiper: check for a race condition in do_apply_state_locally
In do_apply_state_locally, a race condition can occur if a task is
suspended at a preemption point while the node entry is not locked.
During this time, the host may be removed from _endpoint_state_map.
When the task resumes, this can lead to inserting an entry with an
empty host ID into the map, causing various errors, including a node
crash.

This change
1. adds a check after locking the map entry: if a gossip ACK update
   does not contain a host ID, we verify that an entry with that host ID
   still exists in the gossiper’s _endpoint_state_map.
2. Removes xfail from the test_gossiper_race test since the issue is now
   fixed.
3. Adds exception handling in `do_shadow_round` to skip responses from
   nodes that sent an empty host ID.

This re-applies the commit 13392a40d4 that
was reverted in 46aa59fe49, after fixing
the issues that caused the CI to fail.

Fixes: scylladb/scylladb#25702
Fixes: scylladb/scylladb#25621

Ref: scylladb/scylla-enterprise#5613
2025-09-08 11:38:30 +02:00
Emil Maskovsky
28e0f42a83 test/gossiper: add reproducible test for race condition during node decommission
This change introduces a targeted test that simulates the gossiper race
condition observed during node decommissioning. The test delays gossip
state application and host ID lookup to reliably reproduce the scenario
where `gossiper::get_host_id()` is called on a removed endpoint,
potentially triggering an abort in `apply_new_states`.

There is a specific error injection added to widen the race window, in
order to increase the likelihood of hitting the race condition. The
error injection is designed to delay the application of gossip state
updates, for the specific node that is being decommissioned. This should
then result in the server abort in the gossiper.

This re-applies the commit 5dac4b38fb that
was reverted in dc44fca67c, but modified
to relax the check from "on_internal_error" to a just warning log. The
more strict can be re-introduced later once we are sure that all
remaining problems are resolved and it will not break the CI.

Refs: scylladb/scylladb#25621
Fixes: scylladb/scylladb#25721
2025-09-08 11:38:30 +02:00
Dawid Mędrek
bb0255b2fb tools/scylla-sstable: Enable rf_rack_valid_keyspaces
Enabling the configuration option should have no negative impact on how the tool
behaves. There is no topology and we do not create any keyspaces (except for
trivial ones using `SimpleStrategy` and RF=1), only their metadata. Thanks to
that, we don't go through validation logic that could fail in presence of an
RF-rack-invalid keyspace.

On the other hand, enabling `rf_rack_valid_keyspaces` lets the tool access code
hidden behind that option. While that might not be of any consequence right now,
in the future it might be crucial (for instance, see: scylladb/scylladb#23030).

Note that other tools don't need an adjustment:

* scylla-types: it uses schema_builder, but it doesn't reuse any other
  relevant part of Scylla.
* nodetool: it manages Scylla instances but is not an instance itself, and it
  does not reuse any codepaths.
* local-file-key-generator: it has nothing to do with Scylla's logic.

Other files in the `tools` directory are auxiliary and are instructed with an
already created instance of `db::config`. Hence, no need to modify them either.

Fixes scylladb/scylladb#25792

Closes scylladb/scylladb#25794
2025-09-08 11:52:43 +03:00
Yaron Kaikov
b07505a314 auto-backport.py: sync P0 and P1 labels when applied
When triggering the backport process, adding a check for P0 and P1 labels, if available add them to backport PR together with force_on_cloud label

Implementing first in pkg to test the process, then will move it to scylladb

Fixes: PKG-62

Closes scylladb/scylladb#25856
2025-09-08 11:42:36 +03:00
Yaron Kaikov
407b7b0e18 Fix label parsing logic in backport check script
Previously, the script attempted to assign GitHub Actions expressions directly within a Bash string using '${{ ... }}', which is invalid syntax in shell scripts. This caused the label JSON to be treated as a literal string instead of actual data, leading to parsing failures and incorrect backport readiness checks.

This update ensures the label data is passed correctly via the LABELS_JSON environment variable, allowing jq to properly evaluate label names and conditions.

Fixes: PKG-74

Closes scylladb/scylladb#25858
2025-09-08 11:42:16 +03:00
Pawel Pery
61ee630f42 vector_store_client: add timeouts to tests
Sometimes `vector_store_client_test_ann_request` test hangs up. It is hard to
reproduce.

It seems that the problem is that tests are unreliable in case of stalled
requests. This patch attaches a timer to the abort_source to ensure that
the test will finish with a timeout at least.

Fixes: VECTOR-150
Fixes: #25234

Closes scylladb/scylladb#25301
2025-09-08 10:20:48 +03:00
Wojciech Mitros
10b8e1c51c storage_proxy: send hints to pending replicas
Consider the following scenario:
- Current replica set is [A, B, C]
- write succeeds on [A, B], and a hint is logged for node C
- before the hint is replayed, D bootstraps and the token migrates from C to D
- hint is replayed to node C while D is pending, but it's too late, since streaming for that token is already done
- C is cleaned up, replayed data is lost, and D has a stale copy until next repair.
In the scenario we effectively fail to send the hint. This scenario is also more likely to happen with tablets,
as it can happen for every tablet migration.

This issue is particularly detrimental to materialized views. View updates use hints by default and a specific
view update may be sent to just one view replica (when a single base replica has a different row state due to
reordering or missed writes). When we lose a hint for such a view update, we can generate a persistent inconsistency
between the base and view - ghost rows can appear due to a lost tombstone and rows may be missing in the view due
to a lost row update. Such inconsistencies can't be fixed neither by repairing the view or the base table.

To handle this, in this patch we add the pending replicas to the list of targets of each hint, even if the original
target is still alive.

This will cause some updates to be redundant. These updates are probably unavoidable for now, but they shouldn't
be too common either. The scenarios for them are:
1. managing to send the hint to the source of a migrating replica before streaming that its token - the write will
arrive on the pending replica anyway in streaming
2. the hint target not being the source of the migration - if we managed to apply the original write of the hint to
the actual source of the migration, the pending replica will get it during streaming
3. sending the same hint to many targets at a similar time - while sending to each target, we'll see the same pending
replica for the hint so we'll send it multiple times
4. possible retries where even though the hint was successfully sent to the main target, we failed to send it to the
pending replica, so we need to retry the entire write

This patch handles both tablet migrations and tablet rebuilds. In the future, for tablet migrations, we can avoid
sending the hint to pending replias if the hint target is not the source fo the migration, which would allow us to
avoid the redundant writes 2 and 3. For rack-aware RF, this will be as simple as checking whether the replicas are
in the same rack.

We also add a test case reproducing the issue.

Co-Authored-By: Raphael S. Carvalho <raphaelsc@scylladb.com>

Fixes https://github.com/scylladb/scylladb/issues/19835

Closes scylladb/scylladb#25590
2025-09-08 09:18:20 +02:00
Pavel Emelyanov
9deea3655f s3: Fix chunked download source metrics calculations
In S3 client both read and write metrics have three counters -- number
of requests made, number of bytes processed and request latency. In most
of the cases all three counters are updated at once -- upon response
arrival.

However, in case of chunked download source this way of accounting
metrics is misleading. In this code the request is made once, and then
the obtained bytes are consumed eventually as the data arrive.

Currently, each time a new portion of data is read from the socket the
number of read requests is incremented. That's wrong, the request is
made once, and this counter should also be incremented once, not for
every data buffer that arrived in response.

Same for read request latency -- it's "added" for every data buffer that
arrives, but it's a lenghy process, the _request_ latency should be
accounted once per responce. Maybe later we'll want to have "data
latency" metrics as well, but for what we have now it's request latency.

The number of read bytes is accounted properly, so not touched here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25770
2025-09-08 09:49:03 +03:00
Avi Kivity
03ee862b50 cql3: statement_restrictions: forbid querying a single-column or token restriction on a multi-column restriction
In 41880bc893 ("cql3: statement_restrictions: forbid
querying a single-column inequality restriction on a
multi-column restriction"), we removed the ability to contrain
a single column on a tuple inequality, on the grounds that it
isn't used and can't be used.

Here, we extend this to remove the ability to constrain a
single column on a tuple equality, on the grounds that it isn't used
and hampers further refactoring.

CQL supports multi-column equality restrictions in the form

  (ck1, ck2, ck3) = (:v1, :v2, :v3)

These restriction shape is only allowed on clustering keys, and
is translated into a partition_slice allowing the primary index
to efficiently select the part of the partition that satisfies the
restriction.

The possible_lhs_values() values function allows extracting
single-column restrictions from this and similar tuple restrictions.
For example, the multi-column restriction

  (ck1, ck2, ck3) = (:v1, :v2, :v3)

implies that ck2 = :v2. If we have an index on ck2, and if we don't
further have a restriction on the partition key, then it is
advantageous to use the index to select rows, and then filter
on ck1 and ck3 to satisfy the full restriction.

However, we never actually do that. The following sequence

```cql
CREATE TABLE ks.t1 (
    pk int,
    ck1 int,
    ck2 int,
    PRIMARY KEY (pk, ck1, ck2)
);

CREATE INDEX ON ks.t1(ck1);

SELECT *
FROM ks.t1
WHERE (ck1, ck2) = (1, 2);
```

Could have been used to query a single partition via the index, but instead
is used for a full table scan, using the partition slice to skip through
unselected rows.

We can't easily start using a new query plan via the index, since
switching plans mid-query (due to paging and moving from one coordinator
to another during upgrade) would cause the sort order to change, therefore
causing some rows to be omitted and some rows to be returned twice.

Similarly, we cannot extract a token restriction from a tuple, since
the grammar doesn't allow for

```cql
WHERE (token(pk)) = (:var1)
```

Since it's not used, remove it.

This code was first introduced in d33053b841 ("cql3/restrictions: Add
free functions over new classes")

It does not directly correspond to pre-expression code.

Closes scylladb/scylladb#25757

Closes scylladb/scylladb#25821
2025-09-07 18:36:05 +03:00
Nadav Har'El
040d6e2245 Merge 'interval: specialize for trivially copyable types' from Avi Kivity
Interval's copy and move constructors are full of branches since the two payload T:s are
optional and therefore have to be optionally-constructed. This can be eliminated for
trivially copyable types (like dht::token) by eliminating interval's user-defined special member
functions (constructors etc) in that special case.

In turn, this enables optimizations in the standard library (and our own containers) that
convert moves/copies of spans of such types into memcpy().

Minor optimization, not a candidate to backport.

Closes scylladb/scylladb#25841

* github.com:scylladb/scylladb:
  test: nonwrapping_interval_test: verify an interval of tokens is trivial
  interval: specialize interval_data<T> for trivial types
  interval: split data members into new interval_data class
2025-09-07 17:10:32 +03:00
Avi Kivity
49b0751980 test: nonwrapping_interval_test: verify an interval of tokens is trivial
Since dht::token is trivial, an interval<dht::token> ought to be trivial
too. Verify that.
2025-09-06 18:41:00 +03:00
Avi Kivity
ed483647a4 interval: specialize interval_data<T> for trivial types
C++ data movement algorithms (std::uninitialized_copy()) and friends
and the containers that use them optimize for trivially copyable
and destructible types by calling memcpy instead of using a loop
around constructors/destructors. Make intervals of trivially
copyable and destructible types also trivially copyable and
destructible by specializing interval_data<T> not to have
user-defined special member functions. This requires that T have
a default constructor since we can't skip construction when
!_start_exists or !_end_exists.

To choose whether we specialize or not, we look at default
constructiblity (see above) and trivial destructibility. This is
wider than trivial copyablity (a user-defined copy constructor
can exist) but is still beneficial, since the generated copy
constructor for interval_data<T> will be branch-free.

We don't implement the poison words in debug mode; nor are they
necessary, since we no don't manage the lifetime of _start_value
and _end_value manually any more but let the compiler do that for us.

Note [1] prevents full conversion to memcpy for now, but we still
get branch free code.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121789
2025-09-06 18:38:24 +03:00
Avi Kivity
20751517a4 interval: split data members into new interval_data class
Prepare for specialized handling of trivial types by extracting
the data members of wrapping_internal<T> and the special member
functions (constructors/destructors/assignment) into a new
interval_data<T> template.

To avoid having to refer to data member with a this-> prefix,
add using declarations in wrapping_interval<T>.
2025-09-06 18:31:58 +03:00
Pavel Emelyanov
b26816f80d s3: Export memory usage gauge (metrics)
The memory usage is tracked with the help of a semaphore, so just export
its "consumed" units.

One tricky place here is the need to skip metrics registration for
scylla-sstable tool. The thing is that the tools starts the storage
manager and sstables manager on start and then some of tool's operations
may want to start both managers again (via cql environment) causing
double metrics registration exception.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#25769
2025-09-05 18:25:34 +03:00
Botond Dénes
a96d31e684 Merge 'Update workflow trigger to pull_request_target - fixing fork PR bug' from Dani Tweig
The previous version had a problem: Fork PRs didn't pass the Jira credentials to the main code, which updates the Jira key status.

No need for backport. This is not the Scylla code, but a fix to GitHub Actions.

Closes scylladb/scylladb#25833

* github.com:scylladb/scylladb:
  Change pull_request event to pull_request_target - ready for merge
  Update workflow to use pull_request_target event - in review
  Change pull_request event to pull_request_target - in progress
2025-09-05 18:23:19 +03:00
Anna Stuchlik
f66580a28f doc: add support for i7i instances
This commit adds currently supported i7i and i7ie instances
to the list of instance recommendations.

Fixes https://github.com/scylladb/scylladb/issues/25808

Closes scylladb/scylladb#25817
2025-09-05 14:14:58 +02:00
Andrei Chekun
da4990e338 test.py: add additional level of verbosity for output
Add another level of verbosity: quiet.
Before this it was used as a default one, but it provides not enough
information. These changes should be coupled with pytest-sugar plugin to have
an intended information for each level.
2025-09-05 11:54:49 +02:00
Andrei Chekun
7e34d5aa28 test.py: start pytest as a module instead of subprocess
Invoke the pytest as a module, instead of a separate process, to get access to
the terminal to be able to it interactively.
2025-09-05 11:54:49 +02:00
Pavel Emelyanov
dc44fca67c Revert "test/gossiper: add reproducible test for race condition during node decommission"
This reverts commit 5dac4b38fb as per
request from #25803
2025-09-05 09:56:46 +03:00
Pavel Emelyanov
46aa59fe49 Revert "gossiper: check for a race condition in do_apply_state_locally"
This reverts commit 13392a40d4 as per
request from #25803
2025-09-05 09:56:21 +03:00
Anna Mikhlin
21ee24f7cd trigger-scylla-ci: ignore comment from scylladbbot
ignore comments posted by scylladbbot, to allow adding instruction in
CI completion report of how to re-trigger CI

Closes scylladb/scylladb#25838
2025-09-05 06:18:51 +03:00
dependabot[bot]
862f965196 build(deps): bump sphinx-scylladb-theme from 1.8.7 to 1.8.8 in /docs
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.7 to 1.8.8.
- [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases)
- [Commits](https://github.com/scylladb/sphinx-scylladb-theme/compare/1.8.7...1.8.8)

---
updated-dependencies:
- dependency-name: sphinx-scylladb-theme
  dependency-version: 1.8.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Closes scylladb/scylladb#25823
2025-09-04 18:24:09 +03:00
Nadav Har'El
a1ed2c9d4b Merge 'Allow users to SELECT from CDC log tables they created.' from Dawid Pawlik
Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created.
It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not the base table, on which the user had SELECT permission automatically granted on CREATE.

This patch matches the behavior of querying the CDC log to the one implemented for Materialized Views:
1. No new permissions are granted on CREATE.
2. When querying SELECT, the permissions on base table SELECT are checked.

Fixes: https://github.com/scylladb/scylladb/issues/19798
Fixes: VECTOR-151

Closes scylladb/scylladb#25797

* github.com:scylladb/scylladb:
  cqlpy/test_permissions: run the reproducer tests for #19798
  select_statement: check for access to CDC base table
2025-09-04 16:56:52 +03:00
Dani Tweig
ddac32b656 Change pull_request event to pull_request_target - ready for merge
Fix the fork PRs bug
2025-09-04 12:47:25 +03:00
Dani Tweig
eb0bb0f3a0 Update workflow to use pull_request_target event - in review
Fix a fork PRs bug.
2025-09-04 12:42:52 +03:00
Dani Tweig
4c460464b8 Change pull_request event to pull_request_target - in progress
Fix fork PRs bug.
2025-09-04 12:41:29 +03:00
Botond Dénes
db72430d82 Merge 'Don't leave pre-scrub snapshot on API error' from Pavel Emelyanov
The pre-srcub snapshot is taken in the middle of parsing options from the request. In case post-snapshot part of the parsing throws (it can do so if "quarantine_mode" value is not recognized), the snapshot remains on disk, but the API call fails.

The fix is to move snapshot taking out of the parse_scrub_options() helper. It could be moved at the end of it, but the helper name doesn't tell that it also takes a snapshot, so no. After the fix the helper in question can be simplified further.

The issue exists in older versions, but likely doesn't reveal itself for real, so it doesn't look worthwhile to backport it.

Closes scylladb/scylladb#25824

* github.com:scylladb/scylladb:
  api: Simplify parse_scrub_options() helper
  api: Take snapshot after parsing scrub options
2025-09-04 12:13:16 +03:00
Avi Kivity
169092b340 Merge 'pgo: add auth connections stress workload' from Marcin Maliszkiewicz
This series improves the pgo workloads by enabling authentication and authorization and adding new
stress scenarios.

- Enables auth in training clusters
  All training workloads now run with auth enabled, following best
  practices and avoiding config proliferation.

- Adds auth connections stress workload
  Introduces a workload that uses derived roles and permissions, stressing auth
  code paths while also creating a new connection per request to exercise
  server transport handling.

- Enables counters workload
  The counters workload is re-enabled without introducing extra dependencies on
  cqlsh. Instead, a lightweight exec_cql.py wrapper (shared with the auth
  workload) handles preparation statements.

Backport: no, it's not a bug fix

----------------------------------------------------------
Performance results for auth PGO there seems to be no difference, or to small to measure:

scylladb pgo_auth ≡ ◦ ⤖ python3 ./pgo/auth_conns_stress.py localhost cassandra cassandra 10000 100 &
scylladb pgo_auth ≡ ◦ ⤖ perf stat -e instructions --timeout 5000 -p 51591

on both before and after instructions counter varies from 179,818,558,011 to 180,664,528,198.

----------------------------------------------------------
Performance results for counters PGO is notably improved with write workload 16-22% and read 4-5%:

scylladb pgo_auth ≡ ◦ ⤖ ./scylla_master perf-simple-query 2> /dev/null --counters --write
random-seed=3839439576
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=yes}
Disabling auto compaction
2413435.37 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   51167 insns/op,   33157 cycles/op,        0 errors)
2413009.40 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   51053 insns/op,   33009 cycles/op,        0 errors)
2403794.31 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   50867 insns/op,   32899 cycles/op,        0 errors)
2384572.52 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   50562 insns/op,   32811 cycles/op,        0 errors)
2195388.31 tps (122.6 allocs/op,   8.0 logallocs/op,  14.1 tasks/op,   51818 insns/op,   34504 cycles/op,        0 errors)
throughput:
	mean=   2362039.98 standard-deviation=93892.61
	median= 2403794.31 median-absolute-deviation=50969.42
	maximum=2413435.37 minimum=2195388.31
instructions_per_op:
	mean=   51093.44 standard-deviation=465.18
	median= 51052.98 median-absolute-deviation=226.61
	maximum=51818.04 minimum=50562.30
cpu_cycles_per_op:
	mean=   33275.85 standard-deviation=698.65
	median= 33008.58 median-absolute-deviation=377.16
	maximum=34504.13 minimum=32811.18

scylladb pgo_auth ≡ ◦ ⤖ ./scylla_master perf-simple-query 2> /dev/null --counters
random-seed=1134551638
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=yes}
Disabling auto compaction
Creating 10000 partitions...
5499534.56 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   21463 insns/op,   14902 cycles/op,        0 errors)
5478913.87 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   21385 insns/op,   14839 cycles/op,        0 errors)
5346525.04 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.1 tasks/op,   21454 insns/op,   15082 cycles/op,        0 errors)
5467947.74 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   21275 insns/op,   14775 cycles/op,        0 errors)
5454894.98 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   21250 insns/op,   14766 cycles/op,        0 errors)
throughput:
	mean=   5449563.24 standard-deviation=59878.80
	median= 5467947.74 median-absolute-deviation=29350.63
	maximum=5499534.56 minimum=5346525.04
instructions_per_op:
	mean=   21365.28 standard-deviation=98.95
	median= 21384.65 median-absolute-deviation=90.57
	maximum=21463.17 minimum=21250.33
cpu_cycles_per_op:
	mean=   14872.93 standard-deviation=129.26
	median= 14838.65 median-absolute-deviation=97.52
	maximum=15082.44 minimum=14766.13

scylladb pgo_auth ≡ ◦ ⤖ ./scylla_pgo_counters perf-simple-query 2> /dev/null --counters --write
random-seed=437758611
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=yes}
Disabling auto compaction
2950968.10 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   41540 insns/op,   27097 cycles/op,        0 errors)
2923325.10 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   41366 insns/op,   27017 cycles/op,        0 errors)
2928666.67 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   41274 insns/op,   26929 cycles/op,        0 errors)
2918378.39 tps (122.1 allocs/op,   8.0 logallocs/op,  14.0 tasks/op,   41165 insns/op,   26880 cycles/op,        0 errors)
2209053.17 tps (128.4 allocs/op,   8.0 logallocs/op,  14.6 tasks/op,   48176 insns/op,   34726 cycles/op,        0 errors)
throughput:
	mean=   2786078.28 standard-deviation=322807.25
	median= 2923325.10 median-absolute-deviation=142588.38
	maximum=2950968.10 minimum=2209053.17
instructions_per_op:
	mean=   42704.41 standard-deviation=3062.05
	median= 41366.40 median-absolute-deviation=1430.69
	maximum=48176.45 minimum=41165.23
cpu_cycles_per_op:
	mean=   28529.92 standard-deviation=3464.99
	median= 27016.51 median-absolute-deviation=1601.02
	maximum=34726.49 minimum=26880.18

scylladb pgo_auth ≡ ◦ ⤖ ./scylla_pgo_counters 2> /dev/null --counters
random-seed=4277130772
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=yes}
Disabling auto compaction
Creating 10000 partitions...
5691320.62 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   20656 insns/op,   14279 cycles/op,        0 errors)
5708878.25 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   20486 insns/op,   14104 cycles/op,        0 errors)
5727060.22 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   20439 insns/op,   14044 cycles/op,        0 errors)
5700157.92 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   20416 insns/op,   14054 cycles/op,        0 errors)
5610730.84 tps ( 73.0 allocs/op,   0.0 logallocs/op,  18.0 tasks/op,   20459 insns/op,   14195 cycles/op,        0 errors)
throughput:
	mean=   5687629.57 standard-deviation=44972.99
	median= 5700157.92 median-absolute-deviation=21248.68
	maximum=5727060.22 minimum=5610730.84
instructions_per_op:
	mean=   20491.27 standard-deviation=95.74
	median= 20459.35 median-absolute-deviation=52.13
	maximum=20656.27 minimum=20415.95
cpu_cycles_per_op:
	mean=   14135.25 standard-deviation=100.27
	median= 14104.47 median-absolute-deviation=81.48
	maximum=14278.97 minimum=14043.76

Closes scylladb/scylladb#25651

* github.com:scylladb/scylladb:
  pgo: add links to issues about tablet missing features
  pgo: enable counters workload
  pgo: add auth connections stress workload
  pgo: enable auth in training clusters
2025-09-04 11:46:39 +03:00
Pavel Emelyanov
b86b4fc251 api: Simplify parse_scrub_options() helper
It no longer needs to be a coroutine, nether it needs the snapshot_ctl
reference argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-09-03 19:06:31 +03:00
Pavel Emelyanov
ee4197fa80 api: Take snapshot after parsing scrub options
Parsiong scrub options may throw after a snapshot is taken thus leaving
it on disk even though an operation reported as "failed". Not, probably,
critical, but not nice either.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-09-03 19:05:50 +03:00
Marcin Maliszkiewicz
2109110037 pgo: add links to issues about tablet missing features 2025-09-03 15:43:52 +02:00
Marcin Maliszkiewicz
8aa2825caa pgo: enable counters workload
It was not enabled due to some cqlsh dependency missing.
After 3 years it's hard to say if the thing is fixed or not,
but anyway we don't need another big dependecy while we already
have python driver used exstensively in tests. We use simple
wrapper file exec_cql.py, shared with auth_conns workload to
conveniently read needed preparation statements from the file.

Additionally we switch tablets off as counters don't support
it yet.
2025-09-03 15:43:51 +02:00
Marcin Maliszkiewicz
09476a4df8 pgo: add auth connections stress workload
It uses some derived roles and permissions
to exercise auth code paths and also creates new
connection with each stress request to exercise
also transport/server.cc connection handling code.
2025-09-03 15:43:51 +02:00
Marcin Maliszkiewicz
f2270034ec pgo: enable auth in training clusters
As it's best practice to use auth and we don't
want to have 2^n configs to train we just enable
auth for every workload.
2025-09-03 15:29:27 +02:00
Dawid Mędrek
d2c5268196 cql3: Produce CREATE MATERIALIZED VIEW statement when describing MV of index
Before this change, executing `DESCRIBE MATERIALIZED VIEW` on the underlying
materialized view of a secondary index would produce a `CREATE INDEX` statement.
It was not only confusing, but it also prevented from learning about
the definition of the view. The only way to do so was to query system tables.

We change that behavior and produce a `CREATE MATERIALIZED VIEW` statement
instead. The statement is printed as a comment to implicitly convey that
the user should not attempt to execute it to restore the view. A short comment
is provided to make it clearer.

Before this commit:

```
cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int);
cqlsh> CREATE INDEX i ON ks.t(v);
cqlsh> DESCRIBE MATERIALIZED VIEW ks.i;

CREATE INDEX i ON ks.t(v);
```

After this commit:

```
cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int);
cqlsh> CREATE INDEX i ON ks.t(v);
cqlsh> DESCRIBE MATERIALIZED VIEW ks.i;

/* Do NOT execute this statement! It's only for informational purposes.
   This materialized view is the underlying materialized view of a secondary
   index. It can be restored via restoring the index.

CREATE MATERIALIZED VIEW ks.i_index [...];

*/
```

Note that describing the base table has not been affected and still works
as follows:

```
cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int);
cqlsh> CREATE INDEX i ON ks.t(v);
cqlsh> DESCRIBE TABLE ks.t;

CREATE TABLE ks.t (
    p int,
    v int,
    PRIMARY KEY (p)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'IncrementalCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND speculative_retry = '99.0PERCENTILE'
    AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'};

CREATE INDEX i ON ks.t(v);
```

We also provide two reproducers of scylladb/scylladb#24610.

Fixes scylladb/scylladb#24610

Closes scylladb/scylladb#25697
2025-09-03 15:21:37 +02:00
Piotr Dulikowski
f95808cbe7 Merge 'cdc/generation: Clone topology_description asynchronously' from Dawid Mędrek
An instance of `cdc::topology_description` can be quite big. The vector
it consists of stores as many `token_range_description`s as there are
vnodes, and the size of each `token_range_description` is O(#shards).

Because of that, copying an instance of the type can lead to reactor
stalls. To prevent that, we introduce an asynchronous function copying
the contents on the object.

Reactor stalls were detected in the call to `map_reduce` in
`generation_service::legacy_do_handle_cdc_generation`, so let's start
using the new function there.

A similar scenario occurs in `generation_service::handle_cdc_generation`,
so we modify it too.

Unfortunately, it doesn't seem viable to provide a reproducer of said
problem.

Fixes scylladb/scylladb#24522

Backport: none. Reactor stalls are not critical.

Closes scylladb/scylladb#25730

* github.com:scylladb/scylladb:
  cdc/generation: Delete copy constructors of topology_description
  cdc/generation: Clone topology_description asynchronously
2025-09-03 13:41:58 +02:00
Dawid Pawlik
5e72d71188 cqlpy/test_permissions: run the reproducer tests for #19798
Since the previous commit fixes the issue, we can remove the xfail mark.
The tests should pass now.
2025-09-03 13:20:39 +02:00
Dawid Pawlik
be54346846 select_statement: check for access to CDC base table
Before the patch, user with CREATE access could create a table
with CDC or alter the table enabling CDC, but could not query
a SELECT on the CDC table they created.
It was due to the fact, the SELECT permission was checked on
the CDC log, and later it's "parent" - the keyspace,
but not thebase table, on which the user had SELECT permission
automatically granted on CREATE.

This patch matches the behaviour of querying the CDC log
to the one implemented for Materialized Views:
    1. No new permissions are granted on CREATE.
    2. When querying SELECT, the permissions on base table
SELECT are checked.

Fixes: #19798
2025-09-03 13:20:39 +02:00
Botond Dénes
6116f9e11b Merge 'Compaction tasks progress' from Aleksandra Martyniuk
Determine the progress of compaction tasks that have
children.

The progress of a compaction task is calculated using the default
get_progress method. If the expected_total_workload method is
implemented, the default progress is computed as:
(sum of child task progresses) / (expected total workload)

If expected_total_workload is not defined, progress is estimated based
on children progresses. However, in this case, the total progress may
increase over time as the task executes.

All compaction tasks, except for reshape tasks, implement the
expected_children_number method. To compute expected_total_workload,
iterate over all SSTables covered by the task and sum their sizes. Note
that expected_total_workload is just an approximation and the real workload
may differ if SStables set for the keyspace/table/compaction group changes.

Reshape tasks are an exception, as their scope is determined during
execution. Hence, for these tasks expected_total_workload isn't defined
and their progress (both total and completed) is determined based
on currently created children.

Fixes: https://github.com/scylladb/scylladb/issues/8392.
Fixes: https://github.com/scylladb/scylladb/issues/6406.
Fixes: https://github.com/scylladb/scylladb/issues/7845.

New feature, no backport needed

Closes scylladb/scylladb#15158

* github.com:scylladb/scylladb:
  test: add compaction task progress test
  compaction: set progress unit for compaction tasks
  compaction: find expected workload for reshard tasks
  compaction: find expected workload for global cleanup compaction tasks
  compaction: find expected workload for global major compaction tasks
  compaction: find expected workload for keyspace compaction tasks
  compaction: find expected workload for shard compaction tasks
  compaction: find expected workload for table compaction tasks
  compaction: return empty progress when compaction_size isn't set
  compaction: update compaction_data::compaction_size at once
  tasks: do not check expected workload for done task
2025-09-03 13:23:42 +03:00