Compare commits

..

649 Commits

Author SHA1 Message Date
Yaron Kaikov
dcbc6c839d ./github/scripts/auto-backport.py: don't remove backport label when backport process has an error
Today, when the `Fixes` prefix is missing or the developer is not a collaborator with `scylladbbot` we remove the backport labels to prevent the process from starting and notifying the developers.

Developers are worried that removing these backport labels will cause us to forget we need to do these backports. @nyh suggested to add a `scylladbbot/backport_error` label instead

Applied those changes, so when a `Fixes` prefix is missing we will add a `scylladbbot/backport_error` label and stop the process

When a user doesn't accept the invite we will still open the PR but he will not be assigned and will not be able to edit the branch when we have conflicts

Fixes: https://github.com/scylladb/scylla-pkg/issues/4898
Fixes: https://github.com/scylladb/scylla-pkg/issues/4897
2025-03-18 13:58:59 +02:00
Avi Kivity
f18e8edcb7 Merge 'dist/docker: switch to UBI9' from Takuya ASADA
Switch container base image to UBI9, and make it ready for Red Hat
OpenShift Certification.

Fixes https://github.com/scylladb/scylla-pkg/issues/4858

Closes scylladb/scylladb#22910

* github.com:scylladb/scylladb:
  dist/docker: run the container as non-root user
  dist/docker: switch to UBI9
2025-03-10 15:33:30 +02:00
Luis Freitas
09e790d5af .github: Update github action for triggering next gating
Before we were using a marketplace Github action which had some limitations.
With this pull request we are updating the github action using curl option which will gives us full control of the flow instead of relying on pre made github action.

Fixes: scylladb#23088

Closes scylladb/scylladb#23215
2025-03-10 14:38:08 +02:00
Piotr Szymaniak
b6ba573dfe HACKING.md: Provide step-by-step support to enable development with CLion
Claim that building with CMake files is just 'not supported' instead of
not intended, especially that there are attempts to enable this.

Remove the obsolete mention of the `FOR_IDE` flag.

Closes scylladb/scylladb#22890
2025-03-09 16:22:24 +02:00
Ernest Zaslavsky
6a3cef5703 metadata: Correct "DESCRIBE" output for keyspace metadata
Update the "DESCRIBE" command output to accurately display `tablet` settings in keyspace metadata.

Closes scylladb/scylladb#23056
2025-03-09 14:50:08 +02:00
Anna Stuchlik
9ac0aa7bba doc: zero-token nodes and Arbiter DC
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.

The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.

We need separate documents because zero-token nodes may be used
for other purposes in the future.

In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.

Refs https://github.com/scylladb/scylladb/pull/19684

Fixes https://github.com/scylladb/scylladb/issues/20294

Closes scylladb/scylladb#21348
2025-03-07 16:39:02 +01:00
Kefu Chai
2a9966a20e gms: Fix fmt formatter for gossip_digest_sync
In commit 4812a57f, the fmt-based formatter for gossip_digest_syn had
formatting code for cluster_id, partitioner, and group0_id
accidentally commented out, preventing these fields from being included
in the output. This commit restores the formatting by uncommenting the
code, ensuring full visibility of all fields in the gossip_digest_syn
message when logging permits.

This fixes a regression introduced in 4812a57f, which obscured these
fields and reduced debugging insight. Backporting is recommended for
improved observability.

Fixes #23142
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23155
2025-03-07 15:36:03 +01:00
Robert Bindar
27f2d64725 Remove object storage config credentials provider
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#22951
2025-03-07 10:40:58 +03:00
Luis Freitas
84b30d11ec .github: trigger Jenkins job using github action
This action will help preventing next-trigger for running every 15 minutes.

This action will run on push for a specific branch (next, next-enterprise, 2024.x, x.x)

Fixes: scylladb#23088

update action

Closes scylladb/scylladb#23141
2025-03-07 06:41:58 +02:00
Avi Kivity
28906c9261 Merge 'scylla-sstable: introduce the query command' from Botond Dénes
The scylla-sstable dump-* command suite has proven invaluable  in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data`  is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.

Select everything from a table:

    $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
     keyspace_name                 | durable_writes | replication
    -------------------------------+----------------+-------------------------------------------------------------------------------------
            system_replicated_keys |           true |                         ({class : org.apache.cassandra.locator.EverywhereStrategy})
                       system_auth |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
                     system_schema |           true |                              ({class : org.apache.cassandra.locator.LocalStrategy})
                system_distributed |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
                            system |           true |                              ({class : org.apache.cassandra.locator.LocalStrategy})
                                ks |           true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
                     system_traces |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
     system_distributed_everywhere |           true |                         ({class : org.apache.cassandra.locator.EverywhereStrategy})

Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):

    $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
    [
      {
        "keyspace_name": "system_schema",
        "durable_writes": true,
        "replication": {
          "class": "org.apache.cassandra.locator.LocalStrategy"
        }
      },
      {
        "keyspace_name": "system",
        "durable_writes": true,
        "replication": {
          "class": "org.apache.cassandra.locator.LocalStrategy"
        }
      }
    ]

Select a specific field in a specific partition using the command-line:

    $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
     replication
    -------------------------------------------------------------------------------------
     ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

Select a specific field in a specific partition using ``--query-file``:

    $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
    $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
     replication
    -------------------------------------------------------------------------------------
     ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

New functionality: no backport needed.

Closes scylladb/scylladb#22007

* github.com:scylladb/scylladb:
  docs/operating-scylla: document scylla-sstable query
  test/cqlpy/test_tools.py: add tests for scylla-sstable query
  test/cqlpy/test_tools.py: make scylla_sstable() return table name also
  scylla-sstable: introduce the query command
  tools/utils: get_selected_operation(): use std::string for operation_options
  utils/rjson: streaming_writer: add RawValue()
  cql3/type_json: add to_json_type()
  test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
2025-03-06 13:42:45 +02:00
Botond Dénes
1139cf3a98 Merge 'Speed up (and generalize) the way API calculates sstable disk usage' from Pavel Emelyanov
There are several API endpoints that walk a specific list of sstables and sum up their bytes_on_disk() values. All those endpoints accumulate a map of sstable names to their sizes, then squashe the maps together and, finally, sum up the map values to report it back. Maintaining these intermediate collections is the waste of CPU and memory, the usage values can be summed up instantly.

Also add a test for per-cf endpoints to validate the change, and generalize the helper functions while at it.

Closes scylladb/scylladb#23143

* github.com:scylladb/scylladb:
  api: Generalize disk space counting for table and system
  api: Use map_reduce_cf_raw() overload with table name
  api: Don't collect sstables map to count disk space usage
  test: Add unit test for total/live sstable sizes
2025-03-06 11:26:35 +02:00
Raphael S. Carvalho
fedd838b9d replica: Fix race of some operations like cleanup with snapshot
There are two semaphores in table for synchronizing changes to sstable list:

sstable_set_mutation_sem: used to serialize two concurrent operations updating
the list, to prevent them from racing with each other.

sstable_deletion_sem: A deletion guard, used to serialize deletion and
iteration over the list, to prevent iteration from finding deleted files on
disk.

they're always taken in this order to avoid deadlocks:
sstable_set_mutation_sem -> sstable_deletion_sem.

problem:

A = tablet cleanup
B = take_snapshot()

1) A acquires sstable_set_mutation_sem for updating list
2) A acquires sstable_deletion_sem, then delete sstable before updating list
3) A releases sstable_deletion_sem, then yield
4) B acquires sstable_deletion_sem
5) B iterates through list and bumps sstable deleted in step 2
6) B fails since it cannot find the file on disk

Initial reaction is to say that no procedure must delete sstable before
updating the list, that's true.

But we want a iteration, running concurrently to cleanup, to not find sstables
being removed from the system. Otherwise, e.g. snapshot works with sstables
of a tablet that was just cleaned up. That's achieved by serializing iteration
with list update.
Since sstable_deletion_sem is used within the scope of deletion only, it's
useless for achieving this. Cleanup could acquire the deletion sem when
preparing list updates, and then pass the "permit" to deletion function, but
then sstable_deletion_sem would essentially become sstable_set_mutation_sem,
which was created exactly to protect the list update.

That being said, it makes sense to merge both semaphores. Also things become
easier to reason about, and we don't have to worry about deadlocks anymore.

The deletion goes through sstable_list_builder, which holds a permit throughout
its lifetime, which guarantees that list updates and deletion are atomic to
other concurrent operations. The interface becomes less error prone with that.
It allowed us to find discard_sstables() was doing deletion without any permit,
meaning another race could happen between truncate and snapshot.

So we're fixing race of (truncate|cleanup) with take_snapshot, as far as we
know. It's possible another unknown races are fixed as well.

Fixes #23049.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23117
2025-03-06 11:00:48 +02:00
Pavel Emelyanov
86b3e9b50b code: Move checked-file-impl.hh to util/
fixes: #22100

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23123
2025-03-06 10:22:05 +02:00
Petr Hála
f3c3eb6ae3 doc: Fix object_storage_config_file option
It needs to use underscores, not dash

Closes scylladb/scylladb#23161
2025-03-06 10:30:51 +03:00
Pavel Emelyanov
e7d1ea3ab6 commitlog: Use shorter input stream creation overload
There's one that doesn't need the offset argument when it's 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23140
2025-03-06 08:06:42 +01:00
Botond Dénes
49d6bf8947 Merge 'main: safely check stop_signal in-between starting services' from Benny Halevy
To simplify aborting scylla while starting the services,
add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

This patch prevents gate_closed_exception to escape handling
when start-up is aborted early with the stop signal,
causing https://github.com/scylladb/scylladb/issues/23153
The regression is apparently due to a25c3eaa1c

Fixes https://github.com/scylladb/scylladb/issues/23153

* Requires backport to 2025.1 due to a25c3eaa1c

Closes scylladb/scylladb#23103

* github.com:scylladb/scylladb:
  main: add checkpoints
  main: safely check stop_signal in-between starting services
  main: move prometheus start message
  main: move per-shard database start message
2025-03-06 08:28:29 +02:00
Takuya ASADA
781dec5852 dist/docker: run the container as non-root user
Since it is requirement for Red Hat OpenShift Certification, we need to
run the container as non-root user.

Related scylladb/scylla-pkg#4858

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2025-03-05 23:39:56 +09:00
Takuya ASADA
1abf981a73 dist/docker: switch to UBI9
Switch container base image to UBI9, to prepare for Red Hat OpenShift
Certification.

Fixes scylladb/scylla-pkg#4858

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2025-03-05 23:39:56 +09:00
Benny Halevy
b6705ad48b main: add checkpoints
Before starting significant services that didn't
have a corresponding call to supervisor::notify
before them.

Fixes #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:29:34 +02:00
Benny Halevy
feef7d3fa1 main: safely check stop_signal in-between starting services
To simplify aborting scylla while starting the services,
Add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

Refs #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:15:17 +02:00
Benny Halevy
282ff344db main: move prometheus start message
The `prometheus_server` is started only conditionally
but the notification message is sent and logged
unconditionally.
Move it inside the condtional code block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:09:09 +02:00
Benny Halevy
23433f593c main: move per-shard database start message
It is now logged out of place, so move it to right before
calling `start` on every database shard.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:09:09 +02:00
Nadav Har'El
e0f24c03e7 Merge 'test.py: merge all 'Topology' suite types int one folder 'cluster'' from Artsiom Mishuta
Now that we support suite subfolders, there is no
need to create an own suite for object_store and auth_cluster, topology, topology_custom.
this PR merge all these folders into one: 'cluster"

this pr also introduce and apply 'prepare_3_nodes_cluster' fixture  that  allow preparing non-dirty 3 nodes cluster
that can be reused between tests(for tests that was in topology folder)

number of tests in master
release -3461
dev       -3472
debug   -3446

number of tests in this PR
release -3460
dev       -3471
debug   -3445

There is a minus one test in each mode because It was 2 test_topology_failure_recovery files(topology and topology_custom) with the same utility functions but different test cases. This PR merged them into one

Closes scylladb/scylladb#22917

* github.com:scylladb/scylladb:
  test.py: merge object_store into cluster folder
  test.py: merge auth_cluster into cluster folter
  test.py: rename topology_custom folder to cluster
  test.py: merge topology test suite into topology_custom
  test.py delete conftest in topology_custom
  test.py apply prepare_3_nodes_cluster in topology
  test.py: introduce prepare_3_nodes_cluster marker
2025-03-04 19:26:32 +02:00
Pavel Emelyanov
c084de1406 api: Generalize disk space counting for table and system
Now when the bodies of both map-reduce reducers are the same, they can
be generalized with each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:56:16 +03:00
Pavel Emelyanov
4e2abba5a1 api: Use map_reduce_cf_raw() overload with table name
The existing helper that counds disk space usage for a table map-reduces
the table object "by hand". Its peer that counts the usage for all
tables uses the map_reduce_cf_raw() helper. The latter exists for
specific table as well, so the first counter can benefit from using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:55:05 +03:00
Pavel Emelyanov
b43e2390db api: Don't collect sstables map to count disk space usage
All the API calls that collect disk usage of sstables accumulate
map<sstable name, disk size>, then merges shard maps into one, then
counts the "disk size" values and drops the map itself on the floor.
This is waste of CPU cycles, disk usage can be just summed up along
cf/sstables iterations, no need to accumulate map with names for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:53:42 +03:00
Pavel Emelyanov
a8fc1d64bc test: Add unit test for total/live sstable sizes
The pair of column_family/metrics/(total|live)_disk_space_used/{name}
reports the disk usage by sstables. The test creates table, populates,
flushes and checks that the size corresonds to what stat(2) reports for
the respective files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:52:33 +03:00
Patryk Jędrzejczak
c13b6c91d3 Merge 'raft topology: drop changing the raft voters config via storage_service' from Emil Maskovsky
For the limited voters feature to work properly we need to make sure that we are only managing the voter status through the topology coordinator. This means that we should not change the node votership from the storage_service module for the raft topology directly.

We can drop the voter status changes from the storage_service module because the topology coordinator will handle the votership changes eventually. The calls in the storage_service module were not essential and were only used for optimization (improving the HA under certain conditions).
Furthermore, the other bundled commit improves the reaction again by reacting to the node `on_up()` and `on_down()` events, which again shortens the reaction time and improves the HA.

The change has effect on the timing in the tablets migration test though, as it previously relied on the node being made non-voter from the service_storage `raft_removenode()` function. The fix is to add another server to the topology to make sure we will keep the quorum.

Previously the test worked because the test waits for an injection to be reached and it was ensured that the injection (log line) has only been triggered after the node has been made non-voter from the `raft_removenode()`. This is not the case anymore. An alternative fix would be to wait for the first node to be made non-voter before stopping the second server, but this would make the test more complex (and it is not strictly required to only use 4 servers in the test, it has been only done for optimization purposes).

Fixes: scylladb/scylladb#22860

Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969

No backport: Part of the limited voters new feature, so this shouldn't to be backported.

Closes scylladb/scylladb#22847

* https://github.com/scylladb/scylladb:
  raft: use direct return of future for `run_op_with_retry`
  raft: adjust the voters interface to allow atomic changes
  raft topology: drop removing the node from raft config via storage_service
  raft topology: drop changing the raft voters config via storage_service
2025-03-04 13:59:47 +01:00
Nadav Har'El
d096aac200 test/cqlpy/run: reduce number of tablets
In commit 2463e524ed, Scylla's default changed
from starting with one tablet per shard to starting 10 per shard. The
functional tests don't need more tablets and it can only slow down the
tests, so the patch added --tablets-initial-scale-factor=1 to test/*/suite.yaml
but forgot to add it to test/cqlpy/run.py (to affect test/cqlpy/run) so
this patch does this now.

This patch should *only* be about making tests faster, although to be
honest, I don't see any measurable improvement in test speed (10 isn't
so many). But, unfortunately, this is only part of the story. Over time
we allowed a few cqlpy tests to be written in a way that relies on having
only a small number of tablets or even exactly one tablet per shard (!).
These tests are buggy and should be fixed - see issues #23115 and #23116
as examples. But adding the option --tablets-initial-scale-factor=1 also
to run.py will make these bugs not affect test/cqlpy/run in the same way
as it doesn't affect test.py.

These buggy tests will still break with `pytest cqlpy` against a Scylla
you ran yourself manually, so eventually will still need to fix those
test bugs.

Refs #23115
Refs #23116

Closes scylladb/scylladb#23125
2025-03-04 15:39:21 +03:00
Asias He
60913312af repair: Enable small table optimization for system_replicated_keys
This enterprise-only system table is replicated and small. It should be
included for small table optimization.

Fixes scylladb/scylla-enterprise#5256

Closes scylladb/scylladb#23135
2025-03-04 12:40:56 +02:00
Artsiom Mishuta
97a620cda9 test.py: merge object_store into cluster folder
Now that we support suite subfolders, there is no
need to create an own suite for object_store
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
a283b391c2 test.py: merge auth_cluster into cluster folter
Now that we support suite subfolders, there is no
need to create an own suite for auth_cluster
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
d1198f8318 test.py: rename topology_custom folder to cluster
rename topology_custom folder to cluster
as it contains not only topology test cases
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
d8e17c4356 test.py: merge topology test suite into topology_custom
Now that we support suite subfolders, there is no
need to create an own suite for topology
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
ef62dfa6a9 test.py delete conftest in topology_custom
delete conftest in the sepatate commi for brtter diff listing during
merge topology_custom and topology
2025-03-04 10:32:43 +01:00
Artsiom Mishuta
cf48444e3b test.py apply prepare_3_nodes_cluster in topology
apply prepare_3_nodes_cluster for all tests in the topology folder
via applying mark at the test module level using pytestmark
https://docs.pytest.org/en/stable/example/markers.html#marking-whole-classes-or-modules

set initial initial_size for topology folder to 0
2025-03-04 10:32:43 +01:00
Artsiom Mishuta
20777d7fc6 test.py: introduce prepare_3_nodes_cluster marker
prepare_3_nodes_cluster marker will allow preparing non-dirty 3 nodes cluster
that can be reused between tests
2025-03-04 10:32:43 +01:00
Nadav Har'El
a56751e71b test/cqlpy: fix test assuming just one tablet
The cqlpy test test_compaction.py::test_compactionstats_after_major_compaction
was written to assume we have just one tablet per shard - if there are many
tablets compaction splitting the data, the test scenario might not need
compaction in the way that the test assumes it does.

Recently (commit 2463e524ed) Scylla's default
was changed to have 10 tablets per shard - not one. This broke this test.
The same commit modified test/cqlpy/suite.yaml, but that affects only test.py
and not test/cqlpy/run, and also not manual runs against a manually-installed
Scylla. If this test absolutely requires a keyspace with 1 and not 10
tablets, then it should create one explicitly. So this is what this test
does (but only if tablets are in use; if vnodes are used that's fine
too).

Before this patch,
  test/cqlpy/run test_compaction.py::test_compactionstats_after_major_compaction
fails. After the patch, it passes.

Fixes #23116

Closes scylladb/scylladb#23121
2025-03-04 10:15:29 +02:00
Kefu Chai
a43072a21e cql3,test: replace boost::range::adjacent_find with std::ranges
to reduce third-party dependencies and modernize the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22998
2025-03-04 10:08:02 +02:00
Artsiom Mishuta
d7f9c5654b test.py: change test uname
This commit change the test uname replacement fron "_" to "." to be able support sub-folders in
scylla-pkg scripts logic

Closes scylladb/scylladb#23130
2025-03-04 09:58:58 +02:00
Wojciech Mitros
dae7221342 rust: update dependencies
The currently used versions of "wasmtime", "idna", "cap-std" and
"cap-primitives" packages had low to moderate security issues.
In this patch we update the dependencies to versions with these
issues fixed.
The update was performed by changing the "wasmtime" (and "wasmtime-wasi")
version in rust/wasmtime_bindings/Cargo.toml and updating rust/Cargo.lock
using the "cargo update" command with the affected package. To fix an
issue with different dependencies having different versions of
sub-dependencies, the package "smallvec" was also updated to "1.13.1".
After the dependency update, the Rust code also needed to be updated
because of the slightly changed API. One Wasm test case needed to be
updated, as it was actually using an incorrect Wat module and not
failing before. The crate also no longer allows multiple tables in
Wasm modules by default - it is now enabled by setting the "gc" crate
feature and configuring the Engine with config.wasm_reference_types(true).

Fixes https://github.com/scylladb/scylladb/issues/23127

Closes scylladb/scylladb#23128
2025-03-04 09:45:23 +02:00
Pavel Emelyanov
e4e15a00b7 Merge 'reader_concurrency_semaphore: register_inactive_read(): handle aborted permit' from Botond Dénes
It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up.

Fixes: scylladb/scylladb#22919

Bug is present in all live versions, backports are required.

Closes scylladb/scylladb#23044

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
  test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
2025-03-04 10:40:28 +03:00
Botond Dénes
71d8b7aa9f querier: demote tombstone warning for range-scans to debug level
Range scans are expected to go though lots of tombstones, no need to
spam the logs about this. The tombstone warning log is demoted to debug
level, if somebody wants to see it they can bump the logger to debug
level.

Fixes: https://github.com/scylladb/scylladb/issues/23093

Closes scylladb/scylladb#23094
2025-03-04 10:38:06 +03:00
Kefu Chai
a483ff8647 mutation: replace boost::upper_bound with std::ranges::upper_bound
Reduces dependencies on boost/range.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23119
2025-03-04 10:36:57 +03:00
Kefu Chai
a20cd6539c cql3, dht: Remove redundant std::move() calls
These redundant `std::move()` calls were identified by GCC-14.
In general, copy elision applies to these places, so adding
`std::move()` is not only unnecessary but can actually prevent
the compiler from performing copy elision, as it causes the
return statement to fail to satisfy the requirements for
copy elision optimization.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23063
2025-03-04 10:36:49 +03:00
Botond Dénes
6f7a069bce Merge 'Label basic metrics' from Amnon Heiman
This series is part of the effort to reduce the overall overhead originating from metrics reporting, both on the Scylla side and the metrics collecting server (Prometheus or similar)

The idea in this series is to create an equivalent of levels with a label.
First, label a subset of the metrics used by the dashboards.
Second, the per-table metrics that are now off by default will be marked with a different label.
The following specific optional features: CDC, CAS, and Alternator have a dedicated label now.
This will allow users to disable all metrics of features that are not in use.

All the rest of the metrics are left unlabeled.

Without any changes, users would get the same metrics they are getting today.
But you could pass the `__level=1` and get only those metrics the dashboard needs. That reduces between 50% and 70% (many metrics are hidden if not used, so the overall number of metrics varies).

The labels are not reported based on the seastar feature of hiding labels that start with an underscore.

Closes scylladb/scylladb#12246

* github.com:scylladb/scylladb:
  db/view/view.cc: label metrics with basic_level
  transport/server.cc: label metrics with basic_level
  service/storage_proxy.cc: label metrics with basic_level and cas
  main.cc: label metrics with basic_level
  streaming/stream_manager.cc: label metrics with basic_level
  repair/repair.cc: label metrics with basic_level
  service/storage_service.cc: label metrics with basic_level
  gms/gossiper.cc: label metrics with basic_level
  replica/database.cc: label metrics with basic_level
  cdc/log.cc: label metrics with basic_level and cdc
  alternator: label metrics with basic_level and alternator
  row_cache.cc: label metrics with basic_level
  query_processor.cc: label metrics with basic_level
  sstables.cc: label metrics with basic_level
  utils/logalloc.cc label metrics with basic_level
  commitlog.cc: label metrics with basic_level
  compaction_manager.cc: label metrics with basic_level
  Adding the __level and features labels
2025-03-04 09:32:11 +02:00
Calle Wilund
2f10205714 config: Enable optional TLS1.3 session ticket usage in cert setup
Refs #22916

Adds an "enable_session_tickets" option to TLS setup for our server
endpoints (not documented for internode RPC, as we don't handle it
on the client side there), allowing enabling of TLS3 client session
ticket, i.e. quicker reconnect.

Session tickets are valid within a time frame or until a node
restarts, whichever comes first.

v2:
Use "TLS1.3" in help message

Closes scylladb/scylladb#22928
2025-03-04 09:30:53 +02:00
Amnon Heiman
19a414598b db/view/view.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_view_builder_builds_in_progress

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
9518a85ad0 transport/server.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_transport_cql_errors_total
scylla_transport_current_connections
scylla_transport_requests_served
scylla_transport_requests_shed

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
cbae9a4abe service/storage_proxy.cc: label metrics with basic_level and cas
The following metrics will be marked with basic_level label:
scylla_storage_proxy_coordinator_background_reads
scylla_storage_proxy_coordinator_background_writes
scylla_storage_proxy_coordinator_cas_background
scylla_storage_proxy_coordinator_cas_dropped_prune
scylla_storage_proxy_coordinator_cas_failed_read_round_optimization
scylla_storage_proxy_coordinator_cas_foreground
scylla_storage_proxy_coordinator_cas_prune
scylla_storage_proxy_coordinator_cas_read_contention_bucket
scylla_storage_proxy_coordinator_cas_read_contention_count
scylla_storage_proxy_coordinator_cas_read_latency_count
scylla_storage_proxy_coordinator_cas_read_latency_sum
scylla_storage_proxy_coordinator_cas_read_timeouts
scylla_storage_proxy_coordinator_cas_read_unavailable
scylla_storage_proxy_coordinator_cas_read_unfinished_commit
scylla_storage_proxy_coordinator_cas_total_operations
scylla_storage_proxy_coordinator_cas_write_condition_not_met
scylla_storage_proxy_coordinator_cas_write_contention_count
scylla_storage_proxy_coordinator_cas_write_latency_count
scylla_storage_proxy_coordinator_cas_write_latency_sum
scylla_storage_proxy_coordinator_cas_write_timeout_due_to_uncertainty
scylla_storage_proxy_coordinator_cas_write_timeouts
scylla_storage_proxy_coordinator_cas_write_unavailable
scylla_storage_proxy_coordinator_cas_write_unfinished_commit
scylla_storage_proxy_coordinator_current_throttled_base_writes
scylla_storage_proxy_coordinator_foreground_reads
scylla_storage_proxy_coordinator_foreground_writes
scylla_storage_proxy_coordinator_range_timeouts
scylla_storage_proxy_coordinator_range_unavailable
scylla_storage_proxy_coordinator_read_errors_local_node
scylla_storage_proxy_coordinator_read_latency_count
scylla_storage_proxy_coordinator_read_latency_sum
scylla_storage_proxy_coordinator_reads_local_node
scylla_storage_proxy_coordinator_reads_remote_node
scylla_storage_proxy_coordinator_read_timeouts
scylla_storage_proxy_coordinator_read_unavailable
scylla_storage_proxy_coordinator_speculative_data_reads
scylla_storage_proxy_coordinator_speculative_digest_reads
scylla_storage_proxy_coordinator_total_write_attempts_local_node
scylla_storage_proxy_coordinator_write_errors_local_node
scylla_storage_proxy_coordinator_write_latency_bucket
scylla_storage_proxy_coordinator_write_latency_count
scylla_storage_proxy_coordinator_write_latency_sum
scylla_storage_proxy_coordinator_write_timeouts
scylla_storage_proxy_coordinator_write_unavailable
scylla_storage_proxy_replica_received_counter_updates

All cas related metrics are labeled with __cas label.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
fd5d1f1f6a main.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_scylladb_current_version
scylla_reactor_utilization

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
5747af8555 streaming/stream_manager.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_node_ops_finished_percentage

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
48397f8dff repair/repair.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_node_ops_finished_percentage

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
83bfcb53be service/storage_service.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_node_operation_mode

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
1b64fa2283 gms/gossiper.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_gossip_heart_beat
scylla_gossip_live
scylla_gossip_unreachable

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
cfc5c60ba5 replica/database.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_database_active_reads
scylla_database_dropped_view_updates
scylla_database_queued_reads
scylla_database_requests_blocked_memory
scylla_database_requests_blocked_memory_current
scylla_database_schema_changed
scylla_database_total_reads
scylla_database_total_reads_failed
scylla_database_total_view_updates_pushed_local
scylla_database_total_view_updates_pushed_remote
scylla_database_total_writes
scylla_database_total_writes_failed
scylla_database_total_writes_timedout
scylla_database_total_writes_rate_limited
scylla_database_view_update_backlog

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
cf50c71ef5 cdc/log.cc: label metrics with basic_level and cdc
The following metrics will be marked with basic_level label:
scylla_cdc_operations_failed
scylla_cdc_operations_total

All metrics are labeld with the __cdc label.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Amnon Heiman
a474e95ef0 alternator: label metrics with basic_level and alternator
The following metrics will be marked with basic_level label:
scylla_alternator_operation
scylla_alternator_op_latency_bucket
scylla_alternator_op_latency_count
scylla_alternator_op_latency_sum
scylla_alternator_total_operations
scylla_alternator_batch_item_count
scylla_alternator_op_latency
scylla_alternator_op_latency_summary
scylla_expiration_items_deleted

All alternator metrics are marked with __alternator label.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Amnon Heiman
f40dc4e5c4 row_cache.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_cache_bytes_total
scylla_cache_bytes_used
scylla_cache_partition_evictions
scylla_cache_partition_hits
scylla_cache_partition_insertions
scylla_cache_partition_merges
scylla_cache_partition_misses
scylla_cache_partition_removals
scylla_cache_range_tombstone_reads
scylla_cache_reads
scylla_cache_reads_with_misses
scylla_cache_row_evictions
scylla_cache_row_hits
scylla_cache_row_insertions
scylla_cache_row_misses
scylla_cache_row_removals
scylla_cache_rows
scylla_cache_rows_merged_from_memtable
scylla_cache_row_tombstone_reads

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Amnon Heiman
0dde54d053 query_processor.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_cql_authorized_prepared_statements_cache_evictions
scylla_cql_batches
scylla_cql_deletes
scylla_cql_deletes_per_ks
scylla_cql_filtered_read_requests
scylla_cql_filtered_rows_dropped_total
scylla_cql_filtered_rows_matched_total
scylla_cql_filtered_rows_read_total
scylla_cql_inserts
scylla_cql_inserts_per_ks
scylla_cql_prepared_cache_evictions
scylla_cql_reads
scylla_cql_reads_per_ks
scylla_cql_reverse_queries
scylla_cql_rows_read
scylla_cql_secondary_index_reads
scylla_cql_select_bypass_caches
scylla_cql_select_partition_range_scan_no_bypass_cache
scylla_cql_statements_in_batches
scylla_cql_unpaged_select_queries
scylla_cql_unpaged_select_queries_per_ks
scylla_cql_updates
scylla_cql_updates_per_ks
2025-03-03 16:58:38 +02:00
Amnon Heiman
94ba8af788 sstables.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_sstables_cell_tombstone_writes
scylla_sstables_range_tombstone_reads
scylla_sstables_range_tombstone_writes
scylla_sstables_row_tombstone_reads
scylla_sstables_tombstone_writes
2025-03-03 16:58:38 +02:00
Amnon Heiman
bf39a760aa utils/logalloc.cc label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_lsa_total_space_bytes
scylla_lsa_non_lsa_used_space_bytes

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Amnon Heiman
6826b98c88 commitlog.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_commitlog_segments
scylla_commitlog_allocating_segments
scylla_commitlog_unused_segments
scylla_commitlog_alloc
scylla_commitlog_flush
scylla_commitlog_bytes_written
scylla_commitlog_pending_allocations
scylla_commitlog_requests_blocked_memory
scylla_commitlog_flush_limit_exceeded
scylla_commitlog_disk_total_bytes
scylla_commitlog_disk_active_bytes
scylla_commitlog_disk_slack_end_bytes
2025-03-03 16:58:38 +02:00
Amnon Heiman
67ca02b361 compaction_manager.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_compaction_manager_compactions
2025-03-03 16:58:38 +02:00
Amnon Heiman
30b34d29b2 Adding the __level and features labels
Scylla generates many metrics, and when multiplied by the number of
shards, the total number of metrics adds a significant load to a
monitoring server.

With multi-tier monitoring, it is helpful to have a smaller subset of
metrics users care about and allow them to get only those.

This patch adds two kind of labels, the a __level label, currently with
a single value, but we can add more in the future.
The second kind, is a cross feature label, curently for alternator, cdc
and cas.

We will use the __level label to mark the interesting user-facing metrics.

The current level value is:
basic - metrics for Scylla monitoring

In this phase, basic will mark all metrics used in the dashboards.
In practice, without any configuration change, Prometheus would get the
same metrics as it gets today.

While it is possible to filter by the label, e.g.:
curl http://localhost:9180/metrics?__level=basic

The labels themselves are not reported thanks to label filtering of
labels begin with __.

The feature labels:
__cdc, __cas and __alternator can be an easy way to disable a set of
metrics when not using a feature.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:38 +02:00
Emil Maskovsky
8c67307971 raft: use direct return of future for run_op_with_retry
Clean up the code by using direct return of future for `run_op_with_retry`.

This can be done as the `run_op_with_retry` function is already returning
a future that we can reuse directly. What needs to be taken care of is
to not use temporaries referenced from inside the lambda passed to the
`run_op_with_retry`.
2025-03-03 15:19:58 +01:00
Emil Maskovsky
28d1aeb1fa raft: adjust the voters interface to allow atomic changes
Allow setting the voters and non-voters in a single operation. This
ensures that the configuration changes are done atomically.

In particular, we don't want to set voters and non-voters separately
because it could lead to inconsistencies or even the loss of quorum.

This change also partially reverts the commit 115005d, as we will only
need the convenience wrappers for removing the voters (not for adding
them).

Refs: scylladb/scylladb#18793
2025-03-03 15:19:58 +01:00
Emil Maskovsky
074f4fcdf1 raft topology: drop removing the node from raft config via storage_service
For the limited voters feature to work properly we need to make sure
that we are only managing the voter status through the topology
coordinator. This means that we should not change the node votership
from the storage_service module for the raft topology directly.

This needs to be done in addition to dropping of the votership change
from the storage_service module.

The `remove_from_raft_config` is redundant and can be removed because
a successfully completed `removenode` operation implies that the node
has been removed from group 0 by the topology coordinator.

Refs: scylladb/scylladb#22860
Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969
2025-03-03 15:15:43 +01:00
Emil Maskovsky
834f506790 raft topology: drop changing the raft voters config via storage_service
For the limited voters feature to work properly we need to make sure
that we are only managing the voter status through the topology
coordinator. This means that we should not change the node votership
from the storage_service module for the raft topology directly.

We can drop the voter status changes from the storage_service module
because the topology coordinator will handle the votership changes
eventually. The calls in the storage_service module were not essential
and were only used for optimization (improving the HA under certain
conditions).

This has effect on the timing in the tablets migration test though,
as it relied on the node being made non-voter from the service_storage
`raft_removenode()` function. The fix is to add another server to the
topology to make sure we will keep the quorum.

Previously the test worked because the test waits for an injection to be
reached and it was ensured that the injection (log line) has only been
triggered after the node has been made non-voter from the
`raft_removenode()`. This is not the case anymore. An alternative fix
would be to wait for the first node to be made non-voter before stopping
the second server, but this would make the test more complex (and it is
not strictly required to only use 4 servers in the test, it has been
only done for optimization purposes).

Fixes: scylladb/scylladb#22860

Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969
2025-03-03 15:15:43 +01:00
Artsiom Mishuta
90106c6f19 test.py: skip test_incremental_read_repair[row-tombstone]
skip test test_incremental_read_repair[row-tombstone]
due to https://github.com/scylladb/scylladb/issues/21179

Closes scylladb/scylladb#23126
2025-03-03 15:26:28 +02:00
Nadav Har'El
ea19b79fe2 Merge 'De-duplicate API's table name to table ID conversion' from Pavel Emelyanov
This is continuation of #21533

There are two almost identical helpers in api/ -- validate_table(ks, cf) and get_uuid(ks, cf). Both check if the ks:cf table exists, throwing bad_param_exception if it doesn't. There's slight difference in their usage, namely -- callers of the latter one get the table_id found and make use of it, while the former helper is void and its callers need to re-search for the uuid again if the need (spoiler: they do).

This PR merges two helpers together, so there's less code to maintain. As a nice side effect, the existing validate_table() callers save one re-lookup of the ks:cf pair in database mappings.

Affected endpoints are validated by existing tests:
* column_family/{autocompation|tombstone_gc|compaction_strategy}, validated by the tests described in #21533
* /storage_service/{range_to_endpoint_map|describe_ring|ownership}, validated by nodetool tests
* /storage_service/tablets/{move|repair}, validated by tablets move and repair tests

Closes scylladb/scylladb#22742

* github.com:scylladb/scylladb:
  api: Remove get_uuid() local helper
  api: Make use of validate_table()'s table_id
  api: Make validate_table() helper return table_id after validation
  api: Change validate_table()'s ctx argument to database
2025-03-03 13:39:50 +02:00
Kefu Chai
5571b537b5 tree: Make values mutable to enable move semantics
Previously, variables were marked as const, causing std::move() calls to
be redundant as reported by GCC warnings. This change either removes
const qualifiers or marks related lambdas as mutable, allowing the
compiler to properly utilize move constructors for better performance.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23066
2025-03-03 13:53:02 +03:00
Evgeniy Naydanov
cb0e0ebcf7 test.py: extract prepare dirs and S3 mock steps to test/conftest.py
As a part of the moving to bare pytest we need to extract the required test
environment preparation steps into pytest's hooks/fixtures.

Do this for S3 mock stuff (MinioServer, MockS3Server, and S3ProxyServer)
and for directories with test artifacts.

For compatibility reason add --test-py-init CLI option for bare pytest
test runner: need to add it to pytest command if you need test.py
stuff in your tests (boost, topology, etc.)

Also, postpone initialization of TestSuite.artifacts and TestSuite.hosts
from import-time to runtime.

Closes scylladb/scylladb#23087
2025-03-03 13:24:37 +03:00
Kefu Chai
a3ac7c3d33 remove redundant std::move() from position_in_partition::key()
Fix GCC warning about moving from a const reference in mp_row_consumer_k_l::flush_if_needed.
Since position_in_partition::key() returns a const reference, std::move has no effect.

Considered adding an rvalue reference overload (clustering_key_prefix&& key() &&) but
since the "me" sstable format is mandatory since 63b266e9, this approach offers no benefit.

This change simply removes the redundant std::move() call to silence the warning and
improve code clarity.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23085
2025-03-03 12:51:40 +03:00
Piotr Dulikowski
d402a19e9a Merge 'replica: Prepare full keyspace config in make_keyspace_config()' from Pavel Emelyanov
Currently for system keyspace part of config members are configured outside of this helper, in the caller. It's more consistent to have full config initialization in one place.

Closes scylladb/scylladb#22975

* github.com:scylladb/scylladb:
  replica: Mark database::make_keyspace_config() private
  replica: Prepare full keyspace config in make_keyspace_config()
2025-03-03 10:44:42 +01:00
Kefu Chai
65bc6b449e scripts/open-coredump.sh: use the remote repo containing given sha1
Enhance how the script handles remote repository selection for a given
SHA1 commit hash.

Previously, in 3bdbe620, the script fetched from all remotes containing
the product name, which could lead to inefficiencies and errors,
especially with multiple matching remotes. Now, it first checks if the
SHA1 is in any local remote-tracking branch, using that remote if found,
and otherwise fetches from each remote sequentially to find the first
one containing the SHA1. This approach minimizes unnecessary fetches,
making the script more efficient for debugging coredumps in repositories
with multiple remotes.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23026
2025-03-03 08:22:41 +02:00
Paweł Zakrzewski
9e7f79d1ab cql3/select_statement: require LIMIT and PER PARTITION LIMIT to be strictly positive
LIMIT and PER PARTITION LIMIT limit the number of rows returned or taken
into consideration by a query. It makes no logical sense to have this
value at less than 1. Cassandra also has this requirement.

This patch ensures that the limit value is strictly positive and adds
an explicit test for it - it was only tested in a test ported from
Cassandra, that is disabled due to other issues.

Closes scylladb/scylladb#23013
2025-03-03 08:13:27 +02:00
Tomasz Grabiec
0343235aa2 Merge 'tablets: repair: fix hosts and dcs filters behavior for tablet repair' from Aleksandra Martyniuk
If hosts and/or dcs filters are specified for tablet repair and
some replicas match these filters, choose the replica that will
be the repair master according to round-robin principle
(currently it's always the first replica).

If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, the repair succeeds and
the repair request is removed (currently an exception is thrown
and tablet repair scheduler reschedules the repair forever).

Fixes: https://github.com/scylladb/scylladb/issues/23100.

Needs backport to 2025.1 that introduces hosts and dcs filters for tablet repair

Closes scylladb/scylladb#23101

* github.com:scylladb/scylladb:
  test: add new cases to tablet_repair tests
  test: extract repiar check to function
  locator: add round-robin selection of filtered replicas
  locator: add tablet_task_info::selected_by_filters
  service: finish repair successfully if no matching replica found
2025-03-01 14:47:43 +01:00
Jenkins Promoter
7b50fbafb3 Update pgo profiles - aarch64 2025-03-01 04:58:49 +02:00
Jenkins Promoter
84e1514152 Update pgo profiles - x86_64 2025-03-01 04:26:11 +02:00
Anna Stuchlik
850aec58e0 doc: add the 2025.1 upgrade guides and reorganize the upgrade section
This commit adds the upgrade guides relevant in version 2025.1:
- From 6.2 to 2025.1
- From 2024.x to 2025.1

It also removes the upgrade guides that are not relevant in 2025.1 source available:
- Open Source upgrade guides
- From Open Source to Enterprise upgrade guides
- Links to the Enterprise upgrade guides

Also, as part of this PR, the remaining relevant content has been moved to
the new About Upgrade page.

WHAT NEEDS TO BE REVIEWED
- Review the instructions in the 6.2-to-2025.1 guide
- Review the instructions in the 2024.x-to-2025.1 guide
- Verify that there are no references to Open Source and Enterprise.

The scope of this PR does not have to include metrics - the info can be added
in a follow-up PR.

Fixes https://github.com/scylladb/scylladb/issues/22208
Fixes https://github.com/scylladb/scylladb/issues/22209
Fixes https://github.com/scylladb/scylladb/issues/23072
Fixes https://github.com/scylladb/scylladb/issues/22346

Closes scylladb/scylladb#22352
2025-02-28 15:18:34 +03:00
Aleksandra Martyniuk
c7c6d820d7 test: add new cases to tablet_repair tests
Add tests for tablet repair with host and dc filters that select
one or no replica.
2025-02-28 13:03:04 +01:00
Aleksandra Martyniuk
c40eaa0577 test: extract repiar check to function 2025-02-28 13:01:10 +01:00
Aleksandra Martyniuk
2b538d228c locator: add round-robin selection of filtered replicas 2025-02-28 12:32:55 +01:00
Aleksandra Martyniuk
fe4e99d7b3 locator: add tablet_task_info::selected_by_filters
Extract dcs and hosts filters check to a method.
2025-02-28 12:02:21 +01:00
Kefu Chai
af6895548c cql3: result_set: Initialize result_generator::_stats to prevent undefined behavior
Previously, when result_generator's default constructor was called, the
_stats member variable remained uninitialized. This could lead to
undefined behavior in release builds where uninitialized values are
unpredictable, making issues difficult to debug.

This change initializes the pointer to nullptr, ensuring consistent
behavior across all build types and preventing potential memory-related
bugs.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23073
2025-02-28 13:57:13 +03:00
Kefu Chai
9e0e99347f sstables: explicitly call parent's default constructor in copy constructor
When implementing the copy constructor for `sstable_set` (derived from
`enable_lw_shared_from_this`), we intentionally need the parent's default
constructor rather than its copy constructor. This is because each new
`sstable_set` instance maintains its own reference count and owns a clone
of the source object's implementation (`x._impl->clone()`).

Although this behavior is correct, GCC warns about not calling the parent's
copy constructor. This change explicitly calls the parent's default constructor
to:

1. Silence GCC warnings
2. Clearly document our intention to use the default constructor
3. Follow best practices for constructor initialization

The functionality remains unchanged, but the code is now more explicit about
its design and free of compiler warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23083
2025-02-28 13:52:24 +03:00
Aleksandra Martyniuk
9bce40d917 service: finish repair successfully if no matching replica found
If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, an exception is thrown. The repair
fails and tablet repair scheduler reschedules it forever.

Such a repair should actually succeed (as all specified relpicas were
repaired) and the repair request should be removed.

Treat the repair as successful if the filters were specified and
selected no replica.
2025-02-28 11:50:52 +01:00
Botond Dénes
7ba29ec46c reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
It is possible that the permit handed in to register_inactive_read() is
already aborted (currently only possible if permit timed out).
If the permit also happens to have wait for memory, the current code
will attempt to call promise<>::set_exception() on the permit's promise
to abort its waiters. But if the permit was already aborted via timeout,
this promise will already have an exception and this will trigger an
assert. Add a separate case for checking if the permit is aborted
already. If so, treat it as immediate eviction: close the reader and
clean up.

Fixes: scylladb/scylladb#22919
2025-02-28 01:32:46 -05:00
Botond Dénes
4d8eb02b8d test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
Unless the test in question actually wants to test timeouts. Timeouts
will have more pronounced consequences soon and thus using
db::timeout_clock::now() becomes a sure way to make tests flaky.
To avoid this, use db::no_timeout in the tests that don't care about
timeouts.
2025-02-28 01:31:33 -05:00
Anna Stuchlik
439463dbbf doc: add support for Ubuntu 24.04 in 2024.1
Fixes https://github.com/scylladb/scylladb/issues/22841

Refs https://github.com/scylladb/scylla-enterprise/issues/4550

Closes scylladb/scylladb#22843
2025-02-27 15:12:31 +03:00
Anna Stuchlik
0999fad279 doc: add information about tablets limitation to the CQL page
This commit adds a link to the Limitations section on the Tablets page
to the CQL pag, the tablets option.
This is actually the place where the user will need the information:
when creating a keyspace.

In addition, I've reorganized the section for better readability
(otherwise, the section about limitations was easy to miss)
and moved the section up on the page.

Note that I've removed the updated content from the  `_common` folder
(which I deleted) to the .rst page - we no longer split OSS and Enterprise,
so there's no need to keep using the `scylladb_include_flag` directive
to include OSS- and Ent-specific content.

Fixes https://github.com/scylladb/scylladb/issues/22892

Fixes https://github.com/scylladb/scylladb/issues/22940

Closes scylladb/scylladb#22939
2025-02-27 15:11:19 +03:00
Asias He
3f59a89e85 repair: Fix return type for storage_service/tablets/repair API
The API returns the repair task UUID. For example:

{"tablet_task_id":"3597e990-dc4f-11ef-b961-95d5ead302a7"}

Fixes #23032

Closes scylladb/scylladb#23050
2025-02-27 12:38:12 +02:00
Kefu Chai
834450f604 github: Skip clang-tidy when not explicitly requested
Previously, the clang-tidy.yaml workflow would cancel the clang-tidy job
when a comment wasn't prefixed with "/clang-tidy", instead of skipping it.
This cancellation triggered unnecessary email notifications for developers
with GitHub action notifications enabled.

This change modifies the workflow to only run clang-tidy when the
read-toolchain job succeeds, reducing notification noise by properly
skipping the job rather than cancelling it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23084
2025-02-27 13:28:35 +03:00
Artsiom Mishuta
cd5d34f9b7 test.py: fix failed_test collection
after introducing the test.py subfolders support,
test.py start creating weird log files like
testlog/topology_custom.mv/tablets/test_mv_tablets.1

that affect failed test collection logic
this commit fixes this and test.py logs as previously in testlog directory
without any subfolders: topology_custom.mv_tablets_test_mv_tablets.1

Closes scylladb/scylladb#23009
2025-02-27 12:37:11 +03:00
Avi Kivity
3f05fa3a9b test: lib: replace boost::generate with std equivalent
Reduces dependencies on boost/range.

Closes scylladb/scylladb#23034
2025-02-27 01:05:46 +01:00
Kefu Chai
c45f9b7155 utils/sorting: Fix VerticesContainer concept constraints
Fix a bug where std::same_as<...> constraint was incorrectly used as a
simple requirement instead of a nested requirement or part of a
conjunction. This caused the constraint to be always satisfied
regardless of the actual types involved.

This change promotes std::same_as<...> to a top-level constraint,
ensuring proper type checking while improving code readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23068
2025-02-26 23:23:53 +02:00
Kefu Chai
6e4cb20a69 tree: implement boost::accumulate with std::ranges library
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23062
2025-02-26 23:22:02 +02:00
Kefu Chai
41dd004c20 conf: scylla.yaml: correct a misspelling
s/ommitted/omitted/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23055
2025-02-26 23:19:56 +02:00
Pavel Emelyanov
27e96be6ad B+tree: Clean const_iterator->iterator conversion
The tree code have const and non-const overloads for searching methods
like find(), lower_bound(), etc. Not to implement them twice, it's coded
like

   const_iterator find() const {
       ... // the implementation itself
   }

   iterator find() {
       return iterator(const_cast<const *>(this)->find());
   }

i.e. -- const overload is called, and returned by it const_iterator is
converted into a non-const iterator. For that the latter has dedicated
constructor with two inaccuracies: it's not marked as explicit and it
accepts const rvalue reference.

This patch fixes both.

Althogh this disables implicit const -> non-const conversion of
iterators, the constructor in question is public, which still opens a
way for conversion (without const_cast<>). This constructor is better
be marked private, but there's double_decker class that uses bptree
and exploits the same hacks in its finding methods, so it needs this
constructor to be callable. Alas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23069
2025-02-26 23:17:27 +02:00
Kefu Chai
da9960db1c tree: Fix polymorphic exception handling by using references
Replace value-based exception catching with reference-based catching to address
GCC warnings about polymorphic type slicing:

```
warning: catching polymorphic type ‘class seastar::rpc::stream_closed’ by value [-Wcatch-value=]
```

When catching polymorphic exceptions by value, the C++ runtime copies the
thrown exception into a new instance of the specified type, slicing the
actual exception and potentially losing important information. This change
ensures all polymorphic exceptions are caught by reference to preserve the
complete exception state.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23064
2025-02-26 23:15:16 +02:00
Piotr Szymaniak
f887466c3f alternator: Clean error handling on CreateTable without AttributeDefinitions
If user fails to supply the AttributeDefinitions parameter when creating
a table, Scylla used to fail on RAPIDJSON_ASSERT. Now it calls a polite
exception, which is fully in-line with what DynamoDB does.

The commit supplies also a new, relevant test routine.

Fixes #23043

Closes scylladb/scylladb#23041
2025-02-26 14:24:57 +02:00
Botond Dénes
5d63ef4d15 Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund
Fixes #22314

Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.

Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points.
Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line.

Closes scylladb/scylladb#22327

* github.com:scylladb/scylladb:
  tools: Add standard extensions and propagate to schema load
  cql_test_env: Use add all extensions instead of inidividually
  main: Move extensions adding to function
  tomstone_gc: Make validate work for tools
2025-02-26 13:52:47 +02:00
Kefu Chai
6e4df57f97 mutation,test: replace boost::equal with std::ranges::equal
to reduce third-party dependencies and modernize the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22999
2025-02-26 14:27:42 +03:00
Andrzej Jackowski
b4f0a5149a db: cql3: add comments regarding unsafe interval<clustering_key_prefix>
class clustering_range is a range of Clustering Key Prefixes implemented
as interval<clustering_key_prefix>. However, due to the nature of
Clustering Key Prefix, the ordering of clustering_range is complex and
does not satisfy the invariant of interval<>. To be more specific, as a
comment in interval<> implementation states: “The end bound can never be
smaller than the start bound”. As a range of CKP violates the invariant,
some algorithms, like intersection(), can return incorrect results.
For more details refer to scylladb#8157, scylladb#21604, scylladb#22817.

This commit:
 - Add a WARNING comment to discourage usage of clustering_range
 - Add WARNING comments to potentially incorrect uses of
   interval<clustering_key_prefix> non-trivial methods
 - Add a FIXME comment to incorrect use of
   interval<clustering_key_prefix_view>::deoverlap and WARNING comments
   to related interval<clustering_key_prefix_view> misuse.

Closes scylladb/scylladb#22913
2025-02-26 12:01:28 +01:00
Wojciech Mitros
6bc445b841 test: increase timeout for adding a server in test_mv_topology_change
Currently, when we add servers to the cluster in the test, we use
a 60s timeout which proved to be not enough in one of the debug runs.
There is no reason for this test to use a shorter timeout than all
the other tests, so in this patch we reset it to the higher default.

Fixes https://github.com/scylladb/scylladb/issues/23047

Closes scylladb/scylladb#23048
2025-02-26 10:18:05 +02:00
Pavel Emelyanov
db1e29cfea replica: Mark database::make_keyspace_config() private
It's not been used outside of database class for long ago already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-26 09:56:07 +03:00
Pavel Emelyanov
d7018ae3d9 replica: Prepare full keyspace config in make_keyspace_config()
Currently for system keyspace part of config members are configured
outside of this helper, in the caller. It's more consistent to have full
config initialization in one place.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-26 09:54:12 +03:00
Pavel Emelyanov
eff61b167c treewide: Reduce db/config.hh header fanout
Drop it from files that obviously don't need it. Also kill some forward
declarations while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22979
2025-02-25 15:16:40 +01:00
Piotr Dulikowski
43ae3ab703 test: test_mv_topology_change: increase timeout for removenode
The test `test_mv_topology_change` is a regression test for
scylladb/scylladb#19529. The problem was that CL=ANY writes issued when
all replicas were down would be kept in memory until the timeout. In
particular, MV updates are CL=ANY writes and have a 5 minute timeout.
When doing topology operations for vnodes or when migrating tablet
replicas, the cluster goes through stages where the replica sets for
writes undergo changes, and the writes started with the old replica set
need to be drained first.

Because of the aforementioned MV updates, the removenode operation could
be delayed by 5 minutes or more. Therefore, the
`test_mv_topology_change` test uses a short timeout for the removenode
operation, i.e. 30s. Apparently, this is too low for the debug mode and
the test has been observed to time out even though the removenode
operation is progressing fine.

Increase the timeout to 60s. This is the lowest timeout for the
removenode operation that we currently use among the in-repo tests, and
is lower than 5 minutes so the test will still serve its purpose.

Fixes: scylladb/scylladb#22953

Closes scylladb/scylladb#22958
2025-02-25 17:00:36 +03:00
Evgeniy Naydanov
e572771f36 test.py: refactor test.py: move test suites classes into pylib
Split huge test.py into smaller pieces: test.pylib.suite.*

Closes scylladb/scylladb#23005
2025-02-25 14:35:29 +03:00
Avi Kivity
6e70e69246 test/lib: mutation_assertions: deinline
While generally better to reduce inline code, here we get
rid of the clustering_interval_set.hh dependency, which in turns
depends on boost interval_set, a large dependency.

incremental_compaction_test.cc is adjusted for a missing header.

Closes scylladb/scylladb#22957
2025-02-25 11:40:54 +01:00
Calle Wilund
e49f2046e5 generic_server: Update conditions for is_broken_pipe_or_connection_reset
Refs scylla-enterprise#5185
Fixes #22901

If a tls socket gets EPIPE the error is not translated to a specific
gnutls error code, but only a generic ERROR_PULL/PUSH. Since we treat
EPIPE as ignorable for plain sockets, we need to unwind nested exception
here to detect that the error was in fact due to this, so we can suppress
log output for this.

Closes scylladb/scylladb#22888
2025-02-25 10:35:11 +02:00
Kefu Chai
9fdbe0e74b tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22997
2025-02-25 10:32:32 +03:00
Kefu Chai
42335baec5 backup_task: Use INFO level for upload abort during shutdown
When a backup upload is aborted due to instance shutdown, change the log
level from ERROR to INFO since this is expected behavior. Previously,
 `abort_requested_exception` during upload would trigger an ERROR log, causing
test failures since error logs indicate unexpected issues.

This change:
- Catches `abort_requested_exception` specifically during file uploads
- Logs these shutdown-triggered aborts at INFO level instead of ERROR
- Aligns with how `abort_requested_exception` is handled elsewhere in the service

This prevents false test failures while still informing administrators
about aborted uploads during shutdown.

Fixes scylladb/scylladb#22391

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22995
2025-02-25 10:32:10 +03:00
Benny Halevy
55dbf5493c docs: document the views-with-tablets experimental feature
Refs scylladb/scylladb#22217

Fixes scylladb/scylladb#22893

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22896
2025-02-24 17:23:08 +01:00
Avi Kivity
d99df7af6c Merge 'Respect per-shard tablet goal and 10x default per-shard tablet count' from Tomasz Grabiec
This series achieves two things:

1) changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards

    This will result in new tables having at least 10 tablet replicas per
    shard by default.

    We want this to reduce tablet load imbalance due to differences in
    tablet count per shard, where some shards have 1 tablet and some
    shards have 2 tablets. With higher tablet count per shard, this
    difference-by-one is less relevant.

    Fixes https://github.com/scylladb/scylladb/issues/21967

2) introduces a global goal for tablet replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count

    The per-shard goal is enforced by controlling average per-shard tablet replica
    count in a given DC, which is controlled by per-table tablet
    count. This is effective in respecting the limit on individual shards
    as long as tablet replicas are distributed evenly between shards.
    There is no attempt to move tablets around in order to enforce limits
    on individual shards in case of imbalance between shards.

    If the average per-shard tablet count exceeds the limit, all tables
    which contribute to it (have replicas in the DC) are scaled down
    by the same factor. Due to rounding up to the nearest power of 2,
    we may overshoot the per-shard goal by at most a factor of 2.

    The scaling is applied after computing desired tablet count due to
    all other factors: per-table tablet count hints, defaults, average tablet size.

    If different DCs want different scale factors of a given table, the
    lowest scale factor is chosen for a given table.

    When creating a new table, its tablet count is determined by tablet
    scheduler using the scheduler logic, as if the table was already created.
    So any scaling due to per-shard tablet count goal is reflected immediately
    when creating a table. It may however still take some time for the system
    to shrink existing tables. We don't reject requests to create new tables.

    Fixes #21458

Closes scylladb/scylladb#22522

* github.com:scylladb/scylladb:
  config, tablets: Allow tablets_initial_scale_factor to be a fraction
  test: tablets_test: Test scaling when creating lots of tables
  test: tablets_test: Test tablet count changes on per-table option and config changes
  test: tablets_test: Add support for auto-split mode
  test: cql_test_env: Expose db config
  config: Make tablets_initial_scale_factor live-updateable
  tablets: load_balancer: Pick initial_scale_factor from config
  tablets, load_balancer: Fix and improve logging of resize decisions
  tablets, load_balancer: Log reason for target tablet count
  tablets: load_balancer: Move hints processing to tablet scheduler
  tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal
  tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
  tablets: load_balancer: Determine desired count from size separately from count from options
  tablets: load_balancer: Determine resize decision from target tablet count
  tablets: load_balancer: Allow splits even if table stats not available
  tablets: load_balancer: Extract make_sizing_plan()
  tablets: Add formatter for resize_decision::way_type
  tablets: load_balancer: Simplify resize_urgency_cmp()
  tablets: load_balancer: Keep config items as instance members
  locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology()
  tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard
  tablets: Set default initial tablet count scale to 10
  tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology()
  tablets: load_balancer: Extract get_schema_and_rs()
  tablets: load_balancer: Drop test_mode
2025-02-24 17:59:26 +02:00
Łukasz Paszkowski
9ec1a457d6 alter_keyspace_statement: Include tablets information in system.topology
Altering a keyspace (that has tablets enabled) without changing
tablets attributes, i.e. no `AND tablets = {...}` results in incorrect
"Update Keyspace..." log message being printed. The printed log
contains "tablets={"enabled":false}".

Refs https://github.com/scylladb/scylladb/issues/22261

Closes scylladb/scylladb#22324
2025-02-24 15:11:14 +02:00
Botond Dénes
6ae3076b4e Merge 'tablet-mon.py: Improve split&merge visualization and make tablet id text optional in table mode' from Tomasz Grabiec
Tablet sequeunce number was part of the tablet identifier together
with last token, so on split and merge all ids changed and it appeared
in the simulator as all tablets of a table dropping and being created
anew. That's confusing. After this change, only last token is part of
the id, so split appears as adding tablets and merge appears as
removing half the tablets, which is more accurate.

Also includes an enhancement to make showing of tablet id text
optional in table mode.

Closes scylladb/scylladb#22981

* github.com:scylladb/scylladb:
  tablet-mon.py: Don't show merges and splits as full table recreations
  tablet-mon.py: Add toggle for tablet ids
2025-02-24 15:09:54 +02:00
Takuya ASADA
f2a8ae101b dist/docker: drop hostname package, use Python API
We currently depends on hostname command to get local IP, but we can do
this on Python API.
After the change, we can drop the package.

Closes scylladb/scylladb#22909
2025-02-24 15:03:44 +02:00
Anna Stuchlik
d0a48c5661 doc: remove the reference to the 6.2 version
This commit removes the OSS version name, which is irrelevant
and confusing for 2025.1 and later users.
Also, it updates the warning to avoid specifying the release
when the deprecated feature will be removed.

Fixes https://github.com/scylladb/scylladb/issues/22839

Closes scylladb/scylladb#22936
2025-02-24 15:02:11 +02:00
Botond Dénes
6ab16006a2 Merge 'Untangle sstable-directory vs sstable in pending log creation code' from Pavel Emelyanov
There's a sstable_directory::create_pending_deletion_log() helper method that's called by sstable's filesystem_storage atomic-delete methods and that prepares the deletion log for a bunch of sstables. For that method to do its job it needs to get private sstable->_storage field (which is always the filesystem_storage one), also the atomic-delete transparent context object is leaked into the sstable_directory code and low-level sstable storage code needs to include higher-level sstable_directory header.

This patch unties these knots. As the result:
- friendship between sstable and sstable_directory is removed
- transparent atomic_delete_context is encapsulated in storage.(cc|hh) code
- less code for create_pending_deletion_log() to dump TOC filename into log

Closes scylladb/scylladb#22823

* github.com:scylladb/scylladb:
  sstable: Unfriend sstable_directory class
  sstable_directory: Move sstable_directory::pending_delete_result
  sstable_directory: Calculate prefixes outside of create_pending_deletion_log()
  sstable_directory: Introduce local pending_delete_log variable
  sstable_directory: Relax toc file dumping to deletion log
2025-02-24 14:58:37 +02:00
Paweł Zakrzewski
854d2917a1 cql3/select_statement: reject PER PARTITION LIMIT with SELECT DISTINCT
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
SELECT DISTINCT requires all the partition key columns, which means that
setting PER PARTITION LIMIT is redundant - only one result will be
returned from every partition anyway.

Cassandra behaves the same way, so this patch also ensures
compatibility.

Fixes scylladb/scylladb#15109

Closes scylladb/scylladb#22950
2025-02-24 14:50:18 +02:00
Yaron Kaikov
e6227f9a25 install-dependencies.sh: update node_exporter to 1.9.0
Update node_exporter to 1.9.0 to resolve the following CVE's
https://github.com/advisories/GHSA-49gw-vxvf-fc2g
https://github.com/advisories/GHSA-8xfx-rj4p-23jm
https://github.com/advisories/GHSA-crqm-pwhx-j97f
https://github.com/advisories/GHSA-j7vj-rw65-4v26

Fixes: https://github.com/scylladb/scylladb/issues/22884

regenerate frozen toolchain with optimized clang from
* https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
* https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz

Closes scylladb/scylladb#22987
2025-02-24 13:49:36 +02:00
Avi Kivity
1891e10b7b sstables: writer.hh: drop unneeded boost depedencies
Closes scylladb/scylladb#22955
2025-02-24 13:26:44 +03:00
Avi Kivity
58d4d8142a install-dependencies.sh: harden pip_packages against shellcheck
pip_packages is an associative array, which in bash is constructed
as ([key]=value...). In our case the value is often empty (indicating
no version constraint). Shellcheck warns against it, since `[key]= x`
could be a mistype of `[key]=x`. It's not in our case, but shellcheck
doesn't know that.

Make shellcheck happier by specifying the empty values explicitly.

Closes scylladb/scylladb#22990
2025-02-24 13:26:10 +03:00
Kefu Chai
dfa40972bb topology_custom/test_zero_token_nodes_multidc: Enhance test logging and error handling
Add verbose logging to identify failing test combinations in multi-DC
setup:

- Log replication factor (RF) and consistency level (CL) for each test
  iteration
- Add validation checks for empty result sets

Improve error handling:
- Before indexing in a list, use `assert` to check for its emptiness
- Use assertion failures instead of exceptions for clearer test diagnostics

This change helps debug test failures by showing which RF/CL
combinations cause inconsistent results between zero-token and regular
nodes.

Refs scylladb/scylladb#22967

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22968
2025-02-24 11:09:51 +01:00
Kefu Chai
7bf7817e8a docs/cql: s/wasm32-wasi/wasm32-wasip1/
Rust's WASI target of wasm32-wasi was renamed to wasm32-wasip1, see
https://blog.rust-lang.org/2024/04/09/updates-to-rusts-wasi-targets.html.
and our building system has been adapted to this change. let's update
the document to reflect this change.

Fixes scylladb/scylladb#20878
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21184
2025-02-24 11:06:46 +01:00
Patryk Jędrzejczak
de751cad03 Merge 'test/topology_experimental_raft: add test_topology_upgrade_stuck' from Piotr Dulikowski
The test simulates the cluster getting stuck during upgrade to raft
topology due to majority loss, and then verifies that it's possible to
get out of the situation by performing recovery and redoing the upgrade.

Fixes: #17410

Closes scylladb/scylladb#17675

* https://github.com/scylladb/scylladb:
  test/topology_experimental_raft: add test_topology_upgrade_stuck
  test.py: bump minimum python version to 3.11
  test.py: move gather_safely to pylib utils
  cdc: generation: don't capture token metadata when retrying update
  test.py: topology: ignore hosts when waiting for group0 consistency
  raft: add error injection that drops append_entries
  topology_coordinator: add injection which makes upgrade get stuck
2025-02-24 11:02:32 +01:00
Kefu Chai
d92646a17e install.sh: simplify check_usermode_support()
because we don't care about the exact output of grep, let's silence
its output. also, no need to check for the string is empty, so let's
just use the status code of the grep for the return value of the
function, more idiomatic this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22737
2025-02-24 11:29:30 +03:00
Evgeniy Naydanov
99be9ac8d8 test.py: test_random_failures: improve handling of hung node
In some cases the paused/unpaused node can hang not after 30s timeout.
This make the test flaky.  Change the condition to always check the
coordinator's log if there is a hung node.

Add `stop_after_streaming` to the list of error injections which can
cause a node's hang.

Also add a wait for a new coordinator election in cluster events
which cause such elections.

Closes scylladb/scylladb#22825
2025-02-24 10:23:05 +03:00
Kefu Chai
fd52b0a3cc cql3: fix false-positive "used-after-move" warning in clang-tidy
`slice.is_reversed()` was falsely flagged as accessing moved data, since
the underlying enum_set remains valid after move. However, to improve code
clarity and silence the warning, now reference `command->slice` directly
instead, which is guaranteed to be valid as the move target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22971
2025-02-23 18:58:35 +02:00
Marcin Maliszkiewicz
f34ea308b3 transport: remove unused _request_cpu from connection 2025-02-23 18:32:14 +02:00
Benny Halevy
7a4c563e40 feed_writers: optimize error path
Eliminate one try/catch block around call to wr.close()
by using coroutine::as_future.

Mark error paths as `[[unlikely]]`.

Use `coroutine::return_exception_ptr` to avoid rethrowing
the final exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22831
2025-02-23 18:22:39 +02:00
Dawid Mędrek
138645f744 install-dependencies.sh: Make script capable of updating pip packages
Before these changes, the script didn't update the listed pip packages
if they were already installed. If the latest version of Scylla started
using new features and required an updated Python driver, for example,
the developers (and possibly the user) were forced to update it manually.

In this commit, we modify the script so that it updates the installed
packages when run. This should make things easier for everyone.

Closes scylladb/scylladb#22912
2025-02-23 16:26:50 +02:00
Yaron Kaikov
084f4d2ee3 .github/scripts/auto-backport.py: search for Fixes also in commits
In #22650 the backport process wasn't completed since the PR body didn't include the Fixes ref as expected but the commits did have it

Expanding the search for `Fixes` to include commits in the same PR

Fixes: https://github.com/scylladb/scylla-pkg/issues/4899

Closes scylladb/scylladb#22988
2025-02-23 13:20:28 +02:00
Pavel Emelyanov
a6c882e4e3 sstables: Remove dead get_config() and db::config declarations
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22974
2025-02-21 15:56:04 +01:00
Tomasz Grabiec
62d53d2a47 tablet-mon.py: Don't show merges and splits as full table recreations
Tablet sequeunce number was part of the tablet identifier together
with last token, so on split and merge all ids changed and it appeared
in the simulator as all tablets of a table dropping and being created
anew. That's confusing. After this change, only last token is part of
the id, so split appears as adding tablets and merge appears as
removing half the tablets, which is more accurate.
2025-02-21 15:34:48 +01:00
Tomasz Grabiec
7227d70d4d tablet-mon.py: Add toggle for tablet ids 2025-02-21 15:34:48 +01:00
Kefu Chai
a80d7e6159 test/pylib: test/pylib: Simplify boolean logic in pagination check
Replace complex boolean expression:
```py
  not driver_response_future.has_more_pages or not all_pages
```
with clearer equivalent:
```py
  driver_response_future.has_more_pages and all_pages
```
The new expression is more intuitive as it directly checks for both
conditions (having more pages and wanting all pages) rather than using double
negation.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22969
2025-02-21 14:21:09 +03:00
Emil Maskovsky
574224491d raft/test: adjust the "raft_ignore_nodes" test for limited voters
Before the limited voters feature, the "raft_ignore_nodes" test was
relying upon the fact that all nodes will become voters.

With the limited voters feature, the test needs to be adjusted to
ensure that we do not lose the majority of the cluster. This could
happen when there are 7 nodes, but only 5 of them are voters - then if
we kill 3 nodes randomly we might end up with only 2 voters left.

Therefore we need to ensure that we only stop the appropriate number of
voter nodes. So we need to determine which nodes became voters and which
ones are non-voters, and select the nodes to be stopped based on that.

That means with 7 nodes and 5 voters, we can stop up to 2 voter nodes,
but at least one of the stopped nodes must be a non-voter.

Fixes: scylladb/scylladb#22902

Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969

Closes scylladb/scylladb#22904
2025-02-20 18:42:03 +01:00
Patryk Jędrzejczak
6bb1ed2ef4 Merge 'Merge topology_tasks and topology_random_failures into topology_custom' from Artsiom Mishuta
Now that we support suite subfolders, there is no
need to create an own suite for topology_tasks and topology_random_failures.

Closes scylladb/scylladb#22879

* https://github.com/scylladb/scylladb:
  test.py: merge topology_tasks suite into topology_custom suite
  test.py: merge topology_random_failures suite into topology_customs
2025-02-20 16:02:45 +01:00
Patryk Jędrzejczak
78c227c521 Merge 'raft topology: Add support for raft topology init to happen before group0 initialization' from Abhinav Kumar Jha
In the current scenario, the problem discovered is that there is a time
gap between group0 creation and raft_initialize_discovery_leader call.
Because of that, the group0 snapshot/apply entry enters wrong values
from the disk(null) and updates the in-memory variables to wrong values.
During the above time gap, the in-memory variables have wrong values and
perform absurd actions.

This PR removes the variable `_manage_topology_change_kind_from_group0`
which was used earlier as a work around for correctly handling
`topology_change_kind` variable, it was brittle and had some bugs
(causing issues like scylladb/scylladb#21114). The reason for this bug
that _manage_topology_change_kind used to block reading from disk and
was enabled after group0 initialization and starting raft server for the
restart case. Similarly, it was hard to manage `topology_change_kind`
using `_manage_topology_change_kind_from_group0` correctly in bug free
manner.

Post `_manage_topology_change_kind_from_group0` removal, careful
management of `topology_change_kind` variable was needed for maintaining
correct `topology_change_kind` in all scenarios. So this PR also
performs a refactoring to populate all init data to system tables even
before group0 creation(via `raft_initialize_discovery_leader` function).
Now because `raft_initialize_discovery_leader` happens before the group
0 creation, we write mutations directly to system tables instead of a
group 0 command. Hence, post group0 creation, the node can read the
correct values from system tables and correct values are maintained
throughout.

Added a new function `initialize_done_topology_upgrade_state` which
takes care of updating the correct upgrade state to system tables before
starting group0 server. This ensures that the node can read the correct
values from system tables and correct values are maintained throughout.

By moving `raft_initialize_discovery_leader` logic to happen before
starting group0 server, and not as group0 command post server start, we
also get rid of the potential problem of init group0 command not being
the 1st command on the server. Hence ensuring full integrity as expected
by programmer.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#21114

Closes scylladb/scylladb#22484

* https://github.com/scylladb/scylladb:
  storage_service: Remove the variable _manage_topology_change_kind_from_group0
  storage_service: fix indentation after the previous commit
  raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
  service/raft: Refactor mutation writing helper functions.
2025-02-20 14:42:39 +01:00
Benny Halevy
29b795709b token_group_based_splitting_mutation_writer: maybe_switch_to_new_writer: prevent double close
Currently, maybe_switch_to_new_writer resets _current_writer
only in a continuation after closing the current writer.
This leaves a window of vulnerability if close() yields,
and token_group_based_splitting_mutation_writer::close()
is called. Seeing the engaged _current_writer, close()
will call _current_writer->close() - which must be called
exactly once.

Solve this when switching to a new writer by resetting
_current_writer before closing it and potentially yielding.

Fixes #22715

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22922
2025-02-20 15:41:09 +03:00
Kefu Chai
ccbfe4f669 compaction: replace boost::range::find with std::ranges::find
Replace boost::range::find() calls with std::ranges::find(). This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22942
2025-02-20 14:25:08 +02:00
Anna Stuchlik
a28bbc22bd doc: remove references to Enterprise
This commit removes the redundant references to Enterprise,
which are no longer valid.

Fixes https://github.com/scylladb/scylladb/issues/22927

Closes scylladb/scylladb#22930
2025-02-20 11:24:34 +02:00
Raphael S. Carvalho
4d8a333a7f storage_service: Don't retry split when table is dropped
The split monitor wasn't handling the scenario where the table being
split is dropped. The monitor would be unable to find the tablet map
of such a table, and the error would be treated as a retryable one
causing the monitor to fall into an endless retry loop, with sleeps
in between. And that would block further splits, since the monitor
would be busy with the retries. The fix is about detecting table
was dropped and skipping to the next candidate, if any.

Fixes #21859.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22933
2025-02-20 10:13:55 +01:00
Gleb Natapov
914c9f1711 treewide: include build_mode.hh for SCYLLA_BUILD_MODE_RELEASE where it is missing
Fixes: #22914

Closes scylladb/scylladb#22915
2025-02-20 10:50:04 +03:00
Botond Dénes
1f553457dc Merge 'test/topology: use standard new_test_keyspace functions' from Benny Halevy
This PR improves and refactors the test.topology.util new_test_keyspace generator
and adds a corresponding create_new_test_keyspace function to be used by most if not
all topology unit tests in order to standardize the way the tests create keyspaces
and to mitigate the python driver create keyspace retry issue: https://github.com/scylladb/python-driver/issues/317

Fixes #22342
Fixes #21905
Refs https://github.com/scylladb/scylla-enterprise/issues/5060

* No backport required, though may be desired to stabilize CI also in release branches.

Closes scylladb/scylladb#22399

* github.com:scylladb/scylladb:
  test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
  test/repair: create_table_insert_data_for_repair: create keyspace with unique name
  topology_tasks/test_tablet_tasks: use new_test_keyspace
  topology_tasks/test_node_ops_tasks: use new_test_keyspace
  topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
  topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
  topology_custom/test_view_build_status: use new_test_keyspace
  topology_custom/test_truncate_with_tablets: use new_test_keyspace
  topology_custom/test_topology_failure_recovery: use new_test_keyspace
  topology_custom/test_tablets_removenode: use create_new_test_keyspace
  topology_custom/test_tablets_migration: use new_test_keyspace
  topology_custom/test_tablets_merge: use new_test_keyspace
  topology_custom/test_tablets_intranode: use new_test_keyspace
  topology_custom/test_tablets_cql: use new_test_keyspace
  topology_custom/test_tablets2: use *new_test_keyspace
  topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
  topology_custom/test_tablets: use new_test_keyspace
  topology_custom/test_table_desc_read_barrier: use new_test_keyspace
  topology_custom/test_shutdown_hang: use new_test_keyspace
  topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
  topology_custom/test_rpc_compression: use new_test_keyspace
  topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
  topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
  topology_custom/test_raft_no_quorum: use new_test_keyspace
  topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
  topology_custom/test_query_rebounce: use new_test_keyspace
  topology_custom/test_not_enough_token_owners: use new_test_keyspace
  topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
  topology_custom/test_node_isolation: use create_new_test_keyspace
  topology_custom/test_mv_topology_change: use new_test_keyspace
  topology_custom/test_mv_tablets_replace: use new_test_keyspace
  topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
  topology_custom/test_mv_tablets: use new_test_keyspace
  topology_custom/test_mv_read_concurrency: use new_test_keyspace
  topology_custom/test_mv_fail_building: use new_test_keyspace
  topology_custom/test_mv_delete_partitions: use new_test_keyspace
  topology_custom/test_mv_building: use new_test_keyspace
  topology_custom/test_mv_backlog: use new_test_keyspace
  topology_custom/test_mv_admission_control: use new_test_keyspace
  topology_custom/test_major_compaction: use new_test_keyspace
  topology_custom/test_maintenance_mode: use new_test_keyspace
  topology_custom/test_lwt_semaphore: use new_test_keyspace
  topology_custom/test_ip_mappings: use new_test_keyspace
  topology_custom/test_hints: use new_test_keyspace
  topology_custom/test_group0_schema_versioning: use new_test_keyspace
  topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
  topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
  topology_custom/test_read_repair: use new_test_keyspace
  topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
  topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
  topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
  topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
  test/topology/util: new_test_keyspace: drop keyspace only on success
  test/topology/util: refactor new_test_keyspace
  test/topology/util: CREATE KEYSPACE IF NOT EXISTS
  test/topology/util: new_test_keyspace: accept ManagerClient
2025-02-20 09:43:15 +02:00
Kefu Chai
ddfd438434 cql3: replace boost::accumulate() with std::ranges::fold_left()
Replace boost::accumulate() calls with std::ranges::fold_left(). This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22924
2025-02-20 09:32:17 +03:00
Kefu Chai
5be39740a8 tree: migrate from boost::find to std::ranges algorithms
Replace boost::find() calls with std::ranges::find() and std::ranges::contains()
to leverage modern C++ standard library features. This change reduces external
dependencies and modernizes the codebase.

The following changes were made:
- Replaced boost::find() with std::ranges::find() where index/iterator is needed
- Used std::ranges::contains() for simple element presence checks

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22920
2025-02-20 09:28:57 +03:00
Tomasz Grabiec
1a7023c85a config, tablets: Allow tablets_initial_scale_factor to be a fraction
We may want fewer than 1 tablets per shard in large clusters.

The per-table option is a fraction, so for consistency, this should be
too.
2025-02-19 16:29:08 +01:00
Tomasz Grabiec
2b2fa0203e test: tablets_test: Test scaling when creating lots of tables 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
0e111990a1 test: tablets_test: Test tablet count changes on per-table option and config changes 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
5e471c6f1b test: tablets_test: Add support for auto-split mode
rebalance_tablets() was performing migrations and merges automatically
but not splits, because splits need to be acked by replicas via
load_stats. It's inconvenient in tests which want to rebalance to the
equilibrium point. This patch changes rebalance_tablets() to split
automatically by default, can be disabled for tests which expect
differently.

shared_load_stats was introduced to provide a stable holder of
load_stats which can be reused across rebalance_tablets() calls.
2025-02-19 16:29:08 +01:00
Tomasz Grabiec
f3b63bfeff test: cql_test_env: Expose db config 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
3d01ce3707 config: Make tablets_initial_scale_factor live-updateable 2025-02-19 16:29:08 +01:00
Tomasz Grabiec
7e4a61953d tablets: load_balancer: Pick initial_scale_factor from config
So that it can be live-updated.
2025-02-19 16:29:08 +01:00
Tomasz Grabiec
41789962ef tablets, load_balancer: Fix and improve logging of resize decisions
Resize is no longer only due to avg tablet size. Log avg tablet size as an
information, not the reason, and log the true reason for target tablet
count.
2025-02-19 16:29:07 +01:00
Tomasz Grabiec
d1ccbee7f9 tablets, load_balancer: Log reason for target tablet count
Helps in debugging.
2025-02-19 16:29:07 +01:00
Tomasz Grabiec
029505b179 tablets: load_balancer: Move hints processing to tablet scheduler
Hints have common meaning for all strategies, so the logic
belongs more to make_sizing_plan().

As a side effect, we can reuse shard capacity computation across
tables, which reduces computational complexity from O(tables*nodes) to
O(tables * DCs + nodes)
2025-02-19 16:29:07 +01:00
Tomasz Grabiec
f1bda8d4c1 tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal
The limit is enforced by controlling average per-shard tablet replica
count in a given DC, which is controlled by per-table tablet
count. This is effective in respecting the limit on individual shards
as long as tablet replicas are distributed evenly between shards.

There is no attempt to move tablets around in order to enforce limits
on individual shards in case of imbalance between shards.

If the average per-shard tablet count exceeds the limit, all tables
which contribute to it (have replicas in the DC) are scaled down
by the same factor. Due to rounding up to the nearest power of 2,
we may overshoot the per-shard goal by at most a factor of 2.

If different DCs want different scale factors of a given table, the
lowest scale factor is chosen for a given table.

The limit is configurable. It's a global per-cluster config which
controls how many tablet replicas per shard in total we consider to be
still ok. It controls tablet allocator behavior, when choosing initial
tablet count. Even though it's a per-node config, we don't support
different limits per node. All nodes must have the same value of that
config. It's similar in that regard to other scheduler config items
like tablets_initial_scale_factor and target_tablet_size_in_bytes.
2025-02-19 16:29:07 +01:00
Tomasz Grabiec
94b5165ac7 tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
This makes decisions made by the scheduler consistent with decisions
made on table creation, with regard to tablet count.

We want to avoid over-allocation of tablets when table is created,
which would then be reduced by the scheduler's scaling logic. Not just
to avoid wasteful migrations post table creation, but to respect the
per-shard goal. To respect the per-shard goal, the algorithm will no
longer be as simple as looking at hints, and we want to share the
algorithm between the scheduler and initial tablet allocator. So
invoke the scheduler to get the tablet count when table is created.
2025-02-19 14:40:07 +01:00
Tomasz Grabiec
dd68c1e526 tablets: load_balancer: Determine desired count from size separately from count from options
For debugging purposes. Later we will want to know which rule
determined the count.
2025-02-19 14:40:07 +01:00
Tomasz Grabiec
e4c5e2ab55 tablets: load_balancer: Determine resize decision from target tablet count
The flow is simpler this way, since the decision cannot now be
mismatched with target tablet count.
2025-02-19 14:40:07 +01:00
Tomasz Grabiec
35192e2d6f tablets: load_balancer: Allow splits even if table stats not available
This is in preparation for using the sizing plan during table creation
where we never have size stats, and hints are the only determining
factor for target tablet count.
2025-02-19 14:40:07 +01:00
Tomasz Grabiec
d3ffea77e6 tablets: load_balancer: Extract make_sizing_plan()
Resize plan making will now happen in two stages:

  1) Determine desired tablet counts per table (sizing plan)
  2) Schedule resize decisions

We need intermediate step in the resize plan making, which gives us
the planned tablet counts, so that we can plug this part of the
algorithm into initial tablet allocation on table construction.

We want decisisons made by the scheduler to be consistent with
decisions made on table creation. We want to avoid over-allocation of
tablets when table is created, which would then be reduced by the
scheduler. Not just to avoid wasteful migrations post table creation,
but to respect the per-shard goal. To respect the per-shard goal, the
algorithm will no longer be as simple as looking at hints, and we want
to share the algorithm between the scheduler and initial tablet
allocator.

Also, this sizing plan will be later plugged into a virtual table for
observability.
2025-02-19 14:40:06 +01:00
Tomasz Grabiec
33db0d4fea tablets: Add formatter for resize_decision::way_type 2025-02-19 14:39:40 +01:00
Tomasz Grabiec
b7e5919fdd tablets: load_balancer: Simplify resize_urgency_cmp()
Logic is preserved since target tablet size is constant for all
tables.

Dropping d.target_max_tablet_size() will allow us to move it
to the load_balancer scope.
2025-02-19 14:39:40 +01:00
Tomasz Grabiec
997007a2df tablets: load_balancer: Keep config items as instance members
It fits preexisting pattern for other config items, and makes the code
less cluttered because we don't have to carry config items across
calls.
2025-02-19 14:39:39 +01:00
Tomasz Grabiec
ce959818a3 locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology() 2025-02-19 14:38:50 +01:00
Tomasz Grabiec
f043c83ba5 tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard
Currently the scale is applied post rounding up of tablet count so
that tablet count per shard is at least 1. In order to be able to use
the scale to increase tablet count per shard, we need to apply it
prior to division by RF, otherwise we will overshoot per-shard tablet
replica count.

Example:

  4 nodes, -c1, rf=3, initial_tablets_scale=10

  Before: initial_tablet_count=20, tablet-per-shard=15
  After: initial_tablet_count=14, tablets-per-shard=10.5
2025-02-19 14:38:50 +01:00
Tomasz Grabiec
2463e524ed tablets: Set default initial tablet count scale to 10
This will result in new tables having at least 10 tablet replicas per
shard by default.

We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.

Fixes #21967

In some tests, we explicity set the initial scale to 1 as some of the
existing tests assume 1 compaction group per shard.

test.py uses a lower default. Having many tablets per shard slows down
certain topology operations like decommission/replace/removenode,
where the running time is proportional to tablet count, not data size,
because constant cost (latency) of migration dominates. This latency
is due to group0 operations and barriers. This is especially
pronounced in debug mode. Scheduler allows at most 2 migrations per
shard, so this latency becomes a determining factor for decommission
speed.

To avoid this problem in tests, we use lower default for tablet count per
shard, 2 in debug/dev mode and 4 in release mode. Alternatively, we
could compensate by allowing more concurrency when migrating small
tablets, but there's no infrastructure for that yet.

I observed that with 10 tablets per shard, debug-mode
topology_custom.mv/test_mv_topology_change starts to time-out during
removenode (30 s).
2025-02-19 14:38:50 +01:00
Tomasz Grabiec
8eedb551b5 tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology()
To insert preemption points later.
2025-02-19 14:38:49 +01:00
Tomasz Grabiec
eef18d879c tablets: load_balancer: Extract get_schema_and_rs()
For better readability.
2025-02-19 14:38:49 +01:00
Tomasz Grabiec
9d600dd783 tablets: load_balancer: Drop test_mode
tablets_test is now creating proper schema in the database, so
test_mode is no longer needed.
2025-02-19 14:38:48 +01:00
yangpeiyu2_yewu
0de232934a mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke
wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction.

Fixes #22790

Closes scylladb/scylladb#22812
2025-02-19 13:00:45 +02:00
Pavel Emelyanov
d79eec2e76 sstable: Unfriend sstable_directory class
It was only needed there for create_pending_deletion_log() method to get
private "_storage" from sstable. Now it's all gone and friendship can be
broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-19 13:09:04 +03:00
Pavel Emelyanov
96a867c869 sstable_directory: Move sstable_directory::pending_delete_result
... to where it belongs -- to the filesystem storage driver itself.
Continuation of the previous patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-19 13:09:04 +03:00
Pavel Emelyanov
f6de6d6887 sstable_directory: Calculate prefixes outside of create_pending_deletion_log()
The method in question walks the list of sstables and accumulates
sstables' prefixes into a set on pending_delete_result object. The set
in question is not used at all in this method and is in fact alien to it
-- the p.d._result object is used by the filesystem storage driver as
atomic deletion prepare/commit transparent context.

Said that, move the whole pending_delete_result to where it belongs and
relax the create_pending_deletion_log() to only return the log
directory path string.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-19 13:09:04 +03:00
Pavel Emelyanov
b0c1a77528 sstable_directory: Introduce local pending_delete_log variable
This is simply to reduce the churn in the next patch, nothing special
here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-19 13:09:04 +03:00
Pavel Emelyanov
5b92c4549e sstable_directory: Relax toc file dumping to deletion log
The current code takes sstable prefix() (e.g. the /foo/bar string), then
trims from its fron the basedir (e.g. the /foo/ string) and then writes
the remainder, a slash and TOC component name (e.g. the xxx-TOC.txt
string). The final result is "bar/xxx-TOC.txt" string.

The taking into account sstable.toc_filename() renders into
sstable.prefix + \slash + component-name, the above result can be
achieved by trimming basedir directory from toc_filename().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-19 13:09:04 +03:00
Botond Dénes
820f196a49 replica/database: setup_scylla_memory_diagnostics_producer() un-static semaphore dump lambda
The lambda which dumps the diagnostics for each semaphore, is static.
Considering that said lambda captures a local (writeln) by reference, this
is wrong on two levels:
* The writeln captured on the shard which happens to initialize this
  static, will be used on all shards.
* The writeln captured on the first dump, will be used on later dumps,
  possibly triggering a segfault.

Drop the `static` to make the lambda local and resolve this problem.

Fixes: scylladb/scylladb#22756

Closes scylladb/scylladb#22776
2025-02-19 12:22:16 +03:00
Nadav Har'El
a7bf36831c test: remove spammy deprecation warnings
Recently, when running Alternator tests we get hundreds of warnings like
the following from basically all test files:

    /usr/lib/python3.12/site-packages/botocore/crt/auth.py:59:
    DeprecationWarning: datetime.datetime.utcnow() is deprecated and
    scheduled for removal in a future version. Use timezone-aware objects
    to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

    /usr/local/lib/python3.12/site-packages/pytest_elk_reporter.py:299:
    DeprecationWarning: datetime.datetime.utcnow() is deprecated and
    scheduled for removal in a future version. Use timezone-aware objects
    to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

These warnings all come from two libraries that we use in the tests -
botocore is used by Alternator tests, and elk reporter is a plugin that
we don't actually use, but it is installed by dtest and we often see
it in our runs as well. These warnings have zero interest to us - not
only do we not care if botocore uses some deprecated Python APIs and
will need to be updated in the future, all these warnings are hiding
*real* warnings about deprecated things we actually use in our own
test code.

The patch modifies test/pytest.ini (used by all our Python tests,
including but not limited to Alternator tests) to ignore deprecation
warnings from *inside* these two libraries, botocore and elk_reporter.

After this patch, test/alternator/run finishes without any warnings
at all. test/cqlpy does still have a few warnings left, which earlier
were hidden by the thousands of spammy warning eliminated in this patch.

We fix one of these warnings in this patch:

    ResultSet indexing support will be removed in 4.0.
    Consider using ResultSet.one()

by doing exactly what the warning recommended.

Some deprecation warnings in test/cqlpy remain in calls to
get_query_trace(). The "blame" for these warning is misplaced - this
function is part of the cassandra driver, but Python seems to think it's
part of our test code so I can't avoid them with the pytest.ini trick,
I'm not sure why. So I don't know yet how to eliminate these last warnings.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22881
2025-02-19 12:15:51 +03:00
Avi Kivity
45b2026209 service: raft: drop unused dependency from group0_state_machine_merger.hh
Reduces dependency load.

Closes scylladb/scylladb#22781
2025-02-19 12:14:58 +03:00
Kefu Chai
d1f117620a build: restrict -Xclang options to Clang compiler only
Modify CMake configuration to only apply "-Xclang" options when building
with the Clang compiler. These options are Clang-specific and can cause
errors or warnings when used with other compilers like g++.

This change:
- Adds compiler detection to conditionally apply Clang-specific flags
- Prevents build failures when using non-Clang compilers

Previously, the build system would apply these flags universally, which
could lead to compilation errors with other compilers.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22899
2025-02-19 12:13:35 +03:00
Kefu Chai
d384b0a63e utils: use std::to_underlying() when appropriate
Use std::to_underlying() when comparing unsigned types with enumeration values
to fix type mismatch warnings in GCC-14. This specifically addresses an issue in
utils/advanced_rpc_compressor.hh where comparing a uint8_t with 0 triggered a
'-Werror=type-limits' warning:
```
    error: comparison is always false due to limited range of data type [-Werror=type-limits]
    if (x < 0 || x >= static_cast<underlying>(type::COUNT))
        ~~^~~
```
Using std::to_underlying() provides clearer type semantics and avoids these kind
of comparison warnings. This change improves code readability while maintaining
the same behavior.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22898
2025-02-19 12:12:28 +03:00
Benny Halevy
cc281ff88d test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
and return the keyspace unique name to the caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 09:35:33 +02:00
Aleksandra Martyniuk
f8e4198e72 service: tasks: hold token_metadata_ptr in tablet_virtual_task
Hold token_metadata_ptr in tablet_virtual_task methods that iterate
over tablets, to keep the tablet_map alive.

Fixes: https://github.com/scylladb/scylladb/issues/22316.

Closes scylladb/scylladb#22740
2025-02-19 09:33:53 +02:00
Dusan Malusev
4e6ea232d2 docs: add instruction for installing cassandra-stress
Signed-off-by: Dusan Malusev <dusan.malusev@scylladb.com>

Closes scylladb/scylladb#21723
2025-02-19 09:25:16 +02:00
Benny Halevy
cbe79b20f7 test/repair: create_table_insert_data_for_repair: create keyspace with unique name
and return it to the caller

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:56:07 +02:00
Benny Halevy
9829b1594f topology_tasks/test_tablet_tasks: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:52:59 +02:00
Benny Halevy
12f85ce57c topology_tasks/test_node_ops_tasks: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:52:59 +02:00
Benny Halevy
0564e95c51 topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:52:59 +02:00
Benny Halevy
46b1850f0c topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:52:59 +02:00
Benny Halevy
b810791fbb topology_custom/test_view_build_status: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:52:59 +02:00
Benny Halevy
2d4af01281 topology_custom/test_truncate_with_tablets: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:52:58 +02:00
Benny Halevy
16ef78075c topology_custom/test_topology_failure_recovery: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
96d327fb83 topology_custom/test_tablets_removenode: use create_new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
f30e4c6917 topology_custom/test_tablets_migration: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
20f7eda16e topology_custom/test_tablets_merge: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
5ff3153912 topology_custom/test_tablets_intranode: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
e59aca66bf topology_custom/test_tablets_cql: use new_test_keyspace
And create_new_test_keyspace when we need drop
to be explicit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
6b37d04aa9 topology_custom/test_tablets2: use *new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
0b88ea9798 topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
649e68c6db topology_custom/test_tablets: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
005ceb77d3 topology_custom/test_table_desc_read_barrier: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
50a8f5c1c0 topology_custom/test_shutdown_hang: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
4fd6c2d24e topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
72bc4016e7 topology_custom/test_rpc_compression: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
47326d01b7 topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
e72a9d3faa topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
Using the new_test_keyspace fixture is awkward for this test
as it is written to explicitly drop the created keyspaces
at certain points.

Therefore, just use create_new_test_keyspace to standardize the
creation procedure.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
3f35491264 topology_custom/test_raft_no_quorum: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
380c5e5ac8 topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
e05372afa4 topology_custom/test_query_rebounce: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
c68d2a471c topology_custom/test_not_enough_token_owners: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
5759a97eb4 topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
55b35eb21c topology_custom/test_node_isolation: use create_new_test_keyspace
new_test_keyspace is problematic here since
the presence of the banned node can fail the automatic drop of
the test keyspace due to NoHostAvailable (in debug mode for
some reason)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
ff9c8428df topology_custom/test_mv_topology_change: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
11005b10db topology_custom/test_mv_tablets_replace: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
966cf82dae topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
c05794c156 topology_custom/test_mv_tablets: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
d5e3c578f5 topology_custom/test_mv_read_concurrency: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
42a104038d topology_custom/test_mv_fail_building: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
629ee3cb46 topology_custom/test_mv_delete_partitions: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
a82e734110 topology_custom/test_mv_building: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
b13e48b648 topology_custom/test_mv_backlog: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
ef85c4b27e topology_custom/test_mv_admission_control: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
0e11aad9c5 topology_custom/test_major_compaction: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
0668c642a2 topology_custom/test_maintenance_mode: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
9c095b622b topology_custom/test_lwt_semaphore: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
c6653e65ba topology_custom/test_ip_mappings: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
fed078a38a topology_custom/test_hints: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
480a5837ab topology_custom/test_group0_schema_versioning: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
4fefffe335 topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
57faab9ffa topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
205ed113dd topology_custom/test_read_repair: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
fdb339bf28 topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
59687c25e0 topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
df84097a4b topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
a66ddb7c04 topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
0fd1b846fe test/topology/util: new_test_keyspace: drop keyspace only on success
When the test fails with exception, keep the keyspace
intact for post-mortem analysis.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
f946302369 test/topology/util: refactor new_test_keyspace
Define create_new_test_keyspace that can be used in
cases we cannot automatically drop the newly created keyspace
due to e.g. loss of raft majority at the end of the test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
5d448f721e test/topology/util: CREATE KEYSPACE IF NOT EXISTS
Workaround spurious keyspace creation errors
due to retries caused by
https://github.com/scylladb/python-driver/issues/317.
This is safe since the function uses a unique_name for the
keyspace so it should never exist by mistake.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:35 +02:00
Benny Halevy
50ce0aaf1c test/topology/util: new_test_keyspace: accept ManagerClient
Following patch will convert topology tests to use
new_test_keyspace and friends.

Some tests restart server and reset the driver connection
so we cannot use the original cql Session for
dropping the created keyspace in the `finally` block.

Pass the ManagerClient instead to get a new cql
session for dropping the keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-19 08:43:26 +02:00
Kefu Chai
727d5637ab cql3: remove redundant std::move() in select_statement.cc
GCC-14 correctly flagged unnecessary use of std::move() where copy elision applies:
```
    return std::move(paging_state_copy);
```
This error occurs in indexed_table_select_statement::generate_view_paging_state_from_base_query_results
at line 1122. The C++17 standard guarantees copy elision for returning local variables, making
std::move() redundant in this context and potentially hindering compiler optimizations.

Fixes build failure with GCC-14 which treats redundant moves as errors
with -Werror=redundant-move.

The error message looks like:
```
/usr/lib64/ccache/g++ -DDEVEL -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Dev/seastar/gen/include -isystem /home/kefu/dev/scylladb/abseil -I/usr/include/p11-kit-1  -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -Wno-changes-meaning -Wno-ignored-attributes -Wno-dangling-pointer -Wno-array-bounds -Wno-narrowing -Wno-type-limits -ffile-prefix-map=/home/kefu/dev/scylladb/= -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -ffile-prefix-map=/home/kefu/dev/scylladb/build/=build -march=westmere -Wstack-usage=21504 -std=gnu++23 -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -DSEASTAR_P2581R1 -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=19 -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DBOOST_PROGRAM_OPTIONS_NO_LIB -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_THREAD_NO_LIB -DBOOST_THREAD_DYN_LINK -DFMT_SHARED -MD -MT cql3/CMakeFiles/cql3.dir/Dev/statements/select_statement.cc.o -MF cql3/CMakeFiles/cql3.dir/Dev/statements/select_statement.cc.o.d -o cql3/CMakeFiles/cql3.dir/Dev/statements/select_statement.cc.o -c /home/kefu/dev/scylladb/cql3/statements/select_statement.cc
/home/kefu/dev/scylladb/cql3/statements/select_statement.cc: In member function ‘seastar::lw_shared_ptr<const service::pager::paging_state> cql3::statements::indexed_table_select_statement::generate_view_paging_state_from_base_query_results(seastar::lw_shared_ptr<const service::pager::paging_state>, const seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&, service::query_state&, const cql3::query_options&) const’:
/home/kefu/dev/scylladb/cql3/statements/select_statement.cc:1122:21: error: redundant move in return statement [-Werror=redundant-move]
 1122 |     return std::move(paging_state_copy);
      |            ~~~~~~~~~^~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22903
2025-02-18 21:12:58 +02:00
Tomasz Grabiec
22386a6ceb Merge 'truncate: don't fail on already waiting truncate for the same table' from Ferenc Szili
Currently, we can not have more than one global topology operation at the same time. This means that we can not have concurrent truncate operations because truncate is implemented as a global topology operation.

Truncate excludes with other topology operations, and has to wait for those to complete before truncate starts executing. This can lead to truncate timeouts. In these cases the client retries the truncate operation, which will check for ongoing global topology operations, and will fail with
an "Another global topology request is ongoing, please retry." error.

This can be avoided by truncate checking if the ongoing global topology operation is a truncate running for the same table who's truncate has just been requested again. In this case, we can wait for the ongoing truncate to complete instead of immediately failing the operation, and
provide a better user experience.

This is an improvement, backport is not needed.

Closes #22166

Closes scylladb/scylladb#22371

* github.com:scylladb/scylladb:
  test: add test for re-cycling ongoing truncate operations
  truncate: add additional logging and improve error message during truncate
  storage_proxy: wait on already running truncate for the same table
  storage_proxy: allow multiple truncate table fibers per shard
2025-02-18 15:54:00 +01:00
Lakshmi Narayanan Sreethar
0f7d08d41d topology_coordinator: handle_table_migration: do not continue after executing metadata barrier
Return after executing the global metadata barrier to allow the topology
handler to handle any transitions that might have started by a
concurrect transaction.

Fixes #22792

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#22793
2025-02-18 15:48:45 +01:00
Botond Dénes
2e062e4e10 docs/operating-scylla: document scylla-sstable query 2025-02-18 07:37:05 -05:00
Botond Dénes
ddab1b939b test/cqlpy/test_tools.py: add tests for scylla-sstable query 2025-02-18 07:37:05 -05:00
Artsiom Mishuta
3c3a23637a test.py: merge topology_tasks suite into topology_custom suite
Now that we support suite subfolders, there is no
need to create an own suite for tasks.
2025-02-18 13:15:31 +01:00
Artsiom Mishuta
dbdd0dd844 test.py: merge topology_random_failures suite into topology_customs
Now that we support suite subfolders, there is no
need to create an own suite for random_failures
2025-02-18 13:15:24 +01:00
Botond Dénes
3928851ab0 Merge 'encryption_at_rest_test/encryption: Add some verbosity etc to help diagnose test run issues' from Calle Wilund
Refs #22628

Adds exception handler + cleanup for the case where we have a bad config/env vars (hint minio) or similar, such that we fail with exception during setting up the EAR context. In a normal startup, this is ok. We will report the exception, and the do a exit(1).

In tests however, we don't and active context will instead be freed quite proper, in which case we need to call stop to ensure we don't crash on shared pointer destruction on wrong shard. Doing so will hide the real issue from whomever runs the test.

Adds some verbosity to track issues with the network proxy used to test EAR connector difficulties. Also adds an earlier close in input stream to help network usage.

Note: This is a diagnostic helper. Still cannot repro the issue above.

Closes scylladb/scylladb#22810

* github.com:scylladb/scylladb:
  gcp/aws kms: Promote service_error to recoverable + use malformed_response_error
  encryption_at_rest_test: Add verbosity + earlier stream close to proxy
  encryption: Add exception handler to context init (for tests)
2025-02-18 10:29:30 +02:00
Kefu Chai
9c5155fa63 compaction: switch from boost::accumulate to std::views::join
Replace boost::accumulate() with the standard library's alternatives
to reduce external dependencies and simplify the codebase. This
change eliminates the requirement for boost::range and makes the
implementation more maintainable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22856
2025-02-18 10:23:40 +02:00
Botond Dénes
aba4d07c62 tools/utils: configure_tool_mode: set auto_handle_sigint_sigterm = false
Disable seastar's built in handlers for SIGINT and SIGTERM and thus
fall-back to the OS's default handlers, which terminate the process.
This makes tool applications interruptable by SIGINT and SIGTERM.
The default handler just terminates the tool app immediately and
doesn't allow for cleanup, but this is fine: the tools have no important
data to save or any critical cleanup to do before exiting.

Fixes: scylladb/scylladb#16954

Closes scylladb/scylladb#22838
2025-02-17 23:28:18 +02:00
Avi Kivity
30a38e61d4 Merge 'sstables_manager: trigger reclaim/reload on components_memory_reclaim_threshold update' from Lakshmi Narayanan Sreethar
The config variable `components_memory_reclaim_threshold` limits the
memory available to the sstable bloom filters. Any change to its value
is not immediately propagated to the sstable manager, despite it being
a LiveUpdate variable. The updated value takes effect only when a new
sstable is created or deleted.

This PR first refactors the reclaim and reload logic into a single
background fiber. It then updates the sstable manager to subscribe to
changes in the `components_memory_reclaim_threshold` configuration value
and immediately triggers the reclaim/reload fiber when a change is
detected.

Fixes #21947

This is an improvement and does not need to be backported.

Closes scylladb/scylladb#22725

* github.com:scylladb/scylladb:
  sstables_manager: trigger reclaim/reload on `components_memory_reclaim_threshold` update
  sstables_manager: maybe_reclaim_components: yield between iterations
  sstables_manager: rename `increment_total_reclaimable_memory_and_maybe_reclaim()`
  sstables_manager: move reclaim logic into `components_reclaim_reload_fiber()`
  sstables_manager: rename `_sstable_deleted_event` condition variable
  sstables_manager: rename `components_reloader_fiber()`
  sstables_manager: fix `maybe_reclaim_components()` indentation
  sstables_manager: reclaim components memory until usage falls below threshold
  sstables_manager: introduce `get_components_memory_reclaim_threshold()`
  sstables_manager: extract `maybe_reclaim_components()`
  sstables_manager: fix `maybe_reload_components()` indentation
  sstables_manager: extract out `maybe_reload_components()`
2025-02-17 22:33:33 +02:00
Lakshmi Narayanan Sreethar
064bf2fd85 sstables_manager: trigger reclaim/reload on components_memory_reclaim_threshold update
The config variable `components_memory_reclaim_threshold` limits the
memory available to the sstable bloom filters. Any change to its value
is not immediately propagated to the sstable manager, despite it being
a LiveUpdate variable. The updated value takes effect only when a new
sstable is created or deleted.

This patch updates the sstable manager to subscribe to any changes in
the above mentioned config value and immediately trigger the
reclaim/reload fiber when a change occurs. Also, adds a testcase to
verify the fix.

Fixes #21947

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-17 20:55:45 +05:30
Calle Wilund
00263aa57a gcp/aws kms: Promote service_error to recoverable + use malformed_response_error
Refs #22628

Mark problems parsing response (partial message, network error without exception etc
- hello testing), as "malformed_response_error", and promote this as well as
general "service_error" to recoverable exceptions (don't isolate node on error).

This to better handle intermittent network issues as well as making error-testing
more deterministic.
2025-02-17 13:49:43 +00:00
Calle Wilund
5905c19ab4 encryption_at_rest_test: Add verbosity + earlier stream close to proxy
Refs #22628

Adds some verbosity to track issues with the network proxy used to test
EAR connector difficulties. Also adds an earlier close in input stream
to help network usage.

Note: This is a diagnostic helper. Still cannot repro the issue above.
2025-02-17 13:49:43 +00:00
Calle Wilund
83aa66da1a encryption: Add exception handler to context init (for tests)
Adds exception handler + cleanup for the case where we have a
bad config/env vars (hint minio) or similar, such that we fail
with exception during setting up the EAR context.
In a normal startup, this is ok. We will report the exception,
and the do a exit(1).

In tests however, we don't and active context will instead be
freed quite proper, in which case we need to call stop to ensure
we don't crash on shared pointer destruction on wrong shard.
Doing so will hide the real issue from whomever runs the test.
2025-02-17 13:49:42 +00:00
Piotr Dulikowski
35df6bb6b2 Merge 'raft_rpc::send_append_entries: limit memory usage' from Petr Gusev
Serializing `raft::append_request` for transmission requires approximately the same amount of memory as its size. This means when the Raft library replicates a log item to M servers, the log item is effectively copied M times. To prevent excessive memory usage and potential out-of-memory issues, we limit the total memory consumption of in-flight `raft::append_request` messages.

Fixes scylladb/scylladb#14411

Closes scylladb/scylladb#22835

* github.com:scylladb/scylladb:
  raft_rpc::send_append_entries: limit memory usage
  fms: extract entry_size to log_entry::get_size
2025-02-17 14:11:12 +01:00
Botond Dénes
a32b4d20cf test/cqlpy/test_tools.py: make scylla_sstable() return table name also
Not used by current users, will be needed by next patch.
2025-02-17 08:01:39 -05:00
Botond Dénes
5d09182ce5 scylla-sstable: introduce the query command
Allows querying the content of sstables. Simple queries can be
constructed on the command-line. More advanced queries can be passed in
a file. The output can be text (similar to CQLSH) or json (similar to
SELECT JSON).
Uses a cql_test_env behind the scenes to set-up a query pipeline. The
queried sstables are not registered into cql_test_env, instead they are
queried via the virtual-table interface. This is to isolate the sstables
from any accidental modifications cql_test_env might want to do to them.
2025-02-17 08:01:39 -05:00
Botond Dénes
5e76dd90a9 tools/utils: get_selected_operation(): use std::string for operation_options
tool_app_template::run() calls get_selected_operation() to obtain the
operation (command) the user selected. To do this,
get_selected_operation() does a CLI pre-parsing pass, with a minimal
boost::program_options, so things like mixed positional/non-positional
args are correctly handled. This code use `sstring` for generic
operation-options. The problem is that boost doesn't allow values with
spaces inside for non-std::string types. This therefore prevents such
values from being used for any option downstream, because parsing would
fail at this stage. Change the type to std::string to solve this
problem.
2025-02-17 08:01:39 -05:00
Botond Dénes
a6caade11d utils/rjson: streaming_writer: add RawValue()
Exposes the RawValue() method of the underlying rapidjson::Writer. This
method allows writing a pre-formatted json value to the stream. This
will allow using cql3/type_json.hh to pre-format CQL3 types, then write
these pre-formatted values into a json stream.
2025-02-17 08:01:38 -05:00
Botond Dénes
c917ee0638 cql3/type_json: add to_json_type()
Translate a CQL value of a CQL type into the appropriate rjson::type.
2025-02-17 08:01:38 -05:00
Botond Dénes
01a4d30d88 test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
This variant of do_with_cql_env(), forgoes the reentrancy support in the
regular do_with_cql_env() variants, and re-uses the caller's exsting
seastar thread. This is an optimized version for callers which don't
need reentrancy and already have a thread.
2025-02-17 08:01:38 -05:00
Piotr Dulikowski
da2237417c test/topology_experimental_raft: add test_topology_upgrade_stuck
The test simulates the cluster getting stuck during upgrade to raft
topology due to majority loss, and then verifies that it's possible
to get out of the situation by performing recovery and redoing the
upgrade.

Fixes: scylladb/scylladb#17410
2025-02-17 13:12:53 +01:00
Piotr Dulikowski
a2f5e6ab0a test.py: bump minimum python version to 3.11
Python 3.11 introduces asyncio.TaskGroup, which I would like to use in a
test that I'll introduce in the next commit. Modify the python version
check in test.py to prevent from accidentally running with an older
version of python.
2025-02-17 13:12:49 +01:00
Piotr Dulikowski
1b6fb95efc test.py: move gather_safely to pylib utils
The gather_safely function was originally defined in the
test.pylib.scylla_cluster module, but it is a generic concurrency
combinator which is not tied to the concept of Scylla clusters at all.
Move it to test.pylib.util to make this fact more clear.
2025-02-17 12:47:13 +01:00
Piotr Dulikowski
56ae119b19 cdc: generation: don't capture token metadata when retrying update
In legacy topology mode, on startup, a node will attempt to insert data
of the newest CDC generation into the legacy distributed tables. In case
of any errors, the operation will be retried until success in 60s
intervals. While the node waits for the operation to be retried, it
keeps a token_metadata_ptr instance. This is a problem for two reasons:

- The tmptr instance is used in a lambda which determines the cluster
  size. This lambda is used to determine the consistency level when
  inserting the generation to the distributed tables - if there is only
  one node, CL=ONE should be used instead of CL=QUORUM. The tmptr is
  immutable so it can technically happen the the cluster is shrinked
  while the code waits for the generation to be inserted.
- Token metadata instance keeps a version tracker that which prevents
  topology operations from proceeding while the tracker exists. This is
  a very niche problem, but it might happen that a leftover instance of
  token metadata held by update_streams_description might delay a
  topology operation which happens after upgrade to raft topology
  happens. This actually slows down the test which simulates upgrade to
  raft topology getting stuck (to be introduced in later commits).

Instead of capturing a token_metadata_ptr instance, capture a reference
to shared_token_metadata and use a freshly issued token_metadata_ptr
when computing the cluster size in order to choose the consistency
level.
2025-02-17 12:28:53 +01:00
Piotr Dulikowski
d75888460d test.py: topology: ignore hosts when waiting for group0 consistency
Now, check_system_topology_and_cdc_generations_v3_consistency has an
additional list argument and will ignore hosts from that list if some of
them are found to be in the "left" state. Additionally, the function now
requires that the set of the live hosts in the cluster is exactly
`live_hosts` - no more, no less.

It will be needed for the test which simulates upgrade procedure getting
stuck - "un-stucking" the procedure requires removing some nodes via
legacy removenode procedure which marks them as "left" in gossip, and
then those nodes might get inserted as "left" nodes into raft topology
by the gossiper orphan remover fiber.

Some of the existing tests had to be adjusted because of the changes:

- test_unpublished_cdc_generations_arent_cleared passed only one of the
  cluster's live hosts, now it passes all of them.
- test_topology_recovery_after_majority_loss removes some nodes during
  the test, so they need to be put into the ignore_nodes list.
- test_topology_upgrade_basic did not include the last-added node to the
  check_system_topology_and_cdc_generations_v3_consistency call, now it
  does.
2025-02-17 12:28:52 +01:00
Piotr Dulikowski
f112d76422 raft: add error injection that drops append_entries
It will be needed for a test that simulates the cluster getting stuck
during upgrade. Specifically, it will be used to simulate network
isolation and to prevent raft commands from reaching that node.
2025-02-17 12:28:52 +01:00
Piotr Dulikowski
cd1a336885 topology_coordinator: add injection which makes upgrade get stuck
The injection will necessary for the test, introduced in the next
commit, which verifies that it's possible to recover from an upgrade of
raft topology which gets stuck.
2025-02-17 12:28:52 +01:00
Kefu Chai
3cf0f71420 query-result-writer: reorder initialization to prevent use-after-move
Reorder member variable initialization sequence to ensure `pw` is accessed
before being moved. While the current use-after-move warning from clang-tidy
is a false positive, this change:
- Makes the initialization order more logical
- Eliminates misleading static analysis warnings
- Prevents potential future issues if class structure changes

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22830
2025-02-17 13:45:35 +03:00
Abhi
d7884cf651 storage_service: Remove the variable _manage_topology_change_kind_from_group0
This commit removes the variable _manage_topology_change_kind_from_group0
which was used earlier as a work around for correctly handling
topology_change_kind variable, it was brittle and had some bugs. Earlier commits
made some modifications to deal with handling topology_change_kind variable
post _manage_topology_change_kind_from_group0 removal
2025-02-17 15:19:39 +05:30
Abhi
623e01344b storage_service: fix indentation after the previous commit 2025-02-17 15:06:27 +05:30
Nadav Har'El
5693c18637 test/cqlpy, alternator: allow downloading 2025 releases
This patch adds to the fetch_scylla.py script, used by the "--release"
option of test/{cqlpy,alternator}/run, the ability to download the new
2025.1 releases.

In the new single-stream releases, the number looks like the old
Scylla Enterprise releases, but the location of the artifacts in the
S3 bucket look like the old open-source releases (without the word
"-enterprise" in the paths). So this patch introduces a new "if"
for the (major >= 2025) case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22778
2025-02-17 12:30:42 +03:00
Ferenc Szili
8f8c5c5e24 test: add test for re-cycling ongoing truncate operations
This change adds a test for truncate waiting for already queued
truncate operation for the same table.
2025-02-17 10:18:29 +01:00
Ferenc Szili
af3fb1941a truncate: add additional logging and improve error message during truncate
This change adds two log messages. One for the creation of the truncate
global topology request, and another for the truncate timeout. This is
added in order to help with tracking truncate operation events.

It also extends the "Another global topology request is ongoing, please
retry." error message with more information: keyspace and table name.
2025-02-17 10:18:29 +01:00
Ferenc Szili
e87768c5a0 storage_proxy: wait on already running truncate for the same table
Currently, we can not have more than one global topology operation at
the same time. This means that we can not have concurrent truncate
operations because truncate is implemented as a global topology
operation.

Truncate excludes with other topology operations, and has to wait for
those to complete before truncate starts executing. This can lead to
truncate timeouts. In these cases the client retries the truncate operation,
which will check for ongoing global topology operations, and will fail with
an "Another global topology request is ongoing, please retry." error.

This can be avoided by truncate checking if we have a truncate for the same
table already queued. In this case, we can wait for the ongoing truncate to
complete instead of immediatelly failing the operation, and provide a better
user experience.
2025-02-17 10:18:20 +01:00
Piotr Dulikowski
e4d574fdbb Merge 'Fix view-builder vs (repair and streaming) initialization order' from Pavel Emelyanov
Both, repair and streaming depend on view builder, but since the builder is started too late, both keep sharded<> reference on it and apply `if (view_builder.local_is_initialized())` safety checks.

However, view builder can do its sharded start much earlier, there's currently nothing that prevents it from doing so. This PR moves view builder start up together with some other of its dependencies, and relaxes the way repair and streaming use their view-builder references, in particular -- removes those ugly initialization checks.

refs: scylladb/scylladb#2737

Closes scylladb/scylladb#22676

* github.com:scylladb/scylladb:
  streaming: Relax streaming::make_streamig_consumer() view builder arg
  streaming: Keep non-sharded view_builder dependency reference
  streaming: Remove view_builder.local_is_initialized() checks
  repair: Keep non-sharded view_builder dependency reference
  repair: Remove view_builder.local_is_initialized() checks
  main: Start sharded<view_builder> earlier
  test/cql_env: Move stream manager start lower
2025-02-17 10:03:28 +01:00
Kefu Chai
2ed465e70a install.sh: address shellcheck warnings
Replace legacy shell test operator (-o) with more portable OR (||) syntax.
Fix fragile file handling in find loop by using while read loop instead.

Warnings fixed:
- SC2166: Replace [ p -o q ] with [ p ] || [ q ]
- SC2044: Replace for loop over find with while read loop

While no issues were observed with the current code, these changes improve
robustness and portability across different shell environments.

also, set the pipefail option, so that we can catch the unexpected
failure of `find` command call.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22385
2025-02-17 12:01:51 +03:00
Pavel Emelyanov
ac989f7c30 api: Remove get_uuid() local helper
This helper now fully duplicates the validate_table() one, so it
can be removed. Two callers are updated respectively.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-17 11:42:33 +03:00
Pavel Emelyanov
a4cbc4db55 api: Make use of validate_table()'s table_id
There are several places that validate_table() and then call
database::find_column_family(ks, cf) which goes and repeats the search
done by validate_table() before that.

To remove the unneeded work, re-use the table_id found by
validate_table() helper.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-17 11:42:33 +03:00
Pavel Emelyanov
e698259557 api: Make validate_table() helper return table_id after validation
This helper calls database::find_column_family() and ignores the result.
The intention of this is just to check if the c.f. in question exists.

The find_column_family() in turn calls find_uuid() and then finds the
c.f. object using the uuid found. The latter search is not supposed to
fail, if it does, the on_internal_error() is called.

Said that, replacing find_column_family() with find_uuid() is
idempotent. And returning the found table_id will be used by next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-17 11:42:32 +03:00
Pavel Emelyanov
1991512826 api: Change validate_table()'s ctx argument to database
This is to be in-sync with another get_uuid() helper from API. This, in
turn, is to ease the unification of those two, because they are
effectively identical (see next patches)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-17 11:42:32 +03:00
Botond Dénes
b87f5a0b58 reader_concurrency_semaphore: remove reduntant inactive_read::ttl_timer
It is redundant with reader_permit::impl::_ttl_timer. Use the latter for
TTL of inactive reads too. The usage of the two exclude each other, at
any point in time, either one or the other is used, so no reason to keep
both.

Closes scylladb/scylladb#22863
2025-02-17 11:41:16 +03:00
Botond Dénes
15126e4c9f reader_concurrency_semaphore: use std::ranges::for_each()
Instead of boost::for_each().

Closes scylladb/scylladb#22862
2025-02-17 11:35:32 +03:00
Avi Kivity
b7f804659b clustering_range_walker: drop boost iterator_range dependency
Reduces dependency load.

Closes scylladb/scylladb#22880
2025-02-17 11:34:46 +03:00
Avi Kivity
03ae67f9ea tablets: load_balancer: don't log decisions to do nothing
Demote do-nothing decisions to debug level, but keep them at info
if we did decide to do nothing (such as migrate a tablet). Information
about more major events (like split/merge) is kept at info level.

Once log line that logs node information now also logs the datacenter,
which was previously supplied by a log line that is now debug-only.

Closes scylladb/scylladb#22783
2025-02-17 11:34:27 +03:00
Botond Dénes
3439d015cb Merge 'repair: Introduce Host and DC filter support' from Aleksandra Martyniuk
Currently, the tablet repair scheduler repairs all replicas of a tablet. It does not support hosts or DCs selection. It should be enough for most cases. However, users might still want to limit the repair to certain hosts or DCs in production. https://github.com/scylladb/scylladb/pull/21985 added the preparation work to add the config options for the selection. This patch adds the hosts or DCs selection support.

Fixes https://github.com/scylladb/scylladb/issues/22417

New feature. No backport is needed.

Closes scylladb/scylladb#22621

* github.com:scylladb/scylladb:
  test: add test to check dcs and hosts repair filter
  test: add repair dc selection to test_tablet_metadata_persistence
  repair: Introduce Host and DC filter support
  docs: locator: update the docs and formatter of tablet_task_info
2025-02-17 10:04:09 +02:00
Kefu Chai
aa8c27b872 db: prevent accidental copies of result_set_row by making it move-only
result_set_row is a heavyweight object containing multiple cell types:
regular columns, partition keys, and static values. To prevent expensive
accidental copies, delete the copy constructor and replace it with:

1. A move constructor for efficient vector reallocation
2. An explicit copy() method when copies are actually needed

This change reduces overhead in some non-hot paths by eliminating implicit
deep copies. Please note, previously, in `create_view_from_mutation()`,
we kept a copy of `result_set_row`, and then reused `table_rs` for
holding the mutation for `scylla_tables`. Because we don't copy
the `result_set_row` in this change, in order to avoid invalidating
the `row` after reusing `table_rs` in the outer scope, we define a
new `table_rs` shadowing the one in the out scope.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22741
2025-02-17 09:48:08 +02:00
Botond Dénes
57a06a4c35 Merge 'Enhance s3 client perf test with "uploading" facility and related tunables' from Pavel Emelyanov
The existing test measures latencies of object GET-s. That's nice (though incomplete), but we want to measure upload performance. Here it is.

refs: #22460

Closes scylladb/scylladb#22480

* github.com:scylladb/scylladb:
  test/perf/s3: Add --part-size-mb option for upload test
  test/perf/s3: Add uploading test
  test/perf/s3: Some renames not to be download-centric
  test/perf/s3: Make object/file name configurable
  test/perf/s3: Configure maximum number of sockets
  test/perf/s3: Remove parallelizm
  s3/client: Make http client connections limit configurable
2025-02-17 09:46:11 +02:00
Avi Kivity
81821d26cd cql3: functions: add set_intersection()
Given two sets of equivalent types, return the set
intersection.

This is a generic function which adapts to the actual
input type.

A unit test is added.

Closes scylladb/scylladb#22763
2025-02-16 14:06:29 +02:00
Nadav Har'El
4a2654865d Merge 'test.py: suport subfolders' from Artsiom Mishuta
this PR is propper(pythonic) chance of commit 288a47f815

Creating an own folder used to be needed for two reasons:

we want a separate test suite, with its own settings
we want to structure tests, e.g. tablets, raft, schema, gossip.
We've been creating many folders recently. However, test suite
infrastructure is expensive in test.py - each suite has its own
pool of servers, concurrency settings and so on.

Make it possible to structure tests without too many suites,
by supporting subfolders within a suite.

As an example, this PR move mv tests into a separate folder

custom test.py lookup also works.
tests can be run as:

1. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets/test_mv_tablets_empty_ip
2. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets
3. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv

Fixes https://github.com/scylladb/scylladb/issues/20570

Closes scylladb/scylladb#22816

* github.com:scylladb/scylladb:
  test.py: move mv tests into a separate folder
  test.py: suport subfolders
2025-02-16 12:36:25 +02:00
Andrei Chekun
17992c0456 Remove tox
Seems tox is not used anywhere, so there is no need to have it then.
Especially when it messes with pytest. In some cases it can change the
config dir in pytest run.

Closes scylladb/scylladb#22819
2025-02-16 12:23:55 +02:00
Kefu Chai
34517b09a2 alternator,streaming: fix comment typos
Fix misspellings in comments identified by the codespell tool.
fix typos in comment

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22829
2025-02-16 11:34:44 +02:00
Piotr Szymaniak
c1f186c98a alternator: re-enabling/changing existing stream's StreamViewType as well as disabling the nonexistent stream
Table updates that try to enable stream (while changing or not the
StreamViewType) on a table that already has the stream enabled
will result in ValidationError.

Table updates that try to disable stream on a table that does not
have the stream enabled will result in ValidationError.

Add two tests to verify the above.

Mark the test for changing the existing stream's StreamViewType
not to xfail.

Fixes scylladb/scylladb#6939

Closes scylladb/scylladb#22827
2025-02-16 09:57:49 +02:00
Jenkins Promoter
0d5f5e6c9d Update pgo profiles - x86_64 2025-02-15 20:32:23 +02:00
Jenkins Promoter
9daf50d424 Update pgo profiles - aarch64 2025-02-15 20:32:22 +02:00
Lakshmi Narayanan Sreethar
a145a2f83a scylla-gdb: scylla_read_stats: access schema via schema_ptr class
Switch to using schema_ptr wrapper when handling schema references in
scylla_read_stats function. The existing fallback for older versions
(where schema is already a raw pointer) remains preserved.

Fixes #18700

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#22726
2025-02-15 20:32:22 +02:00
Calle Wilund
342df0b1a8 network_topology_strategy/alter ks: Remove dc:s from options once rf=0
Fixes #22688

If we set a dc rf to zero, the options map will still retain a dc=0 entry.
If this dc is decommissioned, any further alters of keyspace will fail,
because the union of new/old options will now contained an unknown keyword.

Change alter ks options processing to simply remove any dc with rf=0 on
alter, and treat this as an implicit dc=0 in nw-topo strategy.
This means we change the reallocate_tablets routine to not rely on
the strategy objects dc mapping, but the full replica topology info
for dc:s to consider for reallocation. Since we verify the input
on attribute processing, the amount of rf/tablets moved should still
be legal.

v2:
* Update docs as well.
v3:
* Simplify dc processing
* Reintroduce options empty check, but do early in ks_prop_defs
* Clean up unit test some

Closes scylladb/scylladb#22693
2025-02-15 20:32:22 +02:00
Nadav Har'El
f89235517d test/topology_custom: fix very slow test test_localnodes_broadcast_rpc_address
The test
topology_custom/test_alternator::test_localnodes_broadcast_rpc_address
sets up nodes with a silly "broadcast rpc address" and checks that
Alternator's "/localnodes" requests returns it correctly.

The problem is that although we don't use CQL in this test, the test
framework does open a CQL connection when the test starts, and closes
it when it ends. It turns out that when we set a silly "broadcast RPC
address", the driver tends to try to connect to it when shutting down,
I'm not even sure why. But the choice of the silly address was 1.2.3.4
is unfortunate, because this IP address is actually routable - and
the driver hangs until it times out (in practice, in a bit over two
minutes). This trivial patch changes 1.2.3.4 to 127.0.0.0 - and equally
silly address but one to which connections fail immediately.

Before this patch, the test often takes more than 2 minutes to finish
on my laptop, after this patch, it always finishes in 4-5 seconds.

Fixes #22744

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22746
2025-02-15 20:32:22 +02:00
Botond Dénes
87e8e00de6 tools/scylla-nodetool: netstats: don't assume both senders and receivers
The code currently assumes that a session has both sender and receiver
streams, but it is possible to have just one or the other.
Change the test to include this scenario and remove this assumption from
the code.

Fixes: #22770

Closes scylladb/scylladb#22771
2025-02-15 20:32:22 +02:00
Pavel Emelyanov
1b44861e8f Merge 'sstable_loader: fix cross-shard resource cleanup in download_task_impl ' from Kefu Chai
This PR addresses two related issues in our task system:

1. Prepares for asynchronous resource cleanup by converting release_resources() to a coroutine. This refactoring enables future improvements in how we handle resource cleanup.

2. Fixes a cross-shard resource cleanup issue in the SSTable loader where destruction of per-shard progress elements could trigger "shared_ptr accessed on non-owner cpu" errors in multi-shard environments. The fix uses coroutines to ensure resources are released on their owner shards.

Fixes #22759

---

this change addresses a regression introduced by d815d7013c, which is contained by 2025.1 and master branches. so it should be backported to 2025.1 branch.

Closes scylladb/scylladb#22791

* github.com:scylladb/scylladb:
  sstable_loader: fix cross-shard resource cleanup in download_task_impl
  tasks: make release_resources() a coroutine
2025-02-15 20:32:22 +02:00
Kefu Chai
7ff0d7ba98 tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22857
2025-02-15 20:32:22 +02:00
Raphael S. Carvalho
d78f57e94a service: Don't use new tablet_resize_finalization state until supported
In a rolling upgrade, nodes that weren't upgraded yet will not recognize
the new tablet_resize_finalization state, that serves both split and
merges, leading to a crash. To fix that, coordinator will pick the
old tablet_split_finalization state for serving split finalization,
until the cluster agrees on merge, so it can start using the new
generic state for resize finalization introduced in merge series.
Regression was introduced in e00798f.

Fixes #22840.

Reported-by: Tomasz Grabiec <tgrabiec@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22845
2025-02-15 20:32:22 +02:00
Li Bo
de8de50fb9 Remove redundant code in mutation_partition.cc
Use the defined `cdef` variable.

Closes scylladb/scylladb#22048
2025-02-15 20:32:22 +02:00
Nadav Har'El
26fa234f87 test/cqlpy,alternator: "--release" should not require AWS credentials
The script fetch_scylla.py is used by the "--release" option of
test/cqlpy/run and test/alternator/run to fetch a given release of
Scylla. The release is fetched from S3, and the script assumed that the
user properly set up $HOME/.aws/config and $HOME/.aws/credentials
to determine the source of that download and the credentials to do this.

But this is unnecessary - Scylla's "downloads.scylladb.com" bucket
actually allows **anonymous** downloads, and this is what we should use.

After this patch, fetch_scylla.py (and the "--release" option of the
run scripts) work correctly even for a user that doesn't have $HOME/.aws
set up at all.

This fix is especially important to new developers, who might not even
have AWS credentials to put into these files.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22773
2025-02-15 20:32:22 +02:00
Pavel Emelyanov
2970567b3a streaming: Relax streaming::make_streamig_consumer() view builder arg
Two callers of it -- repair and stream-manager -- both have non-sharded
reference and can just use it as argument. The helper in question gets
sharded<> one by itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:26:56 +03:00
Pavel Emelyanov
1140a875e1 streaming: Keep non-sharded view_builder dependency reference
Continuation of the previous path -- view builder is started early
enough and construction of stream manager can happen with non-sharded
reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:26:56 +03:00
Pavel Emelyanov
3cb9758bd1 streaming: Remove view_builder.local_is_initialized() checks
Now stream_manager starts with sharded<view_builder> started and this
check can be dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:26:56 +03:00
Pavel Emelyanov
7bd3d31ac6 repair: Keep non-sharded view_builder dependency reference
Continuation of the previous path -- view builder is started early
enough and construction of repair service can happen with non-sharded
reference on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:26:56 +03:00
Pavel Emelyanov
423abc918c repair: Remove view_builder.local_is_initialized() checks
Now repair service starts with sharded<view_builder> started and those
checks can be dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:26:55 +03:00
Pavel Emelyanov
5d1f74b86a main: Start sharded<view_builder> earlier
The view_builder service is needed by repair service, but is started
after it. It's OK in a sense that repair service holds a sharded
reference on it and checks whether local_is_initialized() before using
it, which is not nice.

Fortunately, starting sharded view buidler can be done early enough,
because most of its dependencies would be already started by that time.
Two exceptions are -- view_update_generator and
system_distributed_keyspace. Both can be moved up too with the same
justification.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:26:55 +03:00
Pavel Emelyanov
f650e75137 test/cql_env: Move stream manager start lower
This is to keep it in-sync with main code, where stream manager is
started after storage_proxy's and query_processor's remotes. This
doesn't change nothing for now, but next patches will move other
services around main/cql_test_env and early start of stream manager in
cql_test_env will be problematic.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 20:25:20 +03:00
Lakshmi Narayanan Sreethar
10fffcd646 sstables_manager: maybe_reclaim_components: yield between iterations
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
77107ddaa3 sstables_manager: rename increment_total_reclaimable_memory_and_maybe_reclaim()
Renamed the aboved mentioned method to `increment_total_reclaimable_memory()`
as it doesn't directly reclaim memory anymore.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
7f0f839d6d sstables_manager: move reclaim logic into components_reclaim_reload_fiber()
Move the sstable reclaim logic into `components_reclaim_reload_fiber()`
in preparation for the fix for #21947. This also simplifies the overall
reclaim/reload logic by preventing multiple fibers from attempting to
reclaim/reload component memory concurrently.

Also, update the existing test cases to adapt to this change.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
f73b6abcc7 sstables_manager: rename _sstable_deleted_event condition variable
Rename the `_sstable_deleted_event` condition variable to
`_components_memory_change_event` as it will be used by future patches
to signal changes in sstable component memory consumption, (i.e.)
during sstable create and delete, and also when the
`components_memory_reclaim_threshold` config value is changed.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
35a4de3eeb sstables_manager: rename components_reloader_fiber()
A future patch will move components reclaim logic into the current
`components_reloader_fiber()`, so to reflect its new purpose, rename it
to `components_reclaim_reload_fiber()`.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
4d396b9578 sstables_manager: fix maybe_reclaim_components() indentation
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
f53fd40ff0 sstables_manager: reclaim components memory until usage falls below threshold
The current implementation reclaims memory from SSTables only when a new
SSTable is created. An upcoming patch will move this reclaim logic into
the existing component reloader fiber. To support this change, the
`maybe_reclaim_components()` method is updated to reclaim memory until
the total memory consumption falls below the configured threshold.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
30184ead79 sstables_manager: introduce get_components_memory_reclaim_threshold()
Introduce `get_components_memory_reclaim_threshold()`, which returns
the components' memory threshold based on the total available memory.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
4d12ae433a sstables_manager: extract maybe_reclaim_components()
Extract the code from
`increment_total_reclaimable_memory_and_maybe_reclaim()` that reclaims
the components memory into `maybe_reclaim_components()`. The extracted
new method will be used by a following patch to handle reclaim within
the components reload fiber.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:04 +05:30
Lakshmi Narayanan Sreethar
59cbee6fc7 sstables_manager: fix maybe_reload_components() indentation
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:03 +05:30
Lakshmi Narayanan Sreethar
ce2aa15d19 sstables_manager: extract out maybe_reload_components()
Extract the logic that reloads reclaimed components into memory in the
`components_reloader_fiber()` method into a separate method. This is in
preparation for moving the reclaim logic into the same fiber.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-02-14 22:11:03 +05:30
Pavel Emelyanov
8f61d26007 test/perf/s3: Add --part-size-mb option for upload test
Test now uses default internal part size, but for performance
comparisons its good to make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:26 +03:00
Pavel Emelyanov
6211b39f4b test/perf/s3: Add uploading test
The test picks up a file and uploads it into the bucket, then prints the
time it took and uploading speed. For now it's enough, with existing S3
latencies more timing details can be obtained by turning on trace
logging on s3 logger.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:26 +03:00
Pavel Emelyanov
0919a70ac8 test/perf/s3: Some renames not to be download-centric
Now this test is all about reading objects. Rename some bits in it so
that they can be re-used by future uploading test as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:26 +03:00
Pavel Emelyanov
24c194dcf3 test/perf/s3: Make object/file name configurable
Now the download test first creates a temporary object and then reads
data from it. It's good to have an option to download pre-existing file.
This option will also be used for uploading test (next patches)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Pavel Emelyanov
6b27642a79 test/perf/s3: Configure maximum number of sockets
Add the --sockets NR option that limits the number of sockets the
underlying http client is configured to have.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Pavel Emelyanov
230d4d7c5e test/perf/s3: Remove parallelizm
The test spawns several fibers that read the same file in parallel.
There's not much point in it, just makes the code harder to maintain.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Pavel Emelyanov
b52d1a3d99 s3/client: Make http client connections limit configurable
It's now calculated based on sched group shares, but for tests explicit
value is needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Aleksandra Martyniuk
e499f7c971 test: add test to check dcs and hosts repair filter 2025-02-14 13:46:44 +01:00
Ferenc Szili
d598750b2d storage_proxy: allow multiple truncate table fibers per shard
In order to allow concurrent truncate table operations (for the time being,
only for a single table) we have to remove the limitation allowing only one
truncate table fiber per shard.

This change adds the ability to collect the active truncate fibers in
storage_proxy::remote into std::list<> instead of having just a single
truncate fiber. These fibers are waited for completion during
storage_proxy::remote::stop().
2025-02-14 12:35:31 +01:00
Abhinav Jha
e491950c47 raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
In the current scenario, topology_change_kind variable, was been handled using
 _manage_topology_change_kind_from_group0 variable. This method was brittle
and had some bugs(e.g. for restart case, it led to a time gap between group0
server start and topology_change_kind being managed via group0)

Post _manage_topology_change_kind_from_group0 removal, careful management of
topology_change_kind variable was needed for maintaining correct
topology_change_kind in all scenarios. So this PR also performs a refactoring
to populate all init data to system tables even before group0 creation(via
raft_initialize_discovery_leader function). Now because raft_initialize_discovery_leader
happens before the group 0 creation, we write mutations directly to system
tables instead of a group 0 command. Hence, post group0 creation, the node
can read the correct values from system tables and correct values are
maintained throughout.

Added a new function initialize_done_topology_upgrade_state which takes
care of updating the correct upgrade state to system tables before starting
group0 server. This ensures that the node can read the correct values from
system tables and correct values are maintained throughout.

By moving raft_initialize_discovery_leader logic to happen before starting
group0 server, and not as group0 command post server start, we also get rid
of the potential problem of init group0 command not being the 1st command on
the server. Hence ensuring full integrity as expected by programmer.

Fixes: scylladb/scylladb#21114
2025-02-14 16:56:17 +05:30
Aleksandra Martyniuk
1c8a41e2dd test: add repair dc selection to test_tablet_metadata_persistence 2025-02-14 09:13:11 +01:00
Asias He
5545289bfa repair: Introduce Host and DC filter support
Currently, the tablet repair scheduler repairs all replicas of a tablet.
It does not support hosts or DCs selection. It should be enough for most
cases. However, users might still want to limit the repair to certain
hosts or DCs in production. #21985 added the preparation work to add the
config options for the selection. This patch adds the hosts or DCs
selection support.

Fixes #22417
2025-02-14 09:13:11 +01:00
Aleksandra Martyniuk
4c75701756 docs: locator: update the docs and formatter of tablet_task_info 2025-02-14 09:13:11 +01:00
Kefu Chai
b448fea260 sstable_loader: fix cross-shard resource cleanup in download_task_impl
Previously, download_task_impl's destructor would destroy per-shard progress
elements on whatever shard the task was destroyed on. In multi-shard
environments, this caused "shared_ptr accessed on non-owner cpu" errors when
attempting to free memory allocated on a different shard.

Fix by:
- Convert progress_per_shard into a sharded service
- Stop the service on owner shards during cleanup using coroutines
- Add operator+= to stream_progress to leverage seastar's built-in adder
  instead of a custom adder struct

Alternative approaches considered:

1. Using foreign_ptr: Rejected as it would require interface changes
   that complicate stream delegation. foreign_ptr manages the underlying
   pointee with another smart pointer but does not expose the smart
   pointer instance in its APIs, making it impossible to use
   shared_ptr<stream_progress> in the interface.
2. Using vector<stream_progress>: Rejected for similar interface
   compatibility reasons.

This solution maintains the existing interfaces while ensuring proper
cross-shard cleanup.

Fixes scylladb/scylladb#22759
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-14 11:13:58 +08:00
Kefu Chai
4c1f1baab4 tasks: make release_resources() a coroutine
Convert tasks::task_manager::task::impl::release_resources() to a coroutine
to prepare for upcoming changes that will implement asynchronous resource
release.

This is a preparatory refactoring that enables future coroutine-based
implementation of resource cleanup logic.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-14 11:13:58 +08:00
Michał Chojnowski
294b839e34 test_rpc_compression.py: fix an overly-short timeout
The timeout of 10 seconds is too small for CI.
I didn't mean to make it so short, it was an accident.

Fix that by changing the timeout to 10 minutes.

Fixes scylladb/scylladb#22832

Closes scylladb/scylladb#22836
2025-02-13 17:49:39 +01:00
Gleb Natapov
d288d79d78 api: initialize token metadata API after starting the gossiper
Token metadata API now depend on gossiper to do ip to host id mappings,
so initialized it after the gossiper is initialized and de-initialized
it before gossiper is stopped.

Fixes: scylladb/scylladb#22743

Closes scylladb/scylladb#22760
2025-02-13 14:39:05 +01:00
Takuya ASADA
b5e306047f dist: fix upgrade error from 2024.1
We need to allow replacing nodetool from scylla-enterprise-tools < 2024.2,
just like we did for scylla-tools < 5.5.
This is required to make packages able to upgrade from 2024.1.

Fixes #22820

Closes scylladb/scylladb#22821
2025-02-13 12:36:24 +02:00
Botond Dénes
c57492bd73 Update tools/java submodule
* tools/java 807e991d...4f1353ba (1):
  > dist: support smooth upgrade from enterprise to source availalbe

Refs scylladb/scylladb#22820
2025-02-13 12:32:07 +02:00
Petr Gusev
12cc84f8a9 raft_rpc::send_append_entries: limit memory usage
Serializing raft::append_request for transmission requires approximately
the same amount of memory as its size. This means when the Raft
library replicates a log item to M servers, the log item is
effectively copied M times. To prevent excessive memory usage
and potential out-of-memory issues, we limit the total memory
consumption of in-flight raft::append_request messages.

Fixes [scylladb/scylladb#14411](https://github.com/scylladb/scylladb/issues/14411)
2025-02-13 10:29:09 +01:00
Nadav Har'El
e6dcb605cb Merge 'Fix typos' from Dmitriy Rokhfeld (TripleChecker)
Hey, our tool caught a few typos in your repository.

Also, here is your site's error report: https://triplechecker.com/s/Dza11H/scylladb.com

Hope it's helpful!

Closes scylladb/scylladb#22787

* github.com:scylladb/scylladb:
  Fix typos
  Fix typos
2025-02-13 11:14:29 +02:00
TripleChecker
8d64be94e2 Fix typos 2025-02-13 01:54:08 +02:00
Wojciech Mitros
86838a147d test: skip test_complex_null_values in uf_typest_test
test_complex_null_values is currently flaky, causing many failures
in CI. The reason for the failures is unclear, and a fix might not
be simple, so because UDFs are experimental, for now let's skip
this test until the corresponding issue is fixed.

Refs scylladb/scylladb#22799

Closes scylladb/scylladb#22818
2025-02-12 21:37:34 +01:00
Andrei Chekun
54c165c94c test: Skip test_raft_voters because of existing issue
https://github.com/scylladb/scylladb/issues/18793

Closes scylladb/scylladb#22710
2025-02-12 16:41:17 +03:00
Petr Gusev
043291a2b4 fms: extract entry_size to log_entry::get_size
We intend to reuse it in subsequent commit.
2025-02-12 14:33:41 +01:00
Anna Stuchlik
b860b2109f doc: add a warning for admins launching ScyllaDB on Azure
Fixes scylladb/scylladb#22686
Refs scylladb/scylladb#22505

Closes scylladb/scylladb#22687
2025-02-12 14:27:19 +01:00
Tomasz Grabiec
d8ea780244 Merge 'scylla-gdb.py: introduce scylla tablet-metadata command' from Botond Dénes
Dumps the content of the tablet metadata. Very useful for debugging tablet related problems.

Example output:
```
(gdb) scylla tablet-metadata --table usertable_no_lwt

This node: host_id: b90662a9-98b1-4452-bc45-44d460ecab62, shard: 0

table alternator_usertable_no_lwt.usertable_no_lwt: id: 68316fa0-78ec-11ef-af10-98d4ab71aac4, tablets: 32, resize decision: merge#1, transitions: 0
  tablet#0: last token: -8646911284551352321, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#0, 84d0cb45-1c6c-4870-b727-03db3130641f#0, b933959e-8134-4ba0-8c44-33dbd51170e9#0]
  tablet#1: last token: -8070450532247928833, replicas: [fb0167dc-7a7d-476d-b4a5-4a55a52dadff#0, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#0, ac2fdd20-2f54-4960-9856-27fd07ed38ef#0]
  tablet#2: last token: -7493989779944505345, replicas: [fb0167dc-7a7d-476d-b4a5-4a55a52dadff#1, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#1, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#1]
  tablet#3: last token: -6917529027641081857, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#1, b933959e-8134-4ba0-8c44-33dbd51170e9#1, 84d0cb45-1c6c-4870-b727-03db3130641f#1]
  tablet#4: last token: -6341068275337658369, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#2, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#2, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#2]
  tablet#5: last token: -5764607523034234881, replicas: [4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#2, b933959e-8134-4ba0-8c44-33dbd51170e9#2, 84d0cb45-1c6c-4870-b727-03db3130641f#2]
  tablet#6: last token: -5188146770730811393, replicas: [84d0cb45-1c6c-4870-b727-03db3130641f#3, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#3, ac2fdd20-2f54-4960-9856-27fd07ed38ef#3]
  tablet#7: last token: -4611686018427387905, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#3, b933959e-8134-4ba0-8c44-33dbd51170e9#3, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#3]
  tablet#8: last token: -4035225266123964417, replicas: [b933959e-8134-4ba0-8c44-33dbd51170e9#4, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#4, ac2fdd20-2f54-4960-9856-27fd07ed38ef#4]
  tablet#9: last token: -3458764513820540929, replicas: [4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#4, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#4, 84d0cb45-1c6c-4870-b727-03db3130641f#4]
  tablet#10: last token: -2882303761517117441, replicas: [84d0cb45-1c6c-4870-b727-03db3130641f#5, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#5, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#5]
  tablet#11: last token: -2305843009213693953, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#5, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#5, b933959e-8134-4ba0-8c44-33dbd51170e9#5]
  tablet#12: last token: -1729382256910270465, replicas: [b933959e-8134-4ba0-8c44-33dbd51170e9#6, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#6, 84d0cb45-1c6c-4870-b727-03db3130641f#6]
  tablet#13: last token: -1152921504606846977, replicas: [4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#6, ac2fdd20-2f54-4960-9856-27fd07ed38ef#6, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#6]
  tablet#14: last token: -576460752303423489, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#7, 84d0cb45-1c6c-4870-b727-03db3130641f#7, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#7]
  tablet#15: last token: -1, replicas: [b933959e-8134-4ba0-8c44-33dbd51170e9#7, ac2fdd20-2f54-4960-9856-27fd07ed38ef#7, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#7]
  tablet#16: last token: 576460752303423487, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#8, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#8, 84d0cb45-1c6c-4870-b727-03db3130641f#8]
  tablet#17: last token: 1152921504606846975, replicas: [b933959e-8134-4ba0-8c44-33dbd51170e9#8, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#8, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#8]
  tablet#18: last token: 1729382256910270463, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#9, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#9, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#9]
  tablet#19: last token: 2305843009213693951, replicas: [84d0cb45-1c6c-4870-b727-03db3130641f#9, ac2fdd20-2f54-4960-9856-27fd07ed38ef#9, b933959e-8134-4ba0-8c44-33dbd51170e9#9]
  tablet#20: last token: 2882303761517117439, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#10, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#10, b933959e-8134-4ba0-8c44-33dbd51170e9#10]
  tablet#21: last token: 3458764513820540927, replicas: [84d0cb45-1c6c-4870-b727-03db3130641f#10, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#10, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#10]
  tablet#22: last token: 4035225266123964415, replicas: [4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#11, 84d0cb45-1c6c-4870-b727-03db3130641f#11, b933959e-8134-4ba0-8c44-33dbd51170e9#11]
  tablet#23: last token: 4611686018427387903, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#11, ac2fdd20-2f54-4960-9856-27fd07ed38ef#11, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#11]
  tablet#24: last token: 5188146770730811391, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#12, 84d0cb45-1c6c-4870-b727-03db3130641f#12, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#12]
  tablet#25: last token: 5764607523034234879, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#12, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#12, b933959e-8134-4ba0-8c44-33dbd51170e9#12]
  tablet#26: last token: 6341068275337658367, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#13, b933959e-8134-4ba0-8c44-33dbd51170e9#13, 84d0cb45-1c6c-4870-b727-03db3130641f#13]
  tablet#27: last token: 6917529027641081855, replicas: [ac2fdd20-2f54-4960-9856-27fd07ed38ef#13, fb0167dc-7a7d-476d-b4a5-4a55a52dadff#13, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#13]
  tablet#28: last token: 7493989779944505343, replicas: [b5ddcd7e-45ed-4f20-8841-353bd82cc04c#0, b933959e-8134-4ba0-8c44-33dbd51170e9#0, ac2fdd20-2f54-4960-9856-27fd07ed38ef#0]
  tablet#29: last token: 8070450532247928831, replicas: [fb0167dc-7a7d-476d-b4a5-4a55a52dadff#0, 84d0cb45-1c6c-4870-b727-03db3130641f#0, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#0]
  tablet#30: last token: 8646911284551352319, replicas: [fb0167dc-7a7d-476d-b4a5-4a55a52dadff#1, ac2fdd20-2f54-4960-9856-27fd07ed38ef#1, b5ddcd7e-45ed-4f20-8841-353bd82cc04c#1]
  tablet#31: last token: 9223372036854775807, replicas: [b933959e-8134-4ba0-8c44-33dbd51170e9#1, 4b1e8a42-e8b3-432e-bf7c-b0f7a10eb3cd#1, 84d0cb45-1c6c-4870-b727-03db3130641f#1]
```

The PR includes two marginally related small fixes too.

Improvement, no backport needed.

Closes scylladb/scylladb#20940

* github.com:scylladb/scylladb:
  scylla-gdb.py: add scylla tablet-metadata command
  scylla-gdb.py: register the scylla table command
  scylla-gdb.py: unordered_map: improve flat_hash_map matching
2025-02-12 13:27:36 +01:00
Andrei Chekun
9540e056a4 test: Add the possibility to run raft tests with pytest
Closes scylladb/scylladb#22775
2025-02-12 14:10:19 +02:00
Artsiom Mishuta
b36d586d80 test.py: move mv tests into a separate folder
Now that we support suite subfolders,
As an example, this commit move mv tests into a separate folder

custom test.py lookup also works.
tests can be run as:

1. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets/test_mv_tablets_empty_ip
2. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets
3. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv
2025-02-12 12:27:26 +01:00
Artsiom Mishuta
5ca025a8c1 test.py: suport subfolders
Creating an own folder used to be needed for two reasons:
- we want a separate test suite, with its own settings
- we want to structure tests, e.g. tablets, raft, schema, gossip.

We've been creating many folders recently. However, test suite
infrastructure is expensive in test.py - each suite has its own
pool of servers, concurrency settings and so on.

Make it possible to structure tests without too many suites,
by supporting subfolders within a suite.

Fixes #20570
2025-02-12 11:46:06 +01:00
Botond Dénes
7150442f6a service/storage_proxy: schedule_repair(): materialize the range into a vector
Said method passes down its `diff` input to `mutate_internal()`, after
some std::ranges massaging. Said massaging is destructive -- it moves
items from the diff. If the output range is iterated-over multiple
times, only the first time will see the actual output, further
iterations will get an empty range.
When trace-level logging is enabled, this is exactly what happens:
`mutate_internal()` iterates over the range multiple times, first to log
its content, then to pass it down the stack. This ends up resulting in
a range with moved-from elements being pased down and consequently write
handlers being created with nullopt mutations.

Make the range re-entrant by materializing it into a vector before
passing it to `mutate_internal()`.

Fixes: scylladb/scylladb#21907
Fixes: scylladb/scylladb#21714

Closes scylladb/scylladb#21910
2025-02-12 12:38:47 +02:00
Kefu Chai
6e1fb2c74e build: limit ThinLTO link parallelism to prevent OOM in release builds
When building Scylla with ThinLTO enabled (default with Clang), the linker
spawns threads equal to the number of CPU cores during linking. This high
parallelism can cause out-of-memory (OOM) issues in CI environments,
potentially freezing the build host or triggering the OOM killer.

In this change:

1. Rename `LINK_MEM_PER_JOB` to `Scylla_RAM_PER_LINK_JOB` and make it
   user-configurable
2. Add `Scylla_PARALLEL_LINK_JOBS` option to directly control concurrent
   link jobs (useful for hosts with large RAM)
3. Increase the default value of `Scylla_PARALLEL_LINK_JOBS` to 16 GiB
   when LTO is enabled
4. Default to 2 parallel link jobs when LTO is enabled if the calculated
   number if less than 2 for faster build.

Notes:
- Host memory is shared across job pools, so pool separation alone doesn't help
- Ninja lacks per-job memory quota support
- Only affects link parallelism in LTO-enabled builds

See
https://clang.llvm.org/docs/ThinLTO.html#controlling-backend-parallelism

Fixes scylladb/scylladb#22275

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22383
2025-02-12 10:24:13 +02:00
Alexander Turetskiy
3ac533251a allow "UTC" and "GMT" in string format of timestamp
fix problem with statements like:
INSERT INTO tbl (pk, time) VALUES (1, '2016-09-27 16:10:00 UTC');

fixes #20501

Closes scylladb/scylladb#22426
2025-02-12 09:38:28 +02:00
Alexander Turetskiy
47011ab830 Materialized view name length should be limited
Oversized materialized view and index names are rejected;
Materialized view names with invalid symbols are rejected.

fixes: #20755

Closes scylladb/scylladb#21746
2025-02-11 22:16:09 +02:00
Avi Kivity
5c647408c7 systemd: map libraries close to the executable
The Intel Optimizaton Manual states that branches with relative offsets
greater than 2GB suffer a penalty. They cite a 6% improvement when this
is avoided. Our code doesn't rely heavily on dynamically linked
libraries, so I don't expect a similar win, but it's still better to do
it than not.

Eliminate long branches by asking the dynamic linker to restrict itself
to the lower 4GB of the address space. I saw that it maps libraries
at 1GB+ addresses, so this satisfies the limitation.

Fix is from the Intel Optimization Manual as well.

This change was ported from ScyllaDB Enterprise.

Closes scylladb/scylladb#22498
2025-02-11 22:16:09 +02:00
Avi Kivity
de3b2c827f service: topology coordinator: demote log message about refreshing stats
This repeats every minute and isn't very interesting. Demote to debug
to reduce log clutter.

Closes scylladb/scylladb#22784
2025-02-11 22:16:09 +02:00
Botond Dénes
f808f84a45 db/config: improve description of repair_multishard_reader_enable_read_ahead
The current description has a typo and in general not informative enough
on when this option should be used.

Closes scylladb/scylladb#21758
2025-02-11 22:16:09 +02:00
Botond Dénes
be5c28e149 scylla-gdb.py: add scylla tablet-metadata command
Dumps the content of the tablet-metadata. Very useful for debugging
tablet-replated problems.
2025-02-11 07:29:46 -05:00
Botond Dénes
23db82b957 scylla-gdb.py: register the scylla table command
This command exists but is not registered. There is a test for it, but
it happens to work only because scylla table is a prefix of scylla
tables (another command), so gdb invokes that other command instead.
2025-02-11 07:29:46 -05:00
Botond Dénes
3ec8ef90fe scylla-gdb.py: unordered_map: improve flat_hash_map matching
Strip typedefs from the type before matching.
2025-02-11 07:29:30 -05:00
Avi Kivity
5adaf0a605 Merge 'tree: migrate from boost::remove_if() to the standard library based alternatives' from Kefu Chai
Replace boost::remove_if() with the standard library's std::erase_if() or std::ranges::remove_if() to reduce external dependencies and simplify the codebase. This change eliminates the requirement for boost::range and makes the implementation more maintainable.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#22788

* github.com:scylladb/scylladb:
  service: migrate from boost::range::remove_if() to std::ranges::remove_if
  sstable: migrate from boost::remove_if() to std::erase_if()
2025-02-11 14:07:48 +02:00
Kefu Chai
481397317d sstables, test: migrate from boost::copy() to std::ranges::copy()
Replace boost::copy() with the standard library's std::ranges::copy()
to reduce external dependencies and simplify the codebase. This change
eliminates the requirement for boost::range and makes the implementation
more maintainable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22789
2025-02-11 14:55:25 +03:00
Asias He
fb318d0c81 repair: Add await_completion option for tablet_repair api
Set true to wait for the repair to complete. Set false to skip waiting
for the repair to complete. When the option is not provided, it defaults
to false.

It is useful for management tool that wants the api to be async.

Fixes #22418

Closes scylladb/scylladb#22436
2025-02-11 12:49:12 +02:00
Avi Kivity
770dc37f0f tools: toolchain: prepare: fix optimized_clang archive printout
prepare helpfully prints out the path where optimized clang is stored,
but a couple of typos mean it prints out an empty string. Fix that.

Closes scylladb/scylladb#22714
2025-02-11 11:50:01 +02:00
Nadav Har'El
1842d456a1 test/cqlpy: fix some false failures on Cassandra
Developers are expected to run new cqlpy tests against Cassandra - to
verify that the new test itself is correct. Usually there is no need
to run the entire cqlpy test suite against Cassandra, but when users do
this, it isn't confidence-inspiring to see hundreds of tests failing.
In this patch I fix many but not all of these failures.

Refs #11690 (which will remain open until we fix all the failures on
Cassandra)

* Fixed the "compact_storage" fixture recently introduced to enable the
  deprecated feature in Scylla for the tests. This fixture was broken on
  Cassandra and caused all compact-storage related tests to fail
  on Cassandra.

* Marked all tests in test_tombstone_limit.py as scylla_only - as they
  check the Scylla-only query_tombstone_page_limit configuration option.

* Marked all tests in test_service_level_api.py as scylla_only - as they
  check the Scylla-only service levels feature.

* Marked a test specific to the Scylla-only IncrementalCompactionStrategy
  as scylla_only. Some tests mix STCS and ICS testing in one test - this
  is a mistake and isn't fixed in this patch.

* Various tests in test_tablets.py forgot to use skip_without_tablets
  to skip them on Cassandra or older Scylla that doesn't have the
  tablets feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

x

Closes scylladb/scylladb#22561
2025-02-11 11:48:40 +02:00
Botond Dénes
4a7a75dfcb Merge 'tasks: use host_id in task manager' from Aleksandra Martyniuk
Use host_id in a children list of a task in task manager to indicate
a node on which the child was created.

Move TASKS_CHILDREN_REQUEST to IDL. Send it by host_id.

Fixes: https://github.com/scylladb/scylladb/issues/22284.

Ip to host_id transition; backport isn't needed.

Closes scylladb/scylladb#22487

* github.com:scylladb/scylladb:
  tasks: drop task_manager::config::broadcast_address as it's unused
  tasks: replace ip with host_id in task_identity
  api: task_manager: pass gossiper to api::set_task_manager
  tasks: keep host_id in task_manager
  tasks: move tasks_get_children to IDL
2025-02-11 11:32:27 +02:00
Patryk Jędrzejczak
7b8344faa8 Merge 'Fix a regression that sometimes causes an internal error and demote barrier_and_drain rpc error log to a warning ' from Gleb Natapov
The series fixes a regression and demotes a barrier_and_drain logging error to a warning since this particular condition may happen during normal operation.

We want to backport both since one is a bug fix and another is trivial and reduces CI flakiness.

Closes scylladb/scylladb#22650

* https://github.com/scylladb/scylladb:
  topology_coordinator: demote barrier_and_drain rpc failure to warning
  topology_coordinator: read peers table only once during topology state application
2025-02-11 10:25:35 +01:00
Pavel Emelyanov
529ff3efa5 Merge 'Alternator: implement UpdateTable operation to add or delete GSI' from Nadav Har'El
In this series we implement the UpdateTable operation to add a GSI to an existing table, or remove a GSI from a table. As the individual commit messages will explained, this required changing how Alternator stores materialized view keys - instead of insisting that these key must be real columns (that is **not** the case when adding a GSI to an existing table), the materialized view can now take as its key any Alternator attribute serialized inside the ":attrs" map holding all non-key attributes. Fixes #11567.

We also fix the IndexStatus and Backfilling attributes returned by DescribeTable - as DynamoDB API users use this API to discover when a newly added GSI completed its "backfilling" (what we call "view building") stage. Fixes #11471.

This series should not be backported lightly - it's a new feature and required fairly large and intrusive changes that can introduce bugs to use cases that don't even use Alternator or its UpdateTable operations - every user of CQL materialized views or secondary indexes, as well as Alternator GSI or LSI, will use modified code. **It should be backported to 2025.1**, though - this version was actually branched long after this PR was sent, and it provides a feature that was promised for 2025.1.

Closes scylladb/scylladb#21989

* github.com:scylladb/scylladb:
  alternator: fix view build on oversized GSI key attribute
  mv: clean up do_delete_old_entry
  test/alternator: unflake test for IndexStatus
  test/alternator: work around unrelated bug causing test flakiness
  docs/alternator: adding a GSI is no longer an unimplemented feature
  test/alternator: remove xfail from all tests for issue 11567
  alternator: overhaul implementation of GSIs and support UpdateTable
  mv: support regular_column_transformation key columns in view
  alternator: add new materialized-view computed column for item in map
  build: in cmake build, schema needs alternator
  build: build tests with Alternator
  alternator: add function serialized_value_if_type()
  mv: introduce regular_column_transformation, a new type of computed column
  alternator: add IndexStatus/Backfilling in DescribeTable
  alternator: add "LimitExceededException" error type
  docs/alternator: document two more unimplemented Alternator features
2025-02-11 10:02:01 +03:00
Kefu Chai
a18069fad7 service: migrate from boost::range::remove_if() to std::ranges::remove_if
Replace boost::range::remove_if() with the standard library's
std::ranges::remove_if() to reduce external dependencies and simplify
the codebase. This change eliminates the requirement for boost::range
and makes the implementation more maintainable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-11 09:15:14 +08:00
Kefu Chai
ba724a26f4 sstable: migrate from boost::remove_if() to std::erase_if()
Replace boost::remove_if() with the standard library's std::erase_if()
to reduce external dependencies and simplify the codebase. This change
eliminates the requirement for boost::range and makes the implementation
more maintainable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-11 09:15:14 +08:00
TripleChecker
e72e6fadeb Fix typos 2025-02-11 00:17:43 +02:00
Avi Kivity
6a1ee32cc3 Merge 'raft/group0_state_machine: load current RPC compression dict on startup' from Michał Chojnowski
We are supposed to be loading the most recent RPC compression dictionary
on startup, but we forgot to port the relevant piece of logic during
the source-available port. This causes a restarted node not to use the
dictionary for RPC compression until the next dictionary update.

Fix that.

Fixes scylladb/scylladb#22738

This is more of a bugfix than an improvement, so it should be backported to 2025.1.

Closes scylladb/scylladb#22739

* github.com:scylladb/scylladb:
  test_rpc_compression.py: test the dictionaries are loaded on startup
  raft/group0_state_machine: load current RPC compression dict on startup
2025-02-10 20:40:33 +02:00
Dawid Mędrek
cd50152522 service/mapreduce_service: Cancel query when stopping
Before these changes, shutting down a node could be prolonged because of
mapreduce_service. `mapreduce_service::stop()` uninitializes messaging
service, which includes waiting for all ongoing RPC handlers. We already
had a mechanism for cancelling local mapreduce tasks, but we were missing
one for cancelling external queries.

In this commit, we modify the signature of the request so it supports
cancelling via an abort source. We also provide a reproducer test
for the problem.

Fixes scylladb/scylladb#22337

Closes scylladb/scylladb#22651
2025-02-10 20:12:59 +02:00
Asias He
6f04de3efd streaming: Fail stream plan on stream_mutation_fragments handler in case of error
The following is observed in pytest:

1) node1, stream master, tried to pull data from node3

2) node3, stream follower, found node1 restarted

3) node3 killed the rpc stream

4) node1 did not get the stream session failure message from node3. This
failure message was supposed to kill the stream plan on node1. That's the
reason node1 failed the stream session much later at "2024-08-19 21:07:45,539".
Note, node3 failed the stream on its side, so it should have sent the stream
session failure message.

```
$ cat node1.log |grep f890bea0-5e68-11ef-99ae-e5bca04385fc
INFO  2024-08-19 20:24:01,162 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers={127.0.34.3}, master
ERROR 2024-08-19 20:24:01,190 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=127.0.34.3: seastar::nested_exception: seastar::rpc::stream_closed (rpc stream was closed by peer) (while cleaning up after seastar::rpc::stream_closed (rpc stream was closed by peer))
WARN  2024-08-19 21:07:45,539 [shard 0:main] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.3}, tx=0 KiB, 0.00 KiB/s, rx=484 KiB, 0.18 KiB/s

$ cat node3.log |grep f890bea0-5e68-11ef-99ae-e5bca04385fc
INFO  2024-08-19 20:24:01,163 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers=127.0.34.1, slave
INFO  2024-08-19 20:24:01,164 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Start sending ks=ks, cf=cf, estimated_partitions=2560, with new rpc streaming
WARN  2024-08-19 20:24:01,187 [shard 0: gms] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.1}, tx=633 KiB, 26506.81 KiB/s, rx=0 KiB, 0.00 KiB/s
WARN  2024-08-19 20:24:01,188 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] stream_transfer_task: Fail to send to 127.0.34.1:0: seastar::rpc::stream_closed (rpc stream was closed by peer)
WARN  2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to send: seastar::rpc::stream_closed (rpc stream was closed by peer)
WARN  2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming error occurred, peer=127.0.34.1
```

To be safe in case the stream fail message is not received, node1 could fail
the stream plan as soon as the rpc stream is aborted in the
stream_mutation_fragments handler.

Fixes #20227

Closes scylladb/scylladb#21960
2025-02-10 16:32:12 +01:00
Avi Kivity
cf72c31617 treewide: improve bash error reporting
bash error handling and reporting is atrocious. Without -e it will
just ignore errors. With -e it will stop on errors, but not report
where the error happened (apart from exiting itself with an error code).

Improve that with the `trap ERR` command. Note that this won't be invoked
on intentional error exit with `exit 1`.

We apply this on every bash script that contains -e or that it appears
trivial to set it in. Non-trivial scripts without -e are left unmodified,
since they might intentionally invoke failing scripts.

Closes scylladb/scylladb#22747
2025-02-10 18:28:52 +03:00
Pavel Emelyanov
81f7a6d97d doc: Update system.sstables table schema description
The partition key had been renamed and its type changed some time ago,
but the doc wasn't updated. Fix it.

refs: #20998

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22683
2025-02-10 16:09:49 +02:00
Botond Dénes
51a273401c Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec
This PR converts boost load balancer tests in preparation for load balancer changes
which add per-table tablet hints. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.

Topology is created in tests via group0 commands, which is abstracted by
the new `topology_builder` class.

Tests cannot modify token_metadata only in memory now as it needs to be
consistent with the schema and on-disk metadata. That's why modifications to
tablet metadata are now made under group0 guard and save back metadata to disk.

Closes scylladb/scylladb#22648

* github.com:scylladb/scylladb:
  test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario
  tests: tablets: Set initial tablets to 1 to exit growing mode
  test: tablets_test: Create proper schema in load balancer tests
  test: lib: Introduce topology_builder
  test: cql_test_env: Expose topology_state_machine
  topology_state_machine: Introduce lock transition
2025-02-10 16:08:41 +02:00
Avi Kivity
c212f5a296 db/config: forward-declare boost options_description_easy_init
Reduces large dependency pull from boost.

Closes scylladb/scylladb#22748
2025-02-10 15:08:11 +02:00
Nikita Kurashkin
025bb379a4 cql: remove expansion of "SELECT *" in DESC MATERIALIZED VIEW
This patch removes expansion of "SELECT *" in DESC MATERIALIZED VIEW.
Instead of explicitly printing each column, DESC command will now just
use SELECT *, if view was created with it. Also, adds a correspodning test.
Fixes #21154

Closes scylladb/scylladb#21962
2025-02-10 15:01:23 +02:00
Kefu Chai
c6bf9d8d11 sstables: switch from boost to std::ranges::all_of()
Replace boost::algorithm::all_of_equal() to std::ranges::all_of()

In order to reduce the header dependency to boost ranges library, let's
use the utility from the standard library when appropriate.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22730
2025-02-10 15:44:55 +03:00
Kefu Chai
09a090e410 ent/encryption: Replace manual string suffix checks with ends_with()
Replace manual string suffix comparison (length check + std::equal) with
std::string::ends_with() introduced in C++20 for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22764
2025-02-10 15:42:39 +03:00
Avi Kivity
d4c531307d replica: database.hh: drop dependency on boost ranges
Reduces dependency load.

Closes scylladb/scylladb#22749
2025-02-10 13:29:55 +01:00
Michael Litvak
c098e9a327 test/test_view_build_status: fix flaky asserts
In few test cases of test_view_build_status we create a view, wait for
it and then query the view_build_status table and expect it to have all
rows for each node and view.

But it may fail because it could happen that the wait_for_view query and
the following queries are done on different nodes, and some of the nodes
didn't apply all the table updates yet, so they have missing rows.

To fix it, we change the assert to work in the eventual consistency
sense, retrying until the number of rows is as expectd.

Fixes scylladb/scylladb#22644

Closes scylladb/scylladb#22654
2025-02-10 12:41:42 +01:00
Kefu Chai
ca832dc4fb .github: Make "make-pr-ready-for-review" workflow run in base repo
The "make-pr-ready-for-review" workflow was failing with an "Input
required and not supplied: token" error.  This was due to GitHub Actions
security restrictions preventing access to the token when the workflow
is triggered in a fork:
```
    Error: Input required and not supplied: token
```

This commit addresses the issue by:

- Running the workflow in the base repository instead of the fork. This
  grants the workflow access to the required token with write permissions.
- Simplifying the workflow by using a job-level `if` condition to
  controlexecution, as recommended in the GitHub Actions documentation
  (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution).
  This is cleaner than conditional steps.
- Removing the repository checkout step, as the source code is not required for this workflow.

This change resolves the token error and ensures the
"make-pr-ready-for-review" workflow functions correctly.

Fixes scylladb/scylladb#22765

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22766
2025-02-10 12:56:39 +02:00
Abhi
4748125a48 service/raft: Refactor mutation writing helper functions.
We use these changes in following commit.
2025-02-10 14:48:25 +05:30
Evgeniy Naydanov
06793978c1 test.py: new Python dependencies for dtest->test.py migration
3rd-party library which provide compatibility between sync and async code:

  universalasync

Few deps from scylla-dtest:

  deepdiff
  cryptography
  boto3-stubs[dynamodb]

[avi: regenerate frozen toolchain with optimized clang from

  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz
]

Closes scylladb/scylladb#22497
2025-02-10 10:52:27 +02:00
Kefu Chai
0185aa458b build: cmake: remove trailing comma in db/CMakeLists.txt source list
In c5668d99, a new source file row_cache.cc was added to the `db` target,
but with an extraneous trailing comma. In CMake's target_sources(),
source files should be space-separated - any comma is interpreted as
part of the filename, causing build failures like:
```
  CMake Error at db/CMakeLists.txt:2 (target_sources):
    Cannot find source file:
      row_cache.cc,
```
Fix the issue by removing the trailing comma.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22754
2025-02-09 17:28:47 +02:00
Nadav Har'El
a492e239e3 Merge 'test.py: Add the possibility to run boost and unit tests with pytest ' from Andrei Chekun
Add the possibility to run boost and unit tests with pytest

test.py should follow the next paradigm - the ability to run all test cases sequentially by ONE pytest command.
With this paradigm, to have the better performance, we can split this 1 command into 2,3,4,5,100,200... whatever we want

It's a new functionality that does not touch test.py way of executing the boost and unit tests.
It supports the main features of test.py way of execution: automatic discovery of modes, repeats.
There is an additional requirement to execute tests in parallel: pytest-xdist. To install it, execute `pip install pytest-xdist`

To run test with pytest execute `pytest test/boost`. To execute only one file, provide the path filename `pytest test/boost/aggregate_fcts_test.cc` since it's a normal path, autocompletion will work on the terminal. To provide a specific mode, use the next parameter `--mode dev`, if parameter will not be provided pytest will try to use `ninja mode_list` to find out the compiled modes.
Parallel execution controlled by pyest-xdist and the parameter `-n 12`.
The useful command to discover the tests in the file or directory is `pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc`. That will return all test functions in the file. To execute only one function from the test, you can invoke the output from the previous command, but suffix for mode should be skipped, for example output will be `test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev`, so to execute this specific test function, please use the next command `pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg`
There is a parameter `--repeat` that used to repeat the test case several times in the same way as test.py did.
It's not possible to run both boost and unit tests directories with one command, so we need to provide explicitly which directory should be executed. Like this `pytest --mode dev test/unit` or `pytest --mode dev test/boost`

Fixes: https://github.com/scylladb/qa-tasks/issues/1775

Closes scylladb/scylladb#21108

* github.com:scylladb/scylladb:
  test.py: Add possibility to run ldap tests from pytest
  test.py: Add the possibility to run unit tests from pytest
  test.py: Add the possibility to run boost test from pytest
  test.py: Add discovery for C++ tests for pytest
  test.py: Modify s3 server mock
  test.py: Add method to get environment variables from MinIO wrapper
  test.py: Move get configured modes to common lib
2025-02-09 11:56:24 +01:00
Yaron Kaikov
93f53f4eb8 dist: support smooth upgrade from enterprise to source availalbe
When upgrading for example from `2024.1` to `2025.1` the package name is
not identical casuing the upgrade command to fail:
```
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade scylla -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"'
Exit code: 100
Stdout:
Selecting previously unselected package scylla.
Preparing to unpack .../6-scylla_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb ...
Unpacking scylla (2025.1.0~dev-0.20250118.1ef2d9d07692-1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-JbOMav/0-scylla-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/1-scylla-python3_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/2-scylla-server_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/3-scylla-kernel-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/4-scylla-node-exporter_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/5-scylla-cqlsh_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
Stderr:
E: Sub-process /usr/bin/dpkg returned an error code (1)
```

Adding `Obsoletes` (for rpm) and `Replaces` (for deb)

Fixes: https://github.com/scylladb/scylladb/issues/22420

Closes scylladb/scylladb#22457
2025-02-08 21:56:09 +02:00
Botond Dénes
be23ebf20f Update tools/python3 submodule
* tools/python3 8415caf4...3e0b8932 (2):
  > reloc: collect package files correctly if the package has an optional dependency
  > dist: support smooth upgrade from enterprise to source availalbe

Closes scylladb/scylladb#22517
2025-02-08 21:54:42 +02:00
Avi Kivity
9712390336 Merge 'Add per-table tablet options in schema' from Benny Halevy
This series extends the table schema with per-table tablet options.
The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions,
when the table size changes.

* New feature, no backport required

Closes scylladb/scylladb#22090

* github.com:scylladb/scylladb:
  tablets: resize_decision: get rid of initial_decision
  tablet_allocator: consider tablet options for resize decision
  tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member
  network_topology_strategy: allocate_tablets_for_new_table: consider tablet options
  network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner
  network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0
  cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0
  cql3/create_keyspace_statement: add deprecation warning for initial tablets
  test: cqlpy: test_tablets: add tests for per-table tablet options
  schema: add per-table tablet options
  feature_service: add TABLET_OPTIONS cluster schema feature
2025-02-08 20:32:19 +02:00
Avi Kivity
9db9b0963f Merge ' reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Botond Dénes
`set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature eviction.

Fixes: https://github.com/scylladb/scylladb/issues/22629

Backport required to all active releases, they are all affected.

Closes scylladb/scylladb#22701

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: set_notify_handler(): disable timeout
  reader_permit: mark check_abort() as const
2025-02-08 20:05:03 +02:00
Kefu Chai
a6f703414a db: switch from boost::adaptors::indirected to std::views
replace boost::adaptors::indirected using std::views::transform for
less header dependency.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22731
2025-02-08 17:36:46 +02:00
Avi Kivity
d3b8c9f5ef build: update frozen toolchain to Fedora 41 with clang 19
Update from clang 18 to clang 19. perf-simple-query reports:

clang 18

278102.35 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   36056 insns/op,   16560 cycles/op,        0 errors)
288801.19 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   36018 insns/op,   16004 cycles/op,        0 errors)
287795.23 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   36039 insns/op,   15995 cycles/op,        0 errors)
290495.86 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   36027 insns/op,   15939 cycles/op,        0 errors)
293116.10 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   36020 insns/op,   15780 cycles/op,        0 errors)

clang 19

284742.08 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   35517 insns/op,   16419 cycles/op,        0 errors)
297974.97 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   35497 insns/op,   15926 cycles/op,        0 errors)
279527.99 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   35513 insns/op,   16724 cycles/op,        0 errors)
298229.61 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   35494 insns/op,   15892 cycles/op,        0 errors)
297982.67 tps ( 63.0 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   35494 insns/op,   15819 cycles/op,        0 errors)

So the update delivers a nice performance improvement.

Optimized clang regenerated and stored in

  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz

Script to prepare optimized clang updated, and upstreamed patch
dropped.

Closes scylladb/scylladb#22380
2025-02-08 17:18:17 +02:00
Andrei Chekun
043534acc6 test.py: Add possibility to run ldap tests from pytest
Add posibility to run ldap tests with pytest.
LDAP server will be created for each worker if xdist will be used.
For one thread one LDAP server will be used for all tests.
2025-02-07 21:40:28 +01:00
Andrei Chekun
36ad813b94 test.py: Add the possibility to run unit tests from pytest
Add the possibility to run unit tests from pytest
2025-02-07 21:40:28 +01:00
Andrei Chekun
8ef840a1c5 test.py: Add the possibility to run boost test from pytest
Add the possibility to run boost test from pytest.
Boost facade based on code from https://github.com/pytest-dev/pytest-cpp, but enhanced and rewritten to suite better.
2025-02-07 21:40:25 +01:00
Andrei Chekun
4addc039e5 test.py: Add discovery for C++ tests for pytest
Code based on https://github.com/pytest-dev/pytest-cpp. Updated, customized, enhanced to suit current needs.
Modify generate report to not modify the names, since it will break
xdist way of working. Instead modification will be done in post collect
but before executing the tests.
2025-02-07 19:44:06 +01:00
Andrei Chekun
fb4722443d test.py: Modify s3 server mock
Add the possibility to return environment as a dict to use it later it subprocess created by xdist, without starting another s3 mock server for each thread.
2025-02-07 19:38:53 +01:00
Andrei Chekun
7948c4561d test.py: Add method to get environment variables from MinIO wrapper
Add method to retrieve MinIO server wrapper environment variables for
later processing.
This change will allow to sharing connection information with other
processes and allow reusing the server across multiple tests.
2025-02-07 19:38:53 +01:00
Andrei Chekun
108ef5856f test.py: Move get configured modes to common lib
This will allow using this method inside the test module for pytest launching the boost and unit tests
2025-02-07 19:38:53 +01:00
Tomasz Grabiec
1854ea2165 test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario
This scenario is invoked in a loop in the
test_load_balancing_merge_colocation_with_random_load test case, which
will cause accumulation of tablet maps making each reload slower in
subsequent iterations.

It wasn't a problem before because we overwritten tablet_metadata in
each iteration to contain only tablets for the current table, but now
we need to keep it consistent with the schema and don't do that.
2025-02-07 17:13:52 +01:00
Tomasz Grabiec
58460a8863 tests: tablets: Set initial tablets to 1 to exit growing mode
After tablet hints, there is no notion of leaving growing mode and
tablet count is sustained continuously by initial tablet option, so we
need to lower it for merge to happen.
2025-02-07 17:13:52 +01:00
Tomasz Grabiec
ca6159fbe2 test: tablets_test: Create proper schema in load balancer tests
This is in preparation for load balancer changes needed to respect
per-table tablet hints and respecting per-shard tablet count
goal. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.
2025-02-07 17:13:52 +01:00
Tomasz Grabiec
0d259bb175 test: lib: Introduce topology_builder
Will be used by load balancer tests which need more than a single-node
topology, and which want to create proper schema in the database which
depends on that topology, in particular creating keyspaces with
replication factor > 1.

We need to do that because load balancer will use replication strategy
from the database as part of plan making.
2025-02-07 16:48:33 +01:00
Tomasz Grabiec
3bb9d2fbdb test: cql_test_env: Expose topology_state_machine 2025-02-07 16:09:21 +01:00
Tomasz Grabiec
61532eb53b topology_state_machine: Introduce lock transition
Will be used in load balancer tests to prevent concurrent topology
operations, in particular background load balancing.

load balancer will be invoked explicitly by the test. Disabling load
balancer in topology is not a solution, because we want the explicit
call to perform the load balancing.
2025-02-07 16:09:21 +01:00
Ernest Zaslavsky
5a266926e5 s3_client: Increase default part size for optimal performance
Set the `upload_file` part size to 50MiB, as this value provides the best performance based on tests conducted using `perf_s3_client` on an i4i.4xlarge instance.

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 5
INFO  2025-02-06 10:34:08,007 [shard 0:main] perf - Uploaded 1024MB in 27.768863962s, speed 36.87583335786734MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 10
INFO  2025-02-06 10:35:07,161 [shard 0:main] perf - Uploaded 1024MB in 28.175412552s, speed 36.34374467845414MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 20
INFO  2025-02-06 10:35:55,530 [shard 0:main] perf - Uploaded 1024MB in 14.483539631s, speed 70.700949221575MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 30
INFO  2025-02-06 10:36:35,466 [shard 0:main] perf - Uploaded 1024MB in 11.486155799s, speed 89.15080188004683MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 40
INFO  2025-02-06 10:37:46,642 [shard 0:main] perf - Uploaded 1024MB in 10.236196424s, speed 100.03715809898961MB/s

/perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 50
INFO  2025-02-06 10:38:34,777 [shard 0:main] perf - Uploaded 1024MB in 9.490644522s, speed 107.895728011548MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 60
INFO  2025-02-06 10:39:08,832 [shard 0:main] perf - Uploaded 1024MB in 9.767783693s, speed 104.83442633295012MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 70
INFO  2025-02-06 10:39:47,916 [shard 0:main] perf - Uploaded 1024MB in 10.166116742s, speed 100.72675988162482MB/s

Closes scylladb/scylladb#22732
2025-02-07 13:49:54 +03:00
Pavel Emelyanov
3cb0581022 Merge '.github: improve license header check workflow' from Kefu Chai
This patch series contains improvements to our GitHub license header check workflow.
The first patch grants necessary write permissions to the workflow, allowing it to comment directly on pull requests when license header issues are found. This addresses a permissions-related error that previously prevented the workflow from creating comments.
The second patch optimizes the workflow by skipping the license check step when no relevant files have been modified in the pull request. This prevents unnecessary workflow failures that occurred when the check was run without any files to analyze.
Together, these changes make the license header checking process more robust and efficient. The workflow now properly communicates findings through PR comments and avoids running unnecessary checks.

---

no need to backport, as the workflow updated by this change only exists in master.

Closes scylladb/scylladb#22736

* github.com:scylladb/scylladb:
  .github: grant write permissions for PR comments in license check workflow
  .github: skip license check when no relevant files changed
2025-02-07 13:47:53 +03:00
Alexey Novikov
cc35905531 Allow to use memtable_flush_period_in_ms schema option for system tables
It's possible to modify 'memtable_flush_period_in_ms' option only and as
single option, not with any other options together

Refs #20999
Fixes #21223

Closes scylladb/scylladb#22536
2025-02-07 10:33:05 +02:00
Kefu Chai
06b4abce56 .github: grant write permissions for PR comments in license check workflow
Grant write permissions to the check-license-header workflow to enable
commenting on pull requests. This fixes the "Resource not accessible by
integration" HTTP error that occurred when the workflow attempted to
create comments.

The permission is required according to GitHub's API documentation for
creating issue comments.

see also https://docs.github.com/en/rest/issues/comments?apiVersion=2022-11-28#create-an-issue-comment

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-07 16:09:42 +08:00
Kefu Chai
342b640b4b .github: skip license check when no relevant files changed
Skip the license header check step in `check-license-header.yaml` workflow
when no files with configured extensions were changed in the pull request.
Previously, the workflow would fail in this case since the --files
argument requires at least one file path:

```
  check-license.py: error: argument --files: expected at least one argument
```

Add `if` condition to only run the check when steps.changed-files.outputs.files
is not empty.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-07 16:09:42 +08:00
Yaron Kaikov
d50738feca ./github/workflows/pr-require-backport-label: fix regex to match source available version
Until now this action checked if we have a `backport/none` or `backport/x.y` label only, since we moved to the source available and the releases like 2025.1 don't match this regex this action keeps failing

Closes scylladb/scylladb#22734
2025-02-07 10:03:00 +02:00
Botond Dénes
9174f27cc8 reader_concurrency_semaphore: set_notify_handler(): disable timeout
set_notify_handler() is called after a querier was inserted into the
querier cache. It has two purposes: set a callback for eviction and set
a TTL for the cache entry. This latter was not disabling the
pre-existing timeout of the permit (if any) and this would lead to
premature eviction of the cache entry if the timeout was shorter than
TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature
eviction.

Fixes: #scylladb/scylladb#22629
2025-02-07 02:31:01 -05:00
Botond Dénes
a3ae0c7cee reader_permit: mark check_abort() as const
All it does is read one field, making it const makes using it easier.
2025-02-07 01:32:35 -05:00
Ernest Zaslavsky
97d789043a s3_client: Fix buffer offset reset on request retry
This patch addresses an issue where the buffer offset becomes incorrect when a request is retried. The new request uses an offset that has already been advanced, causing misalignment. This fix ensures the buffer offset is correctly reset, preventing such errors.

Closes scylladb/scylladb#22729
2025-02-07 08:52:08 +03:00
Pavel Emelyanov
f331d3b876 Merge 'auth: ensure default superuser password is set before serving CQL' from Andrzej Jackowski
Before this change, it was ensured that a default superuser is created
before serving CQL. However, the mechanism didn't wait for default
password initialization, so effectively, for a short period, customer
couldn't authenticate as the superuser properily. The purpose of this
change is to improve the superuser initialization mechanism to wait for
superuser default password, just as for the superuser creation.

This change:
 - Introduce authenticator::ensure_superuser_is_created() to allow
   waiting for complete initialization of super user authentication
 - Implement ensure_superuser_is_created in password_authenticator, so
   waiting for superuser password initialization is possible
 - Implement ensure_superuser_is_create in transitional_authenticator,
   so the implementation from password_authenticator is used
 - Implement no-op ensure_superuser_is_create for other authenticators
 - Extend service::ensure_superuser_is_created to wait for superuser
   initialization in authenticator, just as it was implemented earlier
   for role_manager
- Add injected error (sleep) in password_authenticator::start to
   reproduce a case of delayed password creation
 - Implement test_delayed_deafult_password to verify the correctness of the fix
 - Ensure superuser is created in single_node_cql_env::run_in_thread to
   make single_node_cql more similar to scylla_main in main.cc

Fixes scylladb/scylladb#20566

Backport not needed - a minor bugfix

Closes scylladb/scylladb#22532

* github.com:scylladb/scylladb:
  test: implement test_auth_password_ensured
  test: implement connect_driver argument in ManagerClient::server_add
  auth: ensure default superuser password is set before serving CQL
  auth: added password_authenticator_start_pause injected error
2025-02-07 08:47:01 +03:00
Michał Chojnowski
8fb2ea61ba test_rpc_compression.py: test the dictionaries are loaded on startup
Reproduces scylladb/scylladb#22738
2025-02-07 04:21:23 +01:00
Michał Chojnowski
dd82b40186 raft/group0_state_machine: load current RPC compression dict on startup
We are supposed to be loading the most recent RPC compression dictionary
on startup, but we forgot to port the relevant piece of logic during
the source-available port.
2025-02-07 04:20:21 +01:00
Avi Kivity
861fb58e14 Merge 'vector: add support for vector type' from Dawid Pawlik
This pull request is an implementation of vector data type similar to one used by Apache Cassandra.

The patch contains:
- implementation of vector_type_impl class
- necessary functionalities similar to other data types
- support for serialization and deserialization of vectors
- support for Lua and JSON format
- valid CQL syntax for `vector<>` type
- `type_parser` support for vectors
- expression adjustments such as:
    - add `collection_constructor::style_type::vector`
    - rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector`
- vector type encoding (for drivers)
- unit tests
- cassandra compatibility tests
- necessary documentation

Co-authored-by: @janpiotrlakomy

Fixes https://github.com/scylladb/scylladb/issues/19455

Closes scylladb/scylladb#22488

* github.com:scylladb/scylladb:
  docs: add vector type documentation
  cassandra_tests: translate tests covering the vector type
  type_codec: add vector type encoding
  boost/expr_test: add vector expression tests
  expression: adjust collection constructor list style
  expression: add vector style type
  test/boost: add vector type cql_env boost tests
  test/boost: add vector type_parser tests
  type_parser: support vector type
  cql3: add vector type syntax
  types: implement vector_type_impl
2025-02-06 20:36:50 +02:00
Benny Halevy
021fc3c756 tablets: resize_decision: get rid of initial_decision
Now, with tablet_hints calculation of min_tablet_count
it is not used anymore.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 18:43:47 +02:00
Benny Halevy
20c6ca2813 tablet_allocator: consider tablet options for resize decision
Do not merge tablets if that would drop the tablet_count
below the minimum provided by hints.
Split tablets if the current tablet_count is less than
the minimum tablet count calculated using the table's tablet options.

TODO: override min_tablet_count if the tablet count per shard
is greater than the maximum allowed.  In this case
the tables tablet counts should be scaled down proportionally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 18:43:35 +02:00
Nadav Har'El
c2b870ee54 Merge 'De-duplicate validation of tables in some column_family API endpoints' from Pavel Emelyanov
In column_family.cc and storage_service.cc there exist a bunch of helpers that parse and/or validate ks/cf names, and different endpoints use different combinations of those, duplicating the functionality of each other and generating some mess. This PR cleans the endpoints from column_family.cc that parse and validate fully qualified table name (the '$ks:$cf' string).

A visible "improvement" is that `validate_table()` helper usage in the api/ directory is narrowed down to storage_service.cc file only (with the intent to remove that helper completely), and the aforementioned `for_tables_on_all_shards()` helper becomes shorter and tiny bit faster, because it doesn't perform some re-lookups of tables, that had been performed by validation sanity checks before it.

There's more to be done in those helpers, this PR wraps only one part of this mess.

Below is the list of endpoints this PR affects and the tests that validate the changes:

|endpoint|test|
|-|-|
|column_family/autocompaction|rest_api/test_column_family::test_column_family_auto_compaction_table|
|column_family/tombstone_gc|rest_api/test_column_family::test_column_family_tombstone_gc_api|
|column_family/compaction_strategy|rest_api/test_column_family/test_column_family_compaction_strategy|
|compaction_manager/stop_keyspace_compaction/|rest_api/test_compaction_manager::{test_compaction_manager_stop_keyspace_compaction,test_compaction_manager_stop_keyspace_compaction_tables}|

Closes scylladb/scylladb#21533

* github.com:scylladb/scylladb:
  api: Hide parse_tables() helper
  api: Use parse_table_infos() in stop_keyspace_compaction handler
  api: Re-use parse_table_info() in column_family API
  api: Make get_uuid() return table_info (and rename)
  api: Remove keyspace argument from for_table_on_all_shards()
  api: Switch for_table_on_all_shards() to use table_info-s
  api: Hide validate_table() helper
  api: Tables vector is never empty now in for_table_on_all_shards()
  api: Move vectors of tables, not copy
  api: Add table validation to set_compaction_strategy_class endpoint
  api: Use get_uuid() to validate_table() in column family API
  api: Use parse_table_infos() in column family API
2025-02-06 17:28:08 +01:00
Avi Kivity
c33bbc884b types: listlike_partially_deserializing_iterator: improve compatibility with std::ranges
Range concepts require an iterator_concept tag and a default
constructor, so provide those.

Closes scylladb/scylladb#22138
2025-02-06 15:32:28 +03:00
Kefu Chai
5c7ad745fd db: do not include unused headers
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

also, took this opportunity to remove an unused namespace alias. and
add an include which is used actually. please note,
`std::ranges::pop_heap()` and friends are actually provided by
`<algorithm>` not `<ranges>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22716
2025-02-06 13:38:19 +02:00
Andrzej Jackowski
d5a4f3d4cd test: implement test_auth_password_ensured
Before fix of scylladb#20566, CQL was served irrespectively of default
superuser password creation, which led to an incorrect product behavior
and sporadic test failures. This test verifies race condition of serving
CQL and creating default superuser password. Injected failure is used to
ensure CQL use is attempted before default superuser password creation,
however, the attempt is expected to fail because scylladb#20566 is
fixed. Following that, the injected error is notified, so CQL driver can
be started correctly. Finally, CREATE USER query is executed to confirm
successful superuser authentication.

This change:
 - Implement test_auth_password_ensured.py

The test starts a server without expecting CQL serving, because
expected_server_up_state=ServerUpState.HOST_ID_QUERIED and
connect_driver=False. Error password_authenticator_start_pause is
injected to block superuser password setup during server startup.
Next, the test waits for a log to confirm that the code implementing
injected error is reached. When the server startup procedure is
unfinished, some operations might not complete on a first try, so
waiting for driver connection is wrapped in repeat_if_host_unavailable.
2025-02-06 10:30:55 +01:00
Andrzej Jackowski
e70ba7e3ed test: implement connect_driver argument in ManagerClient::server_add
This commit introduces connect_driver argument in
ManagerClient::server_add. The argument allow skipping CQL driver
initialization part during server start. Starting a server without
the driver is necessary to implement some test scenarios related
to system initialization.

After stopping a server, ManagerClient::server_start can be used to
start the server again, so connect_driver argument is also added here to
allow preventing connecting the driver after a server restart.

This change:
 - Implement connect_driver argument in ManagerClient::server_add
 - Implement connect_driver argument in ManagerClient::server_start
2025-02-06 10:30:55 +01:00
Andrzej Jackowski
7391c9419f auth: ensure default superuser password is set before serving CQL
Before this change, it was ensured that a default superuser is created
before serving CQL. However, the mechanism didn't wait for default
password initialization, so effectively, for a short period, customer
couldn't authenticate as the superuser properily. The purpose of this
change is to improve the superuser initialization mechanism to wait for
superuser default password, just as for the superuser creation.

This change:
 - Introduce authenticator::ensure_superuser_is_created() to allow
   waiting for complete initialization of super user authentication
 - Implement ensure_superuser_is_created in password_authenticator, so
   waiting for superuser password initialization is possible
 - Implement ensure_superuser_is_create in transitional_authenticator,
   so the implementation from password_authenticator is used
 - Implement no-op ensure_superuser_is_create for other authenticators
 - Modify service::ensure_superuser_is_created to wait for superuser
   initialization in authenticator, just as it was implemented earlier
   for role_manager

Fixes scylladb/scylladb#20566
2025-02-06 10:30:55 +01:00
Andrzej Jackowski
7c63df085c auth: added password_authenticator_start_pause injected error
This change:
 - Implement password_authenticator_start_pause injected error to allow
   deterministic blocking of default superuser password creation

This change facilitates manual testing of system behavior when default
superuser password is being initialized. Moreover, this mechanism will
be used in next commits to implement a test to verify a fix for
erroneous CQL serving before default superuser password creation.
2025-02-06 10:30:45 +01:00
Kefu Chai
5443d9dabb .github: add check-license-header workflow
this workflow checks the first 10 lines for
"LicenseRef-ScyllaDB-Source-Available-1.0" in newly introduced files
when a new pull request is created against "master" or "next".

if "LicenseRef-ScyllaDB-Source-Available-1.0" is not found, the
workflow fails. for the sake of simplicity, instead of parsing the
header for SPDX License ID, we just check to see if the
"LicenseRef-ScyllaDB-Source-Available-1.0" is included.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22065
2025-02-06 12:20:23 +03:00
Nadav Har'El
cae8a7222e alternator: fix view build on oversized GSI key attribute
Before this patch, the regular_column_transformation constructor, which
we used in Alternator GSIs to generates a view key from a regular-column
cell, accepted a cell of any size. As a reviewer (Avi) noticed, very
long cells are possible, well beyond what Scylla allows for keys (64KB),
and because regular_column_transformation stores such values in a
contiguous "bytes" object it can cause stalls.

But allowing oversized attributes creates an even more accute problem:
While view building (backfilling in DynamoDB jargon), if we encounter
an oversized (>64KB) key, the view building step will fail and the
entire view building will hang forever.

This patch fixes both problems by adding to regular_column_transformation's
constructor the check that if the cell is 64KB or larger, an empty value
is returned for the key. This causes the backfilling to silently skip
this item, which is what we expect to happen (backfilling cannot do
anything to fix or reject the pre-existing items in the best table).

A test test_gsi_updatetable.py::test_gsi_backfill_oversized_key is
introduced to reproduce this problem and its fix. The test adds a 65KB
attribute to a base table, and then adds GSIs to this table with this
attribute as its partition key or its sort key. Before this patch, the
backfilling process for the new GSIs hangs, and never completes.
After this patch, the backfilling completes and as expected contains
other base-table items but not the item with the oversized attribute.
The new test also passes on DynamoDB.

However, while implementing this fix I realized that issue #10347 also
exists for GSIs. Issue #10347 is about the fact that DynamoDB limits
partition key and sort key attributes to 2048 and 1024 bytes,
respectively. In the fix described above we only handled the accute case
of lengths above 64 KB, but we should actually skip items whose GSI
keys are over 2048 or 1024 bytes - not 64KB. This extra checking is
not handled in this patch, and is part of a wider existing issue:
Refs #10347

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:50 +01:00
Nadav Har'El
7a0027bacc mv: clean up do_delete_old_entry
The function do_delete_old_entry() had an if() which was supposedly for
the case of collection column indexing, and which our previous patch
that improved this function to support caller-specified deletion_ts
left behind.

As a reviewer noticed, the new tombstone-setting code was in an "else"
of that existing if(), and it wasn't clear what happens if we get to that
else in the collection column indexing. So I reviewed the code and added
breakpoints and realized that in fact, do_delete_old_entry() is never
called for the collection-indexing case, which has its own
update_entry_for_computed_column() which view_updates::generate_update()
calls instead of the do_delete_old_entry() function and its friends.
So it appears that do_delete_old_entry() doesn't need that special
case at all, which simplifies it.

We should eventually simplify this code further. In particular, the
function generate_update() already knows the key of the rows it
adds or deletes so do_delete_old_entry() and its friends don't need
to call get_view_rows() to get it again. But these simplifications
and other will need to come in a later patch series, this one is
already long enough :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
67d2ea4c4b test/alternator: unflake test for IndexStatus
The test for IndexStatus verifies that on a newly created table and GSI,
the IndexStatus is "ACTIVE". However, in Alternator, this doesn't strictly
need to happen *immediately* - view building, even for an empty table -
can take a short while in debug mode. This make the test test
test_gsi_describe_indexstatus flaky in debug mode.

The fix is to wait for the GSI to become active with wait_for_gsi()
before checking it is active. This is sort of silly and redundant,
but the important point that if the IndexStatus is incorrect this test
will fail, it doesn't really matter whether the wait_for_gsi() or
the DescribeTable assertion is what fails.

Now that wait_for_gsi() is used in two test files, this patch moves it
(and its friend, wait_for_gsi_gone()) to util.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
4ba17387e6 test/alternator: work around unrelated bug causing test flakiness
The alternator test test_gsi_updatetable.py::test_gsi_delete_with_lsi
Creates a GSI together with a table, and then deletes it. We have a
bug unrelated to the purpose of this test - #9059 - that causes view
building to sometimes crash Scylla if the view is deleted while the
view build is starting. We see specifically in debug builds that even
view building of an *empty* table might not finish before the test
deletes the view - so this bug happens.

Work around that bug by waiting for the GSI to build after creating
the table with the GSI. This shouldn't be necessary (in DynamoDB,
a GSI created with the table always begins ready with the table),
but doesn't hurt either.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
42eabb3b6f docs/alternator: adding a GSI is no longer an unimplemented feature
The previous patches implemented issue #11567 - adding a GSI to a
pre-existing table. So we can finally remove the mention of this
feature as an "unimplemented feature" in docs/alternator/compatibility.md.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
ac648950f1 test/alternator: remove xfail from all tests for issue 11567
The previous patches fully implemented issue 11567 - supporting
UpdateTable to add or delet a GSI on an existing Alternator table.
All 14 tests that were marked xfail because of this issue now pass,
so this patch removes their xfail. There are no more xfailing tests
referring to this issue.

These 14 tests, most of them in test/alternator/test_gsi_updatetable.py,
cover all aspects of this feature, including adding a GSI, deleting a
GSI, interactions between GSI and LSI, RBAC when adding or deleting a GSI,
data type limitation on an attribute that becomes a GSI key or stops
being one, GSI backfill, DescribeTable and backfill, various error
conditions, and more.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
9bfa6bf267 alternator: overhaul implementation of GSIs and support UpdateTable
The main goal of this patch is to fully support UpdateTable's ability
to add a GSI to an existing table, and delete a GSI from an existing
table. But to achieve this, this patch first needs to overhaul how GSIs
are implemented:

Remember that in Alternator's data model, key attributes in a table
are stored as real CQL columns (with a known type), but all other
attributes of an item are stored in one map called ":attrs".

* Before this patch, the GSI's key columns were made into real columns
  in the table's schema, and the materialized view used that column as
  the view's key.

* After this patch, the GSI's key columns usually (when they are not
  the base table's keys, and not any LSI's key) are left in the ":attrs"
  map, just like any other non-key column. We use a new type of computed
  column (added in the previous patch) to extract the desired element from
  this map.

This overhaul of the GSI implementation doesn't change anything in the
functionality of GSIs (and the Alternator test suite tries very hard to
ensure that), but finally allows us to add a GSI to an already-existing
table. This is now possible because the GSI will be able to pick up
existing data from inside the ":attrs" map where it is stored, instead
of requiring the data in the map to be moved to a stand-alone column as
the previous implementation needed.

So this patch also finally implements the UpdateTable operations
(Create and Delete) to add or delete a GSI on an existing table,
as this is now fairly straightfoward. For the process of "backfilling"
the existing data into the new GSI we don't need to do anything - this
is just the materialized-view "view building" process that already
exists.

Fixes #11567.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
bc7b5926d2 mv: support regular_column_transformation key columns in view
In an earlier patch, we introduced regular_column_transformation,
a new type of computed column that does a computation on a cell in
regular column in the base and returns a potentially transformed cell
(value or deletion, timestamp and ttl). In *this* patch, we wire the
materialized view code to support this new kind of computed column that
is usable as a materialized-view key column. This new type of computed
column is not yet used in this patch - this will come in the next
patch, where we will use it for Alternator GSIs.

Before this patch, the logic of deciding when the view update needs
to create a new row or delete a new one, and which timestamp and ttl
to give to the new row, could depend on one (or two - in Alternator)
cells read from base-table regular columns. In this patch, this logic
is rewritten - the notion of "base table regular columns" is generalized
to the notion of "updatable view key columns" - these are view key
columns that an update may change - because they really are base regular
columns, or a computed function of one (regular_column_transformation).

In some sense, the new code is easier to understand - there is no longer
a separate "compute_row_marker()" function, rather the top-level
generate_update() is now in charge of finding the "updatable view key
columns" and calculate the row marker (timestamp and ttl) as part
of deciding what needs to be done.

But unfortunately the code still has separate code paths for "collection
secondary indexing", and also for old-style column_computation (basically,
only token_column_computation). Perhaps in the future this can be further
simplified.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Nadav Har'El
ea87b9fff0 alternator: add new materialized-view computed column for item in map
This patch adds a new computed column class for materialized views,
        extract_from_attrs_column_computation
which is Alternator-specific and knows how to extract a value (of a
known type) from an attribute stored in Alternator's map-of-all-nonkey-
attributes ":attrs".

We'll use this new computed column in the next patch to reimplement GSI.

The new computed-column class is based on regular_column_transformation
introduced in the previous patch. It is not yet wired to anything:
The MV code cannot handle any regular_column_transformation yet, and
Alternator will not yet use it to create a GSI. We'll do those things
in the following patches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
e8d1e8a515 build: in cmake build, schema needs alternator
This patch is to cmake what the previous patch was to configure.py.

In the next patch we want to make schema/schema.o depend on
alternator/executor.o - because when the schema has an Alternator
computed column, the schema code needs to construct the computed column
object (extract_from_attrs_column_computation) and that lives in
alternator/executor.o.

In the cmake-based build, all the schema/* objects are put into one
library "libschema.a". But code that uses this library (e.g., tests)
can't just use that library alone, because it depends on other code
not in schema/. So CMakeLists.txt lists other "libraries" that
libschema.a depends on - including for example "cql3". We now need
to add "alternator" to this dependency list. The dependency is marked
"PRIVATE" - schema needs alternator for its own internal uses, but
doesn't need to export alternator's APIs to its own users.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
1ebdf1a9f7 build: build tests with Alternator
For an unknown (to me) reason, configure.py has two separate source
file lists - "scylla_core" and "alternator". Scylla, and almost all tests,
are compiled with both lists, but just a couple of tests were compiled
with just scylla_core without alternator.

In the next patch we want to make schema/schema.o depened on
alternator/executor.o because when the schema has an Alternator
computed column, the schema code needs to construct the computed column
object (extract_from_attrs_column_computation) and that lives in
alternator/executor.o.

This change will break the build of the two tests that do not include
the Alternator objects. So let's just add the "alternator" dependencies
to the couple of tests that were missing it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
828cc98e4c alternator: add function serialized_value_if_type()
This patch introduces a function serialized_value_if_type() which takes
a serialized value stored in the ":attrs" map, and converts it into a
serialized *CQL* type if it matches a particular type (S, B or N) - or
returns null the value has the wrong type.

We will use this function in the following patch for deserializing
values stored in the ":attrs" map to use them as a materialized view
key. If the value has the right type, it will be converted to the CQL
type and used as the key - but if it has the wrong type the key will
be null and it will not appear in the view. This is exactly how GSI
is supposed to behave.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
c8ea9f8470 mv: introduce regular_column_transformation, a new type of computed column
In the patches that follow, we want Alternator to be able to use as a
key for a materialized view (GSI) not a real column from the schema,
but rather an attribute value deserialized from a member of the ":attrs"
map.

For this, we need the ability for materialized view to define a key
column which is computed as function of a real column (":attrs").

We already have an MV feature which we called "computed column"
(column_computation), but it is wholy inadequate for this job:
column_computation can only take a partition key, and produce a value -
while we need it to take a regular column (one member of ":attrs"),
not just the partition key, and return a cell - value or deletion,
timestamp and TTL.

So in this patch we introduce a new type of computed column, which we
called "regular_column_transformation" since it intends to perform some
sort of transformation on a single column (or more accurately, a single
atomic cell). The limitation that this function transforms a single
column only is important - if we had a function of multiple columns,
we wouldn't know which timestamp or ttl it should use for the result
if the two columns had different timestamps or TTLs.

The new class isn't wired to anything yet: The MV code cannot handle
it yet, and the Alternator code will not use it yet.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
cea7aacc52 alternator: add IndexStatus/Backfilling in DescribeTable
This patch adds the missing IndexStatus and Backfilling fields for the
GSIs listed by a DescribeTable request. These fields allow an application
to check whether a GSI has been fully built (IndexStatus=ACTIVE) or
currently being built (IndexStatus=CREATING, Backfilling=true).

This feature is necessary when a GSI can be added to an existing table
so its backfilling might take time - and the application might want to
wait for it.

One test - test_gsi.py::test_gsi_describe_indexstatus - begins to pass
with this fix, so the xfail tag is removed from it.

Fixes #11471.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:48 +01:00
Nadav Har'El
6239e92776 alternator: add "LimitExceededException" error type
This patch adds to Alternator's api_error type yet another type of
error, api_error::limit_exceeded (error code "LimitExceededException").
DynamoDB returns this error code in certain situations where certain
low limits were exceeded, such as the case we'll need in a following
patch - an UpdateTable that tries to create more than one GSI at once.

The LimitExceededException error type should not be confused with
other similarly-named but different error messages like
ProvisionedThroughputExceededException or RequestLimitExceeded.
In general, we make an attempt to return the same error code that
DynamoDB returns for a given error.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:47 +01:00
Nadav Har'El
279fe43ebe docs/alternator: document two more unimplemented Alternator features
Two new features were added to DynamoDB this month - MultiRegionConsistency
and WarmThroughput. Document them as unimplemented - and link to the
relevant issue in our bug tracker - in docs/alternator/compatibility.md.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:47 +01:00
Pavel Emelyanov
64baab1b95 Merge 'config: prevent SIGHUP from changing non-liveupdatable parameters' from Andrzej Jackowski
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).

This change:
 - Introduce _initialization_completed member in config_file
 - Set _initialization_completed=true when config file is processed on
   server start
 - Verify config_file's initialization status during config update - if
   config_file was initialized, prevent from further changes of
   non-liveupdatable parameters
 - Implement ScyllaRESTAPIClient::get_config() that obtains a current
    value of given configuration parameter via /v2/config REST API
 - Implement test to confirm that only liveupdatable parameters are
    changed when SIGHUP is sent after configuration file change

Function set_initialization_completed() is called only once in main.cc,
and the effect is expected to be visible in all shards, as a side effect
of cfg->broadcast_to_all_shards() that is called shortly after. The same
technique was already used for enable_3_1_0_compatibility_mode() call.

Fixes scylladb/scylladb#5382

No backport - minor fix.

Closes scylladb/scylladb#22655

* github.com:scylladb/scylladb:
  test: SIGHUP doesn't change non-liveupdatable configuration
  test: implement ScyllaRESTAPIClient::get_config()
  config: prevent SIGHUP from changing non-liveupdatable parameters
  config: remove unused set_value_on_all_shards(const YAML::Node&)
2025-02-06 11:33:59 +03:00
Pavel Emelyanov
951625ca13 Merge 's3 client: add aws credentials providers' from Ernest Zaslavsky
This update introduces four types of credential providers:

1. Environment variables
2. Configuration file
3. AWS STS
4. EC2 Metadata service

The first two providers should only be used for testing and local runs. **They must NEVER be used in production.**

The last two providers are intended for use on real EC2 instances:

- **AWS STS**: Preferred method for obtaining temporary credentials using IAM roles.
- **EC2 Metadata Service**: Should be used as a last resort.

Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result.

fixes: #21828

Closes scylladb/scylladb#21830

* github.com:scylladb/scylladb:
  docs: update the `object_storage.md` and `admin.rst`
  aws creds: add STS and Instance Metadata service credentials providers
  aws creds: add env. and file credentials providers
  s3 creds: move credentials out of endpoint config
2025-02-06 11:12:37 +03:00
Benny Halevy
559f083dc6 tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member
Rather than target_max_tablet_size.  We need both the target
as well as max and min tablet sizes, so there is no sense
in keeping the max and deriving the target and the minimum
for the max value.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:59:32 +02:00
Benny Halevy
32c2f7579f network_topology_strategy: allocate_tablets_for_new_table: consider tablet options
Use the keyspace initial_tablets for min_tablet_count, if the latter
isn't set, then take the maximum of the option-based tablet counts:
- min_tablet_count
- and expected_data_size_in_gb / target_tablet_size
- min_per_shard_tablet_count (via
  calculate_initial_tablets_from_topology)
If none of the hints produce a positive tablet_count,
fall back to calculate_initial_tablets_from_topology * initial_scale.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:59:32 +02:00
Benny Halevy
86bcf4cffe network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner
Current implementation is inefficient as it calls
get_datacenter_token_owners_ips and then find_node(ep)
while for_each_node easily provides a host_id for
is_normal_token_owner.

Then, since we're interested only in datacenters
configure with a replication factor (but it still might be 0),
simply iterate over the dc->rf map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:59:30 +02:00
Benny Halevy
49dacb1d52 network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0
Currently, if a datacenter has no replication_factor option
we consider its replication factor to be 1 in
calculate_initial_tablets_from_topology, but since
we're not going to have any replica on it, it should
be 0.

This is very minor since in the worst case, it
will pessimize the calculation and calculate a value
for initial_tablets that's higher than it could be.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00
Benny Halevy
8aace28397 cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0
Keyspace `initial` tablets option is deprecated
and may be removed in the future.
Rather than relying on `initial`:0 to always enabled
tablets, explicitly print "enabled":true when tablets
are enabled and initial_tablets=0, same as keyspace_metadata::describe.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00
Benny Halevy
1054e05491 cql3/create_keyspace_statement: add deprecation warning for initial tablets
Per-table hints should be used instead.

Note: the warning is produced by check_against_restricted_replication_strategies
which is called also from alter_keyspace_statement.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00
Benny Halevy
7cd29810a0 test: cqlpy: test_tablets: add tests for per-table tablet options
Test specifying of per-table tablet options on table creation
and alter table.

Also, add a negative test for atempting to use tablet options
with vnodes (that should fail).

And add a basic test for testing tablet options also with
materialized views.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00
Benny Halevy
c5668d99c9 schema: add per-table tablet options
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.

On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).

Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization.  The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.

Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.

The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations.  This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00
Benny Halevy
ad8b0649ff feature_service: add TABLET_OPTIONS cluster schema feature
To be used for enabling per-table tablet options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00
Tomasz Grabiec
3bb19e9ac9 locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
For example, nodes which are being decommissioned should not be
consider as available capacity for new tables. We don't allocate
tablets on such nodes.

Would result in higher per-shard load then planned.

Closes scylladb/scylladb#22657
2025-02-05 23:59:41 +02:00
Kefu Chai
9a20fb43ab tree: replace boost::min_element() with std::ranges::min_element()
in order to reduce the external header dependency, let's switch to
the standardlized std::ranges::min_element().

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22572
2025-02-05 21:54:01 +02:00
Botond Dénes
3d12451d1f db/config: reader_concurrency_semaphore_cpu_concurrency: bump default to 2
This config item controls how many CPU-bound reads are allowed to run in
parallel. The effective concurrency of a single CPU core is 1, so
allowing more than one CPU-bound reads to run concurrently will just
result in time-sharing and both reads having higher latency.
However, restricting concurrency to 1 means that a CPU bound read that
takes a lot of time to complete can block other quick reads while it is
running. Increase this default setting to 2 as a compromise between not
over-using time-sharing, while not allowing such slow reads to block the
queue behind them.

Fixes: #22450

Closes scylladb/scylladb#22679
2025-02-05 21:52:20 +02:00
Tomasz Grabiec
e22e3b21b1 locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
In that case, new_racks will be used, but when we discover no
candidates, we try to pop from existing_racks.

Fixes #22625

Closes scylladb/scylladb#22652
2025-02-05 20:13:05 +02:00
Nadav Har'El
bfdd805f15 test/alternator: fix running against installation blocking CQL
One of the design goals of the Alternator test suite (test/alternator)
is that developers should be able to run the tests against some already
running installation by running `cd test/alternator; pytest [--url ...]`.

Some of our presentations and documents recommend running Alternator
via docker as:

    docker run --name scylla -d -p 8000:8000 scylladb/scylla:latest
         --alternator-port=8000 --alternator-write-isolation=always

This only makes port 8000 available to the host - the CQL port is
blocked. We had a bug in conftest.py's get_valid_alternator_role()
which caused it to fail (and fail every single test) when CQL is
not available. What we really want is that when CQL is not available
and we can't figure out a correct secret key to connect to Alternator,
we just try a connect with a fake key - and hope that the option
alternator-enforce-authorization is turned off. In fact, this is what
the code comments claim was already happening - but we failed to
handle the case that CQL is not available at all.

After this patch, one can run Alternator with the above docker
command, and then run tests against it.

By the way, this provides another way for running any old release of
Scylla and running Alternator tests against it. We already supported
a similar feature via test/alternator/run's "--release" option, but
its implementation doesn't use docker.

Fixes #22591

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22592
2025-02-05 19:01:31 +03:00
Botond Dénes
7ce932ce01 service: query_pager: fix last-position for filtering queries
On short-pages, cut short because of a tombstone prefix.
When page-results are filtered and the filter drops some rows, the
last-position is taken from the page visitor, which does the filtering.
This means that last partition and row position will be that of the last
row the filter saw. This will not match the last position of the
replica, when the replica cut the page due to tombstones.
When fetching the next page, this means that all the tombstone suffix of
the last page, will be re-fetched. Worse still: the last position of the
next page will not match that of the saved reader left on the replica, so
the saved reader will be dropped and a new one created from scratch.
This wasted work will show up as elevated tail latencies.
Fix by always taking the last position from raw query results.

Fixes: #22620

Closes scylladb/scylladb#22622
2025-02-05 17:23:30 +02:00
Avi Kivity
f3751f0eba tools: toolchain: dbuild: don't use which command
The `which` command is typically not installed on cloud OS images
and so requires the user to remember to install it (or to be prompted
by a failure to install it).

Replace it with the built-in `type` that is always there. Wrap it
in a function to make it clear what it does.

Closes scylladb/scylladb#22594
2025-02-05 17:18:05 +03:00
Avi Kivity
1ef0a48bbe conf: scylla.yaml: add stubs for encryption at rest
These are helpful for configuring encryption-at-rest.

Copied verbatim from scylla-enterprise.

Closes scylladb/scylladb#22653
2025-02-05 17:14:53 +03:00
Raphael S. Carvalho
ce65164315 test: Use linux-aio backend again on seastar-based tests
Since mid December, tests started failing with ENOMEM while
submitting I/O requests.

Logs of failed tests show IO uring was used as backend, but we
never deliberately switched to IO uring. Investigation pointed
to it happening accidentaly in commit 1bac6b75dc,
which turned on IO uring for allowing native tool in production,
and picked linux-aio backend explicitly when initializing Scylla.
But it missed that seastar-based tests would pick the default
backend, which is io_uring once enabled.

There's a reason we never made io_uring the default, which is
that it's not stable enough, and turns out we made the right
choice back then and it apparently continue to be unstable
causing flakiness in the tests.

Let's undo that accidental change in tests by explicitly
picking the linux-aio backend for seastar-based tests.
This should hopefully bring back stability.

Refs #21968.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22695
2025-02-05 15:19:24 +02:00
Ernest Zaslavsky
29e60288de docs: update the object_storage.md and admin.rst
Added additional options and best practices for AWS authentication.
2025-02-05 14:57:19 +02:00
Ernest Zaslavsky
dee4fc7150 aws creds: add STS and Instance Metadata service credentials providers
This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.
2025-02-05 14:57:19 +02:00
Ernest Zaslavsky
d534051bea aws creds: add env. and file credentials providers
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
2025-02-05 14:57:19 +02:00
Kefu Chai
f7a729c3fd github: use clang-21 in clang-nightly workflow
since clang 20 has been branched. let's track the development brach,
which is clang 21.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22698
2025-02-05 14:58:35 +03:00
Aleksandra Martyniuk
fe02555c46 tasks: drop task_manager::config::broadcast_address as it's unused 2025-02-05 10:11:54 +01:00
Aleksandra Martyniuk
e16b413568 tasks: replace ip with host_id in task_identity
Replace ip with host_id in task_identity. Translate host_id to ip
in task manager api handlers.

Use host_id in send_tasks_get_children.
2025-02-05 10:11:52 +01:00
Aleksandra Martyniuk
0c868870b4 api: task_manager: pass gossiper to api::set_task_manager
Pass gossiper to api::set_task_manager. It will be used later
for host_id to ip transition.
2025-02-05 10:10:29 +01:00
Aleksandra Martyniuk
4470c2f6d3 tasks: keep host_id in task_manager
Keep host_id of a node in task manager. If host_id wasn't resolved
yet, task manager will keep an empty id.

It's a preparation for the following changes.
2025-02-05 10:10:29 +01:00
Aleksandra Martyniuk
7969e98b4e tasks: move tasks_get_children to IDL 2025-02-05 10:10:29 +01:00
Kefu Chai
3aeecd4264 generic_server: correct typo in comment
s/invokation/invocation/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22697
2025-02-05 11:48:50 +03:00
Andrzej Jackowski
6f5ba3dd89 test: SIGHUP doesn't change non-liveupdatable configuration
This change:
 - Implement test to confirm that only liveupdatable parameters are
   changed when SIGHUP is sent after configuration file change
2025-02-05 09:37:37 +01:00
Andrzej Jackowski
a001b20938 test: implement ScyllaRESTAPIClient::get_config()
This change:
 - Implement ScyllaRESTAPIClient::get_config() that obtains a current
   value of given configuration parameter via /v2/config REST API
2025-02-05 09:37:37 +01:00
Andrzej Jackowski
dd899c0f1f config: prevent SIGHUP from changing non-liveupdatable parameters
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).

This change:
 - Introduce _initialization_completed member in config_file
 - Set _initialization_completed=true when config file is processed on
   server start
 - Verify config_file's initialization status during config update - if
   config_file was initialized, prevent from further changes of
   non-liveupdatable parameters

Fixes scylladb/scylladb#5382
2025-02-05 09:37:30 +01:00
Pavel Emelyanov
83f3821f99 Merge 'cql: clean the code validating replication strategy options' from Piotr Smaron
Clean the code validating if a replication strategy can be used.
This PR consists of a bunch of unmerged https://github.com/scylladb/scylladb/pull/20088 commits - the solution to the problem that the linked PR tried to solve has been accomplished in another PR, leaving the refactor commits unmerged. The commits introduced in this PR have already been reviewed in the old PR.

No need to backport, it's just a refactor.

Closes scylladb/scylladb#22516

* github.com:scylladb/scylladb:
  cql: restore validating replication strategies options
  cql: change validating NetworkTopologyStrategy tags to internal_error
  cql: inline abstract_replication_strategy::validate_replication_strategy
  cql: clean redundant code validating replication strategy options
2025-02-05 11:18:50 +03:00
Jenkins Promoter
9add2ccc41 Update pgo profiles - x86_64 2025-02-05 08:44:54 +02:00
Jenkins Promoter
c7660e5962 Update pgo profiles - aarch64 2025-02-05 07:51:46 +02:00
Ferenc Szili
a59618e83d truncate: create session during request handling
Currently, the session ID under which the truncate for tablets request is
running is created during the request creation and queuing. This is a problem
because this could overwrite the session ID of any ongoing operation on
system.topology#session

This change moves the creation of the session ID for truncate from the request
creation to the request handling.

Fixes #22613

Closes scylladb/scylladb#22615
2025-02-04 22:11:24 +01:00
Botond Dénes
f2d5819645 reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload
with_permit() creates a permit, with a self-reference, to avoid
attaching a continuation to the permit's run function. This
self-reference is used to keep the permit alive, until the execution
loop processes it. This self reference has to be carefully cleared on
error-paths, otherwise the permit will become a zombie, effectively
leaking memory.
Instead of trying to handle all loose ends, get rid of this
self-reference altogether: ask caller to provide a place to save the
permit, where it will survive until the end of the call. This makes the
call-site a little bit less nice, but it gets rid of a whole class of
possible bugs.

Fixes: #22588

Closes scylladb/scylladb#22624
2025-02-04 21:27:16 +02:00
Ernest Zaslavsky
c911fc4f34 s3 creds: move credentials out of endpoint config
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
2025-02-04 16:45:23 +02:00
Andrzej Jackowski
fb118bfd3b config: remove unused set_value_on_all_shards(const YAML::Node&)
This change:
 - Remove unused set_value_on_all_shards(const YAML::Node&) member
   function in class config_file::named_value

The function logic was flawed, in a similar way
named_value<T>::set_value(const YAML::Node& node) is flawed: the config
source verification is insufficient for liveupdatable parameters,
allowing overwriting of non-liveupdatable config parameters (refer to
scylladb#5382). As the function was not used, it was removed instead of
fixing.
2025-02-04 15:09:23 +01:00
Michał Chojnowski
bea434f417 pgo: disable tablets for training with secondary index, lwt and counters
As of right now, materialized views (and consequently secondary
indexes), lwt and counters are unsupported or experimental with tablets.
Since by defaults tablets are enabled, training cases using those
features are currently broken.

The right thing to do here is to disable tablets in those cases.

Fixes https://github.com/scylladb/scylladb/issues/22638

Closes scylladb/scylladb#22661
2025-02-04 15:38:53 +02:00
Piotr Smaron
2953d3ebe0 cql: restore validating replication strategies options
`validate_options` needs to be extended with
`topology` parameter, because NetworkTopologyStrategy needs to validate if every
explicitly listed DC is really existing. I did cut corner a bit and
trimmed the message thrown when it's not the case, just to avoid passing
and extra parameter (ks name) to the `validate_options`
function, as I find the longer message to be a bit redundant (the driver will
receive info which KS modification failed).
The tests that have been commented out in the previous commit have been
restored.
2025-02-04 12:27:33 +01:00
Piotr Smaron
100e8d2856 cql: change validating NetworkTopologyStrategy tags to internal_error
The check for `replication_factor` tag in
`network_topology_strategy::validate_options` is redundant for 2 reasons:
- before we reach this part of the code, the `replication_factor` tag
  is replaced with specific DC names
- we actually do allow for `replication_factor` tag in
  NetworkTopologyStrategy for keyspaces that have tablets disabled.

This code is unreachable, hence changing it to an internal error, which
means this situation should never occur.
The place that unrolls `replication_factor` tag checked for presence of
this tag ignoring the case, which lead to an unexpected behaviour:
- `replication_factor` tag (note the lowercase) was unrolled, as
  explained above,
- the same tag but written in any other case resulted in throwing a vague
  message: "replication_factor is an option for SimpleStrategy, not
NetworkTopologyStrategy".

So we're changing this validation to accept and unroll only the
lowercase version of this tag. We can't ignore the case here, as this
tag is present inside a json, and json is case-sensitive, even though the
CQL itself is case insensitive.

Added a test that passes for both scylla and cassandra.

Fixes: #15336
2025-02-04 12:27:29 +01:00
Aleksandra Martyniuk
683176d3db tasks: add shard, start_time, and end_time to task_stats
task_stats contains short info about a task. To get a list of task_stats
in the module, one needs to request /task_manager/list_module_tasks/{module}.

To make identification and navigation between tasks easier, extend
task_stats to contain shard, start_time, and end_time.

Closes scylladb/scylladb#22351
2025-02-04 12:11:24 +02:00
Botond Dénes
8c8db2052e Merge 'service: add child for tablet repair virtual task' from Aleksandra Martyniuk
tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.

tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.

No backport needed; new functionality

Closes scylladb/scylladb#22372

* github.com:scylladb/scylladb:
  test: add test to check tablet repair child
  service: add child for tablet repair virtual task
2025-02-04 12:08:24 +02:00
Aleksandra Martyniuk
610a761ca2 service: use read barrier in tablet_virtual_task::contains
Currently, when the tablet repair is started, info regarding
the operation is kept in the system.tablets. The new tablet states
are reflected in memory after load_topology_state is called.
Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory
tablet_metadata. Hence, it may not see the operation, even though
its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will
see the latest data. Add a test to check it.

Fixes: #21975.

Closes scylladb/scylladb#21995
2025-02-04 12:07:42 +02:00
Avi Kivity
6913f054e7 Update tools/cqlsh submodule
The driver update makes cqlsh work well with Python 3.13.

* tools/cqlsh 52c6130...02ec7c5 (18):
  > chore(deps): update dependency scylla-driver to v3.28.2
  > dist: support smooth upgrade from enterprise to source availalbe
  > github action: fix downloading of artifacts
  > chore(deps): update docker/setup-buildx-action action to v3
  > chore(deps): update docker/login-action action to v3
  > chore(deps): update docker/build-push-action action to v6
  > chore(deps): update docker/setup-qemu-action action to v3
  > chore(deps): update peter-evans/dockerhub-description action to v4
  > upload actions: update the usage for multiple artifacts
  > chore(deps): update actions/download-artifact action to v4.1.8
  > chore(deps): update dependency scylla-driver to v3.28.0
  > chore(deps): update pypa/cibuildwheel action to v2.22.0
  > chore(deps): update actions/checkout action to v4
  > chore(deps): update python docker tag to v3.13
  > chore(deps): update actions/upload-artifact action to v4
  > github actions: update it to work
  > add option to output driver debug
  > Add renovate.json (#107)

Closes scylladb/scylladb#22593
2025-02-04 12:06:54 +02:00
Avi Kivity
f25636884a api: storage_service: break out set_storage_service lambdas into free functions
This was originally an attempt to reduce the compile time of this
translation unit, but apparently it doesn't work. Still, it has
the effect of converting stack traces that say "set_storage_service"
and refer to some lambda to stack traces that refer to the operation
being performed, so it's a net positive.

To faciliate the change, we introduce new functions rest_bind(),
similar to (and in fact wrapping) std::bind_front(), that capture
references like the lambdas did originally. We can't use
std::bind_front directly since the call to
seastar::httpd::path_description::set() cannot be disambiguated
after the function is obscured by the template returned by
std::bind_front. The new function rest_bind() has constraints
to understand which overload is in use.

Closes scylladb/scylladb#22526
2025-02-04 12:06:18 +02:00
Ran Regev
edd56a2c1c moved cache files to db
As requested in #22097, moved the files
and fixed other includes and build system.

Fixes: #22097
Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#22495
2025-02-04 12:21:31 +03:00
Pavel Emelyanov
e47c7d5255 Merge 'config: Improve internode_compression option validation and documentation' from Kefu Chai
This PR enhances the internode_compression configuration option in two ways:

1. Add validation for option values
   Previously, we silently defaulted to 'none' when given invalid values. Now we
   explicitly validate against the three supported values (all, dc, none) and
   reject invalid inputs. This provides better error messages when users
   misconfigure the option.

2. Fix documentation rendering
   The help text for this option previously used C++ escape sequences which
   rendered incorrectly in Sphinx-generated HTML. We now use bullet points with
   '*' prefix to list the available values, matching our documentation style
   for other config options. This ensures consistent rendering in both CLI
   and HTML outputs.

Note: The current documentation format puts type/default/liveness information
in the same bullet list as option values. This affects other config options
as well and will need to be addressed in a separate change.

---

this improves the handling of invalid option values, and improves the doc rendering, neither of which is critical. hence no need to backport.

Closes scylladb/scylladb#22548

* github.com:scylladb/scylladb:
  config: validate internode_compression option values
  config: start available options with '*'
2025-02-04 10:17:23 +03:00
Andrei Chekun
2a99494752 test.py: Remove workaround for python bug in asyncio
Bug https://bugs.python.org/issue26789 is resolved in python 3.10.
The frozen tool chain uses python 3.12. Since this is a supported and
recommended way for work environment, removing workaround and bumping
requirements for a newer python version.

Closes scylladb/scylladb#22627
2025-02-03 22:27:34 +02:00
David Garcia
fe4750ffc3 docs: fetch multiverson config from remote sources
fix: brand

Closes scylladb/scylladb#22616
2025-02-03 15:25:10 +02:00
Yaron Kaikov
4f832c31b9 .github/workflows/make-pr-ready-for-review: add missing permissions
Following the work done in ed4bfad5c3, the action is failing with the
following error:
```
Error: Input required and not supplied: token
```

It is due ot missing permissions in the workflow, adding it

Closes scylladb/scylladb#22630
2025-02-03 13:25:27 +02:00
Gleb Natapov
fe45ea505b topology_coordinator: demote barrier_and_drain rpc failure to warning
The failure may happen during normal operation as well (for instance if
leader changes).

Fixes: scylladb/scylladb#22364
2025-02-03 13:09:58 +02:00
Gleb Natapov
1da7d6bf02 topology_coordinator: read peers table only once during topology state application
During topology state application peers table may be updated with the
new ip->id mapping. The update is not atomic: it adds new mapping and
then removes the old one. If we call get_host_id_to_ip_map while this is
happening it may trigger an internal error there. This is a regression
since ef929c5def. Before that commit the
code read the peers table only once before starting the update loop.
This patch restores the behaviour.

Fixes: scylladb/scylladb#22578
2025-02-03 13:09:18 +02:00
Aleksandra Martyniuk
43427b8fe0 test: add test to check tablet repair child 2025-02-03 10:31:16 +01:00
Aleksandra Martyniuk
c23ce40f50 service: add child for tablet repair virtual task
tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.

tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.
2025-02-03 10:31:14 +01:00
Avi Kivity
d237d0a4ea Update seastar submodule
* seastar 71036ebcc0...5b95d1d798 (3):
  > rpc stream: do not abort stream queue if stream connection was closed without error
  > resource: fallback to sysconf when failed to detect memory size from hwloc
  > Merge 'scheduling_group: improve scheduling group creation exception safety' from Michael Litvak

scylla-gdb.py adjusted for scheduling_group_specific data structure
changes in Seastar. As part of that, a gratuitous dereference of
std::unique_ptr, which fails for std::unique_ptr<void*, ...>, was
removed.
2025-02-03 00:10:38 +02:00
Botond Dénes
e1b1a2068a reader_concurrency_semaphore: foreach_permit(): include _inactive_reads
So inactive reads show up in semaphore diagnostics dumps (currently the
only non-test user of this method).

Fixes: #22574

Closes scylladb/scylladb#22575
2025-01-30 22:46:57 +02:00
Michael Litvak
44c06ddfbb test/test_view_build_status: fix wrong assert in test
The test expects and asserts that after wait_for_view is completed we
read the view_build_status table and get a row for each node and view.
But this is wrong because wait_for_view may have read the table on one
node, and then we query the table on a different node that didn't insert
all the rows yet, so the assert could fail.

To fix it we change the test to retry and check that eventually all
expected rows are found and then eventually removed on the same host.

Fixes scylladb/scylladb#22547

Closes scylladb/scylladb#22585
2025-01-30 21:25:53 +02:00
Michael Litvak
6d34125eb7 view_builder: fix loop in view builder when tokens are moved
The view builder builds a view by going over the entire token ring,
consuming the base table partitions, and generating view updates for
each partition.

A view is considered as built when we complete a full cycle of the
token ring. Suppose we start to build a view at a token F. We will
consume all partitions with tokens starting at F until the maximum
token, then go back to the minimum token and consume all partitions
until F, and then we detect that we pass F and complete building the
view. This happens in the view builder consumer in
`check_for_built_views`.

The problem is that we check if we pass the first token F with the
condition `_step.current_token() >= it->first_token` whenever we consume
a new partition or the current_token goes back to the minimum token.
But suppose that we don't have any partitions with a token greater than
or equal to the first token (this could happen if the partition with
token F was moved to another node for example), then this condition will never be
satisfied, and we don't detect correctly when we pass F. Instead, we
go back to the minimum token, building the same token ranges again,
in a possibly infinite loop.

To fix this we add another step when reaching the end of the reader's
stream. When this happens it means we don't have any more fragments to
consume until the end of the range, so we advance the current_token to
the end of the range, simulating a partition, and check for built views
in that range.

Fixes scylladb/scylladb#21829

Closes scylladb/scylladb#22493
2025-01-30 14:35:18 +02:00
Nikos Dragazis
439862a8d4 test/cqlpy: Reproduce bug with exceeded limit on secondary index
Add two cqlpy tests that reproduce a bug where a secondary index query
returns more rows than the specified limit. This occurs when the indexed
column is a partition key column or the first clustering key column,
the query result spans multiple partitions, and the last partition
causes the limit to be exceeded.

`test/cqlpy/run --release ...` shows that the tests fail for Scylla
versions all the way back to 4.4.0. Older Scylla versions fail with a
syntax error in CQL query which suggests some incompatibility in the
CQL protocol. That said, this bug is not a regression.

The tests pass in Cassandra 5.0.2.

Refs #22158.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#22513
2025-01-30 13:24:15 +02:00
Kefu Chai
f39cfd8eb0 compaction: switch boost::algorithm::any_of to std::ranges::any_of
std::any_of was included by C++11, and boost::algorithm::any_of() is
provided by Boost for users stuck in the pre-C++11 era. in our case,
we've moved into C++23, where the ranges variant of this algorithm
is available.

in order to reduce the header dependency, let's switch to
`std::ranges::any_of()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22503
2025-01-30 13:22:33 +02:00
Artsiom Mishuta
03606b8e22 test.py:topology_random_failures: enable tests deselected for #21534
removed tests deselectios for issue scylladb/scylladb#21534
as it closed now

fixes: scylladb/scylladb#21711

Closes scylladb/scylladb#22424
2025-01-30 12:12:19 +01:00
Wojciech Mitros
677f9962cf mv: forbid views with tablets by default
Materialized views with tablets are not stable yet, but we want
them available as an experimental feature, mainly for teseting.

The feature was added in scylladb/scylladb#21833,
but currently it has no effect. All tests have been updated to use the
feature, so we should finally make it work.
This patch prevents users from creating materialized views in keyspaces
using tablets when the VIEWS_WITH_TABLETS feature is not enabled - such
requests will now get rejected.

Fixes scylladb/scylladb#21832

Closes scylladb/scylladb#22217
2025-01-30 12:10:47 +01:00
aberry-21
69a0431cce schema: add validation for PERCENTILE values in speculative_retry configuration
This commit addresses issue #21825, where invalid PERCENTILE values for
the `speculative_retry` setting were not properly handled, causing potential
server crashes. The valid range for PERCENTILE is between 0 and 100, as defined
in the documentation for speculative retry options, where values above 100 or
below 0 are invalid and should be rejected.

The added validation ensures that such invalid values are rejected with a clear
error message, improving system stability and user experience.

Fixes #21825

Closes scylladb/scylladb#21879
2025-01-30 11:34:46 +02:00
Yaron Kaikov
ed4bfad5c3 .github: add action to make PR ready for review when conflicts label was removed
Moving a PR out of draft is only allowed to users with write access,
adding a github action to switch PR to `ready for review` once the
`conflicts` label was removed

Closes scylladb/scylladb#22446
2025-01-30 11:33:25 +02:00
Nadav Har'El
698a63e14b test/alternator: test for invalid B value in UpdateItem
This patch adds an Alternator test for the case of UpdateItem attempting
to insert in invalid B (bytes) value into an item. Values of type B
use base64 encoding, and an attempt to insert a value which isn't
valid base64 should be rejected, and this is what this test verifies.

The new tests reproduce issue #17539, which claimed we have a bug in
this area. However, test/alternator/run with the "--release" option
shows that this bug existed in Scylla 5.2, but but fixed long ago, in
5.3 and doesn't exist in master. But we never had a regression test this
issue, so now we do.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22029
2025-01-30 11:33:03 +02:00
Botond Dénes
af46894bb7 Merge 'Rack aware view pairing' from Benny Halevy
Enabled with the tablets_rack_aware_view_pairing cluster feature
rack-aware pairing pairs base to view replicas that are in the
same dc and rack, using their ordinality in the replica map

We distinguish between 2 cases:
- Simple rack-aware pairing: when the replication factor in the dc
  is a multiple of the number of racks and the minimum number of nodes
  per rack in the dc is greater than or equal to rf / nr_racks.

  In this case (that includes the single rack case), all racks would
  have the same number of replicas, so we first filter all replicas
  by dc and rack, retaining their ordinality in the process, and
  finally, we pair between the base replicas and view replicas,
  that are in the same rack, using their original order in the
  tablet-map replica set.

  For example, nr_racks=2, rf=4:
  base_replicas = { N00, N01, N10, N11 }
  view_replicas = { N11, N12, N01, N02 }
  pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 }
  Note that we don't optimize for self-pairing if it breaks pairing ordinality.

- Complex rack-aware pairing: when the replication factor is not
  a multiple of nr_racks.  In this case, we attempt best-match
  pairing in all racks, using the minimum number of base or view replicas
  in each rack (given their global ordinality), while pairing all the other
  replicas, across racks, sorted by their ordinality.

  For example, nr_racks=4, rf=3:
  base_replicas = { N00, N10, N20 }
  view_replicas = { N11, N21, N31 }
  pairing would be: { N00, N31 }\*, { N10, N11 }, { N20, N21 }
  \* cross-rack pair

  If we'd simply stable-sort both base and view replicas by rack,
  we might end up with much worse pairing across racks:
  { N00, N11 }\*, { N10, N21 }\*, { N20, N31 }\*
  \* cross-rack pair

Fixes scylladb/scylladb#17147

* This is an improvement so no backport is required

Closes scylladb/scylladb#21453

* github.com:scylladb/scylladb:
  network_topology_strategy_test: add tablets rack_aware_view_pairing tests
  view: get_view_natural_endpoint: implement rack-aware pairing for tablets
  view: get_view_natural_endpoint: handle case when there are too few view replicas
  view: get_view_natural_endpoint: track replica locator::nodes
  locator: topology: consult local_dc_rack if node not found by host_id
  locator: node: add dc and rack getters
  feature_service: add tablet_rack_aware_view_pairing feature
  view: get_view_natural_endpoint: refactor predicate function
  view: get_view_natural_endpoint: clarify documentation
  view: mutate_MV: optimize remote_endpoints filtering check
  view: mutate_MV: lookup base and view erms synchronously
  view: mutate_MV: calculate keyspace-dependent flags once
2025-01-30 11:32:19 +02:00
Aleksandra Martyniuk
328818a50f replica: mark registry entry as synch after the table is added
When a replica get a write request it performs get_schema_for_write,
which waits until the schema is synced. However, database::add_column_family
marks a schema as synced before the table is added. Hence, the write may
see the schema as synced, but hit no_such_column_family as the table
hasn't been added yet.

Mark schema as synced after the table is added to database::_tables_metadata.

Fixes: #22347.

Closes scylladb/scylladb#22348
2025-01-30 11:30:07 +02:00
Aleksandra Martyniuk
477ad98b72 nodetool: tasks: print empty string for start_time/end_time if unspecified
If start_time/end_time is unspecified for a task, task_manager API
returns epoch. Nodetool prints the value in task status.

Fix nodetool tasks commands to print empty string for start_time/end_time
if it isn't specified.

Modify nodetool tasks status docs to show empty end_time.

Fixes: #22373.

Closes scylladb/scylladb#22370
2025-01-30 11:29:36 +02:00
Calle Wilund
7db14420b7 encryption: Fix encrypted components mask check in describe
Fixes #22401

In the fix for scylladb/scylla-enterprise#892, the extraction and check for sstable component encryption mask was copied
to a subroutine for description purposes, but a very important 1 << <value> shift was somehow
left on the floor.

Without this, the check for whether we actually contain a component encrypted can be wholly
broken for some components.

Closes scylladb/scylladb#22398
2025-01-30 11:29:13 +02:00
Botond Dénes
8e89f2e88e Merge 'audit: make categories, tables, and keyspaces liveupdatable' from Andrzej Jackowski
This change:
 - Remove code that prevented audit from starting if audit_categories,
   audit_tables, and audit_keyspaces are not configured
 - Set liveness::LiveUpdate for audit_categories, audit_tables,
   and audit_keyspaces
 - Keep const reference to db::config in audit, so current config values
   can be obtained by audit implementation
 - Implement function audit::update_config to parse given string, update
   audit datastructures when needed, and log the changes.
 - Add observers to call audit::update_config when categories,
   tables, or keyspaces configuration changes

New functionality, so no backport needed.
Fixes https://github.com/scylladb/scylla-enterprise/issues/1789

Closes scylladb/scylladb#22449

* github.com:scylladb/scylladb:
  audit: make categories, tables, and keyspaces liveupdatable
  audit: move static parsing functions above audit constructors
  audit: move statement_category to string conversion to static function
  audit: start audit even with empty categories/tables/keyspaces
2025-01-30 11:28:49 +02:00
Botond Dénes
d8b8a6c5fc Merge 'api: task_manager: do not unregister finish task when its status is queried' from Aleksandra Martyniuk
Currently, when the status of a task is queried and the task is already finished,
it gets unregistered. Getting the status shouldn't be a one-time operation.

Stop removing the task after its status is queried. Adjust tests not to rely
on this behavior. Add task_manager/drain API and nodetool tasks drain
command to remove finished tasks in the module.

Fixes: https://github.com/scylladb/scylladb/issues/21388.

It's a fix to task_manager API, should be backported to all branches

Closes scylladb/scylladb#22310

* github.com:scylladb/scylladb:
  api: task_manager: do not unregister tasks on get_status
  api: task_manager: add /task_manager/drain
2025-01-30 11:27:44 +02:00
Botond Dénes
98fdf05b0e Merge 'Fix repair vs storage services initialization order' from Pavel Emelyanov
Repair service is started after storage service, while storage service needs to reference repair one for its needs. Recently it was noticed, that this reverse order may cause troubles and was fixed with the help of an extra gate. That's not nice and makes the start-stop mess even worse. The correct fix is to fix the order both services start/stop in.

Closes scylladb/scylladb#22368

* github.com:scylladb/scylladb:
  Revert "repair: add repair_service gate"
  main: Start repair before storage service
  repair: Check for sharded<view-builder> when constructing row_level_repair
2025-01-30 11:26:24 +02:00
Nadav Har'El
98a8ae0552 test/alternator: functional tests for Alternator multi-item transactions
This patch adds extensive functional tests for the DynamoDB multi-item
transactions feature - the TransactWriteItems and TransactGetItems
requests. We add 43 test functions, spanning more than 1000 lines of code,
covering the different parameters and corner cases of these requests.

Because we don't support the transaction feature in Alternator yet (this
is issue #5064), all of these tests fail on Alternator but all of them
were tested to pass on DynamoDB. So all new tests are marked "xfail".

These tests will be handy for whoever will implement this feature as
an acceptance test, and can also be useful for whoever will just want to
understand this feature better - the tests are short and simple and
heavily commented.

Note that these tests only check the correct functionality of individual
calls of these requests - these tests cannot and do not check the
consistency or isolation guarantees of concurrent invocations of
several requests. Such tests would require a different test framework,
such as the one requested in issue #6350, and are therefore not part of
this patch.

Note that this patch includes ONLY tests, and does not mean that an
implementation of the feature will soon follow. In fact, nobody is
currently working on implementing this feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22239
2025-01-30 11:22:05 +02:00
Avi Kivity
000791ad5c README: adjust to reflect license change
Adjust the contact section to reflect the license change.

Closes scylladb/scylladb#22537
2025-01-30 10:28:32 +03:00
Kamil Braun
febd45861e test/lib: cql_test_env: make service shutdown more verbose
Introduce `defer_verbose_shutdown` in `cql_test_env` which logs
a message before and after shutting down a service, distinguishing
between success and failure.

The function is similar to the one in `main` but skips special error
handling logic applicable only to the main Scylla binary. The purpose
of the `cql_test_env` version of this function is only more verbose
logging. If necessary it can be extended in the future with additional
logic.

I estimated the impact on the size of produced log files using
`cdc_test` as an example:
```
$ build/dev/test/boost/combined_tests --run_test=cdc_test -- --smp=2 \
    >logfile 2>&1
$ du -b logfile
```

the result before this commit: 1964064 bytes, after: 2196432 bytes,
so estimated ~12% increase of log file size for boost tests that use
`cql_test_env`, assuming that the number of logs printed by each test is
similar to the logs printed by `cdc_test` (but I believe `cdc_test` is
one of the less verbose tests so this is an overestimate).

The motivation for this change is easier debugging of shutdown issues.
When investigating scylladb/scylladb#21983, where an exception is
thrown somewhere during the shutdown procedure, I found it hard to
pinpoint the service from which the exception originates. This change
will make it easier to debug issues like that by wrapping shutdown of
each service in a pair of messages logged when shutdown starts and when
it finishes (including when it fails). We should get more details on
this issue when it reproduces again in CI after this commit is merged
into `master`. (I failed to reproduce it locally with 1000 runs.)

Ref scylladb/scylladb#21983

Closes scylladb/scylladb#22566
2025-01-30 10:27:45 +03:00
Botond Dénes
5dd6fcfe6f Merge 'encrypted_file_impl: Check for reads on or past actual file length in transform' from Calle Wilund
Fixes #22236

If reading a file and not stopping on block bounds returned by `size()`, we could allow reading from (_file_size+&lt;1-15&gt;) (if crossing block boundary) and try to decrypt this buffer (last one).

Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return `_file_size` - `read offset` -> wraparound -> rather larger number than we expected (not to mention the data in question is junk/zero).

Check on last block in `transform` would wrap around size due to us being >= file size (l).
Just do an early bounds check and return zero if we're past the actual data limit.

Closes scylladb/scylladb#22395

* github.com:scylladb/scylladb:
  encrypted_file_test: Test reads beyond decrypted file length
  encrypted_file_impl: Check for reads on or past actual file length in transform
2025-01-29 20:09:32 +02:00
Anna Stuchlik
2a6445343c doc: update the Web Installer docs to remove OSS
Fixes https://github.com/scylladb/scylladb/issues/22292

Closes scylladb/scylladb#22433
2025-01-29 20:00:01 +02:00
Anna Stuchlik
caf598b118 doc: add SStable support in 2025.1
This commit adds the information about SStable version support in 2025.1
by replacing "2022.2" with "2022.2 and above".

In addition, this commit removes information about versions that are
no longer supported.

Fixes https://github.com/scylladb/scylladb/issues/22485

Closes scylladb/scylladb#22486
2025-01-29 19:59:24 +02:00
dependabot[bot]
962bd452f5 build(deps): bump sphinx-scylladb-theme from 1.8.3 to 1.8.5 in /docs
Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.8.3 to 1.8.5.
- [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases)
- [Commits](https://github.com/scylladb/sphinx-scylladb-theme/compare/1.8.3...1.8.5)

---
updated-dependencies:
- dependency-name: sphinx-scylladb-theme
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Closes scylladb/scylladb#22435
2025-01-29 19:57:33 +02:00
Pavel Emelyanov
4aff86ac64 sstables: Mark sstable::sstable_buffer_size const
It really never changes once set in constructor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22533
2025-01-29 19:51:22 +02:00
Avi Kivity
c71f383cf2 Merge 'sstables: use std::variant instead of boost::variant' from Botond Dénes
Continue replacing boost types with std one where possible.

Improvement, no backport needed

Closes scylladb/scylladb#22290

* github.com:scylladb/scylladb:
  sstables: disk_types: disk_set_of_tagged_union: boost::variant -> std::variant
  scylla-gdb.py: std_variant: fix get()
  sstables: disk_types: remove unused disk_tagged_union
2025-01-29 15:29:15 +02:00
Michał Chojnowski
80072eefe5 test/scylla_gdb: add more checks to coro_task()
test_coro_frame is flaky, as if
`service::topology_coordinator::run() [clone .resume]` wasn't running
on the shard. But it's supposed to.

Perhaps this is a bug in `find_vptrs()`?
This patch asks `scylla find` for a second opinion, and also prints
all `find_vptrs()`, to see if it's the only coroutine missing
from there.

Closes scylladb/scylladb#22534
2025-01-29 11:02:24 +02:00
Kefu Chai
e218a62a7a cdc,index: replace boost::ends_with() with .ends_with()
since C++20, std::string and std::string_view started providing
`ends_with()` member function, the same applies to `seastar::sstring`,
so there is no need to use `boost::ends_with()` anymore.

in this change, we switch from `boost::ends_with()` to the member
functions variant to

- improve the readability
- reduce the header dependency

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22502
2025-01-29 11:52:55 +03:00
Pavel Emelyanov
2eb06c6384 Update seastar submodule with recent IO scheduler improvement
* seastar 18221366...71036ebc (2):
  > Merge 'fair_queue: make the fair_group token grabbing discipline more fair' from Michał Chojnowski
      apps/io_tester: add some test cases for the IO scheduler
      test: in fair_queue_test, ensure that tokens are only replenished by test_env
      fair_queue: track the total capacity of queued requests
      fair_queue: make the fair_group token grabbing discipline more fair
  > scheduling: auto-detect scheduling group key rename() method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#22504
2025-01-29 11:39:12 +03:00
Kefu Chai
8d0cabb392 config: validate internode_compression option values
Previously, the internode_compression option silently defaulted to 'none'
for any unrecognized value instead of validating input. It only compared
against 'all' and 'dc', making it error-prone.

Add explicit validation for the three supported values:
- all
- dc
- none

This ensures invalid values are rejected both in command line and YAML
configuration, providing better error messages to users.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-29 14:52:35 +08:00
Kefu Chai
a81416862a config: start available options with '*'
use '*' prefix for config option values instead of escape sequences

The custom Sphinx extension that generates documentation from config.cc
help messages has issues with C++ escape sequences. For example,
"\tall: All traffic" renders incorrectly as "tall: All traffic" in HTML
output.

Instead of using escape sequences, switch to bullet-point style with '*'
prefix which works better in both CLI and HTML rendering. This matches
our existing documentation style for available option values in other
configs.

Note: This change puts type/default/liveness info in the same bullet
list as option values. This limitation affects other similar config
options and will need to be addressed comprehensively in a future
change.

Refs scylladb/scylladb#22423
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-29 14:52:35 +08:00
Dawid Pawlik
a68bf6dcc1 docs: add vector type documentation
Add missing vector type documentation including: definition of vector,
adjustment of term definition, JSON encoding, Lua and cql3 type
mapping, vector dimension limit, and keyword specification.
2025-01-28 21:14:49 +01:00
Jan Łakomy
947933366f cassandra_tests: translate tests covering the vector type
Add cql_vector_test which tests the basic functionalities of
the vector type using CQL.

Add vectors_test which tests if descending ordering of vector
is supported.
2025-01-28 21:14:49 +01:00
Jan Łakomy
84c92837e0 type_codec: add vector type encoding
This change has been introduced to enable CQL drivers to recognize
vector type in query results.

The encoding has been imported from Apache Cassandra implementation
to match Cassandra's and latest drivers' behaviour.

Co-authored-by: Dawid Pawlik <501149991dp@gmail.com>
2025-01-28 21:14:49 +01:00
Dawid Pawlik
489ab1345e boost/expr_test: add vector expression tests
Add and adjust tests using vector and list_or_vector style types.

Implemented utilities used in expr_test similar to those added
in 8f6309bd66.
2025-01-28 21:14:49 +01:00
Dawid Pawlik
ed49093a01 expression: adjust collection constructor list style
Like mentioned in the previous commit, this changes introduce usage
of vector style type and adjusts the functions using list style type
to distinguish vectors from lists.

Rename collection constructor style list to list_or_vector.
2025-01-28 21:14:49 +01:00
Dawid Pawlik
69c754f0d4 expression: add vector style type
Motivation for this changes is to provide a distinguishable interface
for vector type expressions.

The square bracket literal is ambigious for lists and vectors,
so that we need to perform a distinction not using CQL layer.
At first we should use the collection constructor to manage
both lists and vectors (although a vector is not a collection).
Later during preparation of expressions we should be able to get
to know the exact type using given receiver (column specification).

Knowing the type of expression we may use their respective style type
(in this case the vector style type being introduced),
which would make the implementation more precise and allow us to
evaluate the expressions properly.

This commit introduces vector style type and functions making use of it.

However vector style type is not yet used anywhere,
the next commit should adjust collection constructor and make use
of the new vector style type and it's features.
2025-01-28 21:14:49 +01:00
Dawid Pawlik
7554e55c2c test/boost: add vector type cql_env boost tests
These tests check serialization and deserialization (including JSON),
basic inserts and selects, aggregate functions, element validation,
vector usage in user defined types and functions.

test_vector_between_user_types is a translated Apache Cassandra test
to check if it is handled properly internally.
2025-01-28 21:14:49 +01:00
Dawid Pawlik
c3a1760a44 test/boost: add vector type_parser tests
Contains two type_parser tests: one for a valid vector
and another for invalid vector.
2025-01-28 21:14:49 +01:00
Dawid Pawlik
9954eb0ed7 type_parser: support vector type
This change is introduced due to lack of support for vector class name,
used by type_parser to create data_type based on given class name
(especially compound class name with inner types or other parameters).

Add function that parses vector type parameters from a class name.
2025-01-28 21:14:49 +01:00
Dawid Pawlik
aac10d261c cql3: add vector type syntax
Introduce vector_type CQL syntax: VECTOR<`cql_type`, `integer`>.
The parameters are respectively a type of elements of the vector
and the vector's dimension (number of elements).

Co-authored-by: Jan Łakomy <janpiotrlakomy@gmail.com>
2025-01-28 21:14:49 +01:00
Michael Litvak
4f5550d7f2 cdc: fix handling of new generation during raft upgrade
During raft upgrade, a node may gossip about a new CDC generation that
was propagated through raft. The node that receives the generation by
gossip may have not applied the raft update yet, and it will not find
the generation in the system tables. We should consider this error
non-fatal and retry to read until it succeeds or becomes obsolete.

Another issue is when we fail with a "fatal" exception and not retrying
to read, the cdc metadata is left in an inconsistent state that causes
further attempts to insert this CDC generation to fail.

What happens is we complete preparing the new generation by calling `prepare`,
we insert an empty entry for the generation's timestamp, and then we fail. The
next time we try to insert the generation, we skip inserting it because we see
that it already has an entry in the metadata and we determine that
there's nothing to do. But this is wrong, because the entry is empty,
and we should continue to insert the generation.

To fix it, we change `prepare` to return `true` when the entry already
exists but it's empty, indicating we should continue to insert the
generation.

Fixes scylladb/scylladb#21227

Closes scylladb/scylladb#22093
2025-01-28 18:05:32 +01:00
Kamil Braun
add97ccc15 Merge 'Do not update topology on address change' from Gleb Natapov
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated. The series factors out peers table update code from
sync_raft_topology_nodes() and calls it on topology and ip address
updates. As a side effect it fixes #22293 since now topology loading
does not require IP do be present, so the assert that is triggered in
this bug is removed.

Fixes: scylladb/scylladb#22293

Closes scylladb/scylladb#22519

* github.com:scylladb/scylladb:
  topology coordinator: do not update topology on address change
  topology coordinator: split out the peer table update functionality from raft state application
2025-01-28 12:52:29 +01:00
Avi Kivity
7f2d901c89 Merge 'repair: handle no_such_keyspace in repair preparation phase' from Aleksandra Martyniuk
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

Fixes: #22073.

Requires backport to 6.1 and 6.2 as they contain the bug

Closes scylladb/scylladb#22473

* github.com:scylladb/scylladb:
  test: add test to check if repair handles no_such_keyspace
  repair: handle keyspace dropped
2025-01-28 13:42:38 +02:00
Pavel Emelyanov
4d8d7f1f1d Merge 'backup_task: remove a component once it is uploaded ' from Kefu Chai
Previously, during backup, SSTable components are preserved in the
snapshot directory even after being uploaded. This leads to redundant
uploads in case of failed backups or restarts, wasting time and
resources (S3 API calls).

This change removes SSTable components from the snapshot directory once
they are successfully uploaded to the target location. This prevents
re-uploading the same files and reduces disk usage.

This change only "Refs" https://github.com/scylladb/scylladb/issues/20655, because, we can further optimize
the backup process, consider:

- Sending HEAD requests to S3 to check for existing files before uploading.
- Implementing support for resuming partially uploaded files.

Fixes https://github.com/scylladb/scylladb/issues/21799
Refs https://github.com/scylladb/scylladb/issues/20655

---

the backup API is not used in production yet, so no need to backport.

Closes scylladb/scylladb#22285

* github.com:scylladb/scylladb:
  backup_task: remove a component once it is uploaded
  backup_task: extract component upload logic into dedicated function
  snapshot-ctl: change snapshot_ctl::run_snapshot_modify_operation() to regular func
2025-01-28 14:27:50 +03:00
Kefu Chai
5da2691f05 cmake: remove redundant BINARY_DIR setting for Seastar
ExternalProject automatically creates BINARY_DIR for Seastar, but generator
expressions are not supported in this setting. This caused CMake to create
an unused "build/$<CONFIG>/seastar" directory.

Instead, define a dedicated variable matching configure.py's naming and use
it in supported options like BUILD_COMMAND. This:

- Creates build files in the standard "Seastar-prefix/src/Seastar-build"
  directory instead of "build/$<CONFIG>/seastar". see
  https://cmake.org/cmake/help/latest/module/ExternalProject.html#directory-options
- Makes it clearer that the variable should match configure.py settings

No functional changes to the Seastar build process - purely a cleanup to
reduce confusion when inspecting the build directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22437
2025-01-28 14:23:59 +03:00
Kefu Chai
57b14220ce tree: remove unused "#include"s
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

in which, instead of using `seastarx.hh`, `readers/mutation_reader.hh`,
use `using seastar::future` to include `future` in the global namespace,
this makes `readers/mutation_reader.hh` a header exposing `future<>`,
but this is not a good practice, because, unlike `seastarx.hh` or
`seastar/core/future.hh`, `reader/mutation_reader.hh`  is not
responsible for exposing seastar declarations. so, we trade the
using statement for `#include "seastarx.hh"` in that file to decouple
the source files including it from this header because of this statement.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22439
2025-01-28 14:12:06 +03:00
Kefu Chai
ce2d235c88 docs: correct typo of "abd" to "and"
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22442
2025-01-28 14:11:02 +03:00
Pavel Emelyanov
3b081b4839 Merge 'TLS: reduce inotify usage by sharing reloadability across shards' from Calle Wilund
Refs https://github.com/scylladb/seastar/issues/2513

Reloadable certificates use inotify instances. On a loaded test (CI) server, we've seen cases where we literally run out of capacity. This patch uses the extended callback and reload capability of seastar TLS to only create actual reloadable certificate objects on shard 0 for our main TLS points (encryption only does TLS on shard 0 already).

Closes scylladb/scylladb#22425

* github.com:scylladb/scylladb:
  alternator: Make server peering sharded and reuse reloadable certs
  messaging_service: Share reloadability of certificates across shards
  redis/controller: Reuse shard 0 reloadable certificates for all shards
  controller: Reuse shard 0 reloadable certificates for all shards
  generic_server: Allow sharing reloadability of certificates across shards
2025-01-28 14:08:17 +03:00
Tomasz Grabiec
50d9d5b98e Merge 'truncate: trigger truncate logic from a transition state instead of global topology request' from Ferenc Szili
Truncate table for tablets is implemented as a global topology operation. However, it does not have a transition state associated with it, and performs the truncate logic in `topology_coordinator::handle_global_request()` while `topology::tstate` remains empty. This creates problems because `topology::is_busy()` uses transition_state to determine if the topology state machine is busy, and will return false even though a truncate operation is ongoing.

This change introduces a new topology transition `topology::transition_state::truncate_table` and moves the truncate logic to a new method `topology_coordinator::handle_truncate_table()`. This method is now called as a handler of the `truncate_table` transition state instead of a handler of the `trunacate_table` global topology request.

This PR is a bugfix for truncate with tables and needs to be backported to 2025.1

Closes scylladb/scylladb#22452

* github.com:scylladb/scylladb:
  truncate: trigger truncate logic from transition state instead of global request handler
  truncate: add truncate_table transition state
2025-01-28 12:05:57 +01:00
Asias He
0682b1c716 repair: Skip hints and batchlog flush in case of nodes down
The flush api could not detect if the node is down and fail the flush
before the timeout. This patch detects if there is down node and skip
the flush if so, since the flush will fail after the timeout in this
case anyway.

The slowness due to the flush timeout in
compaction_test.py::TestCompaction::test_delete_tombstone_gc_node_down
is fixed with this patch.

Fixes #22413

Closes scylladb/scylladb#22445
2025-01-28 12:04:42 +01:00
Calle Wilund
4843711fbd alternator: Make server peering sharded and reuse reloadable certs
Reuse reloadability across shards by limiting reload to shard 0,
and use call to other shards to reload other shards certs.
2025-01-27 16:16:24 +00:00
Calle Wilund
15d1664a5c messaging_service: Share reloadability of certificates across shards
Only create reloadable cert object on shard 0, and call other shards on
reload callback to reload other shards "manually".
2025-01-27 16:16:24 +00:00
Calle Wilund
5f7c733b1e redis/controller: Reuse shard 0 reloadable certificates for all shards
Provide a getter to "listen" method and only use full reloadable
object on shard 0.
2025-01-27 16:16:24 +00:00
Calle Wilund
aab35e6806 controller: Reuse shard 0 reloadable certificates for all shards
Provide a getter to "listen" method and only use full reloadable
object on shard 0.
2025-01-27 16:16:23 +00:00
Calle Wilund
c59c87c233 generic_server: Allow sharing reloadability of certificates across shards
Adds an optional callback to "listen", returning the shard local object
instance. If provided, instead of creating a "full" reloadable cerificate
object, only do so on shard 0, and use callback to reload other shards
"manually".
2025-01-27 16:16:23 +00:00
Botond Dénes
b70dccb638 sstables: disk_types: disk_set_of_tagged_union: boost::variant -> std::variant
In the spirit of using standard-library types, instead of boost ones
where possible.
Although a disk type, it is serialized/deserialized with custom code, so
the change shouldn't cause any changes in the disk representation.
2025-01-27 09:29:26 -05:00
Botond Dénes
a095b3bb80 scylla-gdb.py: std_variant: fix get()
It calls self.get_with_type() with one too many params.
2025-01-27 09:29:26 -05:00
Botond Dénes
08f1aecc1e sstables: disk_types: remove unused disk_tagged_union 2025-01-27 09:29:26 -05:00
Anna Stuchlik
b2a718547f doc: remove Enterprise labels and directives
This PR removes the now redundant Enterprise labels and directives
from the ScyllDB documentation.

Fixes https://github.com/scylladb/scylladb/issues/22432

Closes scylladb/scylladb#22434
2025-01-27 16:01:48 +02:00
Asias He
0ab64551c5 storage_service: Reject nodetool removenode force
It is almost always a bad idea to run removenode force. This means a
node is removed without the remaining nodes to stream data that they
should own after the removal. This will make the cluster into a worse
state than a node being down.

One can use one of the following procedure instead:

1) Fix the dead node and move it back to the cluster

2) Run replace ops to replace the dead node

3) Run removenode ops again

We have seen misuse of nodetool removenode force by users again and
again. This patch rejects it so it can not be misused anymore.

Fixes scylladb/scylladb#15833

Closes scylladb/scylladb#15834
2025-01-27 14:50:18 +01:00
Anna Stuchlik
1d5ef3dddb doc: enable the FIPS note in the ScyllaDB docs
This commit removes the information about FIPS out of the '.. only:: enterprise' directive.
As a result, the information will now show in the doc in the ScyllaDB repo
(previously, the directive included the note in the Entrprise docs only).

Refs https://github.com/scylladb/scylla-enterprise/issues/5020

Closes scylladb/scylladb#22374
2025-01-27 15:48:54 +02:00
Calle Wilund
bae5b44b97 docs: Remove configuration_encryptor
Fixes #21993

Removes configuration_encryptor mention from docs.
The tool itself (java) is not included in the main branch
java tools, thus need not remove from there. Only the words.

Closes scylladb/scylladb#22427
2025-01-27 15:45:18 +02:00
Nikos Dragazis
2fb95e4e2f encrypted_file_test: Test reads beyond decrypted file length
Add a test to reproduce a bug in the read DMA API of
`encrypted_file_impl` (the file implementation for Encryption-at-Rest).

The test creates an encrypted file that contains padding, and then
attempts to read from an offset within the padding area. Although this
offset is invalid on the decrypted file, the `encrypted_file_impl` makes
no checks and proceeds with the decryption of padding data, which
eventually leads to bogus results.

Refs #22236.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 8f936b2cbc)
2025-01-27 13:19:37 +00:00
Calle Wilund
e96cc52668 encrypted_file_impl: Check for reads on or past actual file length in transform
Fixes #22236

If reading a file and not stopping on block bounds returned by `size()`, we could
allow reading from (_file_size+1-15) (block boundary) and try to decrypt this
buffer (last one).
Check on last block in `transform` would wrap around size due to us being >=
file size (l).

Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return _file_size - read offset
-> wraparound -> rather larger number than we expected
(not to mention the data in question is junk/zero).

Just do an early bounds check and return zero if we're past the actual data limit.

v2:
* Moved check to a min expression instead
* Added lengthy comment
* Added unit test

v3:
* Fixed read_dma_bulk handling of short, unaligned read
* Added test for unaligned read

v4:
* Added another unaligned test case
2025-01-27 13:19:37 +00:00
Avi Kivity
6b85c03221 Merge 'split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Ferenc Szili
`tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups.

The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that
`tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but
`tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

Fixes #22431

This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1

Closes scylladb/scylladb#22330

* github.com:scylladb/scylladb:
  test: add reproducer and test for fix to split ready CG creation
  table: run set_split_mode() on all storage groups during all_storage_groups_split()
2025-01-27 13:13:42 +01:00
Anna Stuchlik
61c822715c doc: add OS support for 2025.1 and reorganize the page
This commit adds the OS support information for version 2025.1.
In addition, the OS support page is reorganized so that:
- The content is moved from the include page _common/os-support-info.rst
  to the regular os-support.rst page. The include page was necessary
  to document different support for OSS and Enterprise versions, so
  we don't need it anymore.
- I skipped the entries for versions that won't be supported when 2025.1
  is released: 6.1 and 2023.1.
- I moved the definition of "supported" to the end of the page for better
  readability.
- I've renamed the index entry to "OS Support" to be shorter on the left menu.

Fixes https://github.com/scylladb/scylladb/issues/22474

Closes scylladb/scylladb#22476
2025-01-27 13:13:41 +01:00
Botond Dénes
9fc14f203b Merge 'Simplify loading_cache_test and use manual_clock' from Benny Halevy
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.

In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.

Fixes #20322

* The test was flaky forever, so backport is required for all live versions.

Closes scylladb/scylladb#22064

* github.com:scylladb/scylladb:
  tests: loading_cache_test: use manual_clock
  utils: loading_cache: make clock_type a template parameter
  test: loading_cache_test: use function-scope loader
  test: loading_cache_test: simlute loader using sleep
  test: lib: eventually: add sleep function param
  test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
2025-01-27 13:13:41 +01:00
Yaron Kaikov
f91128096d Update ScyllaDB version to: 2025.2.0-dev 2025-01-27 13:13:41 +01:00
Piotr Smaron
2a77405093 cql: inline abstract_replication_strategy::validate_replication_strategy
This function is only called from 1 place and only contains 2 lines of
code, just keeping it increases the code bloat
2025-01-27 12:13:45 +01:00
Piotr Smaron
3848293a43 cql: clean redundant code validating replication strategy options
Most of the code from `recognized_options` is either incorrect or lacks
any implementation, for example:
- comments for Everywhere and Local strategies are contradictory, first
  says to allow all options, second says that the strategy doesn't accept
any options, even though both functions have the same implementation,
- for Local & Everywhere strategies the same logic is repeated in
  `validate_options` member functions, i.e. this function does nothing,
- for NetworkTopology this function returns DC names and tablet options, but tablet
  options are empty; OTOH this strategy also accepts 'replication_factor'
tag, which was ommitted,
- for SimpleStrategy this function returns `replication_factor`, but this is also validated
  in `validate_options` function called just before the removed
function.
All of it makes `validate_replication_strategy` work incorrectly.
That being said, 3 tests fail because of this logic's removal, so it did
something after all. The failing tests are commented out, so that the CI
passes, and will be restored in the next commit(s).
2025-01-27 12:01:59 +01:00
Andrzej Jackowski
5651cc49ed audit: make categories, tables, and keyspaces liveupdatable
This change:
 - Set liveness::LiveUpdate for audit_categories, audit_tables,
   and audit_keyspaces
 - Keep const reference to db::config in audit, so current config values
   can be obtained by audit implementation
 - Implement function audit::update_config to parse given string, update
   audit datastructures when needed, and log the changes.
 - Add observers to call audit::update_config when categories,
   tables, or keyspaces configuration changes

Fixes scylladb/scylla-enterprise#1789
2025-01-27 11:37:13 +01:00
Andrzej Jackowski
5d4eb5d2dc audit: move static parsing functions above audit constructors
This change:
 - Swap static function and audit constructors in audit.cc

This is a preparatory commit for enabling liveupdate of audit
categories, tables, and keyspaces. It allows future use of static
parsing functions in audit constructor.
2025-01-27 11:35:35 +01:00
Andrzej Jackowski
609d7b2725 audit: move statement_category to string conversion to static function
This change:
 - Move audit_info::category_string to a new static function
 - Start using the new function in audit_info::category_string

This is a preparatory commit for enabling liveupdate of audit
categories, tables, and keyspaces. The newly created static function
will be required for proper logging of audit categories.
2025-01-27 11:35:35 +01:00
Andrzej Jackowski
99b4a79df0 audit: start audit even with empty categories/tables/keyspaces
This change:
 - Remove code that prevented audit from starting if audit_categories,
   audit_tables, and audit_keyspaces are not configured

This is a preparatory commit for enabling liveupdate of audit
categories, tables, and keyspaces. Without this change, audit is
not started for particular categories/tables/keyspaces setting and
it is unwanted behavior if customer can change audit configuration via
liveupdate.

This commit has performance implications if audit sink is set (meaning
"audit"="table" or "audit"="syslog" in the config) but categories,
tables, and keyspaces are not set to audit anything. Before this commit,
audit was not started, so some operations (like creating audit_info or
lookup in empty collections) were omitted.
2025-01-27 11:35:35 +01:00
Aleksandra Martyniuk
18cc79176a api: task_manager: do not unregister tasks on get_status
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.

The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.
2025-01-27 11:23:45 +01:00
Aleksandra Martyniuk
e37d1bcb98 api: task_manager: add /task_manager/drain
In the following patches, get_status won't be unregistering finished
tasks. However, tests need a functionality to drop a task, so that
they could manipulate only with the tasks for operations that were
invoked by these tests.

Add /task_manager/drain/{module} to unregister all finished tasks
from the module. Add respective nodetool command.
2025-01-27 11:23:45 +01:00
Aleksandra Martyniuk
54e7f2819c test: add test to check if repair handles no_such_keyspace 2025-01-27 09:49:50 +01:00
Aleksandra Martyniuk
bfb1704afa repair: handle keyspace dropped
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.
2025-01-27 09:37:47 +01:00
Avi Kivity
a23a3110b5 utils: config_file: forward_declare boost::program_options classes
Avoid pulling in boost dependencies when all we need is the class name.

Closes scylladb/scylladb#22453
2025-01-27 10:45:43 +03:00
Takuya ASADA
fb4c7dc3d8 dist: Support FIPS mode
- To make Scylla able to run in FIPS-compliant system, add .hmac files for
  crypto libraries on relocatable/rpm/deb packages.
- Currently we just write hmac value on *.hmac files, but there is new
  .hmac file format something like this:

  ```
  [global]
  format-version = 1
  [lib.xxx.so.yy]
  path = /lib64/libxxx.so.yy
  hmac = <hmac>
  ```
  Seems like GnuTLS rejects fips selftest on .libgnutls.so.30.hmac when
  file format is older one.
  Since we need to absolute path on "path" directive, we need to generate
  .libgnutls.so.30.hmac in older format on create-relocatable-script.py,

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes scylladb/scylladb#22384
2025-01-26 22:49:21 +02:00
Kefu Chai
0237913337 sstables: Migrate from boost::adaptors::indexed to std::views::enumerate
This change modernizes the codebase by:
- Replacing Boost's indexed adaptor with C++20's std::views::enumerate
- Removing unnecessary Boost header inclusion

With this change, we can:
- Reduce external dependencies
- Leverage standard library features
- Improve long-term code maintainability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22469
2025-01-26 20:51:14 +02:00
Jan Łakomy
9561ae5fc8 types: implement vector_type_impl
The vector is a fixed-length array of non-null
specified type elements.

Implement serialization, deserialization, comparison,
JSON and Lua support, and other functionalities.

Co-authored-by: Dawid Pawlik <501149991dp@gmail.com>
2025-01-26 19:36:41 +01:00
Gleb Natapov
fbfef6b28a topology coordinator: do not update topology on address change
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated, so call a function that does peers table update only.
2025-01-26 17:49:05 +02:00
Gleb Natapov
ef929c5def topology coordinator: split out the peer table update functionality from raft state application
Raft topology state application does two things: re-creates token metadata
and updates peers table if needed. The code for both task is intermixed
now. The patch separates it into separate functions. Will be needed in
the next patch.
2025-01-26 17:47:38 +02:00
Avi Kivity
60cdf62fae Merge 'Remove sharded<system_distributed_keyspace>& argument from storage_service::join_cluster()' from Pavel Emelyanov
There's such a reference on storage_service itself, it can use this->_sys_dist_ks instead thus making its API (both internal and external) a bit simpler.

Closes scylladb/scylladb#22483

* github.com:scylladb/scylladb:
  storage_service: Drop sys_dist_ks argument from track_upgrade_progress_to_topology_coordinator()
  storage_service: Drop sys_dist_ks argument from raft_state_monitor_fiber()
  storage_service: Drop sys_dist_ks argument from join_topology()
  storage_service: Drop sys_dist_ks argument from join_cluster()
2025-01-26 15:56:37 +02:00
Kefu Chai
769162de91 tree: correct misspellings
these misspellings were identified by codespell. let's fix them.
one of them is a part of a user visble string.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22443
2025-01-26 15:54:06 +02:00
Kefu Chai
d1c222d9bd config: specialize config_from_string() for sstring
Specialize config_from_string() for sstring to resolve lexical_cast
stream state parsing limitation. This enables correct handling of empty
string configurations, such as setting an empty value in CQL:

```cql
UPDATE system.config SET value='' WHERE
name='allowed_repair_based_node_ops';
```
Previous implementation using boost::lexical_cast would fail due to
EOF stream state, incorrectly rejecting valid empty string conversions.

Fixes scylladb/scylladb#22491
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22492
2025-01-26 15:53:12 +02:00
Kefu Chai
4a268362b9 compress: fix compressor initialization order by making namespace_prefix a function
Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be
initialized before compressor::namespace_prefix due to undefined
global variable initialization order across translation units. This
was causing ZstdCompressor to be unregistered in release builds,
making it impossible to create tables with Zstd compression.

Replace the global namespace_prefix variable with a function that
returns the fully qualified compressor name. This ensures proper
initialization order and fixes the registration of the ZstdCompressor.

Fixes scylladb/scylladb#22444
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22451
2025-01-26 13:43:02 +02:00
Pavel Emelyanov
856832911d storage_service: Drop sys_dist_ks argument from track_upgrade_progress_to_topology_coordinator()
It's unused argument. The only caller is relaxed too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-24 12:29:40 +03:00
Pavel Emelyanov
1e93f51977 storage_service: Drop sys_dist_ks argument from raft_state_monitor_fiber()
And the final drop of that kind -- switch to using this->_sys_dist_ks
here too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-24 12:29:03 +03:00
Pavel Emelyanov
248456cb9a storage_service: Drop sys_dist_ks argument from join_topology()
Similarly to previous patch, there's this->_sys_dist_ks thing

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-24 12:28:27 +03:00
Pavel Emelyanov
ca9b59f3b2 storage_service: Drop sys_dist_ks argument from join_cluster()
Storage service has _sys_dist_ks onboard and can just use it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-24 12:26:32 +03:00
Benny Halevy
32b7cab917 tests: loading_cache_test: use manual_clock
Relying on a real-time clock like lowres_clock
can be flaky (in particular in debug mode).
Use manual_clock instead to harden the test against
timing issues.

Fixes #20322

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-23 09:28:08 +02:00
Benny Halevy
0841483d68 utils: loading_cache: make clock_type a template parameter
So the unit test can use manual_clock rather than lowres_clock
which can be flaky (in particular in debug mode).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-23 09:28:08 +02:00
Benny Halevy
b258f8cc69 test: loading_cache_test: use function-scope loader
Rather than a global function, accessing a thread-local `load_count`.
The thread-local load_count cannot be used when multiple test
cases run in parallel.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-23 09:28:07 +02:00
Benny Halevy
d68829243f test: loading_cache_test: simlute loader using sleep
This test isn't about reading values from file,
but rather it's about the loading_cache.
Reading from the file can sometimes take longer than
the expected refresh times, causing flakiness (see #20322).

Rather than reading a string from a real file, just
sleep a random, short time, and co_return the string.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-23 09:28:07 +02:00
Benny Halevy
934a9d3fd6 test: lib: eventually: add sleep function param
To allow support for manual_clock instead of seastar::sleep.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-23 09:28:05 +02:00
Pavel Emelyanov
4edd327c4f Revert "repair: add repair_service gate"
This reverts commit 32ab58cdea.

Now repair service starts before and stops after storage server, so the
problem described in the commit is no longer relevant.
2025-01-22 19:25:56 +03:00
Pavel Emelyanov
fff5b8adbc main: Start repair before storage service
The latter service uses repair, but not the vice-versa, so the correct
(de)initialization order should be the same.

refs: #2737

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-22 19:21:40 +03:00
Pavel Emelyanov
c5aa185e1b repair: Check for sharded<view-builder> when constructing row_level_repair
Currently initialization order of repair and view-builder is not
correct, so there are several places in repair code that check for v.b.
to be initialized before doing anything. There's one more place that
needs that care -- the construction of row_level_repair object.

The class instantiates helper objects that reply on view_builder to be
fully initialized and is itself created by many other task types from
repair code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-22 19:21:40 +03:00
Benny Halevy
b509644972 test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
rather then macros.

This is a first cleanup step before adding a sleep function
parameter to support also manual_clock.

Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL,
respectively, to make an error more visible in the test log
since those entry points print the offending values
when not equal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 12:47:33 +02:00
Ferenc Szili
9fa254e9a8 truncate: trigger truncate logic from transition state instead of global
request handler

Before this change, the logic of truncate for tablets was triggered from
topology_coordinator::handle_global_request(). This was done without
using a topology transition state which remained empty throughout the
truncate handler's execution.

This change moves the truncate logic to a new method
topology_coordinator::handle_truncate_table(). This method is now called
as a handler of the truncate_table topology transition state instead of
a handler of the trunacate_table global topology request.
2025-01-22 11:08:26 +01:00
Ferenc Szili
29ead7014e truncate: add truncate_table transition state
Truncate table for tablets is implemented as a global topology operation.
However, it does not have a transition state associated with it, and
performs the truncate logic in handle_global_request() while
topology::tstate remains empty. This creates problems because
topology::is_busy() uses transition_state to determine if the topology
state machine is busy, and will return false even though a truncate
operation is ongoing.

This change adds a new transition state: truncate_table
2025-01-22 10:44:36 +01:00
Benny Halevy
dd21d591f6 network_topology_strategy_test: add tablets rack_aware_view_pairing tests
Test the simple case of base/view pairing with replication_factor
that is a multiple of the number of racks.

As well as the complex case when simple_tablets_rack_aware_view_pairing
is not possible.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
249b793674 view: get_view_natural_endpoint: implement rack-aware pairing for tablets
Enabled with the tablets_rack_aware_view_pairing cluster feature
rack-aware pairing pairs base to view replicas that are in the
same dc and rack, using their ordinality in the replica map

We distinguish between 2 cases:
- Simple rack-aware pairing: when the replication factor in the dc
  is a multiple of the number of racks and the minimum number of nodes
  per rack in the dc is greater than or equal to rf / nr_racks.

  In this case (that includes the single rack case), all racks would
  have the same number of replicas, so we first filter all replicas
  by dc and rack, retaining their ordinality in the process, and
  finally, we pair between the base replicas and view replicas,
  that are in the same rack, using their original order in the
  tablet-map replica set.

  For example, nr_racks=2, rf=4:
  base_replicas = { N00, N01, N10, N11 }
  view_replicas = { N11, N12, N01, N02 }
  pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 }
  Note that we don't optimize for self-pairing if it breaks pairing ordinality.

- Complex rack-aware pairing: when the replication factor is not
  a multiple of nr_racks.  In this case, we attempt best-match
  pairing in all racks, using the minimum number of base or view replicas
  in each rack (given their global ordinality), while pairing all the other
  replicas, across racks, sorted by their ordinality.

  For example, nr_racks=4, rf=3:
  base_replicas = { N00, N10, N20 }
  view_replicas = { N11, N21, N31 }
  pairing would be: { N00, N31 }*, { N10, N11 }, { N20, N21 }
  * cross-rack pair

  If we'd simply stable-sort both base and view replicas by rack,
  we might end up with much worse pairing across racks:
  { N00, N11 }*, { N10, N21 }*, { N20, N31 }*
  * cross-rack pair

Fixes scylladb/scylladb#17147

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
0e388a1594 view: get_view_natural_endpoint: handle case when there are too few view replicas
Currently, when reducing RF, we may drop replicas from
the view before dropping replicas from the base table.
Since get_view_natural_endpoint is allowed to return
a disengaged optional if it can't find a pair for the
base replica, replcace the exiting assertion with code
handling this case, and count those events in a new
table metric: total_view_updates_failed_pairing.

Note that this does not fix the root cause for the issue
which is the unsynchronized dropping of replicas, that
should be atomic, using a single group0 transaction.

Refs scylladb/scylladb#21492

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
858b0a51f8 view: get_view_natural_endpoint: track replica locator::nodes
Rather than tracking only the replica host_id, keep
track of the locator:::node& to prepare for
rack-aware pairing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
2589115337 locator: topology: consult local_dc_rack if node not found by host_id
Like get_location by inet_address, there is a case, when
a node is replaced that the node cannot be found by host_id.
Currently get_location would return a reference based on the nullptr
which might cause a segfault as seen in testing.  Instead,
if the host_id is of the location, revert to calling
get_location() which consults this_node or _cfg.local_dc_rack.
Otherwise, throw a runtime_error.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
2bfebc1f62 locator: node: add dc and rack getters
To simplify searching and sorting using std::ranges
projection using std::mem_fn.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
6f8f03f593 feature_service: add tablet_rack_aware_view_pairing feature
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
cadd33bdf6 view: get_view_natural_endpoint: refactor predicate function
Simplify the function logic by calculating the predicate
function once, before scanning all base and view replicas,
rather than testing the different options in the inner loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
97f85e52f7 view: get_view_natural_endpoint: clarify documentation
"self-pairing" is enabled only when use_legacy_self_pairing
is enabled.  That is currently unclear in the documentation
comment for this function.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
6d4de30a3a view: mutate_MV: optimize remote_endpoints filtering check
Currently we always lookup both `my_address` and *target_endpoint
in remote_endpoints. But if my_address is in remote_endpoints
in some cases the second lookup is not needed, so do it only
to decide whether to swap target_endpoint with my_address, if
found in remote_endpoints, or to remove that match, if
*target_endpoint is already pending as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
91d3bf8ebc view: mutate_MV: lookup base and view erms synchronously
Although at the moment storage_service::replicate_to_all_cores
may yield between updating the base and view tables with
a new effective_replication_map, scylladb/scylladb#21781
was submitted to change that so that they are updated
atomically together.

This change prepares for the above change, and is harmless
at the moment.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Benny Halevy
d04cdce0fc view: mutate_MV: calculate keyspace-dependent flags once
All view live in the same keyspace as their base
table, so calculate the keyspace-dependent flags
once, outside the per-view update loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Kefu Chai
8080658df7 backup_task: remove a component once it is uploaded
Previously, during backup, SSTable components are preserved in the
snapshot directory even after being uploaded. This leads to redundant
uploads in case of failed backups or restarts, wasting time and
resources (S3 API calls).

This change

- adds an optional query parameter named "move_files" to
  "/storage_service/backup" API. if it is set to "true", SSTable
  components are removed once they are backed up to object storage.
- conditionally removes SSTable components from the snapshot directory once
  they are successfully uploaded to the target location. This prevents
  re-uploading the same files and reduces disk usage.

This change only "Refs" #20655, because, we can move further optimize
the backup process, consider:

- Sending HEAD requests to S3 to check for existing files before uploading.
- Implementing support for resuming partially uploaded files.

Fixes #21799
Refs #20655

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-22 11:17:01 +08:00
Kefu Chai
32d22371b9 backup_task: extract component upload logic into dedicated function
Extract upload_component() from backup_task_impl::do_backup() to improve
readability and prepare for optional post-upload cleanup. This refactoring
simplifies the main backup flow by isolating the upload logic into its own
function.

The change is motivated by an upcoming feature that will allow optional
deletion of components after successful upload, which would otherwise add
complexity to do_backup().

Refs scylladb/scylladb#21799

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-22 11:17:01 +08:00
Kefu Chai
ded31d1917 snapshot-ctl: change snapshot_ctl::run_snapshot_modify_operation() to regular func
instead of implementing `snapshot_ctl::run_snapshot_modify_operation()` as
a template function, let it accept as plain noncopyable_function
instance, and return `future<>`.

Previously, `snapshot_ctl::run_snapshot_modify_operation` was a template
function that accepted a templated functor parameter. This approach
limited its usability because callers needed to be defined in the same
translation unit as the template implementation.

however, `backup_task_impl` is defined in another translation unit,
and we intend to call `snapshot_ctl::run_snapshot_modify_operation()`
in its implementation. so in order to cater this need, there are two
options:

1. to move the definition of the template function into the header file.
   but the downside is that this slows down the compilation by increaing
   the size of header.
2. to change the template function to a regular function. This change
   restricts the function's parameter to a specific signature. However, all
   current callers already return a `future<>` object, so there's minimal
   impact.

in this change, we implement the second option. this allows us to call
this function from another translation unit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-22 11:17:01 +08:00
Ferenc Szili
8bff7786a8 test: add reproducer and test for fix to split ready CG creation
This adds a reproducer for #22431

In cases where a tablet storage group manager had more than one storage
group, it was possible to create compaction groups outside the group0
guard, which could create problems with operations which should exclude
with compaction group creation.
2025-01-21 18:43:10 +01:00
Ferenc Szili
24e8d2a55c table: run set_split_mode() on all storage groups during all_storage_groups_split()
tablet_storage_group_manager::all_storage_groups_split() calls set_split_mode()
for each of its storage groups to create split ready compaction groups. It does
this by iterating through storage groups using std::ranges::all_of() which is
not guaranteed to iterate through the entire range, and will stop iterating on
the first occurance of the predicate (set_split_mode()) returning false.
set_split_mode() creates the split compaction groups and returns false if the
storage group's main compaction group or merging groups are not empty. This
means that in cases where the tablet storage group manager has non-empty
storage groups, we could have a situation where split compaction groups are not
created for all storage groups.

The missing split compaction groups are later created in
tablet_storage_group_manager::split_all_storage_groups() which also calls
set_split_mode(), and that is the reason why split completes successfully. The
problem is that tablet_storage_group_manager::all_storage_groups_split() runs
under a group0 guard, and tablet_storage_group_manager::split_all_storage_groups()
does not. This can cause problems with operations which should exclude with
compaction group creation. i.e. DROP TABLE/DROP KEYSPACE
2025-01-21 18:42:53 +01:00
Calle Wilund
48fda00f12 tools: Add standard extensions and propagate to schema load
Fixes #22314

Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.
2025-01-15 12:10:23 +00:00
Calle Wilund
00b40eada3 cql_test_env: Use add all extensions instead of inidividually 2025-01-15 12:08:09 +00:00
Calle Wilund
4aaf3df45e main: Move extensions adding to function
Easily called from elsewhere. The extensions we should always include (oxymoron?)
2025-01-15 12:07:39 +00:00
Calle Wilund
e6aa09e319 tomstone_gc: Make validate work for tools
Don't crash if validation is done as part of loading a schema from file (schema.cql)
2025-01-15 12:06:02 +00:00
Pavel Emelyanov
65f52db3a8 api: Hide parse_tables() helper
It's no longer used outside of api/storage_service.cc file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:08 +03:00
Pavel Emelyanov
cf0dc8f90a api: Use parse_table_infos() in stop_keyspace_compaction handler
It now parses only table names from its "cf" argument. Parsing
table_infos has two benefits -- it makes it possible to hide
parse_tables() thus keeping less validation code around, and the
subsequent db.find_column_family() call can avoid re-lookup of table
uuid by its ks:table pair.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
fb09a645b8 api: Re-use parse_table_info() in column_family API
Several places call parse_fully_qualified_cf_name() and get_uuid()
helpers one after another. Previous patch introduced the
parse_table_info() one that wraps both.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
789f468f39 api: Make get_uuid() return table_info (and rename)
The method gets "fully qualified" table name, which is 'ks:cf' string
and returns back the resolved table_id value. Some callers will benefit
from knowing the parsed 'cf' part of it (see next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
b55c05c9d0 api: Remove keyspace argument from for_table_on_all_shards()
This argument is needed to find table by ks:cf prair. The "table" part
is taken from the vector of table_info-s, but table_info-s have table_id
value onboard, and the table can be found by this id. So keyspace is not
needed any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
84ad9fe82b api: Switch for_table_on_all_shards() to use table_info-s
All callers of it already have one. Next patch will make even more use
of those passed table_info-s.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
87cdf25891 api: Hide validate_table() helper
It's no longer used outside of api/storage_service.cc. It's not yet
possible to remove it completely, but it's better not to encourage
others to use it outside of its current .cc file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
5a038fba39 api: Tables vector is never empty now in for_table_on_all_shards()
Callers of this method provide vectors of two kinds:
- explicitly single-entry one from endpoints that work on single table
- vector returned by parse_table_infos()

The latter helper, if it gets empty list of tables from user, populates
its return value with all tables from the given keyspace.

The removed check became obsolete after recent changes. Prior to those,
the 2nd case provided vector from another helper called parse_tables(),
which could return empty result.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
2016b12252 api: Move vectors of tables, not copy
The set_tables_...() helper called here accept vector by value, so the
existing code copies it. It's better to move, all the more so next
changes will make this place pass vectors with more data onboard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
bf715ca614 api: Add table validation to set_compaction_strategy_class endpoint
This handler doesn't check if the requested table exists. If it doesn't
it will throw later anyway, but most of other endpoints that work with
tables check table early. This early check allows throwing bad-param
exception on missing table, not internal-server-error one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
e35245de36 api: Use get_uuid() to validate_table() in column family API
This helper returns uuid, but also "Validates" the table exists by
calling db.find_uuid() and throwing bad_param exception on error.
This change will allow making for_table_on_all_shards() smaller a bit
later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
Pavel Emelyanov
6ab5bade21 api: Use parse_table_infos() in column family API
The one is the same as parse_tables(), but returns back name:id pairs.
This change will allow making for_table_on_all_shards() smaller a bit
later, as well as removing the parse_tables() code eventually.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
1037 changed files with 20555 additions and 25426 deletions

View File

@@ -29,10 +29,11 @@ def parse_args():
parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')
parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')
parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')
parser.add_argument('--github-event', type=str, help='Get GitHub event type')
return parser.parse_args()
def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):
def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft, is_collaborator):
pr_body = f'{pr.body}\n\n'
for commit in commits:
pr_body += f'- (cherry picked from commit {commit})\n\n'
@@ -46,11 +47,12 @@ def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr
draft=is_draft
)
logging.info(f"Pull request created: {backport_pr.html_url}")
backport_pr.add_to_assignees(pr.user)
if is_collaborator:
backport_pr.add_to_assignees(pr.user)
if is_draft:
backport_pr.add_to_labels("conflicts")
pr_comment = f"@{pr.user.login} - This PR was marked as draft because it has conflicts\n"
pr_comment += "Please resolve them and remove the 'conflicts' label. The PR will be made ready for review automatically."
pr_comment += "Please resolve them and mark this PR as ready for review"
backport_pr.create_issue_comment(pr_comment)
logging.info(f"Assigned PR to original author: {pr.user}")
return backport_pr
@@ -92,18 +94,7 @@ def get_pr_commits(repo, pr, stable_branch, start_commit=None):
return commits
def create_pr_comment_and_remove_label(pr, comment_body):
labels = pr.get_labels()
pattern = re.compile(r"backport/\d+\.\d+$")
for label in labels:
if pattern.match(label.name):
print(f"Removing label: {label.name}")
comment_body += f'- {label.name}\n'
pr.remove_from_labels(label)
pr.create_issue_comment(comment_body)
def backport(repo, pr, version, commits, backport_base_branch):
def backport(repo, pr, version, commits, backport_base_branch, is_collaborator):
new_branch_name = f'backport/{pr.number}/to-{version}'
backport_pr_title = f'[Backport {version}] {pr.title}'
repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'
@@ -121,46 +112,29 @@ def backport(repo, pr, version, commits, backport_base_branch):
is_draft = True
repo_local.git.add(A=True)
repo_local.git.cherry_pick('--continue')
repo_local.git.push(fork_repo, new_branch_name, force=True)
create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
is_draft, is_collaborator)
# Check if the branch already exists in the remote fork
remote_refs = repo_local.git.ls_remote('--heads', fork_repo, new_branch_name)
if not remote_refs:
# Branch does not exist, create it with a regular push
repo_local.git.push(fork_repo, new_branch_name)
create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
is_draft)
else:
logging.info(f"Remote branch {new_branch_name} already exists in fork. Skipping push.")
except GitCommandError as e:
logging.warning(f"GitCommandError: {e}")
def with_github_keyword_prefix(repo, pr):
# GitHub issue pattern: #123, scylladb/scylladb#123, or full GitHub URLs
github_pattern = rf"(?:fix(?:|es|ed))\s*:?\s*(?:(?:(?:{repo.full_name})?#)|https://github\.com/{repo.full_name}/issues/)(\d+)"
# JIRA issue pattern: PKG-92 or https://scylladb.atlassian.net/browse/PKG-92
jira_pattern = r"(?:fix(?:|es|ed))\s*:?\s*(?:(?:https://scylladb\.atlassian\.net/browse/)?([A-Z]+-\d+))"
# Check PR body for GitHub issues
github_match = re.findall(github_pattern, pr.body, re.IGNORECASE)
# Check PR body for JIRA issues
jira_match = re.findall(jira_pattern, pr.body, re.IGNORECASE)
match = github_match or jira_match
if match:
pattern = rf"(?:fix(?:|es|ed))\s*:?\s*(?:(?:(?:{repo.full_name})?#)|https://github\.com/{repo.full_name}/issues/)(\d+)"
match = re.findall(pattern, pr.body, re.IGNORECASE)
if not match:
for commit in pr.get_commits():
match = re.findall(pattern, commit.commit.message, re.IGNORECASE)
if match:
print(f'{pr.number} has a valid close reference in commit message {commit.sha}')
break
if not match:
print(f'No valid close reference for {pr.number}')
return False
else:
return True
for commit in pr.get_commits():
github_match = re.findall(github_pattern, commit.commit.message, re.IGNORECASE)
jira_match = re.findall(jira_pattern, commit.commit.message, re.IGNORECASE)
if github_match or jira_match:
print(f'{pr.number} has a valid close reference in commit message {commit.sha}')
return True
print(f'No valid close reference for {pr.number}')
return False
def main():
args = parse_args()
@@ -181,6 +155,7 @@ def main():
scylladbbot_repo = g.get_repo(fork_repo_name)
closed_prs = []
start_commit = None
is_collaborator = True
if args.commits:
start_commit, end_commit = args.commits.split('..')
@@ -205,21 +180,33 @@ def main():
if not backport_labels:
print(f'no backport label: {pr.number}')
continue
if args.commits and not with_github_keyword_prefix(repo, pr):
if not with_github_keyword_prefix(repo, pr) and args.github_event != 'unlabeled':
comment = f''':warning: @{pr.user.login} PR body or PR commits do not contain a Fixes reference to an issue and can not be backported
please update PR body with a valid ref to an issue. Then remove `scylladbbot/backport_error` label to re-trigger the backport process
'''
pr.create_issue_comment(comment)
pr.add_to_labels("scylladbbot/backport_error")
continue
if not repo.private and not scylladbbot_repo.has_in_collaborators(pr.user.login):
logging.info(f"Sending an invite to {pr.user.login} to become a collaborator to {scylladbbot_repo.full_name} ")
scylladbbot_repo.add_to_collaborators(pr.user.login)
comment = f':warning: @{pr.user.login} you have been added as collaborator to scylladbbot fork '
comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again\n'
create_pr_comment_and_remove_label(pr, comment)
continue
comment = f''':warning: @{pr.user.login} you have been added as collaborator to scylladbbot fork
Please check your inbox and approve the invitation, otherwise you will not be able to edit PR branch when needed
'''
# When a pull request is pending for backport but its author is not yet a collaborator of "scylladbbot",
# we attach a "scylladbbot/backport_error" label to the PR.
# This prevents the workflow from proceeding with the backport process
# until the author has been granted proper permissions
# the author should remove the label manually to re-trigger the backport workflow.
pr.add_to_labels("scylladbbot/backport_error")
pr.create_issue_comment(comment)
is_collaborator = False
commits = get_pr_commits(repo, pr, stable_branch, start_commit)
logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")
for backport_label in backport_labels:
version = backport_label.replace('backport/', '')
backport_base_branch = backport_label.replace('backport/', backport_branch)
backport(repo, pr, version, commits, backport_base_branch)
backport(repo, pr, version, commits, backport_base_branch, is_collaborator)
if __name__ == "__main__":

81
.github/scripts/check-license.py vendored Executable file
View File

@@ -0,0 +1,81 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Copyright (C) 2024-present ScyllaDB
#
#
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
#
import argparse
import sys
from pathlib import Path
from typing import Set
def parse_args() -> argparse.Namespace:
"""Parses command-line arguments."""
parser = argparse.ArgumentParser(description='Check license headers in files')
parser.add_argument('--files', required=True, nargs="+", type=Path,
help='List of files to check')
parser.add_argument('--license', required=True,
help='License to check for')
parser.add_argument('--check-lines', type=int, default=10,
help='Number of lines to check (default: %(default)s)')
parser.add_argument('--extensions', required=True, nargs="+",
help='List of file extensions to check')
parser.add_argument('--verbose', action='store_true',
help='Print verbose output (default: %(default)s)')
return parser.parse_args()
def should_check_file(file_path: Path, allowed_extensions: Set[str]) -> bool:
return file_path.suffix in allowed_extensions
def check_license_header(file_path: Path, license_header: str, check_lines: int) -> bool:
try:
with open(file_path, 'r', encoding='utf-8') as f:
for _ in range(check_lines):
line = f.readline()
if license_header in line:
return True
return False
except (UnicodeDecodeError, StopIteration):
# Handle files that can't be read as text or have fewer lines
return False
def main() -> int:
args = parse_args()
if not args.files:
print("No files to check")
return 0
num_errors = 0
for file_path in args.files:
# Skip non-existent files
if not file_path.exists():
continue
# Skip files with non-matching extensions
if not should_check_file(file_path, args.extensions):
print(f" Skipping file with unchecked extension: {file_path}")
continue
# Check license header
if check_license_header(file_path, args.license, args.check_lines):
if args.verbose:
print(f"✅ License header found in: {file_path}")
else:
print(f"❌ Missing license header in: {file_path}")
num_errors += 1
if num_errors > 0:
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -7,7 +7,7 @@ on:
- branch-*.*
- enterprise
pull_request_target:
types: [labeled]
types: [labeled, unlabeled]
branches: [master, next, enterprise]
jobs:
@@ -53,19 +53,28 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}
- name: Check if label starts with 'backport/' and contains digits
- name: Check if a valid backport label exists and no backport_error
id: check_label
run: |
label_name="${{ github.event.label.name }}"
if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then
echo "Label matches backport/X.X pattern."
echo "backport_label=true" >> $GITHUB_OUTPUT
labels_json='${{ toJson(github.event.pull_request.labels) }}'
echo "Checking labels: $(echo "$labels_json" | jq -r '.[].name')"
# Check if a valid backport label exists
if echo "$labels_json" | jq -e 'any(.[] | .name; test("backport/[0-9]+\\.[0-9]+$"))' > /dev/null; then
# Ensure scylladbbot/backport_error is NOT present
if ! echo "$labels_json" | jq -e '.[] | select(.name == "scylladbbot/backport_error")' > /dev/null; then
echo "A matching backport label was found and no backport_error label exists."
echo "ready_for_backport=true" >> "$GITHUB_OUTPUT"
exit 0
else
echo "The label 'scylladbbot/backport_error' is present, invalidating backport."
fi
else
echo "Label does not match the required pattern."
echo "backport_label=false" >> $GITHUB_OUTPUT
echo "No matching backport label found."
fi
- name: Run auto-backport.py when label was added
if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}
echo "ready_for_backport=false" >> "$GITHUB_OUTPUT"
- name: Run auto-backport.py when PR is closed
if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.ready_for_backport == 'true' && github.event.pull_request.state == 'closed' }}
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }} --github-event ${{ github.event.action }}

View File

@@ -18,7 +18,7 @@ jobs:
// Regular expression pattern to check for "Fixes" prefix
// Adjusted to dynamically insert the repository full name
const pattern = `Fixes:? ((?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)|(?:https://scylladb\\.atlassian\\.net/browse/)?([A-Z]+-\\d+))`;
const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;
const regex = new RegExp(pattern);
if (!regex.test(body)) {

View File

@@ -1,53 +0,0 @@
name: Backport with Jira Integration
on:
push:
branches:
- master
- next-*.*
- branch-*.*
pull_request_target:
types: [labeled, closed]
branches:
- master
- next
- next-*.*
- branch-*.*
jobs:
backport-on-push:
if: github.event_name == 'push'
uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
with:
event_type: 'push'
base_branch: ${{ github.ref }}
commits: ${{ github.event.before }}..${{ github.sha }}
secrets:
gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
backport-on-label:
if: github.event_name == 'pull_request_target' && github.event.action == 'labeled'
uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
with:
event_type: 'labeled'
base_branch: refs/heads/${{ github.event.pull_request.base.ref }}
pull_request_number: ${{ github.event.pull_request.number }}
head_commit: ${{ github.event.pull_request.base.sha }}
label_name: ${{ github.event.label.name }}
pr_state: ${{ github.event.pull_request.state }}
secrets:
gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
backport-chain:
if: github.event_name == 'pull_request_target' && github.event.action == 'closed' && github.event.pull_request.merged == true
uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
with:
event_type: 'chain'
base_branch: refs/heads/${{ github.event.pull_request.base.ref }}
pull_request_number: ${{ github.event.pull_request.number }}
pr_body: ${{ github.event.pull_request.body }}
secrets:
gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -0,0 +1,52 @@
name: License Header Check
on:
pull_request:
types: [opened, synchronize, reopened]
branches: [master]
env:
HEADER_CHECK_LINES: 10
LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.0"
CHECKED_EXTENSIONS: ".cc .hh .py"
jobs:
check-license-headers:
name: Check License Headers
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changed-files
run: |
# Get list of added files comparing with base branch
echo "files=$(git diff --name-only --diff-filter=A ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | tr '\n' ' ')" >> $GITHUB_OUTPUT
- name: Check license headers
if: steps.changed-files.outputs.files != ''
run: |
.github/scripts/check-license.py \
--files ${{ steps.changed-files.outputs.files }} \
--license "${{ env.LICENSE }}" \
--check-lines "${{ env.HEADER_CHECK_LINES }}" \
--extensions ${{ env.CHECKED_EXTENSIONS }}
- name: Comment on PR if check fails
if: failure()
uses: actions/github-script@v7
with:
script: |
const license = '${{ env.LICENSE }}';
await github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `❌ License header check failed. Please ensure all new files include the header within the first ${{ env.HEADER_CHECK_LINES }} lines:\n\`\`\`\n${license}\n\`\`\`\nSee action logs for details.`
});

View File

@@ -7,7 +7,7 @@ on:
env:
# use the development branch explicitly
CLANG_VERSION: 20
CLANG_VERSION: 21
BUILD_DIR: build
permissions: {}

View File

@@ -34,6 +34,7 @@ jobs:
name: Run clang-tidy
needs:
- read-toolchain
if: "${{ needs.read-toolchain.result == 'success' }}"
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
steps:

View File

@@ -12,15 +12,10 @@ jobs:
mark-ready:
if: github.event.label.name == 'conflicts'
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ env.DEFAULT_BRANCH }}
token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
fetch-depth: 1
- name: Mark pull request as ready for review
run: gh pr ready "${{ github.event.pull_request.number }}"
env:

View File

@@ -17,6 +17,6 @@ jobs:
with:
mode: minimum
count: 1
labels: "backport/none\nbackport/\\d.\\d"
labels: "backport/none\nbackport/\\d{4}\\.\\d+\nbackport/\\d+\\.\\d+"
use_regex: true
add_comment: false

50
.github/workflows/trigger_jenkins.yaml vendored Normal file
View File

@@ -0,0 +1,50 @@
name: Trigger next gating
on:
push:
branches:
- next**
jobs:
trigger-jenkins:
runs-on: ubuntu-latest
steps:
- name: Determine Jenkins Job Name
run: |
if [[ "${{ github.ref_name }}" == "next" ]]; then
FOLDER_NAME="scylla-master"
elif [[ "${{ github.ref_name }}" == "next-enterprise" ]]; then
FOLDER_NAME="scylla-enterprise"
else
VERSION=$(echo "${{ github.ref_name }}" | awk -F'-' '{print $2}')
if [[ "$VERSION" =~ ^202[0-4]\.[0-9]+$ ]]; then
FOLDER_NAME="enterprise-$VERSION"
elif [[ "$VERSION" =~ ^[0-9]+\.[0-9]+$ ]]; then
FOLDER_NAME="scylla-$VERSION"
fi
fi
echo "JOB_NAME=${FOLDER_NAME}/job/next" >> $GITHUB_ENV
- name: Trigger Jenkins Job
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
run: |
echo "Triggering Jenkins Job: $JOB_NAME"
if ! curl -X POST "$JENKINS_URL/job/$JOB_NAME/buildWithParameters" --fail --user "$JENKINS_USER:$JENKINS_API_TOKEN" -i -v; then
echo "Error: Jenkins job trigger failed"
# Send Slack message
curl -X POST -H 'Content-type: application/json' \
-H "Authorization: Bearer $SLACK_BOT_TOKEN" \
--data '{
"channel": "#releng-team",
"text": "🚨 @here '$JOB_NAME' failed to be triggered, please check https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }} for more details",
"icon_emoji": ":warning:"
}' \
https://slack.com/api/chat.postMessage
exit 1
fi

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../scylla-seastar
url = ../seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -66,24 +66,25 @@ if(is_multi_config)
# establishing proper dependencies between them
include(ExternalProject)
# should be consistent with configure_seastar() in configure.py
set(seastar_build_dir "${CMAKE_BINARY_DIR}/$<CONFIG>/seastar")
ExternalProject_Add(Seastar
SOURCE_DIR "${PROJECT_SOURCE_DIR}/seastar"
BINARY_DIR "${CMAKE_BINARY_DIR}/$<CONFIG>/seastar"
CONFIGURE_COMMAND ""
BUILD_COMMAND ${CMAKE_COMMAND} --build <BINARY_DIR>
BUILD_COMMAND ${CMAKE_COMMAND} --build "${seastar_build_dir}"
--target seastar
--target seastar_testing
--target seastar_perf_testing
--target app_iotune
BUILD_ALWAYS ON
BUILD_BYPRODUCTS
<BINARY_DIR>/libseastar.$<IF:$<CONFIG:Debug,Dev>,so,a>
<BINARY_DIR>/libseastar_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
<BINARY_DIR>/libseastar_perf_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
<BINARY_DIR>/apps/iotune/iotune
<BINARY_DIR>/gen/include/seastar/http/chunk_parsers.hh
<BINARY_DIR>/gen/include/seastar/http/request_parser.hh
<BINARY_DIR>/gen/include/seastar/http/response_parser.hh
${seastar_build_dir}/libseastar.$<IF:$<CONFIG:Debug,Dev>,so,a>
${seastar_build_dir}/libseastar_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
${seastar_build_dir}/libseastar_perf_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
${seastar_build_dir}/apps/iotune/iotune
${seastar_build_dir}/gen/include/seastar/http/chunk_parsers.hh
${seastar_build_dir}/gen/include/seastar/http/request_parser.hh
${seastar_build_dir}/gen/include/seastar/http/response_parser.hh
INSTALL_COMMAND "")
add_dependencies(Seastar::seastar Seastar)
add_dependencies(Seastar::seastar_testing Seastar)
@@ -201,7 +202,6 @@ target_sources(scylla-main
tombstone_gc.cc
reader_concurrency_semaphore.cc
reader_concurrency_semaphore_group.cc
row_cache.cc
schema_mutations.cc
serializer.cc
sstables_loader.cc

View File

@@ -12,7 +12,7 @@ Please use the [issue tracker](https://github.com/scylladb/scylla/issues/) to re
## Contributing code to Scylla
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
If you need help formatting or sending patches, [check out these instructions](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches).
The Scylla C++ source code uses the [Seastar coding style](https://github.com/scylladb/seastar/blob/master/coding-style.md) so please adhere to that in your patches. Note that Scylla code is written with `using namespace seastar`, so should not explicitly add the `seastar::` prefix to Seastar symbols. You will usually not need to add `using namespace seastar` to new source files, because most Scylla header files have `#include "seastarx.hh"`, which does this.

View File

@@ -280,21 +280,45 @@ Once the patch set is ready to be reviewed, push the branch to the public remote
### Development environment and source code navigation
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt` that can be used with development environments so
that they can properly analyze the source code. However, building with CMake is not yet officially supported.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
Good IDEs that have support for CMake build toolchain are [CLion](https://www.jetbrains.com/clion/),
[KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects and its
C++ parser has many issues.
To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
#### CLion
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice
for code hygiene, though its C++ parser sometimes makes errors and flags false issues. In order to enable proper code
analysis in CLion, the following steps are needed:
1. Get the ScyllaDB source code by following the [Getting the source code](#getting-the-source-code).
2. Follow the steps in [Dependencies](#dependencies) in order to install the required tools natively into your system.
**Don't** follow the *frozen toolchain* part described there, since CMake checks for the build dependencies installed
in the system, not in the container image provided by the toolchain.
3. In CLion, select `File``Open` and select the main ScyllaDB directory in order to open the CMake project there. The
project should open and fail to process the `CMakeLists.txt`. That's expected.
4. In CLion, open `File``Settings`.
5. Find and click on `Toolchains` (type *toolchains* into search box).
6. Select the toolchain you will use, for instance the `Default` one.
7. Type in the following system-installed tools to be used:
- `CMake`: *cmake*
- `Build Tool`: *ninja*
- `C Compiler`: *clang*
- `C++ Compiler`: *clang*
8. On the `CMake` panel/tab, click on `Reload CMake Project`
After that, CLion should successfully initialize the CMake project (marked by `[Finished]` in the console) and the
source code editor should provide code analysis support normally from now on.
### Distributed compilation: `distcc` and `ccache`
Scylla's compilations times can be long. Two tools help somewhat:
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and reuses them when possible
- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.

View File

@@ -49,7 +49,7 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using
* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensors trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.
* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback"). If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it. All right in any trademark or logo of Licensor or its affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between the You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.
* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.

View File

@@ -102,7 +102,7 @@ If you are a developer working on Scylla, please read the [developer guidelines]
## Contact
* The [community forum] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.
* The [community forum] and [Slack channel] are for users to discuss configuration, management, and operations of ScyllaDB.
* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.
[Community forum]: https://forum.scylladb.com/

View File

@@ -78,7 +78,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=2025.1.12
VERSION=2025.2.0-dev
if test -f version
then

View File

@@ -24,7 +24,7 @@ static constexpr uint64_t KB = 1024ULL;
static constexpr uint64_t RCU_BLOCK_SIZE_LENGTH = 4*KB;
static constexpr uint64_t WCU_BLOCK_SIZE_LENGTH = 1*KB;
bool consumed_capacity_counter::should_add_capacity(const rjson::value& request) {
static bool should_add_capacity(const rjson::value& request) {
const rjson::value* return_consumed = rjson::find(request, "ReturnConsumedCapacity");
if (!return_consumed) {
return false;
@@ -62,22 +62,15 @@ static uint64_t calculate_half_units(uint64_t unit_block_size, uint64_t total_by
rcu_consumed_capacity_counter::rcu_consumed_capacity_counter(const rjson::value& request, bool is_quorum) :
consumed_capacity_counter(should_add_capacity(request)),_is_quorum(is_quorum) {
}
uint64_t rcu_consumed_capacity_counter::get_half_units(uint64_t total_bytes, bool is_quorum) noexcept {
return calculate_half_units(RCU_BLOCK_SIZE_LENGTH, total_bytes, is_quorum);
}
uint64_t rcu_consumed_capacity_counter::get_half_units() const noexcept {
return get_half_units(_total_bytes, _is_quorum);
return calculate_half_units(RCU_BLOCK_SIZE_LENGTH, _total_bytes, _is_quorum);
}
uint64_t wcu_consumed_capacity_counter::get_half_units() const noexcept {
return calculate_half_units(WCU_BLOCK_SIZE_LENGTH, _total_bytes, true);
}
uint64_t wcu_consumed_capacity_counter::get_units(uint64_t total_bytes) noexcept {
return calculate_half_units(WCU_BLOCK_SIZE_LENGTH, total_bytes, true) * HALF_UNIT_MULTIPLIER;
}
wcu_consumed_capacity_counter::wcu_consumed_capacity_counter(const rjson::value& request) :
consumed_capacity_counter(should_add_capacity(request)) {
}

View File

@@ -42,25 +42,21 @@ public:
*/
virtual uint64_t get_half_units() const noexcept = 0;
uint64_t _total_bytes = 0;
static bool should_add_capacity(const rjson::value& request);
protected:
bool _should_add_to_reponse = false;
};
class rcu_consumed_capacity_counter : public consumed_capacity_counter {
virtual uint64_t get_half_units() const noexcept;
bool _is_quorum = false;
public:
rcu_consumed_capacity_counter(const rjson::value& request, bool is_quorum);
rcu_consumed_capacity_counter(): consumed_capacity_counter(false), _is_quorum(false){}
virtual uint64_t get_half_units() const noexcept;
static uint64_t get_half_units(uint64_t total_bytes, bool is_quorum) noexcept;
};
class wcu_consumed_capacity_counter : public consumed_capacity_counter {
virtual uint64_t get_half_units() const noexcept;
public:
wcu_consumed_capacity_counter(const rjson::value& request);
static uint64_t get_units(uint64_t total_bytes) noexcept;
};
}

File diff suppressed because it is too large Load Diff

View File

@@ -241,8 +241,7 @@ public:
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get,
uint64_t& rcu_half_units);
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<managed_bytes_opt>&,

View File

@@ -91,18 +91,6 @@ options {
throw expressions_syntax_error(format("{} at char {}", err,
ex->get_charPositionInLine()));
}
// ANTLR3 tries to recover missing tokens - it tries to finish parsing
// and create valid objects, as if the missing token was there.
// But it has a bug and leaks these tokens.
// We override offending method and handle abandoned pointers.
std::vector<std::unique_ptr<TokenType>> _missing_tokens;
TokenType* getMissingSymbol(IntStreamType* istream, ExceptionBaseType* e,
ANTLR_UINT32 expectedTokenType, BitsetListType* follow) {
auto token = BaseType::getMissingSymbol(istream, e, expectedTokenType, follow);
_missing_tokens.emplace_back(token);
return token;
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {

View File

@@ -606,14 +606,24 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
set_routes(_https_server._routes);
_https_server.set_content_length_limit(server::content_length_limit);
_https_server.set_content_streaming(true);
auto server_creds = creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
slogger.info("Reloaded {}", files);
}
}).get();
_https_server.listen(socket_address{addr, *https_port}, std::move(server_creds)).get();
if (this_shard_id() == 0) {
_credentials = creds->build_reloadable_server_credentials([this](const tls::credentials_builder& b, const std::unordered_set<sstring>& files, std::exception_ptr ep) -> future<> {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
co_await container().invoke_on_others([&b](server& s) {
if (s._credentials) {
b.rebuild(*s._credentials);
}
});
slogger.info("Reloaded {}", files);
}
}).get();
} else {
_credentials = creds->build_server_credentials();
}
_https_server.listen(socket_address{addr, *https_port}, _credentials).get();
_enabled_servers.push_back(std::ref(_https_server));
}
});

View File

@@ -24,7 +24,7 @@ namespace alternator {
using chunked_content = rjson::chunked_content;
class server {
class server : public peering_sharded_service<server> {
static constexpr size_t content_length_limit = 16*MB;
using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;
@@ -52,6 +52,8 @@ class server {
semaphore* _memory_limiter;
utils::updateable_value<uint32_t> _max_concurrent_requests;
::shared_ptr<seastar::tls::server_credentials> _credentials;
class json_parser {
static constexpr size_t yieldable_parsing_threshold = 16*KB;
chunked_content _raw_document;

View File

@@ -9,6 +9,7 @@
#include "stats.hh"
#include "utils/histogram_metrics_helper.hh"
#include <seastar/core/metrics.hh>
#include "utils/labels.hh"
namespace alternator {
@@ -21,12 +22,12 @@ stats::stats() : api_operations{} {
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}).set_skip_when_empty(),
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName), alternator_label, basic_level}).set_skip_when_empty(),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName), alternator_label, basic_level}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::make_summary("op_latency_summary", \
seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),
seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName))(basic_level)(alternator_label).set_skip_when_empty(),
OPERATION(batch_get_item, "BatchGetItem")
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
@@ -77,39 +78,39 @@ stats::stats() : api_operations{} {
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::description("number of unsupported operations via Alternator API"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::description("number of total operations via Alternator API"))(basic_level)(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::description("number of performed read-before-write operations"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("write_using_lwt", write_using_lwt,
seastar::metrics::description("number of writes that used LWT")),
seastar::metrics::description("number of writes that used LWT"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("shard_bounce_for_lwt", shard_bounce_for_lwt,
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements")),
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("requests_blocked_memory", requests_blocked_memory,
seastar::metrics::description("Counts a number of requests blocked due to memory pressure.")),
seastar::metrics::description("Counts a number of requests blocked due to memory pressure."))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("requests_shed", requests_shed,
seastar::metrics::description("Counts a number of requests shed due to overload.")),
seastar::metrics::description("Counts a number of requests shed due to overload."))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::description("number of rows read during filtering operations"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_counter("rcu_total", [this]{return 0.5 * rcu_half_units_total;},
seastar::metrics::description("total number of consumed read units")).set_skip_when_empty(),
seastar::metrics::make_counter("rcu_total", rcu_total,
seastar::metrics::description("total number of consumed read units, counted as half units"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::PUT_ITEM],
seastar::metrics::description("total number of consumed write units"),{op("PutItem")}).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("PutItem")})(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::DELETE_ITEM],
seastar::metrics::description("total number of consumed write units"),{op("DeleteItem")}).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("DeleteItem")})(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::UPDATE_ITEM],
seastar::metrics::description("total number of consumed write units"),{op("UpdateItem")}).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("UpdateItem")})(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::INDEX],
seastar::metrics::description("total number of consumed write units"),{op("Index")}).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("Index")})(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
seastar::metrics::description("number of rows read and dropped during filtering operations"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},
api_operations.batch_write_item_batch_total).set_skip_when_empty(),
api_operations.batch_write_item_batch_total)(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},
api_operations.batch_get_item_batch_total).set_skip_when_empty(),
api_operations.batch_get_item_batch_total)(alternator_label).set_skip_when_empty(),
});
}

View File

@@ -84,7 +84,7 @@ public:
uint64_t shard_bounce_for_lwt = 0;
uint64_t requests_blocked_memory = 0;
uint64_t requests_shed = 0;
uint64_t rcu_half_units_total = 0;
uint64_t rcu_total = 0;
// wcu can results from put, update, delete and index
// Index related will be done on top of the operation it comes with
enum wcu_types {

View File

@@ -808,9 +808,6 @@ future<executor::request_return_type> executor::get_records(client_state& client
if (limit < 1) {
throw api_error::validation("Limit must be 1 or more");
}
if (limit > 1000) {
throw api_error::validation("Limit must be less than or equal to 1000");
}
auto db = _proxy.data_dictionary();
schema_ptr schema, base;

View File

@@ -16,7 +16,6 @@
#include <seastar/core/future.hh>
#include <seastar/core/lowres_clock.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <boost/multiprecision/cpp_int.hpp>
#include "exceptions/exceptions.hh"
#include "gms/gossiper.hh"
@@ -49,6 +48,7 @@
#include "dht/sharder.hh"
#include "db/config.hh"
#include "db/tags/utils.hh"
#include "utils/labels.hh"
#include "ttl.hh"
@@ -851,13 +851,13 @@ future<> expiration_service::stop() {
expiration_service::stats::stats() {
_metrics.add_group("expiration", {
seastar::metrics::make_total_operations("scan_passes", scan_passes,
seastar::metrics::description("number of passes over the database")),
seastar::metrics::description("number of passes over the database"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("scan_table", scan_table,
seastar::metrics::description("number of table scans (counting each scan of each table that enabled expiration)")),
seastar::metrics::description("number of table scans (counting each scan of each table that enabled expiration)"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("items_deleted", items_deleted,
seastar::metrics::description("number of items deleted after expiration")),
seastar::metrics::description("number of items deleted after expiration"))(basic_level)(alternator_label).set_skip_when_empty(),
seastar::metrics::make_total_operations("secondary_ranges_scanned", secondary_ranges_scanned,
seastar::metrics::description("number of token ranges scanned by this node while their primary owner was down")),
seastar::metrics::description("number of token ranges scanned by this node while their primary owner was down"))(alternator_label).set_skip_when_empty(),
});
}

View File

@@ -813,6 +813,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"move_files",
"description":"Move component files instead of copying them",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -976,7 +984,7 @@
]
},
{
"path":"/storage_service/cleanup_all/",
"path":"/storage_service/cleanup_all",
"operations":[
{
"method":"POST",
@@ -986,30 +994,6 @@
"produces":[
"application/json"
],
"parameters":[
{
"name":"global",
"description":"true if cleanup of entire cluster is requested",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/mark_node_as_clean",
"operations":[
{
"method":"POST",
"summary":"Mark the node as clean. After that the node will not be considered as needing cleanup during automatic cleanup which is triggered by some topology operations",
"type":"void",
"nickname":"reset_cleanup_needed",
"produces":[
"application/json"
],
"parameters":[]
}
]
@@ -3085,22 +3069,6 @@
]
}
]
},
{
"path":"/storage_service/raft_topology/cmd_rpc_status",
"operations":[
{
"method":"GET",
"summary":"Get information about currently running topology cmd rpc",
"type":"string",
"nickname":"raft_topology_get_cmd_status",
"produces":[
"application/json"
],
"parameters":[
]
}
]
}
],
"models":{
@@ -3237,11 +3205,11 @@
"properties":{
"start_token":{
"type":"string",
"description":"The range start token (exclusive)"
"description":"The range start token"
},
"end_token":{
"type":"string",
"description":"The range end token (inclusive)"
"description":"The range start token"
},
"endpoints":{
"type":"array",

View File

@@ -344,6 +344,18 @@
"sequence_number":{
"type":"long",
"description":"The running sequence number of the task"
},
"shard":{
"type":"long",
"description":"The shard the task is running on"
},
"start_time":{
"type":"datetime",
"description":"The start time of the task; unspecified (equal to epoch) when state == created"
},
"end_time":{
"type":"datetime",
"description":"The end time of the task; unspecified (equal to epoch) when the task is not completed"
}
}
},

View File

@@ -42,14 +42,6 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"consider_only_existing_data",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}

View File

@@ -317,13 +317,13 @@ future<> unset_server_commitlog(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_commitlog(ctx, r); });
}
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg) {
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg, sharded<gms::gossiper>& gossiper) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg](routes& r) {
return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg, &gossiper](routes& r) {
rb->register_function(r, "task_manager",
"The task manager API");
set_task_manager(ctx, r, tm, cfg);
set_task_manager(ctx, r, tm, cfg, gossiper);
});
}

View File

@@ -131,7 +131,7 @@ future<> unset_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm);
future<> unset_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg, sharded<gms::gossiper>& gossiper);
future<> unset_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);
future<> unset_server_task_manager_test(http_context& ctx);

View File

@@ -10,7 +10,6 @@
#include "api/api-doc/collectd.json.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include <boost/range/irange.hpp>
#include <ranges>
#include <regex>
#include "api/api_init.hh"

View File

@@ -24,9 +24,6 @@
#include "compaction/compaction_manager.hh"
#include "unimplemented.hh"
#include <boost/range/algorithm/copy.hpp>
#include <boost/range/numeric.hpp>
extern logging::logger apilog;
namespace api {
@@ -51,17 +48,9 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
table_id get_uuid(const sstring& ks, const sstring& cf, const replica::database& db) {
try {
return db.find_uuid(ks, cf);
} catch (replica::no_such_column_family& e) {
throw bad_param_exception(e.what());
}
}
table_id get_uuid(const sstring& name, const replica::database& db) {
table_info parse_table_info(const sstring& name, const replica::database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
return table_info{ .name = cf, .id = validate_table(db, ks, cf) };
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
@@ -78,15 +67,11 @@ future<json::json_return_type> get_cf_stats(http_context& ctx,
}, std::plus<int64_t>());
}
static future<json::json_return_type> for_tables_on_all_shards(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {
if (tables.empty()) {
tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return do_with(keyspace, std::move(tables), [&ctx, set] (const sstring& keyspace, const std::vector<sstring>& tables) {
return ctx.db.invoke_on_all([&keyspace, &tables, set] (replica::database& db) {
return parallel_for_each(tables, [&db, &keyspace, set] (const sstring& table) {
replica::table& t = db.find_column_family(keyspace, table);
static future<json::json_return_type> for_tables_on_all_shards(http_context& ctx, std::vector<table_info> tables, std::function<future<>(replica::table&)> set) {
return do_with(std::move(tables), [&ctx, set] (const std::vector<table_info>& tables) {
return ctx.db.invoke_on_all([&tables, set] (replica::database& db) {
return parallel_for_each(tables, [&db, set] (const table_info& table) {
replica::table& t = db.find_column_family(table.id);
return set(t);
});
});
@@ -113,12 +98,12 @@ public:
}
};
static future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
static future<json::json_return_type> set_tables_autocompaction(http_context& ctx, std::vector<table_info> tables, bool enabled) {
apilog.info("set_tables_autocompaction: enabled={} tables={}", enabled, tables);
return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {
return ctx.db.invoke_on(0, [&ctx, tables = std::move(tables), enabled] (replica::database& db) {
auto g = autocompaction_toggle_guard(db);
return for_tables_on_all_shards(ctx, keyspace, tables, [enabled] (replica::table& cf) {
return for_tables_on_all_shards(ctx, tables, [enabled] (replica::table& cf) {
if (enabled) {
cf.enable_auto_compaction();
} else {
@@ -129,9 +114,9 @@ static future<json::json_return_type> set_tables_autocompaction(http_context& ct
});
}
static future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
return for_tables_on_all_shards(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {
static future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, std::vector<table_info> tables, bool enabled) {
apilog.info("set_tables_tombstone_gc: enabled={} tables={}", enabled, tables);
return for_tables_on_all_shards(ctx, std::move(tables), [enabled] (replica::table& t) {
t.set_tombstone_gc_enabled(enabled);
return make_ready_future<>();
});
@@ -146,7 +131,7 @@ static future<json::json_return_type> get_cf_stats_count(http_context& ctx, con
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
auto uuid = parse_table_info(name, ctx.db.local()).id;
return ctx.db.map_reduce0([uuid, f](replica::database& db) {
// Histograms information is sample of the actual load
// so to get an estimation of sum, we multiply the mean
@@ -169,7 +154,7 @@ static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
auto uuid = parse_table_info(name, ctx.db.local()).id;
return ctx.db.map_reduce0([f, uuid](const replica::database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
utils::ihistogram(),
@@ -181,7 +166,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
auto uuid = parse_table_info(name, ctx.db.local()).id;
return ctx.db.map_reduce0([f, uuid](const replica::database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
utils::ihistogram(),
@@ -208,7 +193,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
auto uuid = parse_table_info(name, ctx.db.local()).id;
return ctx.db.map_reduce0([f, uuid](const replica::database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
utils::rate_moving_average_and_histogram(),
@@ -265,48 +250,29 @@ static integral_ratio_holder mean_partition_size(replica::column_family& cf) {
return res;
}
static std::unordered_map<sstring, uint64_t> merge_maps(std::unordered_map<sstring, uint64_t> a,
const std::unordered_map<sstring, uint64_t>& b) {
a.insert(b.begin(), b.end());
return a;
}
static json::json_return_type sum_map(const std::unordered_map<sstring, uint64_t>& val) {
uint64_t res = 0;
for (auto i : val) {
res += i.second;
static auto count_bytes_on_disk(const replica::column_family& cf, bool total) {
uint64_t bytes_on_disk = 0;
auto sstables = (total) ? cf.get_sstables_including_compacted_undeleted() : cf.get_sstables();
for (auto t : *sstables) {
bytes_on_disk += t->bytes_on_disk();
}
return res;
return bytes_on_disk;
}
static future<json::json_return_type> sum_sstable(http_context& ctx, const sstring name, bool total) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, total](replica::database& db) {
std::unordered_map<sstring, uint64_t> m;
auto sstables = (total) ? db.find_column_family(uuid).get_sstables_including_compacted_undeleted() :
db.find_column_family(uuid).get_sstables();
for (auto t : *sstables) {
m[t->get_filename()] = t->bytes_on_disk();
}
return m;
}, std::unordered_map<sstring, uint64_t>(), merge_maps).
then([](const std::unordered_map<sstring, uint64_t>& val) {
return sum_map(val);
return map_reduce_cf_raw(ctx, name, uint64_t(0), [total](replica::column_family& cf) {
return count_bytes_on_disk(cf, total);
}, std::plus<>()).then([] (uint64_t val) {
return make_ready_future<json::json_return_type>(val);
});
}
static future<json::json_return_type> sum_sstable(http_context& ctx, bool total) {
return map_reduce_cf_raw(ctx, std::unordered_map<sstring, uint64_t>(), [total](replica::column_family& cf) {
std::unordered_map<sstring, uint64_t> m;
auto sstables = (total) ? cf.get_sstables_including_compacted_undeleted() :
cf.get_sstables();
for (auto t : *sstables) {
m[t->get_filename()] = t->bytes_on_disk();
}
return m;
},merge_maps).then([](const std::unordered_map<sstring, uint64_t>& val) {
return sum_map(val);
return map_reduce_cf_raw(ctx, uint64_t(0), [total](replica::column_family& cf) {
return count_bytes_on_disk(cf, total);
}, std::plus<>()).then([] (uint64_t val) {
return make_ready_future<json::json_return_type>(val);
});
}
@@ -918,75 +884,71 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_auto_compaction.set(r, [&ctx] (const_req req) {
auto uuid = get_uuid(req.get_path_param("name"), ctx.db.local());
auto uuid = parse_table_info(req.get_path_param("name"), ctx.db.local()).id;
replica::column_family& cf = ctx.db.local().find_column_family(uuid);
return !cf.is_auto_compaction_disabled_by_user();
});
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_auto_compaction: name={}", req->get_path_param("name"));
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_autocompaction(ctx, ks, {std::move(cf)}, true);
auto ti = parse_table_info(req->get_path_param("name"), ctx.db.local());
return set_tables_autocompaction(ctx, {std::move(ti)}, true);
});
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_auto_compaction: name={}", req->get_path_param("name"));
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_autocompaction(ctx, ks, {std::move(cf)}, false);
auto ti = parse_table_info(req->get_path_param("name"), ctx.db.local());
return set_tables_autocompaction(ctx, {std::move(ti)}, false);
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
auto tables = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, true);
return set_tables_autocompaction(ctx, std::move(tables), true);
});
ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
auto tables = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, false);
return set_tables_autocompaction(ctx, std::move(tables), false);
});
cf::get_tombstone_gc.set(r, [&ctx] (const_req req) {
auto uuid = get_uuid(req.get_path_param("name"), ctx.db.local());
auto uuid = parse_table_info(req.get_path_param("name"), ctx.db.local()).id;
replica::table& t = ctx.db.local().find_column_family(uuid);
return t.tombstone_gc_enabled();
});
cf::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_tombstone_gc: name={}", req->get_path_param("name"));
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, true);
auto ti = parse_table_info(req->get_path_param("name"), ctx.db.local());
return set_tables_tombstone_gc(ctx, {std::move(ti)}, true);
});
cf::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_tombstone_gc: name={}", req->get_path_param("name"));
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, false);
auto ti = parse_table_info(req->get_path_param("name"), ctx.db.local());
return set_tables_tombstone_gc(ctx, {std::move(ti)}, false);
});
ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
auto tables = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
return set_tables_tombstone_gc(ctx, keyspace, tables, true);
return set_tables_tombstone_gc(ctx, std::move(tables), true);
});
ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
auto tables = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
return set_tables_tombstone_gc(ctx, keyspace, tables, false);
return set_tables_tombstone_gc(ctx, std::move(tables), false);
});
cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {
@@ -1003,7 +965,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
}
}
std::vector<sstring> res;
auto uuid = get_uuid(ks, cf_name, ctx.db.local());
auto uuid = validate_table(ctx.db.local(), ks, cf_name);
replica::column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
@@ -1030,7 +992,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto uuid = get_uuid(req->get_path_param("name"), ctx.db.local());
auto uuid = parse_table_info(req->get_path_param("name"), ctx.db.local()).id;
return ctx.db.map_reduce(sum_ratio<double>(), [uuid](replica::database& db) {
replica::column_family& cf = db.find_column_family(uuid);
@@ -1053,17 +1015,17 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
});
cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
auto ti = parse_table_info(req->get_path_param("name"), ctx.db.local());
sstring strategy = req->get_query_param("class_name");
apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->get_path_param("name"), strategy);
return for_tables_on_all_shards(ctx, ks, {std::move(cf)}, [strategy] (replica::table& cf) {
return for_tables_on_all_shards(ctx, {std::move(ti)}, [strategy] (replica::table& cf) {
cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));
return make_ready_future<>();
});
});
cf::get_compaction_strategy_class.set(r, [&ctx](const_req req) {
return ctx.db.local().find_column_family(get_uuid(req.get_path_param("name"), ctx.db.local())).get_compaction_strategy().name();
return ctx.db.local().find_column_family(parse_table_info(req.get_path_param("name"), ctx.db.local()).id).get_compaction_strategy().name();
});
cf::set_compression_parameters.set(r, [](std::unique_ptr<http::request> req) {
@@ -1088,7 +1050,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto key = req->get_query_param("key");
auto uuid = get_uuid(req->get_path_param("name"), ctx.db.local());
auto uuid = parse_table_info(req->get_path_param("name"), ctx.db.local()).id;
return ctx.db.map_reduce0([key, uuid] (replica::database& db) -> future<std::unordered_set<sstring>> {
auto sstables = co_await db.find_column_family(uuid).get_sstables_by_partition_key(key);

View File

@@ -22,12 +22,12 @@ namespace api {
void set_column_family(http_context& ctx, httpd::routes& r, sharded<db::system_keyspace>& sys_ks);
void unset_column_family(http_context& ctx, httpd::routes& r);
table_id get_uuid(const sstring& name, const replica::database& db);
table_info parse_table_info(const sstring& name, const replica::database& db);
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
auto uuid = parse_table_info(name, ctx.db.local()).id;
using mapper_type = std::function<std::unique_ptr<std::any>(replica::database&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
return ctx.db.map_reduce0(mapper_type([mapper, uuid](replica::database& db) {

View File

@@ -112,12 +112,12 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction_man
cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto ks_name = validate_keyspace(ctx, req);
auto table_names = parse_tables(ks_name, ctx, req->query_parameters, "tables");
auto tables = parse_table_infos(ks_name, ctx, req->query_parameters, "tables");
auto type = req->get_query_param("type");
co_await ctx.db.invoke_on_all([&] (replica::database& db) {
auto& cm = db.get_compaction_manager();
return parallel_for_each(table_names, [&] (sstring& table_name) {
auto& t = db.find_column_family(ks_name, table_name);
return parallel_for_each(tables, [&] (const table_info& ti) {
auto& t = db.find_column_family(ti.id);
return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {
return cm.stop_compaction(type, &ts);
});

File diff suppressed because it is too large Load Diff

View File

@@ -43,16 +43,9 @@ sstring validate_keyspace(const http_context& ctx, sstring ks_name);
// containing the description of the respective keyspace error.
sstring validate_keyspace(const http_context& ctx, const std::unique_ptr<http::request>& req);
// verify that the table parameter is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective table error.
void validate_table(const http_context& ctx, sstring ks_name, sstring table_name);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns an empty vector if no parameter was found.
// If the parameter is found and empty, returns a list of all table names in the keyspace.
std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
// verify that the keyspace:table is found, otherwise a bad_param_exception exception is thrown
// returns the table_id of the table if found
table_id validate_table(const replica::database& db, sstring ks_name, sstring table_name);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown

View File

@@ -14,6 +14,7 @@
#include "api/api.hh"
#include "api/api-doc/task_manager.json.hh"
#include "db/system_keyspace.hh"
#include "gms/gossiper.hh"
#include "tasks/task_handler.hh"
#include "utils/overloaded_functor.hh"
@@ -25,18 +26,23 @@ namespace tm = httpd::task_manager_json;
using namespace json;
using namespace seastar::httpd;
tm::task_status make_status(tasks::task_status status) {
auto start_time = db_clock::to_time_t(status.start_time);
auto end_time = db_clock::to_time_t(status.end_time);
::tm st, et;
::gmtime_r(&end_time, &et);
::gmtime_r(&start_time, &st);
static ::tm get_time(db_clock::time_point tp) {
auto time = db_clock::to_time_t(tp);
::tm t;
::gmtime_r(&time, &t);
return t;
}
tm::task_status make_status(tasks::task_status status, sharded<gms::gossiper>& gossiper) {
std::vector<tm::task_identity> tis{status.children.size()};
std::ranges::transform(status.children, tis.begin(), [] (const auto& child) {
std::ranges::transform(status.children, tis.begin(), [&gossiper] (const auto& child) {
tm::task_identity ident;
gms::inet_address addr{};
if (gossiper.local_is_initialized()) {
addr = gossiper.local().get_address_map().find(child.host_id).value_or(gms::inet_address{});
}
ident.task_id = child.task_id.to_sstring();
ident.node = fmt::format("{}", child.node);
ident.node = fmt::format("{}", addr);
return ident;
});
@@ -47,8 +53,8 @@ tm::task_status make_status(tasks::task_status status) {
res.scope = status.scope;
res.state = status.state;
res.is_abortable = bool(status.is_abortable);
res.start_time = st;
res.end_time = et;
res.start_time = get_time(status.start_time);
res.end_time = get_time(status.end_time);
res.error = status.error;
res.parent_id = status.parent_id ? status.parent_id.to_sstring() : "none";
res.sequence_number = status.sequence_number;
@@ -74,10 +80,13 @@ tm::task_stats make_stats(tasks::task_stats stats) {
res.keyspace = stats.keyspace;
res.table = stats.table;
res.entity = stats.entity;
res.shard = stats.shard;
res.start_time = get_time(stats.start_time);
res.end_time = get_time(stats.end_time);;
return res;
}
void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg) {
void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg, sharded<gms::gossiper>& gossiper) {
tm::get_modules.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
std::vector<std::string> v = tm.local().get_modules() | std::views::keys | std::ranges::to<std::vector>();
co_return v;
@@ -135,7 +144,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
co_return std::move(f);
});
tm::get_task_status.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::get_task_status.set(r, [&tm, &gossiper] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tasks::task_status status;
try {
@@ -144,7 +153,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
} catch (tasks::task_manager::task_not_found& e) {
throw bad_param_exception(e.what());
}
co_return make_status(status);
co_return make_status(status, gossiper);
});
tm::abort_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
@@ -160,7 +169,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
co_return json_void();
});
tm::wait_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::wait_task.set(r, [&tm, &gossiper] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tasks::task_status status;
std::optional<std::chrono::seconds> timeout = std::nullopt;
@@ -175,24 +184,24 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
} catch (timed_out_error& e) {
throw httpd::base_exception{e.what(), http::reply::status_type::request_timeout};
}
co_return make_status(status);
co_return make_status(status, gossiper);
});
tm::get_task_status_recursively.set(r, [&_tm = tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::get_task_status_recursively.set(r, [&_tm = tm, &gossiper] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& tm = _tm;
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
try {
auto task = tasks::task_handler{tm.local(), id};
auto res = co_await task.get_status_recursively(true);
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res), &gossiper] (output_stream<char>&& os) -> future<> {
auto s = std::move(os);
auto res = std::move(r);
co_await s.write("[");
std::string delim = "";
for (auto& status: res) {
co_await s.write(std::exchange(delim, ", "));
co_await formatter::write(s, make_status(status));
co_await formatter::write(s, make_status(status, gossiper));
}
co_await s.write("]");
co_await s.close();

View File

@@ -18,7 +18,7 @@ namespace tasks {
namespace api {
void set_task_manager(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm, db::config& cfg);
void set_task_manager(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm, db::config& cfg, sharded<gms::gossiper>& gossiper);
void unset_task_manager(http_context& ctx, httpd::routes& r);
}

View File

@@ -37,74 +37,45 @@ static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {
};
}
future<tasks::task_manager::task_ptr> force_keyspace_compaction(http_context& ctx, std::unique_ptr<http::request> req) {
auto& db = ctx.db;
auto params = req_params({
std::pair("keyspace", mandatory::yes),
std::pair("cf", mandatory::no),
std::pair("flush_memtables", mandatory::no),
std::pair("consider_only_existing_data", mandatory::no),
});
params.process(*req);
auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));
auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
auto consider_only_existing_data = params.get_as<bool>("consider_only_existing_data").value_or(false);
apilog.info("force_keyspace_compaction: keyspace={} tables={}, flush={} consider_only_existing_data={}", keyspace, table_infos, flush, consider_only_existing_data);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<compaction::flush_mode> fmopt;
if (!flush && !consider_only_existing_data) {
fmopt = compaction::flush_mode::skip;
}
return compaction_module.make_and_start_task<compaction::major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt, consider_only_existing_data);
}
future<tasks::task_manager::task_ptr> upgrade_sstables(http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) {
auto& db = ctx.db;
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, table_infos, exclude_current_version);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
return compaction_module.make_and_start_task<compaction::upgrade_sstables_compaction_task_impl>({}, std::move(keyspace), db, table_infos, exclude_current_version);
}
future<tasks::task_manager::task_ptr> force_keyspace_cleanup(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
const auto& rs = db.local().find_keyspace(keyspace).get_replication_strategy();
if (rs.get_type() == locator::replication_strategy_type::local || !rs.is_vnode_based()) {
auto reason = rs.get_type() == locator::replication_strategy_type::local ? "require" : "support";
apilog.info("Keyspace {} does not {} cleanup", keyspace, reason);
co_return nullptr;
}
apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
auto msg = "Can not perform cleanup operation when topology changes";
apilog.warn("force_keyspace_cleanup: keyspace={} tables={}: {}", keyspace, table_infos, msg);
co_await coroutine::return_exception(std::runtime_error(msg));
}
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
co_return co_await compaction_module.make_and_start_task<compaction::cleanup_keyspace_compaction_task_impl>(
{}, std::move(keyspace), db, table_infos, compaction::flush_mode::all_tables, tasks::is_user_task::yes);
}
void set_tasks_compaction_module(http_context& ctx, routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {
t::force_keyspace_compaction_async.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto task = co_await force_keyspace_compaction(ctx, std::move(req));
auto& db = ctx.db;
auto params = req_params({
std::pair("keyspace", mandatory::yes),
std::pair("cf", mandatory::no),
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));
auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.debug("force_keyspace_compaction_async: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<flush_mode> fmopt;
if (!flush) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
t::force_keyspace_cleanup_async.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tasks::task_id id = tasks::task_id::create_null_id();
auto task = co_await force_keyspace_cleanup(ctx, ss, std::move(req));
if (task) {
id = task->get_status().id;
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("force_keyspace_cleanup_async: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
auto msg = "Can not perform cleanup operation when topology changes";
apilog.warn("force_keyspace_cleanup_async: keyspace={} tables={}: {}", keyspace, table_infos, msg);
co_await coroutine::return_exception(std::runtime_error(msg));
}
co_return json::json_return_type(id.to_sstring());
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos, flush_mode::all_tables, tasks::is_user_task::yes);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
t::perform_keyspace_offstrategy_compaction_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
@@ -116,7 +87,14 @@ void set_tasks_compaction_module(http_context& ctx, routes& r, sharded<service::
}));
t::upgrade_sstables_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
auto task = co_await upgrade_sstables(ctx, std::move(req), std::move(keyspace), std::move(table_infos));
auto& db = ctx.db;
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, table_infos, exclude_current_version);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<upgrade_sstables_compaction_task_impl>({}, std::move(keyspace), db, table_infos, exclude_current_version);
co_return json::json_return_type(task->get_status().id.to_sstring());
}));

View File

@@ -15,10 +15,6 @@ namespace seastar::httpd {
class routes;
}
namespace seastar::http {
struct request;
}
namespace service {
class storage_service;
}
@@ -29,8 +25,4 @@ struct http_context;
void set_tasks_compaction_module(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);
void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r);
future<tasks::task_manager::task_ptr> force_keyspace_compaction(http_context& ctx, std::unique_ptr<http::request> req);
future<tasks::task_manager::task_ptr> force_keyspace_cleanup(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req);
future<tasks::task_manager::task_ptr> upgrade_sstables(http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos);
}

View File

@@ -74,9 +74,6 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to
});
ss::get_host_id_map.set(r, [&tm, &g](const_req req) {
if (!g.local().is_enabled()) {
throw std::runtime_error("The gossiper is not ready yet");
}
std::vector<ss::mapper> res;
auto map = tm.local().get()->get_host_ids() |
std::views::transform([&g] (locator::host_id id) { return std::make_pair(g.local().get_address_map().get(id), id); }) |

View File

@@ -26,8 +26,9 @@ namespace audit {
logging::logger logger("audit");
sstring audit_info::category_string() const {
switch (_category) {
static sstring category_to_string(statement_category category)
{
switch (category) {
case statement_category::QUERY: return "QUERY";
case statement_category::DML: return "DML";
case statement_category::DDL: return "DDL";
@@ -38,19 +39,9 @@ sstring audit_info::category_string() const {
return "";
}
audit::audit(locator::shared_token_metadata& token_metadata,
sstring&& storage_helper_name,
std::set<sstring>&& audited_keyspaces,
std::map<sstring, std::set<sstring>>&& audited_tables,
category_set&& audited_categories)
: _token_metadata(token_metadata)
, _audited_keyspaces(std::move(audited_keyspaces))
, _audited_tables(std::move(audited_tables))
, _audited_categories(std::move(audited_categories))
, _storage_helper_class_name(std::move(storage_helper_name))
{ }
audit::~audit() = default;
sstring audit_info::category_string() const {
return category_to_string(_category);
}
static category_set parse_audit_categories(const sstring& data) {
category_set result;
@@ -111,6 +102,25 @@ static std::set<sstring> parse_audit_keyspaces(const sstring& data) {
return result;
}
audit::audit(locator::shared_token_metadata& token_metadata,
sstring&& storage_helper_name,
std::set<sstring>&& audited_keyspaces,
std::map<sstring, std::set<sstring>>&& audited_tables,
category_set&& audited_categories,
const db::config& cfg)
: _token_metadata(token_metadata)
, _audited_keyspaces(std::move(audited_keyspaces))
, _audited_tables(std::move(audited_tables))
, _audited_categories(std::move(audited_categories))
, _storage_helper_class_name(std::move(storage_helper_name))
, _cfg(cfg)
, _cfg_keyspaces_observer(cfg.audit_keyspaces.observe([this] (sstring const& new_value){ update_config<std::set<sstring>>(new_value, parse_audit_keyspaces, _audited_keyspaces); }))
, _cfg_tables_observer(cfg.audit_tables.observe([this] (sstring const& new_value){ update_config<std::map<sstring, std::set<sstring>>>(new_value, parse_audit_tables, _audited_tables); }))
, _cfg_categories_observer(cfg.audit_categories.observe([this] (sstring const& new_value){ update_config<category_set>(new_value, parse_audit_categories, _audited_categories); }))
{ }
audit::~audit() = default;
future<> audit::create_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm) {
sstring storage_helper_name;
if (cfg.audit() == "table") {
@@ -126,18 +136,9 @@ future<> audit::create_audit(const db::config& cfg, sharded<locator::shared_toke
throw audit_exception(fmt::format("Bad configuration: invalid 'audit': {}", cfg.audit()));
}
category_set audited_categories = parse_audit_categories(cfg.audit_categories());
if (!audited_categories) {
return make_ready_future<>();
}
std::map<sstring, std::set<sstring>> audited_tables = parse_audit_tables(cfg.audit_tables());
std::set<sstring> audited_keyspaces = parse_audit_keyspaces(cfg.audit_keyspaces());
if (audited_tables.empty()
&& audited_keyspaces.empty()
&& !audited_categories.contains(statement_category::AUTH)
&& !audited_categories.contains(statement_category::ADMIN)
&& !audited_categories.contains(statement_category::DCL)) {
return make_ready_future<>();
}
logger.info("Audit is enabled. Auditing to: \"{}\", with the following categories: \"{}\", keyspaces: \"{}\", and tables: \"{}\"",
cfg.audit(), cfg.audit_categories(), cfg.audit_keyspaces(), cfg.audit_tables());
@@ -145,7 +146,8 @@ future<> audit::create_audit(const db::config& cfg, sharded<locator::shared_toke
std::move(storage_helper_name),
std::move(audited_keyspaces),
std::move(audited_tables),
std::move(audited_categories));
std::move(audited_categories),
std::cref(cfg));
}
future<> audit::start_audit(const db::config& cfg, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm) {
@@ -260,4 +262,33 @@ bool audit::should_log(const audit_info* audit_info) const {
|| audit_info->category() == statement_category::DCL);
}
template<class T>
void audit::update_config(const sstring & new_value, std::function<T(const sstring&)> parse_func, T& cfg_parameter)
{
try {
cfg_parameter = parse_func(new_value);
} catch (...) {
logger.error("Audit configuration update failed because cannot parse value=\"{}\".", new_value);
return;
}
// If update_config is called with an invalid new_value, this line is not reached.
// But logging the invalid value must be avoided later, when a different configuration parameter is changed to a correct value.
// That's why values from _audited_{categories, keyspaces, tables} are logged instead of _cfg.audit_{categories, keyspaces, tables}
// Each table as "keyspace.table_name" like in the configuration file
auto table_entries = _audited_tables | std::views::transform([](const auto& pair) {
return pair.second | std::views::transform([&](const std::string& table_name) {
return fmt::format("{}.{}", pair.first, table_name);
});
}) | std::views::join;
logger.info(
"Audit configuration is updated. Auditing to: \"{}\", with the following categories: \"{}\", keyspaces: \"{}\", and tables: \"{}\".",
_cfg.audit(),
fmt::join(std::views::transform(_audited_categories, category_to_string), ","),
fmt::join(_audited_keyspaces, ","),
fmt::join(table_entries, ","));
}
}

View File

@@ -9,6 +9,7 @@
#include "seastarx.hh"
#include "utils/log.hh"
#include "utils/observable.hh"
#include "db/consistency_level.hh"
#include "locator/token_metadata_fwd.hh"
#include <seastar/core/sharded.hh>
@@ -96,13 +97,22 @@ class storage_helper;
class audit final : public seastar::async_sharded_service<audit> {
locator::shared_token_metadata& _token_metadata;
const std::set<sstring> _audited_keyspaces;
std::set<sstring> _audited_keyspaces;
// Maps keyspace name to set of table names in that keyspace
const std::map<sstring, std::set<sstring>> _audited_tables;
const category_set _audited_categories;
std::map<sstring, std::set<sstring>> _audited_tables;
category_set _audited_categories;
sstring _storage_helper_class_name;
std::unique_ptr<storage_helper> _storage_helper_ptr;
const db::config& _cfg;
utils::observer<sstring> _cfg_keyspaces_observer;
utils::observer<sstring> _cfg_tables_observer;
utils::observer<sstring> _cfg_categories_observer;
template<class T>
void update_config(const sstring & new_value, std::function<T(const sstring&)> parse_func, T& cfg_parameter);
bool should_log_table(const sstring& keyspace, const sstring& name) const;
public:
static seastar::sharded<audit>& audit_instance() {
@@ -123,7 +133,8 @@ public:
audit(locator::shared_token_metadata& stm, sstring&& storage_helper_name,
std::set<sstring>&& audited_keyspaces,
std::map<sstring, std::set<sstring>>&& audited_tables,
category_set&& audited_categories);
category_set&& audited_categories,
const db::config& cfg);
~audit();
future<> start(const db::config& cfg, cql3::query_processor& qp, service::migration_manager& mm);
future<> stop();

View File

@@ -33,6 +33,20 @@ namespace audit {
namespace {
future<> syslog_send_helper(net::datagram_channel& sender,
const socket_address& address,
const sstring& msg) {
return sender.send(address, net::packet{msg.data(), msg.size()}).handle_exception([address](auto&& exception_ptr) {
auto error_msg = seastar::format(
"Syslog audit backend failed (sending a message to {} resulted in {}).",
address,
exception_ptr
);
logger.error("{}", error_msg);
throw audit_exception(std::move(error_msg));
});
}
static auto syslog_address_helper(const db::config& cfg)
{
return cfg.audit_unix_socket_path.is_set()
@@ -42,26 +56,9 @@ static auto syslog_address_helper(const db::config& cfg)
}
future<> audit_syslog_storage_helper::syslog_send_helper(const sstring& msg) {
try {
auto lock = co_await get_units(_semaphore, 1, std::chrono::hours(1));
co_await _sender.send(_syslog_address, net::packet{msg.data(), msg.size()});
}
catch (const std::exception& e) {
auto error_msg = seastar::format(
"Syslog audit backend failed (sending a message to {} resulted in {}).",
_syslog_address,
e
);
logger.error("{}", error_msg);
throw audit_exception(std::move(error_msg));
}
}
audit_syslog_storage_helper::audit_syslog_storage_helper(cql3::query_processor& qp, service::migration_manager&) :
_syslog_address(syslog_address_helper(qp.db().get_config())),
_sender(make_unbound_datagram_channel(AF_UNIX)),
_semaphore(1) {
_sender(make_unbound_datagram_channel(AF_UNIX)) {
}
audit_syslog_storage_helper::~audit_syslog_storage_helper() {
@@ -76,10 +73,10 @@ audit_syslog_storage_helper::~audit_syslog_storage_helper() {
*/
future<> audit_syslog_storage_helper::start(const db::config& cfg) {
if (this_shard_id() != 0) {
co_return;
return make_ready_future();
}
co_await syslog_send_helper("Initializing syslog audit backend.");
return syslog_send_helper(_sender, _syslog_address, "Initializing syslog audit backend.");
}
future<> audit_syslog_storage_helper::stop() {
@@ -109,7 +106,7 @@ future<> audit_syslog_storage_helper::write(const audit_info* audit_info,
audit_info->table(),
username);
co_await syslog_send_helper(msg);
return syslog_send_helper(_sender, _syslog_address, msg);
}
future<> audit_syslog_storage_helper::write_login(const sstring& username,
@@ -128,7 +125,7 @@ future<> audit_syslog_storage_helper::write_login(const sstring& username,
username,
(error ? "true" : "false"));
co_await syslog_send_helper(msg.c_str());
co_await syslog_send_helper(_sender, _syslog_address, msg.c_str());
}
using registry = class_registrator<storage_helper, audit_syslog_storage_helper, cql3::query_processor&, service::migration_manager&>;

View File

@@ -24,9 +24,6 @@ namespace audit {
class audit_syslog_storage_helper : public storage_helper {
socket_address _syslog_address;
net::datagram_channel _sender;
seastar::semaphore _semaphore;
future<> syslog_send_helper(const sstring& msg);
public:
explicit audit_syslog_storage_helper(cql3::query_processor&, service::migration_manager&);
virtual ~audit_syslog_storage_helper();

View File

@@ -83,6 +83,10 @@ public:
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
virtual future<> ensure_superuser_is_created() const override {
return make_ready_future<>();
}
};
}

View File

@@ -159,6 +159,8 @@ public:
virtual const resource_set& protected_resources() const = 0;
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
virtual future<> ensure_superuser_is_created() const = 0;
};
}

View File

@@ -56,6 +56,10 @@ public:
const resource_set& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
virtual future<> ensure_superuser_is_created() const override {
return make_ready_future<>();
}
private:
};

View File

@@ -119,11 +119,6 @@ future<> create_legacy_metadata_table_if_missing(
return qs;
}
::service::raft_timeout get_raft_timeout() noexcept {
auto dur = internal_distributed_query_state().get_client_state().get_timeout_config().other_timeout;
return ::service::raft_timeout{.value = lowres_clock::now() + dur};
}
static future<> announce_mutations_with_guard(
::service::raft_group0_client& group0_client,
std::vector<canonical_mutation> muts,

View File

@@ -17,7 +17,6 @@
#include "types/types.hh"
#include "service/raft/raft_group0_client.hh"
#include "timeout_config.hh"
using namespace std::chrono_literals;
@@ -78,8 +77,6 @@ future<> create_legacy_metadata_table_if_missing(
///
::service::query_state& internal_distributed_query_state() noexcept;
::service::raft_timeout get_raft_timeout() noexcept;
// Execute update query via group0 mechanism, mutations will be applied on all nodes.
// Use this function when need to perform read before write on a single guard or if
// you have more than one mutation and potentially exceed single command size limit.

View File

@@ -16,7 +16,6 @@ extern "C" {
#include <unistd.h>
}
#include <boost/range.hpp>
#include <seastar/core/seastar.hh>
#include <seastar/core/sleep.hh>

View File

@@ -233,9 +233,9 @@ future<role_set> ldap_role_manager::query_granted(std::string_view grantee_name,
}
future<role_to_directly_granted_map>
ldap_role_manager::query_all_directly_granted(::service::query_state& qs) {
ldap_role_manager::query_all_directly_granted() {
role_to_directly_granted_map result;
auto roles = co_await query_all(qs);
auto roles = co_await query_all();
for (auto& role: roles) {
auto granted_set = co_await query_granted(role, recursive_role_query::no);
for (auto& granted: granted_set) {
@@ -247,8 +247,8 @@ ldap_role_manager::query_all_directly_granted(::service::query_state& qs) {
co_return result;
}
future<role_set> ldap_role_manager::query_all(::service::query_state& qs) {
return _std_mgr.query_all(qs);
future<role_set> ldap_role_manager::query_all() {
return _std_mgr.query_all();
}
future<> ldap_role_manager::create_role(std::string_view role_name) {
@@ -311,12 +311,12 @@ future<bool> ldap_role_manager::can_login(std::string_view role_name) {
}
future<std::optional<sstring>> ldap_role_manager::get_attribute(
std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) {
return _std_mgr.get_attribute(role_name, attribute_name, qs);
std::string_view role_name, std::string_view attribute_name) {
return _std_mgr.get_attribute(role_name, attribute_name);
}
future<role_manager::attribute_vals> ldap_role_manager::query_attribute_for_all(std::string_view attribute_name, ::service::query_state& qs) {
return _std_mgr.query_attribute_for_all(attribute_name, qs);
future<role_manager::attribute_vals> ldap_role_manager::query_attribute_for_all(std::string_view attribute_name) {
return _std_mgr.query_attribute_for_all(attribute_name);
}
future<> ldap_role_manager::set_attribute(
@@ -338,7 +338,8 @@ future<std::vector<cql3::description>> ldap_role_manager::describe_role_grants()
}
future<> ldap_role_manager::ensure_superuser_is_created() {
return _std_mgr.ensure_superuser_is_created();
// ldap is responsible for users
co_return;
}
} // namespace auth

View File

@@ -9,11 +9,9 @@
#pragma once
#include <memory>
#include <seastar/core/abort_source.hh>
#include <stdexcept>
#include "bytes.hh"
#include "ent/ldap/ldap_connection.hh"
#include "standard_role_manager.hh"
@@ -77,9 +75,9 @@ class ldap_role_manager : public role_manager {
future<role_set> query_granted(std::string_view, recursive_role_query) override;
future<role_to_directly_granted_map> query_all_directly_granted(::service::query_state&) override;
future<role_to_directly_granted_map> query_all_directly_granted() override;
future<role_set> query_all(::service::query_state&) override;
future<role_set> query_all() override;
future<bool> exists(std::string_view) override;
@@ -87,9 +85,9 @@ class ldap_role_manager : public role_manager {
future<bool> can_login(std::string_view) override;
future<std::optional<sstring>> get_attribute(std::string_view, std::string_view, ::service::query_state&) override;
future<std::optional<sstring>> get_attribute(std::string_view, std::string_view) override;
future<role_manager::attribute_vals> query_attribute_for_all(std::string_view, ::service::query_state&) override;
future<role_manager::attribute_vals> query_attribute_for_all(std::string_view) override;
future<> set_attribute(std::string_view, std::string_view, std::string_view, ::service::group0_batch& mc) override;

View File

@@ -78,11 +78,11 @@ future<role_set> maintenance_socket_role_manager::query_granted(std::string_view
return operation_not_supported_exception<role_set>("QUERY GRANTED");
}
future<role_to_directly_granted_map> maintenance_socket_role_manager::query_all_directly_granted(::service::query_state&) {
future<role_to_directly_granted_map> maintenance_socket_role_manager::query_all_directly_granted() {
return operation_not_supported_exception<role_to_directly_granted_map>("QUERY ALL DIRECTLY GRANTED");
}
future<role_set> maintenance_socket_role_manager::query_all(::service::query_state&) {
future<role_set> maintenance_socket_role_manager::query_all() {
return operation_not_supported_exception<role_set>("QUERY ALL");
}
@@ -98,11 +98,11 @@ future<bool> maintenance_socket_role_manager::can_login(std::string_view role_na
return make_ready_future<bool>(true);
}
future<std::optional<sstring>> maintenance_socket_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state&) {
future<std::optional<sstring>> maintenance_socket_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name) {
return operation_not_supported_exception<std::optional<sstring>>("GET ATTRIBUTE");
}
future<role_manager::attribute_vals> maintenance_socket_role_manager::query_attribute_for_all(std::string_view attribute_name, ::service::query_state&) {
future<role_manager::attribute_vals> maintenance_socket_role_manager::query_attribute_for_all(std::string_view attribute_name) {
return operation_not_supported_exception<role_manager::attribute_vals>("QUERY ATTRIBUTE");
}

View File

@@ -53,9 +53,9 @@ public:
virtual future<role_set> query_granted(std::string_view grantee_name, recursive_role_query) override;
virtual future<role_to_directly_granted_map> query_all_directly_granted(::service::query_state&) override;
virtual future<role_to_directly_granted_map> query_all_directly_granted() override;
virtual future<role_set> query_all(::service::query_state&) override;
virtual future<role_set> query_all() override;
virtual future<bool> exists(std::string_view role_name) override;
@@ -63,9 +63,9 @@ public:
virtual future<bool> can_login(std::string_view role_name) override;
virtual future<std::optional<sstring>> get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state&) override;
virtual future<std::optional<sstring>> get_attribute(std::string_view role_name, std::string_view attribute_name) override;
virtual future<role_manager::attribute_vals> query_attribute_for_all(std::string_view attribute_name, ::service::query_state&) override;
virtual future<role_manager::attribute_vals> query_attribute_for_all(std::string_view attribute_name) override;
virtual future<> set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value, ::service::group0_batch& mc) override;

View File

@@ -117,8 +117,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
});
}
future<> password_authenticator::legacy_create_default_if_missing() {
SCYLLA_ASSERT(legacy_mode(_qp));
future<> password_authenticator::create_default_if_missing() {
const auto exists = co_await default_role_row_satisfies(_qp, &has_salted_hash, _superuser);
if (exists) {
co_return;
@@ -128,75 +127,18 @@ future<> password_authenticator::legacy_create_default_if_missing() {
salted_pwd = passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt);
}
const auto query = update_row_query();
co_await _qp.execute_internal(
if (legacy_mode(_qp)) {
co_await _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state(),
{salted_pwd, _superuser},
cql3::query_processor::cache_internal::no);
plogger.info("Created default superuser authentication record.");
}
future<> password_authenticator::maybe_create_default_password() {
auto needs_password = [this] () -> future<bool> {
const sstring query = seastar::format("SELECT * FROM {}.{} WHERE is_superuser = true ALLOW FILTERING", get_auth_ks_name(_qp), meta::roles_table::name);
auto results = co_await _qp.execute_internal(query,
db::consistency_level::LOCAL_ONE,
internal_distributed_query_state(), cql3::query_processor::cache_internal::yes);
// Don't add default password if
// - there is no default superuser
// - there is a superuser with a password.
bool has_default = false;
bool has_superuser_with_password = false;
for (auto& result : *results) {
if (result.get_as<sstring>(meta::roles_table::role_col_name) == _superuser) {
has_default = true;
}
if (has_salted_hash(result)) {
has_superuser_with_password = true;
}
}
co_return has_default && !has_superuser_with_password;
};
if (!co_await needs_password()) {
co_return;
}
// We don't want to start operation earlier to avoid quorum requirement in
// a common case.
::service::group0_batch batch(
co_await _group0_client.start_operation(_as, get_raft_timeout()));
// Check again as the state may have changed before we took the guard (batch).
if (!co_await needs_password()) {
co_return;
}
// Set default superuser's password.
std::string salted_pwd(get_config_value(_qp.db().get_config().auth_superuser_salted_password(), ""));
if (salted_pwd.empty()) {
salted_pwd = passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt);
}
const auto update_query = update_row_query();
co_await collect_mutations(_qp, batch, update_query, {salted_pwd, _superuser});
co_await std::move(batch).commit(_group0_client, _as, get_raft_timeout());
plogger.info("Created default superuser authentication record.");
}
future<> password_authenticator::maybe_create_default_password_with_retries() {
size_t retries = _migration_manager.get_concurrent_ddl_retries();
while (true) {
try {
co_return co_await maybe_create_default_password();
} catch (const ::service::group0_concurrent_modification& ex) {
plogger.warn("Failed to execute maybe_create_default_password due to guard conflict.{}.", retries ? " Retrying" : " Number of retries exceeded, giving up");
if (retries--) {
continue;
}
// Log error but don't crash the whole node startup sequence.
plogger.error("Failed to create default superuser password due to guard conflict.");
co_return;
} catch (const ::service::raft_operation_timeout_error& ex) {
plogger.error("Failed to create default superuser password due to exception: {}", ex.what());
co_return;
}
plogger.info("Created default superuser authentication record.");
} else {
co_await announce_mutations(_qp, _group0_client, query,
{salted_pwd, _superuser}, _as, ::service::raft_timeout{});
plogger.info("Created default superuser authentication record.");
}
}
@@ -205,6 +147,9 @@ future<> password_authenticator::start() {
_stopped = do_after_system_ready(_as, [this] {
return async([this] {
if (legacy_mode(_qp)) {
if (!_superuser_created_promise.available()) {
_superuser_created_promise.set_value();
}
_migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get();
if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash, _superuser).get()) {
@@ -219,9 +164,12 @@ future<> password_authenticator::start() {
migrate_legacy_metadata().get();
return;
}
legacy_create_default_if_missing().get();
}
maybe_create_default_password_with_retries().get();
utils::get_local_injector().inject("password_authenticator_start_pause", utils::wait_for_message(5min)).get();
create_default_if_missing().get();
if (!legacy_mode(_qp)) {
_superuser_created_promise.set_value();
}
});
});
@@ -420,4 +368,8 @@ const resource_set& password_authenticator::protected_resources() const {
});
}
future<> password_authenticator::ensure_superuser_is_created() const {
return _superuser_created_promise.get_shared_future();
}
}

View File

@@ -11,6 +11,7 @@
#pragma once
#include <seastar/core/abort_source.hh>
#include <seastar/core/shared_future.hh>
#include "db/consistency_level_type.hh"
#include "auth/authenticator.hh"
@@ -40,7 +41,8 @@ class password_authenticator : public authenticator {
::service::migration_manager& _migration_manager;
future<> _stopped;
abort_source _as;
std::string _superuser; // default superuser name from the config (may or may not be present in roles table)
std::string _superuser;
shared_promise<> _superuser_created_promise;
public:
static db::consistency_level consistency_for_user(std::string_view role_name);
@@ -80,15 +82,14 @@ public:
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
virtual future<> ensure_superuser_is_created() const override;
private:
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata() const;
future<> legacy_create_default_if_missing();
future<> maybe_create_default_password();
future<> maybe_create_default_password_with_retries();
future<> create_default_if_missing();
sstring update_row_query() const;
};

View File

@@ -17,17 +17,12 @@
#include <seastar/core/format.hh>
#include <seastar/core/sstring.hh>
#include "auth/common.hh"
#include "auth/resource.hh"
#include "cql3/description.hh"
#include "seastarx.hh"
#include "exceptions/exceptions.hh"
#include "service/raft/raft_group0_client.hh"
namespace service {
class query_state;
};
namespace auth {
struct role_config final {
@@ -172,9 +167,9 @@ public:
/// (role2, role3)
/// }
///
virtual future<role_to_directly_granted_map> query_all_directly_granted(::service::query_state& = internal_distributed_query_state()) = 0;
virtual future<role_to_directly_granted_map> query_all_directly_granted() = 0;
virtual future<role_set> query_all(::service::query_state& = internal_distributed_query_state()) = 0;
virtual future<role_set> query_all() = 0;
virtual future<bool> exists(std::string_view role_name) = 0;
@@ -191,12 +186,12 @@ public:
///
/// \returns the value of the named attribute, if one is set.
///
virtual future<std::optional<sstring>> get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& = internal_distributed_query_state()) = 0;
virtual future<std::optional<sstring>> get_attribute(std::string_view role_name, std::string_view attribute_name) = 0;
///
/// \returns a mapping of each role's value for the named attribute, if one is set for the role.
///
virtual future<attribute_vals> query_attribute_for_all(std::string_view attribute_name, ::service::query_state& = internal_distributed_query_state()) = 0;
virtual future<attribute_vals> query_attribute_for_all(std::string_view attribute_name) = 0;
/// Sets `attribute_name` with `attribute_value` for `role_name`.
/// \returns an exceptional future with nonexistant_role if the role does not exist.

View File

@@ -55,6 +55,10 @@ public:
const resource_set& protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
virtual future<> ensure_superuser_is_created() const override {
return make_ready_future<>();
}
};
/// A set of four credential strings that saslauthd expects.

View File

@@ -240,13 +240,6 @@ future<> service::start(::service::migration_manager& mm, db::system_keyspace& s
});
}
co_await _role_manager->start();
if (this_shard_id() == 0) {
// Role manager and password authenticator have this odd startup
// mechanism where they asynchronously create the superuser role
// in the background. Correct password creation depends on role
// creation therefore we need to wait here.
co_await _role_manager->ensure_superuser_is_created();
}
co_await when_all_succeed(_authorizer->start(), _authenticator->start()).discard_result();
_permissions_cache = std::make_unique<permissions_cache>(_loading_cache_config, *this, log);
co_await once_among_shards([this] {
@@ -270,7 +263,8 @@ future<> service::stop() {
}
future<> service::ensure_superuser_is_created() {
return _role_manager->ensure_superuser_is_created();
co_await _role_manager->ensure_superuser_is_created();
co_await _authenticator->ensure_superuser_is_created();
}
void service::update_cache_config() {
@@ -682,7 +676,8 @@ future<permission_set> get_permissions(const service& ser, const authenticated_u
}
bool is_protected(const service& ser, command_desc cmd) noexcept {
if (cmd.type_ == command_desc::type::ALTER_WITH_OPTS) {
if (cmd.type_ == command_desc::type::ALTER_WITH_OPTS ||
cmd.type_ == command_desc::type::ALTER_SYSTEM_WITH_ALLOWED_OPTS) {
return false; // Table attributes are OK to modify; see #7057.
}
return ser.underlying_role_manager().protected_resources().contains(cmd.resource)

View File

@@ -224,6 +224,7 @@ struct command_desc {
const ::auth::resource& resource; ///< Resource impacted by this command.
enum class type {
ALTER_WITH_OPTS, ///< Command is ALTER ... WITH ...
ALTER_SYSTEM_WITH_ALLOWED_OPTS,
OTHER
} type_ = type::OTHER;
};

View File

@@ -9,7 +9,6 @@
#include "auth/standard_role_manager.hh"
#include <optional>
#include <stdexcept>
#include <unordered_set>
#include <vector>
@@ -29,7 +28,6 @@
#include "cql3/util.hh"
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "utils/error_injection.hh"
#include "utils/log.hh"
#include <seastar/core/loop.hh>
#include <seastar/coroutine/maybe_yield.hh>
@@ -180,8 +178,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
_migration_manager)).discard_result();
}
future<> standard_role_manager::legacy_create_default_role_if_missing() {
SCYLLA_ASSERT(legacy_mode(_qp));
future<> standard_role_manager::create_default_role_if_missing() {
try {
const auto exists = co_await default_role_row_satisfies(_qp, &has_can_login, _superuser);
if (exists) {
@@ -191,12 +188,16 @@ future<> standard_role_manager::legacy_create_default_role_if_missing() {
get_auth_ks_name(_qp),
meta::roles_table::name,
meta::roles_table::role_col_name);
co_await _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state(),
{_superuser},
cql3::query_processor::cache_internal::no).discard_result();
if (legacy_mode(_qp)) {
co_await _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_query_state(),
{_superuser},
cql3::query_processor::cache_internal::no).discard_result();
} else {
co_await announce_mutations(_qp, _group0_client, query, {_superuser}, _as, ::service::raft_timeout{});
}
log.info("Created default superuser role '{}'.", _superuser);
} catch(const exceptions::unavailable_exception& e) {
log.warn("Skipped default role setup: some nodes were not ready; will retry");
@@ -204,60 +205,6 @@ future<> standard_role_manager::legacy_create_default_role_if_missing() {
}
}
future<> standard_role_manager::maybe_create_default_role() {
auto has_superuser = [this] () -> future<bool> {
const sstring query = seastar::format("SELECT * FROM {}.{} WHERE is_superuser = true ALLOW FILTERING", get_auth_ks_name(_qp), meta::roles_table::name);
auto results = co_await _qp.execute_internal(query, db::consistency_level::LOCAL_ONE,
internal_distributed_query_state(), cql3::query_processor::cache_internal::yes);
for (const auto& result : *results) {
if (has_can_login(result)) {
co_return true;
}
}
co_return false;
};
if (co_await has_superuser()) {
co_return;
}
// We don't want to start operation earlier to avoid quorum requirement in
// a common case.
::service::group0_batch batch(
co_await _group0_client.start_operation(_as, get_raft_timeout()));
// Check again as the state may have changed before we took the guard (batch).
if (co_await has_superuser()) {
co_return;
}
// There is no superuser which has can_login field - create default role.
// Note that we don't check if can_login is set to true.
const sstring insert_query = seastar::format("INSERT INTO {}.{} ({}, is_superuser, can_login) VALUES (?, true, true)",
get_auth_ks_name(_qp),
meta::roles_table::name,
meta::roles_table::role_col_name);
co_await collect_mutations(_qp, batch, insert_query, {_superuser});
co_await std::move(batch).commit(_group0_client, _as, get_raft_timeout());
log.info("Created default superuser role '{}'.", _superuser);
}
future<> standard_role_manager::maybe_create_default_role_with_retries() {
size_t retries = _migration_manager.get_concurrent_ddl_retries();
while (true) {
try {
co_return co_await maybe_create_default_role();
} catch (const ::service::group0_concurrent_modification& ex) {
log.warn("Failed to execute maybe_create_default_role due to guard conflict.{}.", retries ? " Retrying" : " Number of retries exceeded, giving up");
if (retries--) {
continue;
}
// Log error but don't crash the whole node startup sequence.
log.error("Failed to create default superuser role due to guard conflict.");
co_return;
} catch (const ::service::raft_operation_timeout_error& ex) {
log.error("Failed to create default superuser role due to exception: {}", ex.what());
co_return;
}
}
}
static const sstring legacy_table_name{"users"};
bool standard_role_manager::legacy_metadata_exists() {
@@ -319,13 +266,10 @@ future<> standard_role_manager::start() {
co_await migrate_legacy_metadata();
co_return;
}
co_await legacy_create_default_role_if_missing();
}
co_await create_default_role_if_missing();
if (!legacy) {
co_await maybe_create_default_role_with_retries();
if (!_superuser_created_promise.available()) {
_superuser_created_promise.set_value();
}
_superuser_created_promise.set_value();
}
};
@@ -652,30 +596,21 @@ future<role_set> standard_role_manager::query_granted(std::string_view grantee_n
});
}
future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted(::service::query_state& qs) {
future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted() {
const sstring query = seastar::format("SELECT * FROM {}.{}",
get_auth_ks_name(_qp),
meta::role_members_table::name);
const auto results = co_await _qp.execute_internal(
query,
db::consistency_level::ONE,
qs,
cql3::query_processor::cache_internal::yes);
role_to_directly_granted_map roles_map;
std::transform(
results->begin(),
results->end(),
std::inserter(roles_map, roles_map.begin()),
[] (const cql3::untyped_result_set_row& row) {
return std::make_pair(row.get_as<sstring>("member"), row.get_as<sstring>("role")); }
);
co_await _qp.query_internal(query, [&roles_map] (const cql3::untyped_result_set_row& row) -> future<stop_iteration> {
roles_map.insert({row.get_as<sstring>("member"), row.get_as<sstring>("role")});
co_return stop_iteration::no;
});
co_return roles_map;
}
future<role_set> standard_role_manager::query_all(::service::query_state& qs) {
future<role_set> standard_role_manager::query_all() {
const sstring query = seastar::format("SELECT {} FROM {}.{}",
meta::roles_table::role_col_name,
get_auth_ks_name(_qp),
@@ -684,16 +619,10 @@ future<role_set> standard_role_manager::query_all(::service::query_state& qs) {
// To avoid many copies of a view.
static const auto role_col_name_string = sstring(meta::roles_table::role_col_name);
if (utils::get_local_injector().enter("standard_role_manager_fail_legacy_query")) {
if (legacy_mode(_qp)) {
throw std::runtime_error("standard_role_manager::query_all: failed due to error injection");
}
}
const auto results = co_await _qp.execute_internal(
query,
db::consistency_level::QUORUM,
qs,
internal_distributed_query_state(),
cql3::query_processor::cache_internal::yes);
role_set roles;
@@ -725,11 +654,11 @@ future<bool> standard_role_manager::can_login(std::string_view role_name) {
});
}
future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) {
future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name) {
const sstring query = seastar::format("SELECT name, value FROM {}.{} WHERE role = ? AND name = ?",
get_auth_ks_name(_qp),
meta::role_attributes_table::name);
const auto result_set = co_await _qp.execute_internal(query, db::consistency_level::ONE, qs, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes);
const auto result_set = co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes);
if (!result_set->empty()) {
const cql3::untyped_result_set_row &row = result_set->one();
co_return std::optional<sstring>(row.get_as<sstring>("value"));
@@ -737,11 +666,11 @@ future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_
co_return std::optional<sstring>{};
}
future<role_manager::attribute_vals> standard_role_manager::query_attribute_for_all (std::string_view attribute_name, ::service::query_state& qs) {
return query_all(qs).then([this, attribute_name, &qs] (role_set roles) {
return do_with(attribute_vals{}, [this, attribute_name, roles = std::move(roles), &qs] (attribute_vals &role_to_att_val) {
return parallel_for_each(roles.begin(), roles.end(), [this, &role_to_att_val, attribute_name, &qs] (sstring role) {
return get_attribute(role, attribute_name, qs).then([&role_to_att_val, role] (std::optional<sstring> att_val) {
future<role_manager::attribute_vals> standard_role_manager::query_attribute_for_all (std::string_view attribute_name) {
return query_all().then([this, attribute_name] (role_set roles) {
return do_with(attribute_vals{}, [this, attribute_name, roles = std::move(roles)] (attribute_vals &role_to_att_val) {
return parallel_for_each(roles.begin(), roles.end(), [this, &role_to_att_val, attribute_name] (sstring role) {
return get_attribute(role, attribute_name).then([&role_to_att_val, role] (std::optional<sstring> att_val) {
if (att_val) {
role_to_att_val.emplace(std::move(role), std::move(*att_val));
}
@@ -786,7 +715,7 @@ future<> standard_role_manager::remove_attribute(std::string_view role_name, std
future<std::vector<cql3::description>> standard_role_manager::describe_role_grants() {
std::vector<cql3::description> result{};
const auto grants = co_await query_all_directly_granted(internal_distributed_query_state());
const auto grants = co_await query_all_directly_granted();
result.reserve(grants.size());
for (const auto& [grantee_role, granted_role] : grants) {

View File

@@ -66,9 +66,9 @@ public:
virtual future<role_set> query_granted(std::string_view grantee_name, recursive_role_query) override;
virtual future<role_to_directly_granted_map> query_all_directly_granted(::service::query_state&) override;
virtual future<role_to_directly_granted_map> query_all_directly_granted() override;
virtual future<role_set> query_all(::service::query_state&) override;
virtual future<role_set> query_all() override;
virtual future<bool> exists(std::string_view role_name) override;
@@ -76,9 +76,9 @@ public:
virtual future<bool> can_login(std::string_view role_name) override;
virtual future<std::optional<sstring>> get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state&) override;
virtual future<std::optional<sstring>> get_attribute(std::string_view role_name, std::string_view attribute_name) override;
virtual future<role_manager::attribute_vals> query_attribute_for_all(std::string_view attribute_name, ::service::query_state&) override;
virtual future<role_manager::attribute_vals> query_attribute_for_all(std::string_view attribute_name) override;
virtual future<> set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value, ::service::group0_batch& mc) override;
@@ -95,10 +95,7 @@ private:
future<> migrate_legacy_metadata();
future<> legacy_create_default_role_if_missing();
future<> maybe_create_default_role();
future<> maybe_create_default_role_with_retries();
future<> create_default_role_if_missing();
future<> create_or_replace(std::string_view role_name, const role_config&, ::service::group0_batch&);

View File

@@ -159,6 +159,10 @@ public:
};
return ::make_shared<sasl_wrapper>(_authenticator->new_sasl_challenge());
}
virtual future<> ensure_superuser_is_created() const override {
return _authenticator->ensure_superuser_is_created();
}
};
class transitional_authorizer : public authorizer {

View File

@@ -1,8 +1,10 @@
#!/bin/bash
#!/bin/bash -e
# Copyright (C) 2023-present ScyllaDB
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
trap 'echo "error $? in $0 line $LINENO"' ERR
here=$(dirname "$0")
exec "$here/../tools/cqlsh/bin/cqlsh.py" "$@"

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/bash -e
#
# Copyright (C) 2024-present ScyllaDB
#
@@ -6,6 +6,8 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
#
trap 'echo "error $? in $0 line $LINENO"' ERR
SCRIPT_PATH=$(dirname $(realpath "$0"))
INSTALLED_SCYLLA_PATH="${SCRIPT_PATH}/scylla"

View File

@@ -365,9 +365,6 @@ cdc::topology_description make_new_generation_description(
const noncopyable_function<std::pair<size_t, uint8_t>(dht::token)>& get_sharding_info,
const locator::token_metadata_ptr tmptr) {
const auto tokens = get_tokens(bootstrap_tokens, tmptr);
if (tokens.empty()) {
on_internal_error(cdc_log, "Attempted to create a CDC generation from an empty list of tokens");
}
utils::chunked_vector<token_range_description> vnode_descriptions;
vnode_descriptions.reserve(tokens.size());
@@ -1028,7 +1025,7 @@ future<> generation_service::legacy_handle_cdc_generation(std::optional<cdc::gen
if (using_this_gen) {
cdc_log.info("Starting to use generation {}", *gen_id);
co_await update_streams_description(*gen_id, _sys_ks.local(), get_sys_dist_ks(),
[tmptr = _token_metadata.get()] { return tmptr->count_normal_token_owners(); },
[&tm = _token_metadata] { return tm.get()->count_normal_token_owners(); },
_abort_src);
}
}
@@ -1045,7 +1042,7 @@ void generation_service::legacy_async_handle_cdc_generation(cdc::generation_id g
if (using_this_gen) {
cdc_log.info("Starting to use generation {}", gen_id);
co_await update_streams_description(gen_id, svc->_sys_ks.local(), svc->get_sys_dist_ks(),
[tmptr = svc->_token_metadata.get()] { return tmptr->count_normal_token_owners(); },
[&tm = svc->_token_metadata] { return tm.get()->count_normal_token_owners(); },
svc->_abort_src);
}
co_return;

View File

@@ -10,7 +10,6 @@
#include <algorithm>
#include <boost/range/irange.hpp>
#include <boost/algorithm/string/predicate.hpp>
#include <seastar/core/thread.hh>
#include <seastar/core/metrics.hh>
@@ -41,6 +40,7 @@
#include "types/listlike_partial_deserializing_iterator.hh"
#include "tracing/trace_state.hh"
#include "stats.hh"
#include "utils/labels.hh"
namespace std {
@@ -56,17 +56,8 @@ using namespace std::chrono_literals;
logging::logger cdc_log("cdc");
namespace {
// When dropping a column from a CDC log table, we set the drop timestamp
// `column_drop_leeway` seconds into the future to ensure that for writes concurrent
// with column drop, the write timestamp is before the column drop timestamp.
constexpr auto column_drop_leeway = std::chrono::seconds(5);
} // anonymous namespace
namespace cdc {
static schema_ptr create_log_schema(const schema&, api::timestamp_type, std::optional<table_id> = {}, schema_ptr = nullptr);
static schema_ptr create_log_schema(const schema&, std::optional<table_id> = {}, schema_ptr = nullptr);
}
static constexpr auto cdc_group_name = "cdc";
@@ -77,7 +68,7 @@ void cdc::stats::parts_touched_stats::register_metrics(seastar::metrics::metric_
metrics.add_group(cdc_group_name, {
sm::make_total_operations(seastar::format("operations_on_{}_performed_{}", part_name, suffix), count[(size_t)part],
sm::description(seastar::format("number of {} CDC operations that processed a {}", suffix, part_name)),
{})
{cdc_label}).set_skip_when_empty()
});
};
@@ -100,23 +91,23 @@ cdc::stats::stats() {
_metrics.add_group(cdc_group_name, {
sm::make_total_operations("operations_" + kind, counters.unsplit_count,
sm::description(format("number of {} CDC operations", kind)),
{split_label(false)}),
{split_label(false), basic_level, cdc_label}).set_skip_when_empty(),
sm::make_total_operations("operations_" + kind, counters.split_count,
sm::description(format("number of {} CDC operations", kind)),
{split_label(true)}),
{split_label(true), basic_level, cdc_label}).set_skip_when_empty(),
sm::make_total_operations("preimage_selects_" + kind, counters.preimage_selects,
sm::description(format("number of {} preimage queries performed", kind)),
{}),
{cdc_label}).set_skip_when_empty(),
sm::make_total_operations("operations_with_preimage_" + kind, counters.with_preimage_count,
sm::description(format("number of {} operations that included preimage", kind)),
{}),
{cdc_label}).set_skip_when_empty(),
sm::make_total_operations("operations_with_postimage_" + kind, counters.with_postimage_count,
sm::description(format("number of {} operations that included postimage", kind)),
{})
{cdc_label}).set_skip_when_empty()
});
counters.touches.register_metrics(_metrics, kind);
@@ -176,7 +167,7 @@ public:
ensure_that_table_uses_vnodes(ksm, schema);
// in seastar thread
auto log_schema = create_log_schema(schema, timestamp);
auto log_schema = create_log_schema(schema);
auto log_mut = db::schema_tables::make_create_table_mutations(log_schema, timestamp);
@@ -214,7 +205,7 @@ public:
ensure_that_table_has_no_counter_columns(new_schema);
ensure_that_table_uses_vnodes(*keyspace.metadata(), new_schema);
auto new_log_schema = create_log_schema(new_schema, timestamp, log_schema ? std::make_optional(log_schema->id()) : std::nullopt, log_schema);
auto new_log_schema = create_log_schema(new_schema, log_schema ? std::make_optional(log_schema->id()) : std::nullopt, log_schema);
auto log_mut = log_schema
? db::schema_tables::make_update_table_mutations(db, keyspace.metadata(), log_schema, new_log_schema, timestamp)
@@ -429,7 +420,7 @@ static const sstring cdc_deleted_column_prefix = cdc_meta_column_prefix + "delet
static const sstring cdc_deleted_elements_column_prefix = cdc_meta_column_prefix + "deleted_elements_";
bool is_log_name(const std::string_view& table_name) {
return boost::ends_with(table_name, cdc_log_suffix);
return table_name.ends_with(cdc_log_suffix);
}
bool is_cdc_metacolumn_name(const sstring& name) {
@@ -505,7 +496,7 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
}
static schema_ptr create_log_schema(const schema& s, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old) {
static schema_ptr create_log_schema(const schema& s, std::optional<table_id> uuid, schema_ptr old) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.with_partitioner(cdc::cdc_partitioner::classname);
b.set_compaction_strategy(sstables::compaction_strategy_type::time_window);
@@ -540,28 +531,6 @@ static schema_ptr create_log_schema(const schema& s, api::timestamp_type timesta
b.with_column(log_meta_column_name_bytes("ttl"), long_type);
b.with_column(log_meta_column_name_bytes("end_of_batch"), boolean_type);
b.set_caching_options(caching_options::get_disabled_caching_options());
auto validate_new_column = [&] (const sstring& name) {
// When dropping a column from a CDC log table, we set the drop timestamp to be
// `column_drop_leeway` seconds into the future (see `create_log_schema`).
// Therefore, when recreating a column with the same name, we need to validate
// that it's not recreated too soon and that the drop timestamp has passed.
if (old && old->dropped_columns().contains(name)) {
const auto& drop_info = old->dropped_columns().at(name);
auto create_time = api::timestamp_clock::time_point(api::timestamp_clock::duration(timestamp));
auto drop_time = api::timestamp_clock::time_point(api::timestamp_clock::duration(drop_info.timestamp));
if (drop_time > create_time) {
throw exceptions::invalid_request_exception(format("Cannot add column {} because a column with the same name was dropped too recently. Please retry after {} seconds",
name, std::chrono::duration_cast<std::chrono::seconds>(drop_time - create_time).count() + 1));
}
}
};
auto add_column = [&] (sstring name, data_type type) {
validate_new_column(name);
b.with_column(to_bytes(name), type);
};
auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
for (const auto& column : columns) {
auto type = column.type;
@@ -583,9 +552,9 @@ static schema_ptr create_log_schema(const schema& s, api::timestamp_type timesta
}
));
}
add_column(log_data_column_name(column.name_as_text()), type);
b.with_column(log_data_column_name_bytes(column.name()), type);
if (is_data_col) {
add_column(log_data_column_deleted_name(column.name_as_text()), boolean_type);
b.with_column(log_data_column_deleted_name_bytes(column.name()), boolean_type);
}
if (column.type->is_multi_cell()) {
auto dtype = visit(*type, make_visitor(
@@ -601,7 +570,7 @@ static schema_ptr create_log_schema(const schema& s, api::timestamp_type timesta
throw std::invalid_argument("Should not reach");
}
));
add_column(log_data_column_deleted_elements_name(column.name_as_text()), dtype);
b.with_column(log_data_column_deleted_elements_name_bytes(column.name()), dtype);
}
}
};
@@ -623,8 +592,7 @@ static schema_ptr create_log_schema(const schema& s, api::timestamp_type timesta
// not super efficient, but we don't do this often.
for (auto& col : old->all_columns()) {
if (!b.has_column({col.name(), col.name_as_text() })) {
auto drop_ts = api::timestamp_clock::now() + column_drop_leeway;
b.without_column(col.name_as_text(), col.type, drop_ts.time_since_epoch().count());
b.without_column(col.name_as_text(), col.type, api::new_timestamp());
}
}
}
@@ -992,12 +960,8 @@ public:
// Given a reference to such a column from the base schema, this function sets the corresponding column
// in the log to the given value for the given row.
void set_value(const clustering_key& log_ck, const column_definition& base_cdef, const managed_bytes_view& value) {
auto log_cdef_ptr = _log_schema.get_column_definition(log_data_column_name_bytes(base_cdef.name()));
if (!log_cdef_ptr) {
throw exceptions::invalid_request_exception(format("CDC log schema for {}.{} does not have base column {}",
_log_schema.ks_name(), _log_schema.cf_name(), base_cdef.name_as_text()));
}
_log_mut.set_cell(log_ck, *log_cdef_ptr, atomic_cell::make_live(*base_cdef.type, _ts, value, _ttl));
auto& log_cdef = *_log_schema.get_column_definition(log_data_column_name_bytes(base_cdef.name()));
_log_mut.set_cell(log_ck, log_cdef, atomic_cell::make_live(*base_cdef.type, _ts, value, _ttl));
}
// Each regular and static column in the base schema has a corresponding column in the log schema
@@ -1005,13 +969,7 @@ public:
// Given a reference to such a column from the base schema, this function sets the corresponding column
// in the log to `true` for the given row. If not called, the column will be `null`.
void set_deleted(const clustering_key& log_ck, const column_definition& base_cdef) {
auto log_cdef_ptr = _log_schema.get_column_definition(log_data_column_deleted_name_bytes(base_cdef.name()));
if (!log_cdef_ptr) {
throw exceptions::invalid_request_exception(format("CDC log schema for {}.{} does not have base column {}",
_log_schema.ks_name(), _log_schema.cf_name(), base_cdef.name_as_text()));
}
auto& log_cdef = *log_cdef_ptr;
_log_mut.set_cell(log_ck, *log_cdef_ptr, atomic_cell::make_live(*log_cdef.type, _ts, log_cdef.type->decompose(true), _ttl));
_log_mut.set_cell(log_ck, log_data_column_deleted_name_bytes(base_cdef.name()), data_value(true), _ts, _ttl);
}
// Each regular and static non-atomic column in the base schema has a corresponding column in the log schema
@@ -1020,12 +978,7 @@ public:
// Given a reference to such a column from the base schema, this function sets the corresponding column
// in the log to the given set of keys for the given row.
void set_deleted_elements(const clustering_key& log_ck, const column_definition& base_cdef, const managed_bytes& deleted_elements) {
auto log_cdef_ptr = _log_schema.get_column_definition(log_data_column_deleted_elements_name_bytes(base_cdef.name()));
if (!log_cdef_ptr) {
throw exceptions::invalid_request_exception(format("CDC log schema for {}.{} does not have base column {}",
_log_schema.ks_name(), _log_schema.cf_name(), base_cdef.name_as_text()));
}
auto& log_cdef = *log_cdef_ptr;
auto& log_cdef = *_log_schema.get_column_definition(log_data_column_deleted_elements_name_bytes(base_cdef.name()));
_log_mut.set_cell(log_ck, log_cdef, atomic_cell::make_live(*log_cdef.type, _ts, deleted_elements, _ttl));
}
@@ -1912,10 +1865,5 @@ bool cdc::cdc_service::needs_cdc_augmentation(const std::vector<mutation>& mutat
future<std::tuple<std::vector<mutation>, lw_shared_ptr<cdc::operation_result_tracker>>>
cdc::cdc_service::augment_mutation_call(lowres_clock::time_point timeout, std::vector<mutation>&& mutations, tracing::trace_state_ptr tr_state, db::consistency_level write_cl) {
if (utils::get_local_injector().enter("sleep_before_cdc_augmentation")) {
return seastar::sleep(std::chrono::milliseconds(100)).then([this, timeout, mutations = std::move(mutations), tr_state = std::move(tr_state), write_cl] () mutable {
return _impl->augment_mutation_call(timeout, std::move(mutations), std::move(tr_state), write_cl);
});
}
return _impl->augment_mutation_call(timeout, std::move(mutations), std::move(tr_state), write_cl);
}

View File

@@ -16,13 +16,13 @@
#include "mutation/mutation_fragment.hh"
#include "mutation/mutation_fragment_v2.hh"
#include <boost/range/iterator_range.hpp>
#include <ranges>
// Utility for in-order checking of overlap with position ranges.
class clustering_ranges_walker {
const schema& _schema;
const query::clustering_row_ranges& _ranges;
boost::iterator_range<query::clustering_row_ranges::const_iterator> _current_range;
std::ranges::subrange<query::clustering_row_ranges::const_iterator> _current_range;
bool _in_current; // next position is known to be >= _current_start
bool _past_current; // next position is known to be >= _current_end
bool _using_clustering_range; // Whether current range comes from _current_range
@@ -40,7 +40,7 @@ private:
if (!_current_range) {
return false;
}
_current_range.advance_begin(1);
_current_range.advance(1);
}
++_change_counter;
_using_clustering_range = true;

View File

@@ -1,16 +1,36 @@
set(LINK_MEM_PER_JOB 4096 CACHE INTERNAL "Maximum memory used by each link job in (in MiB)")
if(NOT DEFINED Scylla_PARALLEL_LINK_JOBS)
if(NOT DEFINED Scylla_RAM_PER_LINK_JOB)
# preserve user-provided value
set(_default_ram_value 4096)
if(Scylla_ENABLE_LTO)
# When ThinLTO optimization is enabled, the linker uses all available CPU threads.
# To prevent excessive memory usage, we limit parallel link jobs based on available RAM,
# as each link job requires significant memory during optimization.
set(_default_ram_value 16384)
endif()
set(Scylla_RAM_PER_LINK_JOB ${_default_ram_value} CACHE STRING
"Maximum amount of memory used by each link job (in MiB)")
endif()
cmake_host_system_information(
RESULT _total_mem_mb
QUERY AVAILABLE_PHYSICAL_MEMORY)
math(EXPR _link_pool_depth "${_total_mem_mb} / ${Scylla_RAM_PER_LINK_JOB}")
# Use 2 parallel link jobs to optimize build throughput. The main executable requires
# LTO (slower link phase) while tests are linked without LTO (faster link phase).
# This allows simultaneous linking of LTO and non-LTO targets, enabling better CPU
# utilization by overlapping the slower LTO link with faster test links.
if(_link_pool_depth LESS 2)
set(_link_pool_depth 2)
endif()
cmake_host_system_information(
RESULT _total_mem
QUERY AVAILABLE_PHYSICAL_MEMORY)
math(EXPR _link_pool_depth "${_total_mem} / ${LINK_MEM_PER_JOB}")
if(_link_pool_depth EQUAL 0)
set(_link_pool_depth 1)
set(Scylla_PARALLEL_LINK_JOBS "${_link_pool_depth}" CACHE STRING
"Maximum number of concurrent link jobs")
endif()
set_property(
GLOBAL
APPEND
PROPERTY JOB_POOLS
link_pool=${_link_pool_depth}
link_pool=${Scylla_PARALLEL_LINK_JOBS}
submodule_pool=1)
set(CMAKE_JOB_POOL_LINK link_pool)

View File

@@ -122,7 +122,9 @@ if(target_arch)
add_compile_options("-march=${target_arch}")
endif()
add_compile_options("SHELL:-Xclang -fexperimental-assignment-tracking=disabled")
if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
add_compile_options("SHELL:-Xclang -fexperimental-assignment-tracking=disabled")
endif()
function(maybe_limit_stack_usage_in_KB stack_usage_threshold_in_KB config)
math(EXPR _stack_usage_threshold_in_bytes "${stack_usage_threshold_in_KB} * 1024")

View File

@@ -16,8 +16,6 @@
#include "collection_mutation.hh"
#include <boost/range/numeric.hpp>
bytes_view collection_mutation_input_stream::read_linearized(size_t n) {
managed_bytes_view mbv = ::read_simple_bytes(_src, n);
if (mbv.is_linearized()) {

View File

@@ -769,7 +769,7 @@ private:
}
virtual sstables::sstable_set make_sstable_set_for_input() const {
return _table_s.get_compaction_strategy().make_sstable_set(_table_s);
return _table_s.get_compaction_strategy().make_sstable_set(_schema);
}
const tombstone_gc_state& get_tombstone_gc_state() const {
@@ -1301,7 +1301,7 @@ public:
}
virtual sstables::sstable_set make_sstable_set_for_input() const override {
return sstables::make_partitioned_sstable_set(_schema, _table_s.token_range());
return sstables::make_partitioned_sstable_set(_schema, false);
}
// Unconditionally enable incremental compaction if the strategy specifies a max output size, e.g. LCS.
@@ -1910,11 +1910,7 @@ static future<compaction_result> scrub_sstables_validate_mode(sstables::compacti
using scrub = sstables::compaction_type_options::scrub;
if (validation_errors != 0 && descriptor.options.as<scrub>().quarantine_sstables == scrub::quarantine_invalid_sstables::yes) {
for (auto& sst : descriptor.sstables) {
try {
co_await sst->change_state(sstables::sstable_state::quarantine);
} catch (...) {
clogger.error("Moving {} to quarantine failed due to {}, continuing.", sst->get_filename(), std::current_exception());
}
co_await sst->change_state(sstables::sstable_state::quarantine);
}
}

View File

@@ -29,6 +29,7 @@
#include "db/system_keyspace.hh"
#include "tombstone_gc-internals.hh"
#include <cmath>
#include "utils/labels.hh"
static logging::logger cmlog("compaction_manager");
using namespace std::chrono_literals;
@@ -394,7 +395,7 @@ future<sstables::sstable_set> compaction_task_executor::sstable_set_for_tombston
auto compound_set = t.sstable_set_for_tombstone_gc();
// Compound set will be linearized into a single set, since compaction might add or remove sstables
// to it for incremental compaction to work.
auto new_set = sstables::make_partitioned_sstable_set(t.schema(), t.token_range());
auto new_set = sstables::make_partitioned_sstable_set(t.schema(), false);
co_await compound_set->for_each_sstable_gently([&] (const sstables::shared_sstable& sst) {
auto inserted = new_set.insert(sst);
if (!inserted) {
@@ -997,7 +998,7 @@ void compaction_manager::register_metrics() {
_metrics.add_group("compaction_manager", {
sm::make_gauge("compactions", [this] { return _stats.active_tasks; },
sm::description("Holds the number of currently active compactions.")),
sm::description("Holds the number of currently active compactions."))(basic_level),
sm::make_gauge("pending_compactions", [this] { return _stats.pending_tasks; },
sm::description("Holds the number of compaction tasks waiting for an opportunity to run.")),
sm::make_counter("completed_compactions", [this] { return _stats.completed_tasks; },
@@ -1127,16 +1128,16 @@ future<> compaction_manager::drain() {
// Disable the state so that it can be enabled later if requested.
_state = state::disabled;
}
_compaction_submission_timer.cancel();
// Stop ongoing compactions, if the request has not been sent already and wait for them to stop.
co_await stop_ongoing_compactions("drain");
// Trigger a signal to properly exit from postponed_compactions_reevaluation() fiber
reevaluate_postponed_compactions();
cmlog.info("Drained");
}
future<> compaction_manager::stop() {
do_stop();
if (auto cm = std::exchange(_task_manager_module, nullptr)) {
co_await cm->stop();
}
if (_stop_future) {
co_await std::exchange(*_stop_future, make_ready_future());
}
@@ -1147,15 +1148,14 @@ future<> compaction_manager::really_do_stop() noexcept {
// Reset the metrics registry
_metrics.clear();
co_await stop_ongoing_compactions("shutdown");
co_await _task_manager_module->stop();
if (!_tasks.empty()) {
on_fatal_internal_error(cmlog, format("{} tasks still exist after being stopped", _tasks.size()));
}
co_await coroutine::parallel_for_each(_compaction_state | std::views::values, [] (compaction_state& cs) -> future<> {
if (!cs.gate.is_closed()) {
co_await cs.gate.close();
}
});
if (!_tasks.empty()) {
on_fatal_internal_error(cmlog, format("{} tasks still exist after being stopped", _tasks.size()));
}
reevaluate_postponed_compactions();
co_await std::move(_waiting_reevalution);
_weight_tracker.clear();
@@ -1818,21 +1818,8 @@ future<compaction_manager::compaction_stats_opt> compaction_manager::perform_sst
if (!gh) {
co_return compaction_stats_opt{};
}
// Collect and register all sstables as compacting while compaction is disabled, to avoid a race condition where
// regular compaction runs in between and picks the same files.
std::vector<sstables::shared_sstable> all_sstables;
compacting_sstable_registration compacting(*this, get_compaction_state(&t));
co_await run_with_compaction_disabled(t, [&all_sstables, &compacting, &t] () -> future<> {
// All sstables must be included.
all_sstables = get_all_sstables(t);
compacting.register_compacting(all_sstables);
return make_ready_future<>();
});
if (all_sstables.empty()) {
co_return compaction_stats_opt{};
}
// All sstables must be included, even the ones being compacted, such that everything in table is validated.
auto all_sstables = get_all_sstables(t);
co_return co_await perform_compaction<validate_sstables_compaction_task_executor>(throw_if_stopping::no, info, &t, info.id, std::move(all_sstables), quarantine_sstables);
}

View File

@@ -300,11 +300,6 @@ public:
// unless it is moved back to enabled state.
future<> drain();
// Check if compaction manager is running, i.e. it was enabled or drained
bool is_running() const noexcept {
return _state == state::enabled || _state == state::disabled;
}
using compaction_history_consumer = noncopyable_function<future<>(const db::compaction_history_entry&)>;
future<> get_compaction_history(compaction_history_consumer&& f);

View File

@@ -789,8 +789,8 @@ future<reshape_config> make_reshape_config(const sstables::storage& storage, res
};
}
std::unique_ptr<sstable_set_impl> incremental_compaction_strategy::make_sstable_set(const table_state& ts) const {
return std::make_unique<partitioned_sstable_set>(ts.schema(), ts.token_range());
std::unique_ptr<sstable_set_impl> incremental_compaction_strategy::make_sstable_set(schema_ptr schema) const {
return std::make_unique<partitioned_sstable_set>(std::move(schema), false);
}
}

View File

@@ -105,7 +105,7 @@ public:
return name(type());
}
sstable_set make_sstable_set(const table_state& ts) const;
sstable_set make_sstable_set(schema_ptr schema) const;
compaction_backlog_tracker make_backlog_tracker() const;

View File

@@ -56,7 +56,7 @@ public:
return true;
}
virtual int64_t estimated_pending_compactions(table_state& table_s) const = 0;
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(const table_state& ts) const;
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(schema_ptr schema) const;
bool use_clustering_key_filter() const {
return _use_clustering_key_filter;

View File

@@ -6,7 +6,6 @@
#include "incremental_backlog_tracker.hh"
#include "sstables/sstables.hh"
#include <boost/range/adaptor/map.hpp>
using namespace sstables;

View File

@@ -11,10 +11,6 @@
#include "compaction_manager.hh"
#include "incremental_compaction_strategy.hh"
#include "incremental_backlog_tracker.hh"
#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/range/numeric.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>
#include <ranges>
namespace sstables {
@@ -152,7 +148,7 @@ bool incremental_compaction_strategy::is_bucket_interesting(const std::vector<ss
}
bool incremental_compaction_strategy::is_any_bucket_interesting(const std::vector<std::vector<sstables::frozen_sstable_run>>& buckets, size_t min_threshold) const {
return boost::algorithm::any_of(buckets, [&] (const std::vector<sstables::frozen_sstable_run>& bucket) {
return std::ranges::any_of(buckets, [&] (const std::vector<sstables::frozen_sstable_run>& bucket) {
return this->is_bucket_interesting(bucket, min_threshold);
});
}
@@ -282,7 +278,7 @@ incremental_compaction_strategy::find_garbage_collection_job(const compaction::t
};
auto compaction_time = gc_clock::now();
auto can_garbage_collect = [&] (const size_bucket_t& bucket) {
return boost::algorithm::any_of(bucket, [&] (const frozen_sstable_run& r) {
return std::ranges::any_of(bucket, [&] (const frozen_sstable_run& r) {
return worth_dropping_tombstones(*r, compaction_time);
});
};
@@ -418,11 +414,10 @@ int64_t incremental_compaction_strategy::estimated_pending_compactions(table_sta
std::vector<shared_sstable>
incremental_compaction_strategy::runs_to_sstables(std::vector<frozen_sstable_run> runs) {
return boost::accumulate(runs, std::vector<shared_sstable>(), [&] (std::vector<shared_sstable>&& v, const frozen_sstable_run& run_ptr) {
auto& run = *run_ptr;
v.insert(v.end(), run.all().begin(), run.all().end());
return std::move(v);
});
return runs
| std::views::transform([] (auto& run) -> auto& { return run->all(); })
| std::views::join
| std::ranges::to<std::vector>();
}
std::vector<frozen_sstable_run>
@@ -441,8 +436,8 @@ void incremental_compaction_strategy::sort_run_bucket_by_first_key(size_bucket_t
auto sst_first_key_less = [&schema] (const shared_sstable& sst_a, const shared_sstable& sst_b) {
return sst_a->get_first_decorated_key().tri_compare(*schema, sst_b->get_first_decorated_key()) <= 0;
};
auto& a_first = *boost::min_element(a->all(), sst_first_key_less);
auto& b_first = *boost::min_element(b->all(), sst_first_key_less);
auto& a_first = *std::ranges::min_element(a->all(), sst_first_key_less);
auto& b_first = *std::ranges::min_element(b->all(), sst_first_key_less);
return a_first->get_first_decorated_key().tri_compare(*schema, b_first->get_first_decorated_key()) <= 0;
});
}

View File

@@ -7,7 +7,6 @@
#pragma once
#include "compaction_strategy_impl.hh"
#include <boost/range/algorithm.hpp>
class incremental_backlog_tracker;
@@ -99,7 +98,7 @@ public:
virtual compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const override;
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(const table_state& ts) const override;
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(schema_ptr schema) const override;
friend class ::incremental_backlog_tracker;
};

View File

@@ -11,8 +11,6 @@
#include "compaction_strategy_state.hh"
#include <algorithm>
#include <boost/range/algorithm/remove_if.hpp>
namespace sstables {
leveled_compaction_strategy_state& leveled_compaction_strategy::get_state(table_state& table_s) const {
@@ -50,10 +48,9 @@ compaction_descriptor leveled_compaction_strategy::get_sstables_for_compaction(t
for (auto level = int(manifest.get_level_count()); level >= 0; level--) {
auto& sstables = manifest.get_level(level);
// filter out sstables which droppable tombstone ratio isn't greater than the defined threshold.
auto e = boost::range::remove_if(sstables, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
std::erase_if(sstables, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s);
});
sstables.erase(e, sstables.end());
if (sstables.empty()) {
continue;
}

View File

@@ -70,7 +70,7 @@ public:
virtual compaction_strategy_type type() const override {
return compaction_strategy_type::leveled;
}
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(const table_state& ts) const override;
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(schema_ptr schema) const override;
virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;

View File

@@ -11,8 +11,6 @@
#include "size_tiered_compaction_strategy.hh"
#include "cql3/statements/property_definitions.hh"
#include <boost/range/algorithm.hpp>
namespace sstables {
static long validate_sstable_size(const std::map<sstring, sstring>& options) {
@@ -242,10 +240,9 @@ size_tiered_compaction_strategy::get_sstables_for_compaction(table_state& table_
// tombstone purge, i.e. less likely to shadow even older data.
for (auto&& sstables : buckets | std::views::reverse) {
// filter out sstables which droppable tombstone ratio isn't greater than the defined threshold.
auto e = boost::range::remove_if(sstables, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
std::erase_if(sstables, [this, compaction_time, &table_s] (const sstables::shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s);
});
sstables.erase(e, sstables.end());
if (sstables.empty()) {
continue;
}

View File

@@ -33,7 +33,6 @@ namespace compaction {
class table_state {
public:
virtual ~table_state() {}
virtual dht::token_range token_range() const noexcept = 0;
virtual const schema_ptr& schema() const noexcept = 0;
// min threshold as defined by table.
virtual unsigned min_compaction_threshold() const noexcept = 0;

View File

@@ -6,8 +6,6 @@
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include <boost/range/algorithm/min_element.hpp>
#include <boost/range/numeric.hpp>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/coroutine/parallel_for_each.hh>
@@ -109,7 +107,7 @@ distribute_reshard_jobs(sstables::sstable_directory::sstable_open_info_vector so
// Choose the stable shard owner with the smallest amount of accumulated work.
// Note that for sstables that need cleanup via resharding, owners may contain
// a single shard.
auto shard_it = boost::min_element(info.owners, [&] (const shard_id& lhs, const shard_id& rhs) {
auto shard_it = std::ranges::min_element(info.owners, [&] (const shard_id& lhs, const shard_id& rhs) {
return destinations[lhs].total_size_smaller(destinations[rhs]);
});
auto& dest = destinations[*shard_it];

View File

@@ -13,10 +13,6 @@
#include "sstables/sstables.hh"
#include "compaction_strategy_state.hh"
#include <boost/range/algorithm/find.hpp>
#include <boost/range/algorithm/remove_if.hpp>
#include <boost/range/algorithm/min_element.hpp>
#include <ranges>
namespace sstables {
@@ -167,7 +163,7 @@ public:
explicit classify_by_timestamp(time_window_compaction_strategy_options options) : _options(std::move(options)) { }
int64_t operator()(api::timestamp_type ts) {
const auto window = time_window_compaction_strategy::get_window_for(_options, ts);
if (const auto it = boost::range::find(_known_windows, window); it != _known_windows.end()) {
if (const auto it = std::ranges::find(_known_windows, window); it != _known_windows.end()) {
std::swap(*it, _known_windows.front());
return window;
}
@@ -400,14 +396,13 @@ time_window_compaction_strategy::get_next_non_expired_sstables(table_state& tabl
// if there is no sstable to compact in standard way, try compacting single sstable whose droppable tombstone
// ratio is greater than threshold.
auto e = boost::range::remove_if(non_expiring_sstables, [this, compaction_time, &table_s] (const shared_sstable& sst) -> bool {
std::erase_if(non_expiring_sstables, [this, compaction_time, &table_s] (const shared_sstable& sst) -> bool {
return !worth_dropping_tombstones(sst, compaction_time, table_s);
});
non_expiring_sstables.erase(e, non_expiring_sstables.end());
if (non_expiring_sstables.empty()) {
return {};
}
auto it = boost::min_element(non_expiring_sstables, [] (auto& i, auto& j) {
auto it = std::ranges::min_element(non_expiring_sstables, [] (auto& i, auto& j) {
return i->get_stats_metadata().min_timestamp < j->get_stats_metadata().min_timestamp;
});
return { *it };

View File

@@ -150,7 +150,7 @@ public:
return compaction_strategy_type::time_window;
}
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(const table_state& ts) const override;
virtual std::unique_ptr<sstable_set_impl> make_sstable_set(schema_ptr schema) const override;
virtual std::unique_ptr<compaction_backlog_tracker::impl> make_backlog_tracker() const override;

View File

@@ -255,9 +255,6 @@ public:
// Returns true iff given prefix has no missing components
bool is_full(managed_bytes_view v) const {
SCYLLA_ASSERT(AllowPrefixes == allow_prefixes::yes);
if (_types.size() == 0) {
return v.empty();
}
return std::distance(begin(v), end(v)) == (ssize_t)_types.size();
}
bool is_empty(managed_bytes_view v) const {

View File

@@ -14,7 +14,9 @@
#include "exceptions/exceptions.hh"
#include "utils/class_registrator.hh"
const sstring compressor::namespace_prefix = "org.apache.cassandra.io.compress.";
sstring compressor::make_name(std::string_view short_name) {
return seastar::format("org.apache.cassandra.io.compress.{}", short_name);
}
class lz4_processor: public compressor {
public:
@@ -66,7 +68,7 @@ compressor::ptr_type compressor::create(const sstring& name, const opt_getter& o
return {};
}
qualified_name qn(namespace_prefix, name);
qualified_name qn(make_name(""), name);
for (auto& c : { lz4, snappy, deflate }) {
if (c->name() == static_cast<const sstring&>(qn)) {
@@ -91,9 +93,9 @@ shared_ptr<compressor> compressor::create(const std::map<sstring, sstring>& opti
return {};
}
thread_local const shared_ptr<compressor> compressor::lz4 = ::make_shared<lz4_processor>(namespace_prefix + "LZ4Compressor");
thread_local const shared_ptr<compressor> compressor::snappy = ::make_shared<snappy_processor>(namespace_prefix + "SnappyCompressor");
thread_local const shared_ptr<compressor> compressor::deflate = ::make_shared<deflate_processor>(namespace_prefix + "DeflateCompressor");
thread_local const shared_ptr<compressor> compressor::lz4 = ::make_shared<lz4_processor>(make_name("LZ4Compressor"));
thread_local const shared_ptr<compressor> compressor::snappy = ::make_shared<snappy_processor>(make_name("SnappyCompressor"));
thread_local const shared_ptr<compressor> compressor::deflate = ::make_shared<deflate_processor>(make_name("DeflateCompressor"));
const sstring compression_parameters::SSTABLE_COMPRESSION = "sstable_compression";
const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_in_kb";

View File

@@ -69,7 +69,7 @@ public:
static thread_local const ptr_type snappy;
static thread_local const ptr_type deflate;
static const sstring namespace_prefix;
static sstring make_name(std::string_view short_name);
};
template<typename BaseType, typename... Args>

View File

@@ -15,6 +15,7 @@
#include "types/map.hh"
#include "types/set.hh"
#include "types/tuple.hh"
#include "types/vector.hh"
#include "types/user.hh"
#include "utils/big_decimal.hh"
@@ -170,6 +171,7 @@ template <typename Func> concept CanHandleAllTypes = requires(Func f) {
{ f(*static_cast<const utf8_type_impl*>(nullptr)) } -> std::same_as<visit_ret_type<Func>>;
{ f(*static_cast<const uuid_type_impl*>(nullptr)) } -> std::same_as<visit_ret_type<Func>>;
{ f(*static_cast<const varint_type_impl*>(nullptr)) } -> std::same_as<visit_ret_type<Func>>;
{ f(*static_cast<const vector_type_impl*>(nullptr)) } -> std::same_as<visit_ret_type<Func>>;
};
template<typename Func>
@@ -232,6 +234,8 @@ inline visit_ret_type<Func> visit(const abstract_type& t, Func&& f) {
return f(*static_cast<const uuid_type_impl*>(&t));
case abstract_type::kind::varint:
return f(*static_cast<const varint_type_impl*>(&t));
case abstract_type::kind::vector:
return f(*static_cast<const vector_type_impl*>(&t));
}
__builtin_unreachable();
}

View File

@@ -473,6 +473,7 @@ commitlog_total_space_in_mb: -1
# certficate_revocation_list: <not set>
# require_client_auth: False
# priority_string: <not set, use default>
# enable_session_tickets: <default false>
# internode_compression controls whether traffic between nodes is
# compressed.
@@ -662,6 +663,153 @@ maintenance_socket: ignore
# maximum_replication_factor_warn_threshold: -1
# maximum_replication_factor_fail_threshold: -1
#
# System information encryption settings
#
# If enabled, system tables that may contain sensitive information (system.batchlog,
# system.paxos), hints files and commit logs are encrypted with the
# encryption settings below.
#
# When enabling system table encryption on a node with existing data, run
# `nodetool upgradesstables -a` on the listed tables to encrypt existing data.
#
# When tracing is enabled, sensitive info will be written into the tables in the
# system_traces keyspace. Those tables should be configured to encrypt their data
# on disk.
#
# It is recommended to use remote encryption keys from a KMIP server/KMS when using
# Transparent Data Encryption (TDE) features.
# Local key support is provided when a KMIP server/KMS is not available.
#
# See the scylla documentation for more info on available key providers and
# their properties.
#
# system_info_encryption:
# enabled: true
# cipher_algorithm: AES
# secret_key_strength: 128
# key_provider: LocalFileSystemKeyProviderFactory
# secret_key_file: <key file>
#
# system_info_encryption:
# enabled: true
# cipher_algorithm: AES
# secret_key_strength: 128
# key_provider: KmipKeyProviderFactory
# kmip_host: <kmip host group>
# template_name: <kmip key template name> (optional)
# key_namespace: <kmip key namespace> (optional)
#
#
# The directory where system keys are kept
# This directory should have 700 permissions and belong to the scylla user
#
# system_key_directory: /etc/scylla/conf/resources/system_keys
#
#
# KMIP host(s).
#
# The unique name of kmip host/cluster that can be referenced in table schema.
#
# host.yourdomain.com={ hosts=[<host1>, <host2>...], keyfile=/path/to/keyfile, truststore=/path/to/truststore.pem, key_cache_millis=<cache ms>, timeout=<timeout ms> }:...
#
# The KMIP connection management only supports failover, so all requests will go through a
# single KMIP server. There is no load balancing, as no KMIP servers (at the time of this writing)
# support read replication, or other strategies for availability.
#
# Hosts are tried in the order they appear here. Add them in the same sequence they'll fail over in.
#
# KMIP requests will fail over/retry 'max_command_retries' times (default 3)
#
# kmip_hosts:
# <name>:
# hosts: <address1[:port]> [, <address2[:port]>...]
# certificate: <identifying certificate> (optional)
# keyfile: <identifying key> (optional)
# truststore: <truststore for SSL connection> (optional)
# priority_string: <kmip tls priority string> (optional)
# username: <login> (optional>
# password: <password> (optional)
# max_command_retries: <int> (optional; default 3)
# key_cache_expiry: <key cache expiry period>
# key_cache_refresh: <key cache refresh/prune period>
# <name>:
# ...
#
#
# KMS host(s).
#
# The unique name of kms host/account config that can be referenced in table schema.
#
# host.yourdomain.com={ endpoint=<http(s)://host[:port]>, aws_access_key_id=<AWS access id>, aws_secret_access_key=<AWS secret key>, aws_region=<AWS region>, master_key=<alias or id>, keyfile=/path/to/keyfile, truststore=/path/to/truststore.pem, key_cache_millis=<cache ms>, timeout=<timeout ms> }:...
#
# Actual connection can be either an explicit endpoint (<host>:<port>), or selected automatic via aws_region.
#
# Authentication can be explicit with aws_access_key_id and aws_secret_access_key. Either secret or both can be omitted
# in which case the provider will try to read them from AWS credentials in ~/.aws/credentials. If aws_profile is set, the
# credentials in this section is used.
#
# master_key is an AWS KMS key id or alias from which all keys used for actual encryption of scylla data will be derived.
# This key must be pre-created with access policy allowing the above AWS id Encrypt, Decrypt and GenerateDataKey operations.
#
# kms_hosts:
# <name>:
# endpoint: http(s)://<host>(:port) (optional)
# aws_region: <aws region> (optional)
# aws_access_key_id: <aws access key id> (optional)
# aws_secret_access_key: <aws secret access key> (optional)
# aws_profile: <aws credentials profile> (optional)
# aws_use_ec2_credentials: <bool> (default false) If true, KMS queries will use the credentials provided by ec2 instance role metadata as initial access key.
# aws_use_ec2_region: <bool> (default false) If true, KMS queries will use the AWS region indicated by ec2 instance metadata
# aws_assume_role_arn: <aws role arn> (optional) If set, any KMS query will first attempt to assume this role.
# master_key: <named KMS key for encrypting data keys> (required)
# certificate: <identifying certificate> (optional)
# keyfile: <identifying key> (optional)
# truststore: <truststore for SSL connection> (optional)
# priority_string: <kmip tls priority string> (optional)
# key_cache_expiry: <key cache expiry period>
# key_cache_refresh: <key cache refresh/prune period>
# <name>:
# ...
#
#
# Server-global user information encryption settings
#
# If enabled, all user tables are encrypted with the
# encryption settings below, unless the table has local scylla_encryption_options
# specified.
#
# When enabling user table encryption on a node with existing data, run
# `nodetool upgradesstables -a` on all user tables to encrypt existing data.
#
# It is recommended to use remote encryption keys from a KMIP server or KMS when using
# Transparent Data Encryption (TDE) features.
# Local key support is provided when a KMIP server/KMS is not available.
#
# See the scylla documentation for more info on available key providers and
# their properties.
#
# user_info_encryption:
# enabled: true
# cipher_algorithm: AES
# secret_key_strength: 128
# key_provider: LocalFileSystemKeyProviderFactory
# secret_key_file: <key file>
#
# user_info_encryption:
# enabled: true
# cipher_algorithm: AES
# secret_key_strength: 128
# key_provider: KmipKeyProviderFactory
# kmip_host: <kmip host group>
# template_name: <kmip key template name> (optional)
# key_namespace: <kmip key namespace> (optional)
#
# Guardrails to warn about or disallow creating a keyspace with specific replication strategy.
# Each of these 2 settings is a list storing replication strategies considered harmful.
# The replication strategies to choose from are:
@@ -677,9 +825,7 @@ maintenance_socket: ignore
# Guardrail to enable the deprecated feature of CREATE TABLE WITH COMPACT STORAGE.
# enable_create_table_with_compact_storage: false
# Control tablets for new keyspaces.
# Can be set to: disabled|enabled
#
# Enable tablets for new keyspaces.
# When enabled, newly created keyspaces will have tablets enabled by default.
# That can be explicitly disabled in the CREATE KEYSPACE query
# by using the `tablets = {'enabled': false}` replication option.
@@ -688,15 +834,6 @@ maintenance_socket: ignore
# unless tablets are explicitly enabled in the CREATE KEYSPACE query
# by using the `tablets = {'enabled': true}` replication option.
#
# When set to `enforced`, newly created keyspaces will always have tablets enabled by default.
# This prevents explicitly disabling tablets in the CREATE KEYSPACE query
# using the `tablets = {'enabled': false}` replication option.
# It also mandates a replication strategy supporting tablets, like
# NetworkTopologyStrategy
#
# Note that creating keyspaces with tablets enabled or disabled is irreversible.
# The `tablets` option cannot be changed using `ALTER KEYSPACE`.
tablets_mode_for_new_keyspaces: enabled
# Enforce RF-rack-valid keyspaces.
rf_rack_valid_keyspaces: false
enable_tablets: true

View File

@@ -795,7 +795,6 @@ scylla_core = (['message/messaging_service.cc',
'frozen_schema.cc',
'bytes.cc',
'timeout_config.cc',
'row_cache.cc',
'schema_mutations.cc',
'generic_server.cc',
'utils/alien_worker.cc',
@@ -813,10 +812,10 @@ scylla_core = (['message/messaging_service.cc',
'utils/rjson.cc',
'utils/human_readable.cc',
'utils/histogram_metrics_helper.cc',
'utils/io-wrappers.cc',
'utils/on_internal_error.cc',
'utils/pretty_printers.cc',
'utils/stream_compressor.cc',
'utils/labels.cc',
'converting_mutation_partition_applier.cc',
'readers/combined.cc',
'readers/multishard.cc',
@@ -976,44 +975,45 @@ scylla_core = (['message/messaging_service.cc',
'cql3/restrictions/statement_restrictions.cc',
'cql3/result_set.cc',
'cql3/prepare_context.cc',
'db/consistency_level.cc',
'db/system_keyspace.cc',
'db/virtual_table.cc',
'db/virtual_tables.cc',
'db/system_distributed_keyspace.cc',
'db/size_estimates_virtual_reader.cc',
'db/schema_applier.cc',
'db/schema_tables.cc',
'db/cql_type_parser.cc',
'db/legacy_schema_migrator.cc',
'db/batchlog_manager.cc',
'db/commitlog/commitlog.cc',
'db/commitlog/commitlog_replayer.cc',
'db/commitlog/commitlog_entry.cc',
'db/commitlog/commitlog_replayer.cc',
'db/config.cc',
'db/consistency_level.cc',
'db/cql_type_parser.cc',
'db/data_listeners.cc',
'db/extensions.cc',
'db/functions/function.cc',
'db/heat_load_balance.cc',
'db/hints/host_filter.cc',
'db/hints/internal/hint_endpoint_manager.cc',
'db/hints/internal/hint_sender.cc',
'db/hints/internal/hint_storage.cc',
'db/hints/manager.cc',
'db/hints/resource_manager.cc',
'db/hints/host_filter.cc',
'db/hints/sync_point.cc',
'db/config.cc',
'db/extensions.cc',
'db/heat_load_balance.cc',
'db/large_data_handler.cc',
'db/corrupt_data_handler.cc',
'db/legacy_schema_migrator.cc',
'db/marshal/type_parser.cc',
'db/batchlog_manager.cc',
'db/per_partition_rate_limit_options.cc',
'db/rate_limiter.cc',
'db/row_cache.cc',
'db/schema_applier.cc',
'db/schema_tables.cc',
'db/size_estimates_virtual_reader.cc',
'db/snapshot-ctl.cc',
'db/snapshot/backup_task.cc',
'db/sstables-format-selector.cc',
'db/system_distributed_keyspace.cc',
'db/system_keyspace.cc',
'db/tags/utils.cc',
'db/view/row_locking.cc',
'db/view/view.cc',
'db/view/view_update_generator.cc',
'db/view/row_locking.cc',
'db/sstables-format-selector.cc',
'db/snapshot-ctl.cc',
'db/rate_limiter.cc',
'db/per_partition_rate_limit_options.cc',
'db/snapshot/backup_task.cc',
'db/virtual_table.cc',
'db/virtual_tables.cc',
'db/tablet_options.cc',
'index/secondary_index_manager.cc',
'index/secondary_index.cc',
'utils/UUID_gen.cc',
@@ -1029,10 +1029,14 @@ scylla_core = (['message/messaging_service.cc',
'utils/multiprecision_int.cc',
'utils/gz/crc_combine.cc',
'utils/gz/crc_combine_table.cc',
'utils/http.cc',
'utils/s3/aws_error.cc',
'utils/s3/client.cc',
'utils/s3/retry_strategy.cc',
'utils/s3/credentials_providers/aws_credentials_provider.cc',
'utils/s3/credentials_providers/environment_aws_credentials_provider.cc',
'utils/s3/credentials_providers/instance_profile_credentials_provider.cc',
'utils/s3/credentials_providers/sts_assume_role_credentials_provider.cc',
'utils/s3/credentials_providers/aws_credentials_provider_chain.cc',
'utils/advanced_rpc_compressor.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
@@ -1102,7 +1106,7 @@ scylla_core = (['message/messaging_service.cc',
'utils/lister.cc',
'repair/repair.cc',
'repair/row_level.cc',
'streaming/table_check.cc',
'repair/table_check.cc',
'exceptions/exceptions.cc',
'auth/allow_all_authenticator.cc',
'auth/allow_all_authorizer.cc',
@@ -1324,7 +1328,6 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/replica_exception.idl.hh',
'idl/per_partition_rate_limit_info.idl.hh',
'idl/position_in_partition.idl.hh',
'idl/full_position.idl.hh',
'idl/experimental/broadcast_tables_lang.idl.hh',
'idl/storage_service.idl.hh',
'idl/join_node.idl.hh',
@@ -1332,7 +1335,7 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/gossip.idl.hh',
'idl/migration_manager.idl.hh',
"idl/node_ops.idl.hh",
"idl/tasks.idl.hh"
]
scylla_tests_generic_dependencies = [
@@ -1349,6 +1352,7 @@ scylla_tests_dependencies = scylla_core + alternator + idls + scylla_tests_gener
'test/lib/cql_assertions.cc',
'test/lib/result_set_assertions.cc',
'test/lib/mutation_source_test.cc',
'test/lib/mutation_assertions.cc',
'test/lib/sstable_utils.cc',
'test/lib/data_model.cc',
'test/lib/exception_utils.cc',
@@ -1546,13 +1550,14 @@ deps['test/boost/bytes_ostream_test'] = [
"bytes.cc",
"utils/managed_bytes.cc",
"utils/logalloc.cc",
"utils/labels.cc",
"utils/dynamic_bitset.cc",
"test/lib/log.cc",
]
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'utils/labels.cc']
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
deps['test/boost/estimated_histogram_test'] = ['test/boost/estimated_histogram_test.cc']
deps['test/boost/summary_test'] = ['test/boost/summary_test.cc']
@@ -1661,7 +1666,14 @@ defines = ' '.join(['-D' + d for d in defines])
globals().update(vars(args))
total_memory = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')
# assuming each link job takes around 7GiB of memory without LTO
link_pool_depth = max(int(total_memory / 7e9), 1)
if args.lto:
# ThinLTO provides its own parallel linking, use 16GiB for RAM size used
# by each link job
depth_with_lto = max(int(total_memory / 16e9), 2)
if depth_with_lto < link_pool_depth:
link_pool_depth = depth_with_lto
selected_modes = args.selected_modes or modes.keys()
default_modes = args.selected_modes or [mode for mode, mode_cfg in modes.items() if mode_cfg["default"]]
@@ -2141,13 +2153,14 @@ def get_extra_cxxflags(mode, mode_config, cxx, debuginfo):
if debuginfo and mode_config['can_have_debug_info']:
cxxflags += ['-g', '-gz']
# Since AssignmentTracking was enabled by default in clang
# (llvm/llvm-project@de6da6ad55d3ca945195d1cb109cb8efdf40a52a)
# coroutine frame debugging info (`coro_frame_ty`) is broken.
#
# It seems that we aren't losing much by disabling AssigmentTracking,
# so for now we choose to disable it to get `coro_frame_ty` back.
cxxflags.append('-Xclang -fexperimental-assignment-tracking=disabled')
if 'clang' in cxx:
# Since AssignmentTracking was enabled by default in clang
# (llvm/llvm-project@de6da6ad55d3ca945195d1cb109cb8efdf40a52a)
# coroutine frame debugging info (`coro_frame_ty`) is broken.
#
# It seems that we aren't losing much by disabling AssigmentTracking,
# so for now we choose to disable it to get `coro_frame_ty` back.
cxxflags.append('-Xclang -fexperimental-assignment-tracking=disabled')
return cxxflags

View File

@@ -709,23 +709,17 @@ batchStatement returns [std::unique_ptr<cql3::statements::raw::batch_statement>
: K_BEGIN
( K_UNLOGGED { type = btype::UNLOGGED; } | K_COUNTER { type = btype::COUNTER; } )?
K_BATCH ( usingClause[attrs] )?
( s=batchStatementObjective ';'?
{
auto&& stmt = *$s.statement;
stmt->add_raw(sstring{$s.text});
statements.push_back(std::move(stmt));
} )*
( s=batchStatementObjective ';'? { statements.push_back(std::move(s)); } )*
K_APPLY K_BATCH
{
$expr = std::make_unique<cql3::statements::raw::batch_statement>(type, std::move(attrs), std::move(statements));
}
;
batchStatementObjective returns [::lw_shared_ptr<std::unique_ptr<cql3::statements::raw::modification_statement>> statement]
@init { using original_ret_type = std::unique_ptr<cql3::statements::raw::modification_statement>; }
: i=insertStatement { $statement = make_lw_shared<original_ret_type>(std::move(i)); }
| u=updateStatement { $statement = make_lw_shared<original_ret_type>(std::move(u)); }
| d=deleteStatement { $statement = make_lw_shared<original_ret_type>(std::move(d)); }
batchStatementObjective returns [std::unique_ptr<cql3::statements::raw::modification_statement> statement]
: i=insertStatement { $statement = std::move(i); }
| u=updateStatement { $statement = std::move(u); }
| d=deleteStatement { $statement = std::move(d); }
;
dropAggregateStatement returns [std::unique_ptr<cql3::statements::drop_aggregate_statement> expr]
@@ -1627,7 +1621,7 @@ collectionLiteral returns [uexpression value]
@init{ std::vector<expression> l; }
: '['
( t1=term { l.push_back(std::move(t1)); } ( ',' tn=term { l.push_back(std::move(tn)); } )* )?
']' { $value = collection_constructor{collection_constructor::style_type::list, std::move(l)}; }
']' { $value = collection_constructor{collection_constructor::style_type::list_or_vector, std::move(l)}; }
| '{' t=term v=setOrMapLiteral[t] { $value = std::move(v); } '}'
// Note that we have an ambiguity between maps and set for "{}". So we force it to a set literal,
// and deal with it later based on the type of the column (SetLiteral.java).
@@ -1771,7 +1765,7 @@ subscriptExpr returns [uexpression e]
;
singleColumnInValuesOrMarkerExpr returns [uexpression e]
: values=singleColumnInValues { e = collection_constructor{collection_constructor::style_type::list, std::move(values)}; }
: values=singleColumnInValues { e = collection_constructor{collection_constructor::style_type::list_or_vector, std::move(values)}; }
| m=marker { e = std::move(m); }
;
@@ -1849,7 +1843,7 @@ relation returns [uexpression e]
| name=cident K_IN in_values=singleColumnInValues
{ $e = binary_operator(unresolved_identifier{std::move(name)}, oper_t::IN,
collection_constructor {
.style = collection_constructor::style_type::list,
.style = collection_constructor::style_type::list_or_vector,
.elements = std::move(in_values)
}); }
| name=cident K_NOT K_IN marker1=marker
@@ -1857,7 +1851,7 @@ relation returns [uexpression e]
| name=cident K_NOT K_IN in_values=singleColumnInValues
{ $e = binary_operator(unresolved_identifier{std::move(name)}, oper_t::NOT_IN,
collection_constructor {
.style = collection_constructor::style_type::list,
.style = collection_constructor::style_type::list_or_vector,
.elements = std::move(in_values)
}); }
| name=cident K_CONTAINS { rt = oper_t::CONTAINS; } (K_KEY { rt = oper_t::CONTAINS_KEY; })?
@@ -1871,7 +1865,7 @@ relation returns [uexpression e]
ids,
oper_t::IN,
collection_constructor {
.style = collection_constructor::style_type::list,
.style = collection_constructor::style_type::list_or_vector,
.elements = std::vector<expression>()
}
);
@@ -1890,7 +1884,7 @@ relation returns [uexpression e]
ids,
oper_t::IN,
collection_constructor {
.style = collection_constructor::style_type::list,
.style = collection_constructor::style_type::list_or_vector,
.elements = std::move(literals)
}
);
@@ -1901,7 +1895,7 @@ relation returns [uexpression e]
ids,
oper_t::IN,
collection_constructor {
.style = collection_constructor::style_type::list,
.style = collection_constructor::style_type::list_or_vector,
.elements = std::move(markers)
}
);
@@ -1957,6 +1951,7 @@ comparator_type [bool internal] returns [shared_ptr<cql3_type::raw> t]
: n=native_or_internal_type[internal] { $t = cql3_type::raw::from(n); }
| c=collection_type[internal] { $t = c; }
| tt=tuple_type[internal] { $t = tt; }
| vt=vector_type { $t = vt; }
| id=userTypeName { $t = cql3::cql3_type::raw::user_type(std::move(id)); }
| K_FROZEN '<' f=comparator_type[internal] '>'
{
@@ -2041,6 +2036,15 @@ tuple_type [bool internal] returns [shared_ptr<cql3::cql3_type::raw> t]
'>' { $t = cql3::cql3_type::raw::tuple(std::move(types)); }
;
vector_type returns [shared_ptr<cql3::cql3_type::raw> pt]
: K_VECTOR '<' t=comparator_type[false] ',' d=INTEGER '>'
{
if ($d.text[0] == '-')
throw exceptions::invalid_request_exception("Vectors must have a dimension greater than 0");
$pt = cql3::cql3_type::raw::vector(t, std::stoul($d.text));
}
;
username returns [sstring str]
: t=IDENT { $str = $t.text; }
| t=STRING_LITERAL { $str = $t.text; }
@@ -2106,6 +2110,7 @@ basic_unreserved_keyword returns [sstring str]
| K_STATIC
| K_FROZEN
| K_TUPLE
| K_VECTOR
| K_FUNCTION
| K_FUNCTIONS
| K_AGGREGATE
@@ -2298,6 +2303,7 @@ K_LIST: L I S T;
K_NAN: N A N;
K_INFINITY: I N F I N I T Y;
K_TUPLE: T U P L E;
K_VECTOR: V E C T O R;
K_TRIGGER: T R I G G E R;
K_STATIC: S T A T I C;

View File

@@ -20,6 +20,7 @@
#include "types/map.hh"
#include "types/set.hh"
#include "types/list.hh"
#include "types/vector.hh"
#include "concrete_types.hh"
namespace cql3 {
@@ -49,6 +50,7 @@ static cql3_type::kind get_cql3_kind(const abstract_type& t) {
cql3_type::kind operator()(const varint_type_impl&) { return cql3_type::kind::VARINT; }
cql3_type::kind operator()(const reversed_type_impl& r) { return get_cql3_kind(*r.underlying_type()); }
cql3_type::kind operator()(const tuple_type_impl&) { SCYLLA_ASSERT(0 && "no kind for this type"); }
cql3_type::kind operator()(const vector_type_impl&) { SCYLLA_ASSERT(0 && "no kind for this type"); }
cql3_type::kind operator()(const collection_type_impl&) { SCYLLA_ASSERT(0 && "no kind for this type"); }
};
return visit(t, visitor{});
@@ -303,6 +305,56 @@ public:
}
};
class cql3_type::raw_vector : public raw {
shared_ptr<raw> _type;
size_t _dimension;
// This limitation is acquired from the maximum number of dimensions in OpenSearch.
static constexpr size_t MAX_VECTOR_DIMENSION = 16000;
virtual sstring to_string() const override {
return seastar::format("vector<{}, {}>", _type, _dimension);
}
public:
raw_vector(shared_ptr<raw> type, size_t dimension)
: _type(std::move(type)), _dimension(dimension) {
}
virtual bool supports_freezing() const override {
return true;
}
virtual bool is_collection() const override {
return false;
}
virtual void freeze() override {
_frozen = true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, const data_dictionary::user_types_metadata& user_types) override {
if (!is_frozen()) {
freeze();
}
if (_dimension == 0) {
throw exceptions::invalid_request_exception("Vectors must have a dimension greater than 0");
}
if (_dimension > MAX_VECTOR_DIMENSION) {
throw exceptions::invalid_request_exception(
format("Vectors must have a dimension less than or equal to {}", MAX_VECTOR_DIMENSION));
}
return vector_type_impl::get_instance(_type->prepare_internal(keyspace, user_types).get_type(), _dimension)->as_cql3_type();
}
bool references_user_type(const sstring& name) const override {
return _type->references_user_type(name);
}
};
bool
cql3_type::raw::is_collection() const {
return false;
@@ -364,6 +416,11 @@ cql3_type::raw::tuple(std::vector<shared_ptr<raw>> ts) {
return ::make_shared<raw_tuple>(std::move(ts));
}
shared_ptr<cql3_type::raw>
cql3_type::raw::vector(shared_ptr<raw> t, size_t dimension) {
return ::make_shared<raw_vector>(std::move(t), dimension);
}
shared_ptr<cql3_type::raw>
cql3_type::raw::frozen(shared_ptr<raw> t) {
t->freeze();

Some files were not shown because too many files have changed in this diff Show More