Commit Graph

46964 Commits

Author SHA1 Message Date
Yaron Kaikov
dcbc6c839d ./github/scripts/auto-backport.py: don't remove backport label when backport process has an error
Today, when the `Fixes` prefix is missing or the developer is not a collaborator with `scylladbbot` we remove the backport labels to prevent the process from starting and notifying the developers.

Developers are worried that removing these backport labels will cause us to forget we need to do these backports. @nyh suggested to add a `scylladbbot/backport_error` label instead

Applied those changes, so when a `Fixes` prefix is missing we will add a `scylladbbot/backport_error` label and stop the process

When a user doesn't accept the invite we will still open the PR but he will not be assigned and will not be able to edit the branch when we have conflicts

Fixes: https://github.com/scylladb/scylla-pkg/issues/4898
Fixes: https://github.com/scylladb/scylla-pkg/issues/4897
2025-03-18 13:58:59 +02:00
Avi Kivity
f18e8edcb7 Merge 'dist/docker: switch to UBI9' from Takuya ASADA
Switch container base image to UBI9, and make it ready for Red Hat
OpenShift Certification.

Fixes https://github.com/scylladb/scylla-pkg/issues/4858

Closes scylladb/scylladb#22910

* github.com:scylladb/scylladb:
  dist/docker: run the container as non-root user
  dist/docker: switch to UBI9
2025-03-10 15:33:30 +02:00
Luis Freitas
09e790d5af .github: Update github action for triggering next gating
Before we were using a marketplace Github action which had some limitations.
With this pull request we are updating the github action using curl option which will gives us full control of the flow instead of relying on pre made github action.

Fixes: scylladb#23088

Closes scylladb/scylladb#23215
2025-03-10 14:38:08 +02:00
Piotr Szymaniak
b6ba573dfe HACKING.md: Provide step-by-step support to enable development with CLion
Claim that building with CMake files is just 'not supported' instead of
not intended, especially that there are attempts to enable this.

Remove the obsolete mention of the `FOR_IDE` flag.

Closes scylladb/scylladb#22890
2025-03-09 16:22:24 +02:00
Ernest Zaslavsky
6a3cef5703 metadata: Correct "DESCRIBE" output for keyspace metadata
Update the "DESCRIBE" command output to accurately display `tablet` settings in keyspace metadata.

Closes scylladb/scylladb#23056
2025-03-09 14:50:08 +02:00
Anna Stuchlik
9ac0aa7bba doc: zero-token nodes and Arbiter DC
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.

The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.

We need separate documents because zero-token nodes may be used
for other purposes in the future.

In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.

Refs https://github.com/scylladb/scylladb/pull/19684

Fixes https://github.com/scylladb/scylladb/issues/20294

Closes scylladb/scylladb#21348
2025-03-07 16:39:02 +01:00
Kefu Chai
2a9966a20e gms: Fix fmt formatter for gossip_digest_sync
In commit 4812a57f, the fmt-based formatter for gossip_digest_syn had
formatting code for cluster_id, partitioner, and group0_id
accidentally commented out, preventing these fields from being included
in the output. This commit restores the formatting by uncommenting the
code, ensuring full visibility of all fields in the gossip_digest_syn
message when logging permits.

This fixes a regression introduced in 4812a57f, which obscured these
fields and reduced debugging insight. Backporting is recommended for
improved observability.

Fixes #23142
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23155
2025-03-07 15:36:03 +01:00
Robert Bindar
27f2d64725 Remove object storage config credentials provider
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#22951
2025-03-07 10:40:58 +03:00
Luis Freitas
84b30d11ec .github: trigger Jenkins job using github action
This action will help preventing next-trigger for running every 15 minutes.

This action will run on push for a specific branch (next, next-enterprise, 2024.x, x.x)

Fixes: scylladb#23088

update action

Closes scylladb/scylladb#23141
2025-03-07 06:41:58 +02:00
Avi Kivity
28906c9261 Merge 'scylla-sstable: introduce the query command' from Botond Dénes
The scylla-sstable dump-* command suite has proven invaluable  in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data`  is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.

Select everything from a table:

    $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
     keyspace_name                 | durable_writes | replication
    -------------------------------+----------------+-------------------------------------------------------------------------------------
            system_replicated_keys |           true |                         ({class : org.apache.cassandra.locator.EverywhereStrategy})
                       system_auth |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
                     system_schema |           true |                              ({class : org.apache.cassandra.locator.LocalStrategy})
                system_distributed |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
                            system |           true |                              ({class : org.apache.cassandra.locator.LocalStrategy})
                                ks |           true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
                     system_traces |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
     system_distributed_everywhere |           true |                         ({class : org.apache.cassandra.locator.EverywhereStrategy})

Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):

    $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
    [
      {
        "keyspace_name": "system_schema",
        "durable_writes": true,
        "replication": {
          "class": "org.apache.cassandra.locator.LocalStrategy"
        }
      },
      {
        "keyspace_name": "system",
        "durable_writes": true,
        "replication": {
          "class": "org.apache.cassandra.locator.LocalStrategy"
        }
      }
    ]

Select a specific field in a specific partition using the command-line:

    $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
     replication
    -------------------------------------------------------------------------------------
     ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

Select a specific field in a specific partition using ``--query-file``:

    $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
    $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
     replication
    -------------------------------------------------------------------------------------
     ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

New functionality: no backport needed.

Closes scylladb/scylladb#22007

* github.com:scylladb/scylladb:
  docs/operating-scylla: document scylla-sstable query
  test/cqlpy/test_tools.py: add tests for scylla-sstable query
  test/cqlpy/test_tools.py: make scylla_sstable() return table name also
  scylla-sstable: introduce the query command
  tools/utils: get_selected_operation(): use std::string for operation_options
  utils/rjson: streaming_writer: add RawValue()
  cql3/type_json: add to_json_type()
  test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
2025-03-06 13:42:45 +02:00
Botond Dénes
1139cf3a98 Merge 'Speed up (and generalize) the way API calculates sstable disk usage' from Pavel Emelyanov
There are several API endpoints that walk a specific list of sstables and sum up their bytes_on_disk() values. All those endpoints accumulate a map of sstable names to their sizes, then squashe the maps together and, finally, sum up the map values to report it back. Maintaining these intermediate collections is the waste of CPU and memory, the usage values can be summed up instantly.

Also add a test for per-cf endpoints to validate the change, and generalize the helper functions while at it.

Closes scylladb/scylladb#23143

* github.com:scylladb/scylladb:
  api: Generalize disk space counting for table and system
  api: Use map_reduce_cf_raw() overload with table name
  api: Don't collect sstables map to count disk space usage
  test: Add unit test for total/live sstable sizes
2025-03-06 11:26:35 +02:00
Raphael S. Carvalho
fedd838b9d replica: Fix race of some operations like cleanup with snapshot
There are two semaphores in table for synchronizing changes to sstable list:

sstable_set_mutation_sem: used to serialize two concurrent operations updating
the list, to prevent them from racing with each other.

sstable_deletion_sem: A deletion guard, used to serialize deletion and
iteration over the list, to prevent iteration from finding deleted files on
disk.

they're always taken in this order to avoid deadlocks:
sstable_set_mutation_sem -> sstable_deletion_sem.

problem:

A = tablet cleanup
B = take_snapshot()

1) A acquires sstable_set_mutation_sem for updating list
2) A acquires sstable_deletion_sem, then delete sstable before updating list
3) A releases sstable_deletion_sem, then yield
4) B acquires sstable_deletion_sem
5) B iterates through list and bumps sstable deleted in step 2
6) B fails since it cannot find the file on disk

Initial reaction is to say that no procedure must delete sstable before
updating the list, that's true.

But we want a iteration, running concurrently to cleanup, to not find sstables
being removed from the system. Otherwise, e.g. snapshot works with sstables
of a tablet that was just cleaned up. That's achieved by serializing iteration
with list update.
Since sstable_deletion_sem is used within the scope of deletion only, it's
useless for achieving this. Cleanup could acquire the deletion sem when
preparing list updates, and then pass the "permit" to deletion function, but
then sstable_deletion_sem would essentially become sstable_set_mutation_sem,
which was created exactly to protect the list update.

That being said, it makes sense to merge both semaphores. Also things become
easier to reason about, and we don't have to worry about deadlocks anymore.

The deletion goes through sstable_list_builder, which holds a permit throughout
its lifetime, which guarantees that list updates and deletion are atomic to
other concurrent operations. The interface becomes less error prone with that.
It allowed us to find discard_sstables() was doing deletion without any permit,
meaning another race could happen between truncate and snapshot.

So we're fixing race of (truncate|cleanup) with take_snapshot, as far as we
know. It's possible another unknown races are fixed as well.

Fixes #23049.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23117
2025-03-06 11:00:48 +02:00
Pavel Emelyanov
86b3e9b50b code: Move checked-file-impl.hh to util/
fixes: #22100

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23123
2025-03-06 10:22:05 +02:00
Petr Hála
f3c3eb6ae3 doc: Fix object_storage_config_file option
It needs to use underscores, not dash

Closes scylladb/scylladb#23161
2025-03-06 10:30:51 +03:00
Pavel Emelyanov
e7d1ea3ab6 commitlog: Use shorter input stream creation overload
There's one that doesn't need the offset argument when it's 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23140
2025-03-06 08:06:42 +01:00
Botond Dénes
49d6bf8947 Merge 'main: safely check stop_signal in-between starting services' from Benny Halevy
To simplify aborting scylla while starting the services,
add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

This patch prevents gate_closed_exception to escape handling
when start-up is aborted early with the stop signal,
causing https://github.com/scylladb/scylladb/issues/23153
The regression is apparently due to a25c3eaa1c

Fixes https://github.com/scylladb/scylladb/issues/23153

* Requires backport to 2025.1 due to a25c3eaa1c

Closes scylladb/scylladb#23103

* github.com:scylladb/scylladb:
  main: add checkpoints
  main: safely check stop_signal in-between starting services
  main: move prometheus start message
  main: move per-shard database start message
2025-03-06 08:28:29 +02:00
Takuya ASADA
781dec5852 dist/docker: run the container as non-root user
Since it is requirement for Red Hat OpenShift Certification, we need to
run the container as non-root user.

Related scylladb/scylla-pkg#4858

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2025-03-05 23:39:56 +09:00
Takuya ASADA
1abf981a73 dist/docker: switch to UBI9
Switch container base image to UBI9, to prepare for Red Hat OpenShift
Certification.

Fixes scylladb/scylla-pkg#4858

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2025-03-05 23:39:56 +09:00
Benny Halevy
b6705ad48b main: add checkpoints
Before starting significant services that didn't
have a corresponding call to supervisor::notify
before them.

Fixes #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:29:34 +02:00
Benny Halevy
feef7d3fa1 main: safely check stop_signal in-between starting services
To simplify aborting scylla while starting the services,
Add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

Refs #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:15:17 +02:00
Benny Halevy
282ff344db main: move prometheus start message
The `prometheus_server` is started only conditionally
but the notification message is sent and logged
unconditionally.
Move it inside the condtional code block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:09:09 +02:00
Benny Halevy
23433f593c main: move per-shard database start message
It is now logged out of place, so move it to right before
calling `start` on every database shard.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:09:09 +02:00
Nadav Har'El
e0f24c03e7 Merge 'test.py: merge all 'Topology' suite types int one folder 'cluster'' from Artsiom Mishuta
Now that we support suite subfolders, there is no
need to create an own suite for object_store and auth_cluster, topology, topology_custom.
this PR merge all these folders into one: 'cluster"

this pr also introduce and apply 'prepare_3_nodes_cluster' fixture  that  allow preparing non-dirty 3 nodes cluster
that can be reused between tests(for tests that was in topology folder)

number of tests in master
release -3461
dev       -3472
debug   -3446

number of tests in this PR
release -3460
dev       -3471
debug   -3445

There is a minus one test in each mode because It was 2 test_topology_failure_recovery files(topology and topology_custom) with the same utility functions but different test cases. This PR merged them into one

Closes scylladb/scylladb#22917

* github.com:scylladb/scylladb:
  test.py: merge object_store into cluster folder
  test.py: merge auth_cluster into cluster folter
  test.py: rename topology_custom folder to cluster
  test.py: merge topology test suite into topology_custom
  test.py delete conftest in topology_custom
  test.py apply prepare_3_nodes_cluster in topology
  test.py: introduce prepare_3_nodes_cluster marker
2025-03-04 19:26:32 +02:00
Pavel Emelyanov
c084de1406 api: Generalize disk space counting for table and system
Now when the bodies of both map-reduce reducers are the same, they can
be generalized with each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:56:16 +03:00
Pavel Emelyanov
4e2abba5a1 api: Use map_reduce_cf_raw() overload with table name
The existing helper that counds disk space usage for a table map-reduces
the table object "by hand". Its peer that counts the usage for all
tables uses the map_reduce_cf_raw() helper. The latter exists for
specific table as well, so the first counter can benefit from using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:55:05 +03:00
Pavel Emelyanov
b43e2390db api: Don't collect sstables map to count disk space usage
All the API calls that collect disk usage of sstables accumulate
map<sstable name, disk size>, then merges shard maps into one, then
counts the "disk size" values and drops the map itself on the floor.
This is waste of CPU cycles, disk usage can be just summed up along
cf/sstables iterations, no need to accumulate map with names for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:53:42 +03:00
Pavel Emelyanov
a8fc1d64bc test: Add unit test for total/live sstable sizes
The pair of column_family/metrics/(total|live)_disk_space_used/{name}
reports the disk usage by sstables. The test creates table, populates,
flushes and checks that the size corresonds to what stat(2) reports for
the respective files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:52:33 +03:00
Patryk Jędrzejczak
c13b6c91d3 Merge 'raft topology: drop changing the raft voters config via storage_service' from Emil Maskovsky
For the limited voters feature to work properly we need to make sure that we are only managing the voter status through the topology coordinator. This means that we should not change the node votership from the storage_service module for the raft topology directly.

We can drop the voter status changes from the storage_service module because the topology coordinator will handle the votership changes eventually. The calls in the storage_service module were not essential and were only used for optimization (improving the HA under certain conditions).
Furthermore, the other bundled commit improves the reaction again by reacting to the node `on_up()` and `on_down()` events, which again shortens the reaction time and improves the HA.

The change has effect on the timing in the tablets migration test though, as it previously relied on the node being made non-voter from the service_storage `raft_removenode()` function. The fix is to add another server to the topology to make sure we will keep the quorum.

Previously the test worked because the test waits for an injection to be reached and it was ensured that the injection (log line) has only been triggered after the node has been made non-voter from the `raft_removenode()`. This is not the case anymore. An alternative fix would be to wait for the first node to be made non-voter before stopping the second server, but this would make the test more complex (and it is not strictly required to only use 4 servers in the test, it has been only done for optimization purposes).

Fixes: scylladb/scylladb#22860

Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969

No backport: Part of the limited voters new feature, so this shouldn't to be backported.

Closes scylladb/scylladb#22847

* https://github.com/scylladb/scylladb:
  raft: use direct return of future for `run_op_with_retry`
  raft: adjust the voters interface to allow atomic changes
  raft topology: drop removing the node from raft config via storage_service
  raft topology: drop changing the raft voters config via storage_service
2025-03-04 13:59:47 +01:00
Nadav Har'El
d096aac200 test/cqlpy/run: reduce number of tablets
In commit 2463e524ed, Scylla's default changed
from starting with one tablet per shard to starting 10 per shard. The
functional tests don't need more tablets and it can only slow down the
tests, so the patch added --tablets-initial-scale-factor=1 to test/*/suite.yaml
but forgot to add it to test/cqlpy/run.py (to affect test/cqlpy/run) so
this patch does this now.

This patch should *only* be about making tests faster, although to be
honest, I don't see any measurable improvement in test speed (10 isn't
so many). But, unfortunately, this is only part of the story. Over time
we allowed a few cqlpy tests to be written in a way that relies on having
only a small number of tablets or even exactly one tablet per shard (!).
These tests are buggy and should be fixed - see issues #23115 and #23116
as examples. But adding the option --tablets-initial-scale-factor=1 also
to run.py will make these bugs not affect test/cqlpy/run in the same way
as it doesn't affect test.py.

These buggy tests will still break with `pytest cqlpy` against a Scylla
you ran yourself manually, so eventually will still need to fix those
test bugs.

Refs #23115
Refs #23116

Closes scylladb/scylladb#23125
2025-03-04 15:39:21 +03:00
Asias He
60913312af repair: Enable small table optimization for system_replicated_keys
This enterprise-only system table is replicated and small. It should be
included for small table optimization.

Fixes scylladb/scylla-enterprise#5256

Closes scylladb/scylladb#23135
2025-03-04 12:40:56 +02:00
Artsiom Mishuta
97a620cda9 test.py: merge object_store into cluster folder
Now that we support suite subfolders, there is no
need to create an own suite for object_store
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
a283b391c2 test.py: merge auth_cluster into cluster folter
Now that we support suite subfolders, there is no
need to create an own suite for auth_cluster
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
d1198f8318 test.py: rename topology_custom folder to cluster
rename topology_custom folder to cluster
as it contains not only topology test cases
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
d8e17c4356 test.py: merge topology test suite into topology_custom
Now that we support suite subfolders, there is no
need to create an own suite for topology
2025-03-04 10:32:44 +01:00
Artsiom Mishuta
ef62dfa6a9 test.py delete conftest in topology_custom
delete conftest in the sepatate commi for brtter diff listing during
merge topology_custom and topology
2025-03-04 10:32:43 +01:00
Artsiom Mishuta
cf48444e3b test.py apply prepare_3_nodes_cluster in topology
apply prepare_3_nodes_cluster for all tests in the topology folder
via applying mark at the test module level using pytestmark
https://docs.pytest.org/en/stable/example/markers.html#marking-whole-classes-or-modules

set initial initial_size for topology folder to 0
2025-03-04 10:32:43 +01:00
Artsiom Mishuta
20777d7fc6 test.py: introduce prepare_3_nodes_cluster marker
prepare_3_nodes_cluster marker will allow preparing non-dirty 3 nodes cluster
that can be reused between tests
2025-03-04 10:32:43 +01:00
Nadav Har'El
a56751e71b test/cqlpy: fix test assuming just one tablet
The cqlpy test test_compaction.py::test_compactionstats_after_major_compaction
was written to assume we have just one tablet per shard - if there are many
tablets compaction splitting the data, the test scenario might not need
compaction in the way that the test assumes it does.

Recently (commit 2463e524ed) Scylla's default
was changed to have 10 tablets per shard - not one. This broke this test.
The same commit modified test/cqlpy/suite.yaml, but that affects only test.py
and not test/cqlpy/run, and also not manual runs against a manually-installed
Scylla. If this test absolutely requires a keyspace with 1 and not 10
tablets, then it should create one explicitly. So this is what this test
does (but only if tablets are in use; if vnodes are used that's fine
too).

Before this patch,
  test/cqlpy/run test_compaction.py::test_compactionstats_after_major_compaction
fails. After the patch, it passes.

Fixes #23116

Closes scylladb/scylladb#23121
2025-03-04 10:15:29 +02:00
Kefu Chai
a43072a21e cql3,test: replace boost::range::adjacent_find with std::ranges
to reduce third-party dependencies and modernize the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22998
2025-03-04 10:08:02 +02:00
Artsiom Mishuta
d7f9c5654b test.py: change test uname
This commit change the test uname replacement fron "_" to "." to be able support sub-folders in
scylla-pkg scripts logic

Closes scylladb/scylladb#23130
2025-03-04 09:58:58 +02:00
Wojciech Mitros
dae7221342 rust: update dependencies
The currently used versions of "wasmtime", "idna", "cap-std" and
"cap-primitives" packages had low to moderate security issues.
In this patch we update the dependencies to versions with these
issues fixed.
The update was performed by changing the "wasmtime" (and "wasmtime-wasi")
version in rust/wasmtime_bindings/Cargo.toml and updating rust/Cargo.lock
using the "cargo update" command with the affected package. To fix an
issue with different dependencies having different versions of
sub-dependencies, the package "smallvec" was also updated to "1.13.1".
After the dependency update, the Rust code also needed to be updated
because of the slightly changed API. One Wasm test case needed to be
updated, as it was actually using an incorrect Wat module and not
failing before. The crate also no longer allows multiple tables in
Wasm modules by default - it is now enabled by setting the "gc" crate
feature and configuring the Engine with config.wasm_reference_types(true).

Fixes https://github.com/scylladb/scylladb/issues/23127

Closes scylladb/scylladb#23128
2025-03-04 09:45:23 +02:00
Pavel Emelyanov
e4e15a00b7 Merge 'reader_concurrency_semaphore: register_inactive_read(): handle aborted permit' from Botond Dénes
It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up.

Fixes: scylladb/scylladb#22919

Bug is present in all live versions, backports are required.

Closes scylladb/scylladb#23044

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
  test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
2025-03-04 10:40:28 +03:00
Botond Dénes
71d8b7aa9f querier: demote tombstone warning for range-scans to debug level
Range scans are expected to go though lots of tombstones, no need to
spam the logs about this. The tombstone warning log is demoted to debug
level, if somebody wants to see it they can bump the logger to debug
level.

Fixes: https://github.com/scylladb/scylladb/issues/23093

Closes scylladb/scylladb#23094
2025-03-04 10:38:06 +03:00
Kefu Chai
a483ff8647 mutation: replace boost::upper_bound with std::ranges::upper_bound
Reduces dependencies on boost/range.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23119
2025-03-04 10:36:57 +03:00
Kefu Chai
a20cd6539c cql3, dht: Remove redundant std::move() calls
These redundant `std::move()` calls were identified by GCC-14.
In general, copy elision applies to these places, so adding
`std::move()` is not only unnecessary but can actually prevent
the compiler from performing copy elision, as it causes the
return statement to fail to satisfy the requirements for
copy elision optimization.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23063
2025-03-04 10:36:49 +03:00
Botond Dénes
6f7a069bce Merge 'Label basic metrics' from Amnon Heiman
This series is part of the effort to reduce the overall overhead originating from metrics reporting, both on the Scylla side and the metrics collecting server (Prometheus or similar)

The idea in this series is to create an equivalent of levels with a label.
First, label a subset of the metrics used by the dashboards.
Second, the per-table metrics that are now off by default will be marked with a different label.
The following specific optional features: CDC, CAS, and Alternator have a dedicated label now.
This will allow users to disable all metrics of features that are not in use.

All the rest of the metrics are left unlabeled.

Without any changes, users would get the same metrics they are getting today.
But you could pass the `__level=1` and get only those metrics the dashboard needs. That reduces between 50% and 70% (many metrics are hidden if not used, so the overall number of metrics varies).

The labels are not reported based on the seastar feature of hiding labels that start with an underscore.

Closes scylladb/scylladb#12246

* github.com:scylladb/scylladb:
  db/view/view.cc: label metrics with basic_level
  transport/server.cc: label metrics with basic_level
  service/storage_proxy.cc: label metrics with basic_level and cas
  main.cc: label metrics with basic_level
  streaming/stream_manager.cc: label metrics with basic_level
  repair/repair.cc: label metrics with basic_level
  service/storage_service.cc: label metrics with basic_level
  gms/gossiper.cc: label metrics with basic_level
  replica/database.cc: label metrics with basic_level
  cdc/log.cc: label metrics with basic_level and cdc
  alternator: label metrics with basic_level and alternator
  row_cache.cc: label metrics with basic_level
  query_processor.cc: label metrics with basic_level
  sstables.cc: label metrics with basic_level
  utils/logalloc.cc label metrics with basic_level
  commitlog.cc: label metrics with basic_level
  compaction_manager.cc: label metrics with basic_level
  Adding the __level and features labels
2025-03-04 09:32:11 +02:00
Calle Wilund
2f10205714 config: Enable optional TLS1.3 session ticket usage in cert setup
Refs #22916

Adds an "enable_session_tickets" option to TLS setup for our server
endpoints (not documented for internode RPC, as we don't handle it
on the client side there), allowing enabling of TLS3 client session
ticket, i.e. quicker reconnect.

Session tickets are valid within a time frame or until a node
restarts, whichever comes first.

v2:
Use "TLS1.3" in help message

Closes scylladb/scylladb#22928
2025-03-04 09:30:53 +02:00
Amnon Heiman
19a414598b db/view/view.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_view_builder_builds_in_progress

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
9518a85ad0 transport/server.cc: label metrics with basic_level
The following metrics will be marked with basic_level label:
scylla_transport_cql_errors_total
scylla_transport_current_connections
scylla_transport_requests_served
scylla_transport_requests_shed

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00
Amnon Heiman
cbae9a4abe service/storage_proxy.cc: label metrics with basic_level and cas
The following metrics will be marked with basic_level label:
scylla_storage_proxy_coordinator_background_reads
scylla_storage_proxy_coordinator_background_writes
scylla_storage_proxy_coordinator_cas_background
scylla_storage_proxy_coordinator_cas_dropped_prune
scylla_storage_proxy_coordinator_cas_failed_read_round_optimization
scylla_storage_proxy_coordinator_cas_foreground
scylla_storage_proxy_coordinator_cas_prune
scylla_storage_proxy_coordinator_cas_read_contention_bucket
scylla_storage_proxy_coordinator_cas_read_contention_count
scylla_storage_proxy_coordinator_cas_read_latency_count
scylla_storage_proxy_coordinator_cas_read_latency_sum
scylla_storage_proxy_coordinator_cas_read_timeouts
scylla_storage_proxy_coordinator_cas_read_unavailable
scylla_storage_proxy_coordinator_cas_read_unfinished_commit
scylla_storage_proxy_coordinator_cas_total_operations
scylla_storage_proxy_coordinator_cas_write_condition_not_met
scylla_storage_proxy_coordinator_cas_write_contention_count
scylla_storage_proxy_coordinator_cas_write_latency_count
scylla_storage_proxy_coordinator_cas_write_latency_sum
scylla_storage_proxy_coordinator_cas_write_timeout_due_to_uncertainty
scylla_storage_proxy_coordinator_cas_write_timeouts
scylla_storage_proxy_coordinator_cas_write_unavailable
scylla_storage_proxy_coordinator_cas_write_unfinished_commit
scylla_storage_proxy_coordinator_current_throttled_base_writes
scylla_storage_proxy_coordinator_foreground_reads
scylla_storage_proxy_coordinator_foreground_writes
scylla_storage_proxy_coordinator_range_timeouts
scylla_storage_proxy_coordinator_range_unavailable
scylla_storage_proxy_coordinator_read_errors_local_node
scylla_storage_proxy_coordinator_read_latency_count
scylla_storage_proxy_coordinator_read_latency_sum
scylla_storage_proxy_coordinator_reads_local_node
scylla_storage_proxy_coordinator_reads_remote_node
scylla_storage_proxy_coordinator_read_timeouts
scylla_storage_proxy_coordinator_read_unavailable
scylla_storage_proxy_coordinator_speculative_data_reads
scylla_storage_proxy_coordinator_speculative_digest_reads
scylla_storage_proxy_coordinator_total_write_attempts_local_node
scylla_storage_proxy_coordinator_write_errors_local_node
scylla_storage_proxy_coordinator_write_latency_bucket
scylla_storage_proxy_coordinator_write_latency_count
scylla_storage_proxy_coordinator_write_latency_sum
scylla_storage_proxy_coordinator_write_timeouts
scylla_storage_proxy_coordinator_write_unavailable
scylla_storage_proxy_replica_received_counter_updates

All cas related metrics are labeled with __cas label.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2025-03-03 16:58:39 +02:00