Commit Graph

46989 Commits

Author SHA1 Message Date
Gleb Natapov
8a747fbc2a treewide: drop endpoint life cycle subscribers that do nothing
Provide default implementation for them instead. Will be easier to rework them later.
2025-03-11 12:09:22 +02:00
Gleb Natapov
525b88f877 load_meter: move to host id
Use host id indexing in load_meter and only convert to ips on api level.
2025-03-11 12:09:22 +02:00
Gleb Natapov
48a1030c91 treewide: use host id directly in endpoint state change subscribers
Now that we have host ids in endpoint state change subscribers some of
them can be simplified by using the id directly instead of locking it up
by ip.
2025-03-11 12:09:22 +02:00
Gleb Natapov
499eb4d17f treewide: pass host id to endpoint state change subscribers 2025-03-11 12:09:22 +02:00
Gleb Natapov
eb59205caf gossiper: drop deprecated unsafe_assassinate_endpoint operation
It was always deprecated.
2025-03-11 12:09:21 +02:00
Gleb Natapov
c17a8b4a76 storage_service: drop unused code in handle_state_removed 2025-03-11 12:09:21 +02:00
Gleb Natapov
696aee3adc treewide: drop endpoint state change subscribers that do nothing
Provide default implementation for them instead. Will be easier to rework them later.
2025-03-11 12:09:21 +02:00
Gleb Natapov
7dcffda6bd gossiper: drop ip address from handle_echo_msg and simplify code since host_id is now mandatory 2025-03-11 12:09:21 +02:00
Gleb Natapov
8425c26462 gossiper: start using host ids to send messages earlier
Send digest ack and ack2 by host ids as well now since the id->ip
mapping is available after receiving digest syn. It allows to convert
more code to host id here.
2025-03-11 12:09:21 +02:00
Gleb Natapov
f0af3f261e messaging_service: add temporary address map entry on incoming connection
We want to move to use host ids as soon as possible. Currently it is
possible only after the full gossiper exchange (because only at this
point gossiper state is added and with it address map entry). To make it
possible to move to host ids earlier this patch adds address map entries
on incoming communication during CLIENT_ID verb processing. The patch
also adds generation to CLIENT_ID to use it when address map is updated.
It is done so that older gossiper entries can be overwritten with newer
mapping in case of IP change.
2025-03-11 12:09:21 +02:00
Gleb Natapov
c3035caeb5 topology_coordinator: notify about IP change from sync_raft_topology_nodes as well
Currently sync_raft_topology_nodes() only send join notification if a
node is new in the topology, but sometimes a node changes IP and the
join notification should be send for the new IP as well. Usually it is
done from ip_address_updater, but topology reload can run first and then
the notification will be missed. The solution is to send notification
during topology reload as well.
2025-03-11 12:09:21 +02:00
Gleb Natapov
0e3dcb7954 treewide: move everyone to use host id based gossiper::is_alive and drop ip based one 2025-03-11 12:09:21 +02:00
Gleb Natapov
56c6e04079 storage_proxy: drop unused template
The storage_proxy::is_alive is called with host_id only.
2025-03-11 12:09:21 +02:00
Gleb Natapov
e47f251178 gossiper: move _live_endpoints and _unreachable_endpoints endpoint to host_id
Index live and dead endpoints by host id. It also allows to simplify
some code that does a translation.
2025-03-11 12:09:21 +02:00
Gleb Natapov
6f05608b5e gossiper: chunk vector using std::views::chunk instead of explicitly code it 2025-03-11 12:09:21 +02:00
Gleb Natapov
0437f558cd idl: generate ip based version of a verb only for verbs that need it
The patch adds new marker for a verb - [[ip]] that means that for this
verb ip version of the verbs needs to be generated. Most of the verbs
do not need it.
2025-03-11 12:09:21 +02:00
Gleb Natapov
3734afe8a5 gossiper: send shutdown notification by host id 2025-03-11 12:09:21 +02:00
Gleb Natapov
ee59baf6fc gossiper: drop old shadow round code
It is no longer used. It was replaced with explicit GOSSIP_GET_ENDPOINT_STATES verb in
cd7d64f588 which is in scylla-4.3.0
2025-03-11 12:09:20 +02:00
Gleb Natapov
f1a82c1d01 gossiper: drop unused get_endpoint_states function 2025-03-11 12:09:20 +02:00
Gleb Natapov
c4a0fbae16 gossiper: check id match inside force_remove_endpoint
Before calling force_remove_endpoint (which works on ip) the code checks
that the ip maps to the correct id (not not remove a new node that
inherited this ip by  mistake). Move the check to the function itself.
2025-03-11 12:09:20 +02:00
Gleb Natapov
52c9217f1b migration_manager: drop unneeded id to ip translation 2025-03-11 12:09:20 +02:00
Gleb Natapov
4420ddaf86 gossiper: move is_gossip_only_member and its users to work on host id 2025-03-11 12:09:20 +02:00
Gleb Natapov
cb2b874942 table: use host id based get_endpoint_state_ptr and skip id->ip translation 2025-03-11 12:09:20 +02:00
Gleb Natapov
2746d391af gossiper: do not ping outdated address
A node may change its IP but some other node in the cluster may still
try to ping it using an old IP because it may receive an outdated gossiper
entry with the old IP. Do not send echo message to the old IP. It will
cause a misusing UP message with old address to be printed.
2025-03-11 12:09:20 +02:00
Gleb Natapov
aaba55073d storage_service: drop outdated code that checks whether raft topology should be used
After raft_topology_change_enabled() was introduced the code does
nothing useful. The function is responsible for the decision if raft topology
is enabled or not.
2025-03-11 12:09:20 +02:00
Gleb Natapov
6952f62869 gossiper: drop unused field from loaded_endpoint_state 2025-03-11 12:09:20 +02:00
Avi Kivity
f18e8edcb7 Merge 'dist/docker: switch to UBI9' from Takuya ASADA
Switch container base image to UBI9, and make it ready for Red Hat
OpenShift Certification.

Fixes https://github.com/scylladb/scylla-pkg/issues/4858

Closes scylladb/scylladb#22910

* github.com:scylladb/scylladb:
  dist/docker: run the container as non-root user
  dist/docker: switch to UBI9
2025-03-10 15:33:30 +02:00
Luis Freitas
09e790d5af .github: Update github action for triggering next gating
Before we were using a marketplace Github action which had some limitations.
With this pull request we are updating the github action using curl option which will gives us full control of the flow instead of relying on pre made github action.

Fixes: scylladb#23088

Closes scylladb/scylladb#23215
2025-03-10 14:38:08 +02:00
Piotr Szymaniak
b6ba573dfe HACKING.md: Provide step-by-step support to enable development with CLion
Claim that building with CMake files is just 'not supported' instead of
not intended, especially that there are attempts to enable this.

Remove the obsolete mention of the `FOR_IDE` flag.

Closes scylladb/scylladb#22890
2025-03-09 16:22:24 +02:00
Ernest Zaslavsky
6a3cef5703 metadata: Correct "DESCRIBE" output for keyspace metadata
Update the "DESCRIBE" command output to accurately display `tablet` settings in keyspace metadata.

Closes scylladb/scylladb#23056
2025-03-09 14:50:08 +02:00
Anna Stuchlik
9ac0aa7bba doc: zero-token nodes and Arbiter DC
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.

The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.

We need separate documents because zero-token nodes may be used
for other purposes in the future.

In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.

Refs https://github.com/scylladb/scylladb/pull/19684

Fixes https://github.com/scylladb/scylladb/issues/20294

Closes scylladb/scylladb#21348
2025-03-07 16:39:02 +01:00
Kefu Chai
2a9966a20e gms: Fix fmt formatter for gossip_digest_sync
In commit 4812a57f, the fmt-based formatter for gossip_digest_syn had
formatting code for cluster_id, partitioner, and group0_id
accidentally commented out, preventing these fields from being included
in the output. This commit restores the formatting by uncommenting the
code, ensuring full visibility of all fields in the gossip_digest_syn
message when logging permits.

This fixes a regression introduced in 4812a57f, which obscured these
fields and reduced debugging insight. Backporting is recommended for
improved observability.

Fixes #23142
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23155
2025-03-07 15:36:03 +01:00
Robert Bindar
27f2d64725 Remove object storage config credentials provider
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#22951
2025-03-07 10:40:58 +03:00
Luis Freitas
84b30d11ec .github: trigger Jenkins job using github action
This action will help preventing next-trigger for running every 15 minutes.

This action will run on push for a specific branch (next, next-enterprise, 2024.x, x.x)

Fixes: scylladb#23088

update action

Closes scylladb/scylladb#23141
2025-03-07 06:41:58 +02:00
Avi Kivity
28906c9261 Merge 'scylla-sstable: introduce the query command' from Botond Dénes
The scylla-sstable dump-* command suite has proven invaluable  in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data`  is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.

Select everything from a table:

    $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
     keyspace_name                 | durable_writes | replication
    -------------------------------+----------------+-------------------------------------------------------------------------------------
            system_replicated_keys |           true |                         ({class : org.apache.cassandra.locator.EverywhereStrategy})
                       system_auth |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
                     system_schema |           true |                              ({class : org.apache.cassandra.locator.LocalStrategy})
                system_distributed |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
                            system |           true |                              ({class : org.apache.cassandra.locator.LocalStrategy})
                                ks |           true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
                     system_traces |           true |   ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
     system_distributed_everywhere |           true |                         ({class : org.apache.cassandra.locator.EverywhereStrategy})

Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):

    $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
    [
      {
        "keyspace_name": "system_schema",
        "durable_writes": true,
        "replication": {
          "class": "org.apache.cassandra.locator.LocalStrategy"
        }
      },
      {
        "keyspace_name": "system",
        "durable_writes": true,
        "replication": {
          "class": "org.apache.cassandra.locator.LocalStrategy"
        }
      }
    ]

Select a specific field in a specific partition using the command-line:

    $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
     replication
    -------------------------------------------------------------------------------------
     ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

Select a specific field in a specific partition using ``--query-file``:

    $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
    $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
     replication
    -------------------------------------------------------------------------------------
     ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

New functionality: no backport needed.

Closes scylladb/scylladb#22007

* github.com:scylladb/scylladb:
  docs/operating-scylla: document scylla-sstable query
  test/cqlpy/test_tools.py: add tests for scylla-sstable query
  test/cqlpy/test_tools.py: make scylla_sstable() return table name also
  scylla-sstable: introduce the query command
  tools/utils: get_selected_operation(): use std::string for operation_options
  utils/rjson: streaming_writer: add RawValue()
  cql3/type_json: add to_json_type()
  test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
2025-03-06 13:42:45 +02:00
Botond Dénes
1139cf3a98 Merge 'Speed up (and generalize) the way API calculates sstable disk usage' from Pavel Emelyanov
There are several API endpoints that walk a specific list of sstables and sum up their bytes_on_disk() values. All those endpoints accumulate a map of sstable names to their sizes, then squashe the maps together and, finally, sum up the map values to report it back. Maintaining these intermediate collections is the waste of CPU and memory, the usage values can be summed up instantly.

Also add a test for per-cf endpoints to validate the change, and generalize the helper functions while at it.

Closes scylladb/scylladb#23143

* github.com:scylladb/scylladb:
  api: Generalize disk space counting for table and system
  api: Use map_reduce_cf_raw() overload with table name
  api: Don't collect sstables map to count disk space usage
  test: Add unit test for total/live sstable sizes
2025-03-06 11:26:35 +02:00
Raphael S. Carvalho
fedd838b9d replica: Fix race of some operations like cleanup with snapshot
There are two semaphores in table for synchronizing changes to sstable list:

sstable_set_mutation_sem: used to serialize two concurrent operations updating
the list, to prevent them from racing with each other.

sstable_deletion_sem: A deletion guard, used to serialize deletion and
iteration over the list, to prevent iteration from finding deleted files on
disk.

they're always taken in this order to avoid deadlocks:
sstable_set_mutation_sem -> sstable_deletion_sem.

problem:

A = tablet cleanup
B = take_snapshot()

1) A acquires sstable_set_mutation_sem for updating list
2) A acquires sstable_deletion_sem, then delete sstable before updating list
3) A releases sstable_deletion_sem, then yield
4) B acquires sstable_deletion_sem
5) B iterates through list and bumps sstable deleted in step 2
6) B fails since it cannot find the file on disk

Initial reaction is to say that no procedure must delete sstable before
updating the list, that's true.

But we want a iteration, running concurrently to cleanup, to not find sstables
being removed from the system. Otherwise, e.g. snapshot works with sstables
of a tablet that was just cleaned up. That's achieved by serializing iteration
with list update.
Since sstable_deletion_sem is used within the scope of deletion only, it's
useless for achieving this. Cleanup could acquire the deletion sem when
preparing list updates, and then pass the "permit" to deletion function, but
then sstable_deletion_sem would essentially become sstable_set_mutation_sem,
which was created exactly to protect the list update.

That being said, it makes sense to merge both semaphores. Also things become
easier to reason about, and we don't have to worry about deadlocks anymore.

The deletion goes through sstable_list_builder, which holds a permit throughout
its lifetime, which guarantees that list updates and deletion are atomic to
other concurrent operations. The interface becomes less error prone with that.
It allowed us to find discard_sstables() was doing deletion without any permit,
meaning another race could happen between truncate and snapshot.

So we're fixing race of (truncate|cleanup) with take_snapshot, as far as we
know. It's possible another unknown races are fixed as well.

Fixes #23049.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23117
2025-03-06 11:00:48 +02:00
Pavel Emelyanov
86b3e9b50b code: Move checked-file-impl.hh to util/
fixes: #22100

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23123
2025-03-06 10:22:05 +02:00
Petr Hála
f3c3eb6ae3 doc: Fix object_storage_config_file option
It needs to use underscores, not dash

Closes scylladb/scylladb#23161
2025-03-06 10:30:51 +03:00
Pavel Emelyanov
e7d1ea3ab6 commitlog: Use shorter input stream creation overload
There's one that doesn't need the offset argument when it's 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23140
2025-03-06 08:06:42 +01:00
Botond Dénes
49d6bf8947 Merge 'main: safely check stop_signal in-between starting services' from Benny Halevy
To simplify aborting scylla while starting the services,
add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

This patch prevents gate_closed_exception to escape handling
when start-up is aborted early with the stop signal,
causing https://github.com/scylladb/scylladb/issues/23153
The regression is apparently due to a25c3eaa1c

Fixes https://github.com/scylladb/scylladb/issues/23153

* Requires backport to 2025.1 due to a25c3eaa1c

Closes scylladb/scylladb#23103

* github.com:scylladb/scylladb:
  main: add checkpoints
  main: safely check stop_signal in-between starting services
  main: move prometheus start message
  main: move per-shard database start message
2025-03-06 08:28:29 +02:00
Takuya ASADA
781dec5852 dist/docker: run the container as non-root user
Since it is requirement for Red Hat OpenShift Certification, we need to
run the container as non-root user.

Related scylladb/scylla-pkg#4858

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2025-03-05 23:39:56 +09:00
Takuya ASADA
1abf981a73 dist/docker: switch to UBI9
Switch container base image to UBI9, to prepare for Red Hat OpenShift
Certification.

Fixes scylladb/scylla-pkg#4858

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2025-03-05 23:39:56 +09:00
Benny Halevy
b6705ad48b main: add checkpoints
Before starting significant services that didn't
have a corresponding call to supervisor::notify
before them.

Fixes #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:29:34 +02:00
Benny Halevy
feef7d3fa1 main: safely check stop_signal in-between starting services
To simplify aborting scylla while starting the services,
Add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

Refs #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:15:17 +02:00
Benny Halevy
282ff344db main: move prometheus start message
The `prometheus_server` is started only conditionally
but the notification message is sent and logged
unconditionally.
Move it inside the condtional code block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:09:09 +02:00
Benny Halevy
23433f593c main: move per-shard database start message
It is now logged out of place, so move it to right before
calling `start` on every database shard.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 07:09:09 +02:00
Nadav Har'El
e0f24c03e7 Merge 'test.py: merge all 'Topology' suite types int one folder 'cluster'' from Artsiom Mishuta
Now that we support suite subfolders, there is no
need to create an own suite for object_store and auth_cluster, topology, topology_custom.
this PR merge all these folders into one: 'cluster"

this pr also introduce and apply 'prepare_3_nodes_cluster' fixture  that  allow preparing non-dirty 3 nodes cluster
that can be reused between tests(for tests that was in topology folder)

number of tests in master
release -3461
dev       -3472
debug   -3446

number of tests in this PR
release -3460
dev       -3471
debug   -3445

There is a minus one test in each mode because It was 2 test_topology_failure_recovery files(topology and topology_custom) with the same utility functions but different test cases. This PR merged them into one

Closes scylladb/scylladb#22917

* github.com:scylladb/scylladb:
  test.py: merge object_store into cluster folder
  test.py: merge auth_cluster into cluster folter
  test.py: rename topology_custom folder to cluster
  test.py: merge topology test suite into topology_custom
  test.py delete conftest in topology_custom
  test.py apply prepare_3_nodes_cluster in topology
  test.py: introduce prepare_3_nodes_cluster marker
2025-03-04 19:26:32 +02:00
Pavel Emelyanov
c084de1406 api: Generalize disk space counting for table and system
Now when the bodies of both map-reduce reducers are the same, they can
be generalized with each other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:56:16 +03:00
Pavel Emelyanov
4e2abba5a1 api: Use map_reduce_cf_raw() overload with table name
The existing helper that counds disk space usage for a table map-reduces
the table object "by hand". Its peer that counts the usage for all
tables uses the map_reduce_cf_raw() helper. The latter exists for
specific table as well, so the first counter can benefit from using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-04 19:55:05 +03:00