Commit Graph

1229 Commits

Author SHA1 Message Date
Calle Wilund
fd59176a73 main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars
Fixes scylladb/scylla-pkg#3845

Don't overwrite (or rather change) AWS credentials variables if already set in
enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI.

v2:
* Allow environment variables in reading obj storage config - allows CI to
  use real credentials in env without risking putting them info less seure
  files
* Don't write credentials info from miniserver into config, instead use said
  environment vars to propagate creds.

v3:
* Fix python launch scripts to not clear environment, thus retaining above aws envs.

(cherry picked from commit 5056a98289)

Closes scylladb/scylladb#19330
2024-06-20 18:08:51 +03:00
Pavel Emelyanov
cb9d6e080c main: Warn unused features
When seeing an UNUSED feature -- print it into log. This is where the
enum_option::key is in use. The thing is that experimental features map
different unused feature names into the single UNUSED feature enum
value, so once the feature is parsed its configured name only persists
in the option's key member (saved by previous patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit b85a02a3fe)
2024-06-12 18:35:32 +00:00
Botond Dénes
7a6ff12ace Merge '[Backport 6.0] alternator: keep TTL work in the maintenance scheduling group' from ScyllaDB
Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).
This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group.

Fixes: #18719

- [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this.

(cherry picked from commit 5d3f7c13f9)

(cherry picked from commit 1fe8f22d89)

 Refs #18729

Closes scylladb/scylladb#19196

* github.com:scylladb/scylladb:
  alternator, scheduler: test reproducing RPC scheduling group bug
  main: add maintenance tenant to messaging_service's scheduling config
2024-06-10 19:58:38 +03:00
Gleb Natapov
45ff4d2c41 group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.

Fixes #18863

(cherry picked from commit a74fbab99a)

Closes scylladb/scylladb#19175
2024-06-10 10:34:29 +02:00
Botond Dénes
5b546ad4b1 main: add maintenance tenant to messaging_service's scheduling config
Currently only the user tenant (statement scheduling group) and system
(default scheduling group) tenants exist, as we used to have only
user-initiated operations and sytem (internal) ones. Now there is need
to distinguish between two kinds of system operation: foreground and
background ones. The former should use the system tenant while the
latter will use the new maintenance tenant (streaming scheduling group).

(cherry picked from commit 5d3f7c13f9)
2024-06-10 07:42:22 +00:00
Tomasz Grabiec
ccd441a4de repair_service: Propagate topology_state_machine to repair_service
(cherry picked from commit e97acf4e30)
2024-06-08 16:31:15 +02:00
Tomasz Grabiec
e518bb68b2 main, storage_service: Move topology_state_machine outside storage_service
It will be propagated to repair_service to avoid cyclic dependency:

storage_service <-> repair_service

(cherry picked from commit c45ce41330)
2024-06-06 13:01:19 +00:00
Tomasz Grabiec
0c1b6fed16 test: perf: Add test for tablet load balancer effectiveness
(cherry picked from commit 7b1eea794b)
2024-06-02 22:40:45 +00:00
Piotr Smaron
51b8b04d97 Add storage service to query processor
Query processor needs to access storage service to check if global
topology request is still ongoing and to be able to wait until it
completes.
2024-05-30 08:33:15 +03:00
Piotr Dulikowski
9820472277 main: introduce schema commitlog scheduling group
Currently, we do not explicitly set a scheduling group for the schema
commitlog which causes it to run in the default scheduling group (called
"main"). However:

- It is important and significant enough that it should run in a
  scheduling group that is separate from the main one,
- It should not run in the existing "commitlog" group as user writes may
  sometimes need to wait for schema commitlog writes (e.g. read barrier
  done to learn the schema necessary to interpret the user write) and we
  want to avoid priority inversion issues.

Therefore, introduce a new scheduling group dedicated to the schema
commitlog.

Fixes: scylladb/scylladb#15566

Closes scylladb/scylladb#18715
2024-05-21 11:29:57 +02:00
Pavel Emelyanov
634c066c43 service_level_controller: Add dependency on shared_token_metadata
The controller needs to access topology, so it needs the token metadata
at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-14 15:43:01 +03:00
Nadav Har'El
9813ec9446 Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz
The code is based on similar idea as perf_simple_query. The main differences are:
  - it starts full scylla process
  - communicates with alternator via http (localhost)
  - uses richer table schema with all dynamoDB types instead of only strings

  Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

  Results on my machine (with 1 vCPU):
  > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
  ...
  median 23402.59616090321
  median absolute deviation: 598.77
  maximum: 24014.41
  minimum: 19990.34

  > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
  ...
  median 16089.34211320635
  median absolute deviation: 552.65
  maximum: 16915.95
  minimum: 14781.97

  The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).

Related: https://github.com/scylladb/scylladb/issues/12518

Closes scylladb/scylladb#13121

* github.com:scylladb/scylladb:
  test: perf: alternator: add option to skip data pre-population
  perf-alternator-workloads: add operations-per-shard option
  test: perf: add global secondary indexes write workload for alternator
  test: perf: add option to continue after failed request
  test: perf: add read modify write workload for alternator (lwt)
  test: perf: add scan workload for alternator
  test: perf: add end-to-end benchmark for alternator
  test: perf: extract result aggregation logic to a separate struct
2024-05-12 18:15:29 +03:00
Piotr Dulikowski
a3070089de main: initialize scheduling group keys before service levels
Due to scylladb/seastar#2231, creating a scheduling group and a
scheduling group key is not safe to do in parallel. The service level
code may attempt to create scheduling groups while
the cql_transport::cql_sg_stats scheduling group key is being created.

Until the seastar issue is fixed, move initialization of the cql sg
states before service level initialization.

Refs: scylladb/seastar#2231

Closes scylladb/scylladb#18581
2024-05-10 10:35:05 +03:00
Marcin Maliszkiewicz
55030b1550 test: perf: add end-to-end benchmark for alternator
The code is based on similar idea as perf_simple_query. The main differences are:
- it starts full scylla process
- communicates with alternator via http (localhost)
- uses richer table schema with all dynamoDB types instead of only strings

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Results on my machine (with 1 vCPU):
> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34

> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97

The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).
2024-05-09 13:58:40 +02:00
Botond Dénes
155332ebf8 Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov
Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail.

This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop.

Closes scylladb/scylladb#18408

* github.com:scylladb/scylladb:
  Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
  view: Abort pending view updates when draining
2024-05-09 08:26:44 +03:00
Kamil Braun
03818c4aa9 direct_failure_detector: increase ping timeout and make it tunable
The direct failure detector design is simplistic. It sends pings
sequentially and times out listeners that reached the threshold (i.e.
didn't hear from a given endpoint for too long) in-between pings.

Given the sequential nature, the previous ping must finish so the next
ping can start. We timeout pings that take too long. The timeout was
hardcoded and set to 300ms. This is too low for wide-area setups --
latencies across the Earth can indeed go up to 300ms. 3 subsequent timed
out pings to a given node were sufficient for the Raft listener to "mark
server as down" (the listener used a threshold of 1s).

Increase the ping timeout to 600ms which should be enough even for
pinging the opposite side of Earth, and make it tunable.

Increase the Raft listener threshold from 1s to 2s. Without the
increased threshold, one timed out ping would be enough to mark the
server as down. Increasing it to 2s requires 3 timed out pings which
makes it more robust in presence of transient network hiccups.

In the future we'll most likely want to decrease the Raft listener
threshold again, if we use Raft for data path -- so leader elections
start quickly after leader failures. (Faster than 2s). To do that we'll
have to improve the design of the direct failure detector.

Ref: scylladb/scylladb#16410
Fixes: scylladb/scylladb#16607

---

I tested the change manually using `tc qdisc ... netem delay`, setting
network delay on local setup to ~300ms with jitter. Without the change,
the result is as observed in scylladb/scylladb#16410: interleaving
```
raft_group_registry - marking Raft server ... as dead for Raft groups
raft_group_registry - marking Raft server ... as alive for Raft groups
```
happening once every few seconds. The "marking as dead" happens whenever
we get 3 subsequent failed pings, which is happens with certain (high)
probability depending on the latency jitter. Then as soon as we get a
successful ping, we mark server back as alive.

With the change, the phenomenon no longer appears.

Closes scylladb/scylladb#18443
2024-05-07 23:40:23 +02:00
Benny Halevy
ebff5f5d70 everywhere: include seastar headers using angle brackets
seastar is an external library therefore it should
use the system-include syntax.

Closes scylladb/scylladb#18513
2024-05-06 10:00:31 +03:00
Pavel Emelyanov
67736b5cd3 Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
This reverts commit 9c2a836607.
2024-05-02 08:16:14 +03:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Kefu Chai
ad2c26824a main: do not reference moved variable
before this change, we dereference `linfo` after moving it away.
and clang-tidy warns us like

```
[19/171] Building CXX object CMakeFiles/scylla.dir/main.cc.o
/home/kefu/dev/scylladb/main.cc:559:12: warning: 'linfo' used after it was moved [bugprone-use-after-move]
  559 |     return linfo.host_id;
      |            ^
/home/kefu/dev/scylladb/main.cc:558:36: note: move occurred here
  558 |     sys_ks.local().save_local_info(std::move(linfo), snitch.local()->get_location(), broadcast_address, broadcast_rpc_address).get();
      |                                    ^
```

the default-generated move constructor of `local_info` uses the
default-generated move constructor of `locator::host_id`, which in turn
use the default-generated move constructor of
`utils::tagged_uuid<struct host_id_tag>`, and then `utils::UUID` 's
move constructor. since `UUID` does not contain any moveable resources,
what it has is but two `int64_t` member variables. so this is a benign
issue. but still, it is distracting.

in this change, we keep the value of `host_id` locally, and return it
instead to silence this warning, and to improve the maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18362
2024-04-23 11:58:58 +03:00
Kefu Chai
ff04375016 main: drop unused namespace alias
`fs` namespace alias was introduced in ff4d8b6e85, but we don't
use it anymore. so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18308
2024-04-22 13:50:28 +03:00
Mikołaj Grzebieluch
65cfb9b4e0 storage_service: skip wait_for_gossip_to_settle if topology changes are based on raft
Waiting for gossip to settle slows down the bootstrap of the cluster.
It is safe to disable it if the topology is based on Raft.

Fixes scylladb/scylladb#16055

Closes scylladb/scylladb#17960
2024-04-20 17:56:51 +02:00
Kefu Chai
5ab527e669 main: do not echo parsed options when calling scylla interactively
in 2f0f53ac, we added logging of parsed command line options so that we
can see how scylla is launched in case it fails to boot. but when scylla
is called interactively in console. this echo is a little bit annoying.
see following console session
```console
$ scylla --help-loggers
Scylla version 5.5.0~dev-0.20240419.3c9651adf297 with build-id 7dd6a110e608535e5c259a03548eda6517ab4bde starting ...
command used: "./RelWithDebInfo/scylla --help-loggers"
pid: 996503
parsed command line options: [help-loggers]
Available loggers:
    BatchStatement
    LeveledManifest
    alter_keyspace
    alter_table
...
```

so in this change, we check if the stdin is associated with a terminal
device, if that the case, we don't print the scylla version, parsed
command line and pid. and the interactive session looks like:

```console
$ scylla --help-loggers
Available loggers:
    BatchStatement
    LeveledManifest
    alter_keyspace
    alter_table
```
no more distracting information printed. the original behavior
can be tested like:

```console
$ : | ./RelWithDebInfo/scylla --help-loggers
```

assuming scylla is always launched with systemd, which connects
stdin to /dev/null. see
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Logging%20and%20Standard%20Input/Output
. so this behavior is preserved with this change.

Refs #4203

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18309
2024-04-19 15:00:05 +03:00
Kamil Braun
9c2a836607 Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB"
This reverts commit 298a7fcbf2, reversing
changes made to 5cf53e670d.

The change made CI flaky.

Fixes: scylladb/scylladb#18278
2024-04-18 11:50:41 +02:00
Pavel Emelyanov
1e0d96cfed storage_service: Drain view builder on drain too
This gets rid of dangling deferred drin on stop and makes nodetool drain
more "consistent" by stopping one more unneeded background activity

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:12 +03:00
Pavel Emelyanov
895391fb4b storage_service: Add view_builder& reference
Storage service will need to drain v.b. on its drain. Also on cluster
join it marks existing views as built while it's v.b.'s job to do it.
Both will be fixed by next patching and this is prerequisite.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:55:07 +03:00
Pavel Emelyanov
f00f1f117b main,cql_test_env: Move view_builder start up (and make unconditional)
Just starting sharded<view_builder> is lightweight, its constructor does
nothing but initializes on-board variables. Real work takes off on
view_builder::start() which is not moved.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:53:33 +03:00
Tomasz Grabiec
1a839bcb36 main: Skip tablet metadata loading in maintenance mode
If system.tablets is corrupted, the node would not boot in maintenance
mode, which is needed to fix system.tablets.

Closes scylladb/scylladb#17990
2024-04-04 09:20:09 +03:00
Piotr Dulikowski
baae811142 Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.

Fixes https://github.com/scylladb/scylladb/issues/17736

Closes scylladb/scylladb#18039

* github.com:scylladb/scylladb:
  auth: keep auth version in scylla_local
  auth: coroutinize service::start
2024-04-03 12:25:56 +02:00
Marcin Maliszkiewicz
562caaf6c6 auth: keep auth version in scylla_local
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.
2024-04-02 19:04:21 +02:00
Piotr Dulikowski
57719ece4f Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.

Closes scylladb/scylladb#18040

* github.com:scylladb/scylladb:
  main: reload service levels data accessor after join_cluster
  service: qos: create separate function for reloading data accessor
2024-03-29 09:39:11 +01:00
Marcin Maliszkiewicz
e1fea3af6b main: reload service levels data accessor after join_cluster
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.
2024-03-26 17:36:03 +01:00
Marcin Maliszkiewicz
ff17a29b54 service: qos: create separate function for reloading data accessor
Scylla's main is already too long, it's better to contain this logic inside qos service.
2024-03-26 17:26:19 +01:00
Pavel Emelyanov
67c2a06493 api: Rename (un)set_server_load_sstable -> (un)set_server_column_family
The method sets up column family API, not load-sstables one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18022
2024-03-26 12:16:08 +02:00
Piotr Dulikowski
f23f8f81bf Merge 'Raft-based service levels' from Michał Jadwiszczak
This patch introduces raft-based service levels.

The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation

Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)

Fixes #17926

Closes scylladb/scylladb#16585

* github.com:scylladb/scylladb:
  test: test service levels v2 works in recovery mode
  test: add test for service levels migration
  test: add test for service levels snapshot
  test:topology: extract `trigger_snapshot` to utils
  main: create raft dda if sl data was migrated
  service:qos: store information about sl data migration
  service:qos: service levels migration
  main: assign standard service level DDA before starting group0
  service:qos: fix `is_v2()` method
  service:qos: add a method to upgrade data accessor
  test: add unit_test_raft_service_levels_accessor
  service:storage_service: add support for service levels raft snapshot
  service:qos: add abort_source for group0 operations
  service:qos: raft service level distributed data accessor
  service:qos: use group0_guard in data accessor
  cql3:statements: run service level statements on shard0 with raft guard
  test: fix overrides in unit_test_service_levels_accessor
  service:qos: fix indentation
  service:qos: coroutinize some of the methods
  db:system_keyspace: add `SERVICE_LEVELS_V2` table
  service:qos: extract common service levels' table functions
2024-03-22 11:51:53 +01:00
Michał Jadwiszczak
a08918a320 main: create raft dda if sl data was migrated
Create `raft_service_levels_distributed_data_accessor` if service levels
were migrated to v2 table.
This supports raft recovery mode, as service levels will be read from v2
table in the mode.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
2917ec5d51 service:qos: service levels migration
Migrate data from `system_distributes.service_levels` to
`system.service_levels_v2` during raft topology upgrade.

Migration process reads data from old table with CL ALL
and inserts the data to the new table via raft.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
36c9afda99 main: assign standard service level DDA before starting group0
`topology_state_load()` is responsible for upgrading service level DDA,
so the standard DDA has to be assigned before to be upgraded
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d5fa0747d7 service:qos: add abort_source for group0 operations
Add mechanism to abort ongoing group0 operations while draining
service_level_controller or leaving the cluster.
2024-03-21 23:14:57 +01:00
Petr Gusev
49a4220fea error_injection: pass injection parameters at startup
Injection parameters can be used in the lambda passed to
inject_with_handler method to take some values from
the test. However, there was no way to set values to these
parameters on node startup, only through
the error injection REST api. Therefore, we couldn't rely
on this when inject_with_handler is used during
node startup, it could trigger before we call the api
from the test.

In this commit with solve this problem by allowing these
parameters to be assigned through scylla.yaml config.

The defer.hh header was added to error_injection.hh to fix
compilation after adding error_injection.hh to config.hh,
defer function is used in error_injection.hh.
2024-03-19 20:17:02 +04:00
Kamil Braun
19b816bb68 Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.

Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.

Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157

Closes scylladb/scylladb#16578

* github.com:scylladb/scylladb:
  test: extend auth-v2 migration test to catch stale static
  test: add auth-v2 migration test
  test: add auth-v2 snapshot transfer test
  test: auth: add tests for lost quorum and command splitting
  test: pylib: disconnect driver before re-connection
  test: adjust tests for auth-v2
  auth: implement auth-v2 migration
  auth: remove static from queries on auth-v2 path
  auth: coroutinize functions in password_authenticator
  auth: coroutinize functions in standard_role_manager
  auth: coroutinize functions in default_authorizer
  storage_service: add support for auth-v2 raft snapshots
  storage_service: extract getting mutations in raft snapshot to a common function
  auth: service: capture string_view by value
  alternator: add support for auth-v2
  auth: add auth-v2 write paths
  auth: add raft_group0_client as dependency
  cql3: auth: add a way to create mutations without executing
  cql3: run auth DML writes on shard 0 and with raft guard
  service: don't loose service_level_controller when bouncing client_state
  auth: put system_auth and users consts in legacy namespace
  cql3: parametrize keyspace name in auth related statements
  auth: parametrize keyspace name in roles metadata helpers
  auth: parametrize keyspace name in password_authenticator
  auth: parametrize keyspace name in standard_role_manager
  auth: remove redundant consts auth::meta::*::qualified_name
  auth: parametrize keyspace name in default_authorizer
  db: make all system_auth_v2 tables use schema commitlog
  db: add system_auth_v2 tables
  db: add system_auth_v2 keyspace
2024-03-06 10:11:33 +01:00
Konstantin Osipov
39d882ddca main: print pid (process id) at start
Print process id to the log at start.
It aids debugging/administering the instance if you have multiple
instances running on the same machine.

Closes scylladb/scylladb#17582
2024-03-06 10:14:22 +02:00
Marcin Maliszkiewicz
7f204a6e80 auth: add raft_group0_client as dependency
Most auth classes need this to be able to announce
raft commands.

Usage added in subsequent commit.
2024-03-01 16:25:14 +01:00
Tomasz Grabiec
ef9e5e64a3 locator: token_metadata: Introduce topology barrier stall detector
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.

Example log:

  token_metadata - topology version 30 held for 299.159 [s] past expiry, released at:  0x2397ae1 0x23a36b6 ...

Closes scylladb/scylladb#17427
2024-02-21 15:05:34 +02:00
Petr Gusev
4b33ba2894 raft_address_map: add my ip with the new generation
The following scenario is possible: a node A changes its IP
from ip1 to ip2 with restart, other nodes are not yet aware of ip2
so they keep gossiping ip1, after restart A receives
ip1 in a gossip message and calls handle_major_state_change
since it considers it as a new node. Then on_join event is
called on the gossiper notification handles, we receive
such event in raft_ip_address_updater and reverts the IP
of the node A back to ip1.

The essence of the problem is that we don't pass the proper
generation when we add ip2 as a local IP during initialization
when node A restarts, so the zero generation is used
in raft_address_map::add_or_update_entry and the gossiper
message owerwrites ip2 to ip1.

In this commit we fix this problem by passing the new generation.
To do that we move the increment_and_get_generation call
from join_token_ring to scylla_main, so that we have a new generation
value before init_address_map is called.

Also we remove the load_initial_raft_address_map function from
raft_group0 since it's redundant. The comment above its call site
says that it's needed to not miss gossiper updates, but
the function storage_service::init_address_map where raft_address_map
is now initialized is called before gossiper is started. This
function does both - it load the previously persisted host_id<->IP
mappings from system.local and subscribes to gossiper notifications,
so there is no room for races.

Note that this problem reproduces less likely with the
'raft topology: ip change: purge old IP' commit - other
nodes remove the old IP before it's send back to the
just restarted node. This is also the reason why this
problem doesn't occur in gossiper mode.

fixes scylladb/scylladb#17199
2024-02-15 13:21:04 +04:00
Pavel Emelyanov
2b1612aa04 main: Stop lifecycle notifier for real
It wasn't because of storage service, not the latter is stopped (since
e6b34527c1), so the former can be stopped to

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17251
2024-02-12 13:59:50 +02:00
Piotr Dulikowski
d04b3338ce cdc/generation_service: in legacy mode, fall back to raft tables
When a node enters recovery after being in raft topology mode, topology
operations switch back to legacy mode. We want CDC to keep working when
that happens, so we need for the legacy code to be able to access
generations created back in raft mode - so that the node can still
properly serve writes to CDC log tables.

In order to make this possible, modify the legacy logic to also look for
a cdc generation in raft tables, if it is not found in legacy tables.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
77a8f5e3d6 cdc/generation_service: turn off gossip notifications in raft topo mode
In raft topology mode CDC information is propagated through group 0.
Prevent the generation service from reacting to gossiper notifications
after we made the switch to raft mode.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
07aba3abc4 group0_state_machine: pull snapshot after raft topology feature enabled
Pulling a snapshot of the raft topology is done via new rpc verb
(RAFT_PULL_TOPOLOGY_SNAPSHOT). If the recipient runs an older version of
scylla and does not understand the verb, sending it will result in an
error. We usually use cluster features to avoid such situations, but in
the case when a node joins the cluster, it doesn't have access to
features yet. Therefore, we need to enable pulling snapshots in two
situations:

- when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature becomes enabled,
- in case when starting group 0 server when joining a cluster that uses
  raft-based topology.
2024-02-08 19:12:28 +01:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00