scylladb

Author	SHA1	Message	Date
Calle Wilund	fd59176a73	main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars Fixes scylladb/scylla-pkg#3845 Don't overwrite (or rather change) AWS credentials variables if already set in enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI. v2: * Allow environment variables in reading obj storage config - allows CI to use real credentials in env without risking putting them info less seure files * Don't write credentials info from miniserver into config, instead use said environment vars to propagate creds. v3: * Fix python launch scripts to not clear environment, thus retaining above aws envs. (cherry picked from commit `5056a98289`) Closes scylladb/scylladb#19330	2024-06-20 18:08:51 +03:00
Pavel Emelyanov	cb9d6e080c	main: Warn unused features When seeing an UNUSED feature -- print it into log. This is where the enum_option::key is in use. The thing is that experimental features map different unused feature names into the single UNUSED feature enum value, so once the feature is parsed its configured name only persists in the option's key member (saved by previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `b85a02a3fe`)	2024-06-12 18:35:32 +00:00
Botond Dénes	7a6ff12ace	Merge '[Backport 6.0] alternator: keep TTL work in the maintenance scheduling group' from ScyllaDB Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. (cherry picked from commit `5d3f7c13f9`) (cherry picked from commit `1fe8f22d89`) Refs #18729 Closes scylladb/scylladb#19196 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-10 19:58:38 +03:00
Gleb Natapov	45ff4d2c41	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes #18863 (cherry picked from commit `a74fbab99a`) Closes scylladb/scylladb#19175	2024-06-10 10:34:29 +02:00
Botond Dénes	5b546ad4b1	main: add maintenance tenant to messaging_service's scheduling config Currently only the user tenant (statement scheduling group) and system (default scheduling group) tenants exist, as we used to have only user-initiated operations and sytem (internal) ones. Now there is need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). (cherry picked from commit `5d3f7c13f9`)	2024-06-10 07:42:22 +00:00
Tomasz Grabiec	ccd441a4de	repair_service: Propagate topology_state_machine to repair_service (cherry picked from commit `e97acf4e30`)	2024-06-08 16:31:15 +02:00
Tomasz Grabiec	e518bb68b2	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service (cherry picked from commit `c45ce41330`)	2024-06-06 13:01:19 +00:00
Tomasz Grabiec	0c1b6fed16	test: perf: Add test for tablet load balancer effectiveness (cherry picked from commit `7b1eea794b`)	2024-06-02 22:40:45 +00:00
Piotr Smaron	51b8b04d97	Add storage service to query processor Query processor needs to access storage service to check if global topology request is still ongoing and to be able to wait until it completes.	2024-05-30 08:33:15 +03:00
Piotr Dulikowski	9820472277	main: introduce schema commitlog scheduling group Currently, we do not explicitly set a scheduling group for the schema commitlog which causes it to run in the default scheduling group (called "main"). However: - It is important and significant enough that it should run in a scheduling group that is separate from the main one, - It should not run in the existing "commitlog" group as user writes may sometimes need to wait for schema commitlog writes (e.g. read barrier done to learn the schema necessary to interpret the user write) and we want to avoid priority inversion issues. Therefore, introduce a new scheduling group dedicated to the schema commitlog. Fixes: scylladb/scylladb#15566 Closes scylladb/scylladb#18715	2024-05-21 11:29:57 +02:00
Pavel Emelyanov	634c066c43	service_level_controller: Add dependency on shared_token_metadata The controller needs to access topology, so it needs the token metadata at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:43:01 +03:00
Nadav Har'El	9813ec9446	Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core). Related: https://github.com/scylladb/scylladb/issues/12518 Closes scylladb/scylladb#13121 * github.com:scylladb/scylladb: test: perf: alternator: add option to skip data pre-population perf-alternator-workloads: add operations-per-shard option test: perf: add global secondary indexes write workload for alternator test: perf: add option to continue after failed request test: perf: add read modify write workload for alternator (lwt) test: perf: add scan workload for alternator test: perf: add end-to-end benchmark for alternator test: perf: extract result aggregation logic to a separate struct	2024-05-12 18:15:29 +03:00
Piotr Dulikowski	a3070089de	main: initialize scheduling group keys before service levels Due to scylladb/seastar#2231, creating a scheduling group and a scheduling group key is not safe to do in parallel. The service level code may attempt to create scheduling groups while the cql_transport::cql_sg_stats scheduling group key is being created. Until the seastar issue is fixed, move initialization of the cql sg states before service level initialization. Refs: scylladb/seastar#2231 Closes scylladb/scylladb#18581	2024-05-10 10:35:05 +03:00
Marcin Maliszkiewicz	55030b1550	test: perf: add end-to-end benchmark for alternator The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).	2024-05-09 13:58:40 +02:00
Botond Dénes	155332ebf8	Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail. This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop. Closes scylladb/scylladb#18408 * github.com:scylladb/scylladb: Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" view: Abort pending view updates when draining	2024-05-09 08:26:44 +03:00
Kamil Braun	03818c4aa9	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. Closes scylladb/scylladb#18443	2024-05-07 23:40:23 +02:00
Benny Halevy	ebff5f5d70	everywhere: include seastar headers using angle brackets seastar is an external library therefore it should use the system-include syntax. Closes scylladb/scylladb#18513	2024-05-06 10:00:31 +03:00
Pavel Emelyanov	67736b5cd3	Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `9c2a836607`.	2024-05-02 08:16:14 +03:00
Patryk Jędrzejczak	3a34bb18cd	db: config: make consistent-topology-changes unused We make the `consistent-topology-changes` experimental feature unused and assumed to be true in 6.0. We remove code branches that executed if `consistent-topology-changes` was disabled.	2024-04-25 14:33:21 +02:00
Kefu Chai	ad2c26824a	main: do not reference moved variable before this change, we dereference `linfo` after moving it away. and clang-tidy warns us like ``` [19/171] Building CXX object CMakeFiles/scylla.dir/main.cc.o /home/kefu/dev/scylladb/main.cc:559:12: warning: 'linfo' used after it was moved [bugprone-use-after-move] 559 \| return linfo.host_id; \| ^ /home/kefu/dev/scylladb/main.cc:558:36: note: move occurred here 558 \| sys_ks.local().save_local_info(std::move(linfo), snitch.local()->get_location(), broadcast_address, broadcast_rpc_address).get(); \| ^ ``` the default-generated move constructor of `local_info` uses the default-generated move constructor of `locator::host_id`, which in turn use the default-generated move constructor of `utils::tagged_uuid<struct host_id_tag>`, and then `utils::UUID` 's move constructor. since `UUID` does not contain any moveable resources, what it has is but two `int64_t` member variables. so this is a benign issue. but still, it is distracting. in this change, we keep the value of `host_id` locally, and return it instead to silence this warning, and to improve the maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18362	2024-04-23 11:58:58 +03:00
Kefu Chai	ff04375016	main: drop unused namespace alias `fs` namespace alias was introduced in `ff4d8b6e85`, but we don't use it anymore. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18308	2024-04-22 13:50:28 +03:00
Mikołaj Grzebieluch	65cfb9b4e0	storage_service: skip wait_for_gossip_to_settle if topology changes are based on raft Waiting for gossip to settle slows down the bootstrap of the cluster. It is safe to disable it if the topology is based on Raft. Fixes scylladb/scylladb#16055 Closes scylladb/scylladb#17960	2024-04-20 17:56:51 +02:00
Kefu Chai	5ab527e669	main: do not echo parsed options when calling scylla interactively in `2f0f53ac`, we added logging of parsed command line options so that we can see how scylla is launched in case it fails to boot. but when scylla is called interactively in console. this echo is a little bit annoying. see following console session ```console $ scylla --help-loggers Scylla version 5.5.0~dev-0.20240419.3c9651adf297 with build-id 7dd6a110e608535e5c259a03548eda6517ab4bde starting ... command used: "./RelWithDebInfo/scylla --help-loggers" pid: 996503 parsed command line options: [help-loggers] Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ... ``` so in this change, we check if the stdin is associated with a terminal device, if that the case, we don't print the scylla version, parsed command line and pid. and the interactive session looks like: ```console $ scylla --help-loggers Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ``` no more distracting information printed. the original behavior can be tested like: ```console $ : \| ./RelWithDebInfo/scylla --help-loggers ``` assuming scylla is always launched with systemd, which connects stdin to /dev/null. see https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Logging%20and%20Standard%20Input/Output . so this behavior is preserved with this change. Refs #4203 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18309	2024-04-19 15:00:05 +03:00
Kamil Braun	9c2a836607	Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `298a7fcbf2`, reversing changes made to `5cf53e670d`. The change made CI flaky. Fixes: scylladb/scylladb#18278	2024-04-18 11:50:41 +02:00
Pavel Emelyanov	1e0d96cfed	storage_service: Drain view builder on drain too This gets rid of dangling deferred drin on stop and makes nodetool drain more "consistent" by stopping one more unneeded background activity Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:12 +03:00
Pavel Emelyanov	895391fb4b	storage_service: Add view_builder& reference Storage service will need to drain v.b. on its drain. Also on cluster join it marks existing views as built while it's v.b.'s job to do it. Both will be fixed by next patching and this is prerequisite. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:55:07 +03:00
Pavel Emelyanov	f00f1f117b	main,cql_test_env: Move view_builder start up (and make unconditional) Just starting sharded<view_builder> is lightweight, its constructor does nothing but initializes on-board variables. Real work takes off on view_builder::start() which is not moved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:53:33 +03:00
Tomasz Grabiec	1a839bcb36	main: Skip tablet metadata loading in maintenance mode If system.tablets is corrupted, the node would not boot in maintenance mode, which is needed to fix system.tablets. Closes scylladb/scylladb#17990	2024-04-04 09:20:09 +03:00
Piotr Dulikowski	baae811142	Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail. Fixes https://github.com/scylladb/scylladb/issues/17736 Closes scylladb/scylladb#18039 * github.com:scylladb/scylladb: auth: keep auth version in scylla_local auth: coroutinize service::start	2024-04-03 12:25:56 +02:00
Marcin Maliszkiewicz	562caaf6c6	auth: keep auth version in scylla_local Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail.	2024-04-02 19:04:21 +02:00
Piotr Dulikowski	57719ece4f	Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz Setting data accessor implicitly depends on node joining the cluster with raft leader elected as only then service level mutation is put into scylla_local table. Calling it after join_cluster avoids starting new cluster with older version only to immediately migrate it to the latest one in the background. Closes scylladb/scylladb#18040 * github.com:scylladb/scylladb: main: reload service levels data accessor after join_cluster service: qos: create separate function for reloading data accessor	2024-03-29 09:39:11 +01:00
Marcin Maliszkiewicz	e1fea3af6b	main: reload service levels data accessor after join_cluster Setting data accessor implicitly depends on node joining the cluster with raft leader elected as only then service level mutation is put into scylla_local table. Calling it after join_cluster avoids starting new cluster with older version only to immediately migrate it to the latest one in the background.	2024-03-26 17:36:03 +01:00
Marcin Maliszkiewicz	ff17a29b54	service: qos: create separate function for reloading data accessor Scylla's main is already too long, it's better to contain this logic inside qos service.	2024-03-26 17:26:19 +01:00
Pavel Emelyanov	67c2a06493	api: Rename (un)set_server_load_sstable -> (un)set_server_column_family The method sets up column family API, not load-sstables one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18022	2024-03-26 12:16:08 +02:00
Piotr Dulikowski	f23f8f81bf	Merge 'Raft-based service levels' from Michał Jadwiszczak This patch introduces raft-based service levels. The difference to the current method of working is: - service levels are stored in `system.service_levels_v2` - reads are executed with `LOCAL_ONE` - writes are done via raft group0 operation Service levels are migrated to v2 in topology upgrade. After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then) Fixes #17926 Closes scylladb/scylladb#16585 * github.com:scylladb/scylladb: test: test service levels v2 works in recovery mode test: add test for service levels migration test: add test for service levels snapshot test:topology: extract `trigger_snapshot` to utils main: create raft dda if sl data was migrated service:qos: store information about sl data migration service:qos: service levels migration main: assign standard service level DDA before starting group0 service:qos: fix `is_v2()` method service:qos: add a method to upgrade data accessor test: add unit_test_raft_service_levels_accessor service:storage_service: add support for service levels raft snapshot service:qos: add abort_source for group0 operations service:qos: raft service level distributed data accessor service:qos: use group0_guard in data accessor cql3:statements: run service level statements on shard0 with raft guard test: fix overrides in unit_test_service_levels_accessor service:qos: fix indentation service:qos: coroutinize some of the methods db:system_keyspace: add `SERVICE_LEVELS_V2` table service:qos: extract common service levels' table functions	2024-03-22 11:51:53 +01:00
Michał Jadwiszczak	a08918a320	main: create raft dda if sl data was migrated Create `raft_service_levels_distributed_data_accessor` if service levels were migrated to v2 table. This supports raft recovery mode, as service levels will be read from v2 table in the mode.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	2917ec5d51	service:qos: service levels migration Migrate data from `system_distributes.service_levels` to `system.service_levels_v2` during raft topology upgrade. Migration process reads data from old table with CL ALL and inserts the data to the new table via raft.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	36c9afda99	main: assign standard service level DDA before starting group0 `topology_state_load()` is responsible for upgrading service level DDA, so the standard DDA has to be assigned before to be upgraded	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d5fa0747d7	service:qos: add abort_source for group0 operations Add mechanism to abort ongoing group0 operations while draining service_level_controller or leaving the cluster.	2024-03-21 23:14:57 +01:00
Petr Gusev	49a4220fea	error_injection: pass injection parameters at startup Injection parameters can be used in the lambda passed to inject_with_handler method to take some values from the test. However, there was no way to set values to these parameters on node startup, only through the error injection REST api. Therefore, we couldn't rely on this when inject_with_handler is used during node startup, it could trigger before we call the api from the test. In this commit with solve this problem by allowing these parameters to be assigned through scylla.yaml config. The defer.hh header was added to error_injection.hh to fix compilation after adding error_injection.hh to config.hh, defer function is used in error_injection.hh.	2024-03-19 20:17:02 +04:00
Kamil Braun	19b816bb68	Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz This patch series makes all auth writes serialized via raft. Reads stay eventually consistent for performance reasons. To make transition to new code easier data is stored in a newly created keyspace: system_auth_v2. Internally the difference is that instead of executing CQL directly for writes we generate mutations and then announce them via raft group0. Per commit descriptions provide more implementation details. Refs https://github.com/scylladb/scylladb/issues/16970 Fixes https://github.com/scylladb/scylladb/issues/11157 Closes scylladb/scylladb#16578 * github.com:scylladb/scylladb: test: extend auth-v2 migration test to catch stale static test: add auth-v2 migration test test: add auth-v2 snapshot transfer test test: auth: add tests for lost quorum and command splitting test: pylib: disconnect driver before re-connection test: adjust tests for auth-v2 auth: implement auth-v2 migration auth: remove static from queries on auth-v2 path auth: coroutinize functions in password_authenticator auth: coroutinize functions in standard_role_manager auth: coroutinize functions in default_authorizer storage_service: add support for auth-v2 raft snapshots storage_service: extract getting mutations in raft snapshot to a common function auth: service: capture string_view by value alternator: add support for auth-v2 auth: add auth-v2 write paths auth: add raft_group0_client as dependency cql3: auth: add a way to create mutations without executing cql3: run auth DML writes on shard 0 and with raft guard service: don't loose service_level_controller when bouncing client_state auth: put system_auth and users consts in legacy namespace cql3: parametrize keyspace name in auth related statements auth: parametrize keyspace name in roles metadata helpers auth: parametrize keyspace name in password_authenticator auth: parametrize keyspace name in standard_role_manager auth: remove redundant consts auth::meta::*::qualified_name auth: parametrize keyspace name in default_authorizer db: make all system_auth_v2 tables use schema commitlog db: add system_auth_v2 tables db: add system_auth_v2 keyspace	2024-03-06 10:11:33 +01:00
Konstantin Osipov	39d882ddca	main: print pid (process id) at start Print process id to the log at start. It aids debugging/administering the instance if you have multiple instances running on the same machine. Closes scylladb/scylladb#17582	2024-03-06 10:14:22 +02:00
Marcin Maliszkiewicz	7f204a6e80	auth: add raft_group0_client as dependency Most auth classes need this to be able to announce raft commands. Usage added in subsequent commit.	2024-03-01 16:25:14 +01:00
Tomasz Grabiec	ef9e5e64a3	locator: token_metadata: Introduce topology barrier stall detector When topology barrier is blocked for longer than configured threshold (2s), stale versions are marked as stalled and when they get released they report backtrace to the logs. This should help to identify what was holding for token metadata pointer for too long. Example log: token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ... Closes scylladb/scylladb#17427	2024-02-21 15:05:34 +02:00
Petr Gusev	4b33ba2894	raft_address_map: add my ip with the new generation The following scenario is possible: a node A changes its IP from ip1 to ip2 with restart, other nodes are not yet aware of ip2 so they keep gossiping ip1, after restart A receives ip1 in a gossip message and calls handle_major_state_change since it considers it as a new node. Then on_join event is called on the gossiper notification handles, we receive such event in raft_ip_address_updater and reverts the IP of the node A back to ip1. The essence of the problem is that we don't pass the proper generation when we add ip2 as a local IP during initialization when node A restarts, so the zero generation is used in raft_address_map::add_or_update_entry and the gossiper message owerwrites ip2 to ip1. In this commit we fix this problem by passing the new generation. To do that we move the increment_and_get_generation call from join_token_ring to scylla_main, so that we have a new generation value before init_address_map is called. Also we remove the load_initial_raft_address_map function from raft_group0 since it's redundant. The comment above its call site says that it's needed to not miss gossiper updates, but the function storage_service::init_address_map where raft_address_map is now initialized is called before gossiper is started. This function does both - it load the previously persisted host_id<->IP mappings from system.local and subscribes to gossiper notifications, so there is no room for races. Note that this problem reproduces less likely with the 'raft topology: ip change: purge old IP' commit - other nodes remove the old IP before it's send back to the just restarted node. This is also the reason why this problem doesn't occur in gossiper mode. fixes scylladb/scylladb#17199	2024-02-15 13:21:04 +04:00
Pavel Emelyanov	2b1612aa04	main: Stop lifecycle notifier for real It wasn't because of storage service, not the latter is stopped (since `e6b34527c1`), so the former can be stopped to Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17251	2024-02-12 13:59:50 +02:00
Piotr Dulikowski	d04b3338ce	cdc/generation_service: in legacy mode, fall back to raft tables When a node enters recovery after being in raft topology mode, topology operations switch back to legacy mode. We want CDC to keep working when that happens, so we need for the legacy code to be able to access generations created back in raft mode - so that the node can still properly serve writes to CDC log tables. In order to make this possible, modify the legacy logic to also look for a cdc generation in raft tables, if it is not found in legacy tables.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	77a8f5e3d6	cdc/generation_service: turn off gossip notifications in raft topo mode In raft topology mode CDC information is propagated through group 0. Prevent the generation service from reacting to gossiper notifications after we made the switch to raft mode.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	07aba3abc4	group0_state_machine: pull snapshot after raft topology feature enabled Pulling a snapshot of the raft topology is done via new rpc verb (RAFT_PULL_TOPOLOGY_SNAPSHOT). If the recipient runs an older version of scylla and does not understand the verb, sending it will result in an error. We usually use cluster features to avoid such situations, but in the case when a node joins the cluster, it doesn't have access to features yet. Therefore, we need to enable pulling snapshots in two situations: - when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature becomes enabled, - in case when starting group 0 server when joining a cluster that uses raft-based topology.	2024-02-08 19:12:28 +01:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00

1 2 3 4 5 ...

1229 Commits