scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 17:40:34 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	a246b6d3eb	streaming: Pass db::config& to manager constructor The stream_manager will bookkeep the streaming bandwidth option, to subscribe on its changes it needs the config reference. It would be better if it was stream_manager::config, but currently subscription on db::config::<stuff> updates is not very shard-friendly, so we need to carry the config reference itself around. Similar trouble is there for compaction_manager. The option is passed through its own config, but the config is created on each shard by database code. Stream manager config would be created once by main code on shard 0. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-19 12:18:08 +03:00
Nadav Har'El	cc69177dcc	config: fix printing of experimental feature list Recently we noticed a regression where with certain versions of the fmt library, SELECT value FROM system.config WHERE name = 'experimental_features' returns string numbers, like "5", instead of feature names like "raft". It turns out that the fmt library keep changing their overload resolution order when there are several ways to print something. For enum_option<T> we happen to have to conflicting ways to print it: 1. We have an explicit operator<<. 2. We have an implicit convertor to the type held by T. We were hoping that the operator<< always wins. But in fmt 8.1, there is special logic that if the type is convertable to an int, this is used before operator<<()! For experimental_features_t, the type held in it was an old-style enum, so it is indeed convertible to int. The solution I used in this patch is to replace the old-style enum in experimental_features_t by the newer and more recommended "enum class", which does not have an implicit conversion to int. I could have fixed it in other ways, but it wouldn't have been much prettier. For example, dropping the implicit convertor would require us to change a bunch of switch() statements over enum_option (and not just experimental_features_t, but other types of enum_option). Going forward, all uses of enum_option should use "enum class", not "enum". tri_mode_restriction_t was already using an enum class, and now so does experimental_features_t. I changed the examples in the comments to also use "enum class" instead of enum. This patch also adds to the existing experimental_features test a check that the feature names are words that are not numbers. Fixes #11003. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11004	2022-07-11 09:17:30 +02:00
Tomasz Grabiec	6b316f267f	db: Avoid memtable flush latency on schema merge Currently, applying schema mutations involves flushing all schema tables so that on restart commit log replay is performed on top of latest schema (for correctness). The downside is that schema merge is very sensitive to fdatasync latency. Flushing a single memtable involves many syncs, and we flush several of them. It was observed to take as long as 30 seconds on GCE disks under some conditions. This patch changes the schema merge to rely on a separate commit log to replay the mutations on restart. This way it doesn't have to wait for memtables to be flushed. It has to wait for the commitlog to be synced, but this cost is well amortized. We put the mutations into a separate commit log so that schema can be recovered before replaying user mutations. This is necessary because regular writes have a dependency on schema version, and replaying on top of latest schema satisfies all dependencies. Without this, we could get loss of writes if we replay a write which depends on the latest schema on top of old schema. Also, if we have a separate commit log for schema we can delay schema parsing for after the replay and avoid complexity of recognizing schema transactions in the log and invoking the schema merge logic. One complication with this change is that replay_position markers are commitlog-domain specific and cannot cross domains. They are recorded in various places which survive node restart: sstables are annotated with the maximum replay position, and they are present inside truncation records. The former annotation is used by "truncate" operation to drop sstables. To prevent old replay positions from being interpreted in the context in the new schema commitlog domain, the change refuses to boot if there are truncation records, and also prohibits truncation of schema tables. The boot sequence needs to know whether the cluster feature associated with this change was enabled on all nodes. Fetaures are stored in system.scylla_local. Because we need to read it before initializing schema tables, the initialization of tables now has to be split into two phases. The first phase initializes all system tables except schema tables, and later we initialize schema tables, after reading stored cluster features. The commitlog domain is switched only when all nodes are upgraded, and only after new node is restarted. This is so that we don't have to add risky code to deal with hot-switching of the commitlog domain. Cold switching is safer. This means that after upgrade there is a need for yet another rolling restart round. Fixes #8272 Fixes #8309 Fixes #1459	2022-07-06 22:08:56 +02:00
Tomasz Grabiec	c5ad05c819	db: Allow splitting initiatlization of system tables We will need some system tables to be initialized earlier in the boot so that system.scylla_local can be read before schema tables are initialized.	2022-07-06 22:08:56 +02:00
Pavel Emelyanov	85033ea6ae	Merge 'A bunch of refactors related to Raft group 0' from Kamil Braun The commits here were extracted from PR https://github.com/scylladb/scylla/pull/10835 which implements upgrade procedure for Raft group 0. They are mostly refactors which don't affect the behavior of the system, except one: the commit `4d439a16b3` causes all schema changes to be bounced to shard 0. Previously, they would only be bounced when the local Raft feature was enabled. I do that because: 1. eventually, we want this to be the default behavior 2. in the upgrade PR I remove the `is_raft_enabled()` function - the function was basically created with the mindset "Raft is either enabled or not" - which was right when we didn't support upgrade, but will be incorrect when we introduce intermediate states (when we upgrade from non-raft-based to raft-based operations); the upgrade PR introduces another mechanism to dispatch based on the upgrade state, but for the case of bouncing to shard 0, dispatching is simply not necessary. Closes #10864 * github.com:scylladb/scylla: service/raft: raft_group_registry: add assertions when fetching servers for groups service/raft: raft_group_registry: remove `_raft_support_listener` service/raft: raft_group0: log adding/removing servers to/from group 0 RPC map service/raft: raft_group0: move group 0 RPC handlers from `storage_service` service/raft: messaging: extract raft_addr/inet_addr conversion functions service: storage_service: initialize `raft_group0` in `main` and pass a reference to `join_cluster` treewide: remove unnecessary `migration_manager::is_raft_enabled()` calls test/boost: memtable_test: perform schema operations on shard 0 test/boost: cdc_test: remove test_cdc_across_shards message: rename `send_message_abortable` to `send_message_cancellable` message: change parameter order in `send_message_oneway_timeout`	2022-06-29 16:51:54 +03:00
Pavel Emelyanov	3a753068be	Merge "Make permissions cache live updateable and add an API for resetting authorization cache" from Igor Ribeiro Barbosa Duarte Currently, for users who have permissions_cache configs set to very high values (and thus can't wait for the configured times to pass) having to restart the service every time they make a change related to permissions or prepared_statements cache (e.g. Adding a user and changing their permissions) can become pretty annoying. This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms and permissions_cache_max_entries live updateable so that restarting the service is not necessary anymore for these cases. It also adds an API for flushing the cache to make it easier for users who don't want to modify their permissions_cache config. branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/ dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache * https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable: loading_cache_test: Test loading_cache::reset and loading_cache::update_config api: Add API for resetting authorization cache authorization_cache: Make permissions cache and authorized prepared statements cache live updateable auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct utils/loading_cache.hh: Add update_config method utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh utils/loading_cache.hh: Add reset method	2022-06-29 11:14:13 +03:00
Igor Ribeiro Barbosa Duarte	a23c3d6338	api: Add API for resetting authorization cache For cases where we have very high values set to permissions_cache validity and update interval (E.g.: 1 day), whenever a change to permissions is made it's necessary to update scylla config and decrease these values, since waiting for all this time to pass wouldn't be viable. This patch adds an API for resetting the authorization cache so that changing the config won't be mandatory for these cases. Usage: $ curl -X POST http://localhost:10000/authorization_cache/reset Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-06-28 19:58:06 -03:00
Igor Ribeiro Barbosa Duarte	b9051c79bc	authorization_cache: Make permissions cache and authorized prepared statements cache live updateable Currently, for users who have permissions_cache configs set to very high values (and thus can't wait for the configured times to pass) having to restart the service every time they make a change related to permissions or prepared_statements cache(e.g.: Adding a user) can become pretty annoying. This patch make permissions_validity_in_ms, permissions_update_interval_in_ms and permissions_cache_max_entries live updateable so that restarting the service is not necessary anymore for these cases. Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-06-28 19:58:06 -03:00
Igor Ribeiro Barbosa Duarte	c8c48a98fa	auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct This patch makes authorized_prepared_statements_cache acccept a config struct, similarly to permissions_cache. This will make it easier to make this cache live updateable on the next patch. Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-06-28 19:57:52 -03:00
Igor Ribeiro Barbosa Duarte	667840a7eb	utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh This patch renames the permissions_cache_config struct to loading_cache_config and moves it to utils/loading_cache.hh. This will make it easier to handle config updates to the authorization caches on the next patches Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-06-28 19:46:22 -03:00
Amnon Heiman	6b9b76c919	main.cc: add trailing backslash to the API directories The API uses the http server to serve two directories: the api_ui_dir where the swagger-ui directory is found and the api_doc_dir where the swagger definition files are found. Internally, the API uses the httpd::directory_handler that append the files it gets from the path to the base directory name. A user can override the default configuration and set a directory name that will not end with a backslash. This will result with files not found. This patch check if that backslash is missing, and if it is, adds it to the API configuration. Fixes #10700 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes #10877	2022-06-26 20:05:37 +03:00
Avi Kivity	dab56b82fa	Merge 'Per-partition rate limiting' from Piotr Dulikowski Due to its sharded and token-based architecture, Scylla works best when the user workload is more or less uniformly balanced across all nodes and shards. However, a common case when this assumption is broken is the "hot partition" - suddenly, a single partition starts getting a lot more reads and writes in comparison to other partitions. Because the shards owning the partition have only a fraction of the total cluster capacity, this quickly causes latency problems for other partitions within the same shard and vnode. This PR introduces per-partition rate limiting feature. Now, users can choose to apply per-partition limits to their tables of choice using a schema extension: ``` ALTER TABLE ks.tbl WITH per_partition_rate_limit = { 'max_writes_per_second': 100, 'max_reads_per_second': 200 }; ``` Reads and writes which are detected to go over that quota are rejected to the client using a new RATE_LIMIT_ERROR CQL error code - existing error codes didn't really fit well with the rate limit error, so a new error code is added. This code is implemented as a part of a CQL protocol extension and returned to clients only if they requested the extension - if not, the existing CONFIG_ERROR will be used instead. Limits are tracked and enforced on the replica side. If a write fails with some replicas reporting rate limit being reached, the rate limit error is propagated to the client. Additionally, the following optimization is implemented: if the coordinator shard/node is also a replica, we account the operation into the rate limit early and return an error in case of exceeding the rate limit before sending any messages to other replicas at all. The PR covers regular, non-batch writes and single-partition reads. LWT and counters are not covered here. Results of `perf_simple_query --smp=1 --operations-per-shard=1000000`: - Write mode: ``` `8f690fdd47` (PR base): 129644.11 tps ( 56.2 allocs/op, 13.2 tasks/op, 49785 insns/op) This PR: 125564.01 tps ( 56.2 allocs/op, 13.2 tasks/op, 49825 insns/op) ``` - Read mode: ``` `8f690fdd47` (PR base): 150026.63 tps ( 63.1 allocs/op, 12.1 tasks/op, 42806 insns/op) This PR: 151043.00 tps ( 63.1 allocs/op, 12.1 tasks/op, 43075 insns/op) ``` Manual upgrade test: - Start 3 nodes, 4 shards each, Scylla version `8f690fdd47` - Create a keyspace with scylla-bench, RF=3 - Start reading and writing with scylla-bench with CL=QUORUM - Manually upgrade nodes one by one to the version from this PR - Upgrade succeeded, apart from a small number of operations which failed when each node was being put down all reads/writes succeeded - Successfully altered the scylla-bench table to have a read and write limit and those limits were enforced as expected Fixes: #4703 Closes #9810 * github.com:scylladb/scylla: storage_proxy: metrics for per-partition rate limiting of reads storage_proxy: metrics for per-partition rate limiting of writes database: add stats for per partition rate limiting tests: add per_partition_rate_limit_test config: add add_per_partition_rate_limit_extension function for testing cf_prop_defs: guard per-partition rate limit with a feature query-request: add allow_limit flag storage_proxy: add allow rate limit flag to get_read_executor storage_proxy: resultize return type of get_read_executor storage_proxy: add per partition rate limit info to read RPC storage_proxy: add per partition rate limit info to query_result_local(_digest) storage_proxy: add allow rate limit flag to mutate/mutate_result storage_proxy: add allow rate limit flag to mutate_internal storage_proxy: add allow rate limit flag to mutate_begin storage_proxy: choose the right per partition rate limit info in write handler storage_proxy: resultize return types of write handler creation path storage_proxy: add per partition rate limit to mutation_holders storage_proxy: add per partition rate limit info to write RPC storage_proxy: add per partition rate limit info to mutate_locally database: apply per-partition rate limiting for reads/writes database: move and rename: classify_query -> classify_request schema: add per_partition_rate_limit schema extension db: add rate_limiter storage_proxy: propagate rate_limit_exception through read RPC gms: add TYPED_ERRORS_IN_READ_RPC cluster feature storage_proxy: pass rate_limit_exception through write RPC replica: add rate_limit_exception and a simple serialization framework docs: design doc for per-partition rate limiting transport: add rate_limit_error	2022-06-24 01:32:13 +03:00
Kamil Braun	bb58ee0b2e	service/raft: raft_group_registry: remove `_raft_support_listener` It did nothing. It will be readded in `raft_group0` and it will do something, stay tuned. With this we can remove the `feature_service` reference from `raft_group_registry`.	2022-06-23 16:14:41 +02:00
Kamil Braun	8e907cbf57	service/raft: raft_group0: move group 0 RPC handlers from `storage_service` And generate the boilerplate from IDL declarations. Simplifies the code, and the code now resides where it belongs.	2022-06-23 16:14:41 +02:00
Kamil Braun	5da163e0b8	service: storage_service: initialize `raft_group0` in `main` and pass a reference to `join_cluster` `raft_group0` was constructed at the beginning of `join_cluster`, which required passing references to 3 additional services to `join_cluster` used only for that purpose (group 0 client, raft group registry, and query processor). Now we initialize `raft_group0` in main - like all other services - and pass a reference to `join_cluster` so `storage_service` can store a pointer to group 0. We initialize `raft_group0` before we start listening for RPCs in `messaging_service`. In a later commit we'll move the initialization of group 0 related verbs to the constructor of `raft_group0` from `storage_service`, so they will be initialized before we start listening for RPCs.	2022-06-23 16:14:41 +02:00
Piotr Dulikowski	dccb8a5729	schema: add per_partition_rate_limit schema extension Adds the new `per_partition_rate_limit` schema extension. It has two parameters: `max_writes_per_second` and `max_reads_per_second`. In the future commits they will control how many operations of given type are allowed for each partition in the given table.	2022-06-22 20:16:48 +02:00
Pavel Emelyanov' via ScyllaDB development	b0b29edcd7	distributed-loader: Remove ensure_system_table_directories It looks like the exactly same code is called few steps above via distributed_loader::init_system_keyspace `- distributed_loader::populate_keyspace While at it -- move the supervisor::notify("loading system sstables") handing around in the more suitable location. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/981/ Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220621165313.31284-1-xemul@scylladb.com>	2022-06-22 13:59:00 +03:00
Gleb Natapov	6e100d1ea3	main: stop raft before the migration manager Since the group0 uses migration manager to apply commands we need to stop raft before we stopping migration manager.	2022-06-09 09:40:55 +03:00
Gleb Natapov	70b7b2b4d6	storage_service: do not pass the raft group manager to storage_service constructor Reduce the storage_service's dependency on the raft group manager. The group manager is needed only during bootstrap and in an rpc handler, so pass it to those functions directly.	2022-06-09 09:40:55 +03:00
Gleb Natapov	89fe305888	main: destroy the group0_client after stopping the group0 The group0_client uses the group0 internally and cannot be destroyed until the group0 is stopped to guaranty no ongoing calls into it by the group0_client.	2022-06-09 09:23:53 +03:00
Benny Halevy	1daa7820c9	main: shutdown: do not abort on storage_io_error Do not abort in defer_verbose_shutdown if the callback throws storage_io_error, similar and in addition to the system errors handling that was added in `132c9d5933` As seen in https://github.com/scylladb/scylla/issues/9573#issuecomment-1148238291 Fixes #9573 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10740	2022-06-07 16:55:08 +03:00
Pavel Emelyanov	d755fdc1f4	storage_service: Remove global proxy call Storage service needs it to calculate schema version on join. The proxy at this point can be passed as an argument to the joining helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Pavel Emelyanov	bc051387c5	storage_service: Remove sys_dist_ks from storage_service dependencies The service in question is only needed join_cluster-time, no need to keep it in the dependencies list. This also solves the dependency trouble -- the distributed keyspace is sharded::start-ed after it's passed to storage_service initialization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Pavel Emelyanov	5a97ba7121	storage_service: Remove cdc_gen_service from storage_service dependencies This service is only needed join-time, it's better to pass it as argument to join_cluster(). This solves current reversed dependency issuse -- the cdc_gen_svc is now started after it's passed to storage service initialization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Pavel Emelyanov	282cc070bc	storage_service: Merge init-server and join-cluster Now they always follow one another both in main and cql-test-env. Also, despite the name, init-server does joins the cluster when it's just a normal node restarting, so join-cluster is called when the cluster is already joind. This merge make the function be named as what it really does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Pavel Emelyanov	b2b86b0c83	main, storage_service: Move wait for gossip to settle And make cql-test-env configure to skip it not to slow down tests in vain. Another side effect is that cql-test-env would trigger features enabling at this point, but that's OK, they are enabled anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Pavel Emelyanov	b842167bcc	main, storage_service: Move passive announce subscription Storage service already has a vector of random subscription scope holders, this becomes yet another one. This partially reverts `e4f35e2139`, which's half-step backwards, but so far I've no better ideas where to track that scope guard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Pavel Emelyanov	89163a3be4	main, storage_service: Move early group0 join call It happens right after the prepare to join, moving it at the end of the latter call doesn't change the code logic. A side effect -- this removes a silly join_group0() one-line helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-23 12:55:30 +03:00
Gleb Natapov	c2ef390a52	service: raft: move group0 write path into a separate file Writing into the group0 raft group on a client side involves locking the state machine, choosing a state id and checking for its presence after operation completes. The code that does it resides now in the migration manager since the currently it is the only user of group0. In the near future we will have more client for group0 and they all will have to have the same logic, so the patch moves it to a separate class raft_group0_client that any future user of group0 can use to write into it. Message-Id: <YoYAJwdTdbX+iCUn@scylladb.com>	2022-05-19 17:21:35 +03:00
Pavel Emelyanov	f81f1c7ef7	format-selector: Remove .sync() point The feature listener callbacks are waited upon to finish in the middle of the cluster joining process. I particular -- before actually joining the cluster the format should have being selected. For that there's a .sync() method that locks the semaphore thus making sure that any update is finished and it's called right after the wait_for_gossip_to_settle() finishes. However, features are enabled inside the wait_for_gossip_to_settle() in a seastar::async() context that's also waited upon to finish. This waiting makes it possible for any feature listener to .get() any of its futures that should be resolved until gossip is settled. Said that, the format selection barrier can be moved -- instead of waiting on the semaphore, the respective part of the selection code can be .get()-ed (it all runs in async context). One thing to care about -- the remainder should continue running with the gate held. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-16 14:14:14 +03:00
Asias He	77b1db475c	locator: Do not enforce public ip address for broadcast_rpc_address Reported by Felipe Cardeneti: - Create a 2-node Scylla cluster w/ Ec2MultiRegionSnitch - Check system.peers table Scylla (uses public address) ``` cqlsh> select peer,data_center,host_id,preferred_ip,rack,rpc_address,schema_version from system.peers; peer \| data_center \| host_id \| preferred_ip \| rack \| rpc_address \| schema_version ---------------+-------------+--------------------------------------+---------------+------+---------------+-------------------------------------- 18.216.98.219 \| us-east-2 \| d9443741-a12e-4bbb-91ce-9931cece589c \| 172.31.43.122 \| 2c \| 18.216.98.219 \| 95c3fca5-c463-3aba-98c6-1c0b3fac5b58 (1 rows) ``` Cassandra (uses local address): ``` cqlsh> SELECT peer,data_center,host_id,preferred_ip,rack,rpc_address,schema_version from system.peers; peer \| data_center \| host_id \| preferred_ip \| rack \| rpc_address \| schema_version ---------------+-------------+--------------------------------------+---------------+------------+---------------+-------------------------------------- 52.15.104.255 \| us-east-2 \| 42c0b717-775f-4998-a420-0388fe8b4e70 \| 172.31.42.126 \| us-east-2c \| 172.31.42.126 \| 2207c2a9-f598-3971-986b-2926e09e239d (1 rows) ``` Config diff: ``` cassandra.yaml:rpc_address: 0.0.0.0 cassandra.yaml:broadcast_rpc_address: 172.31.42.126 /etc/scylla/scylla.yaml:broadcast_rpc_address: 172.31.42.126 /etc/scylla/scylla.yaml:rpc_address: 0.0.0.0 ``` After this patch, if broadcast_rpc_address is unset, Ec2MultiRegionSnitch will use the public ip address to set broadcast_rpc_address. If broadcast_rpc_address is set, Ec2MultiRegionSnitch will not modify it. Fixes #10236 Closes #10519	2022-05-11 14:46:30 +02:00
Kamil Braun	7e4bb68061	service: raft: add/remove direct failure detector endpoints on group 0 configuration changes We connect the group 0 raft server rpc implementation to the new direct failure detector service, so that when servers are added or removed from the the group 0 configuration, corresponding endpoints are added to the direct failure detector service. Thus the set of detected endpoints will be equal to the group 0 configuration. This causes the failure detector service to start pinging endpoints, but no listeners are registered yet. The following commit changes that.	2022-05-09 15:31:19 +02:00
Kamil Braun	38f65e5a2e	main: start direct failure detector service We add the new direct failure detector to the list of services started in the Scylla process. To start the service, we need an implementation of `pinger` and `clock`. `pinger` is implemented using existing GOSSIP_ECHO verb. The gossip echo message requires the node's gossip generation number. We handle this by embedding the pinger implementation inside `gossiper`, and making `gossiper` update the generation number (cached inside the pinger class) periodically. `clock` is a simple implementation which uses `std::chrono::steady_clock` and `seastar::sleep_until` underneath. Translating `steady_clock` durations to `direct_failure_detector::clock` durations happens by taking the number of ticks. The service is currently not used, just initialized; no endpoints are added and no listeners are registered yet, but the following commits change that.	2022-05-09 13:14:42 +02:00
Pavel Emelyanov	e80adbade3	code: De-globalize gossiper No code uses global gossiper instance, it can be removed. The main and cql-test-env code now have their own real local instances. This change also requires adding the debug:: pointer and fixing the scylle-gdb.py to find the correct global location. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	89ee15b05b	scylla-gdb, main: Get feature service without gossiper help This is needed not to mess with removed global gossiper in the next patch. Other than this, it's better to access services by their own debug:: pointers, not via under-the-good dependencies chains. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	4bea0b7491	code: Use gossiper reference where possible Some places in the code has function-local gossiper reference but continue to use global instance. Re-use the local reference (it's going to become sharded<> instance soon). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	38c77d0d85	snitch: Keep gossiper reference The reference is put on the snitch_ptr because this is the sharded<> thing and because gossiper reference is the same for different snitch drivers. Also, getting gossiper from snitch_ptr by driver will look simpler than getting it from any base class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	2d32c47d0d	main, cql_test_env: Start snitch later Snitch depends on gossiper and system keyspace, so it needs to be started after those two do. fixes #10402 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:32 +03:00
Avi Kivity	4a5082bfc8	main: fix discarded future during prometheus start sequence Probably not triggerable since it will be a while before we recognize a signal to exit. But a FIXME is a FIXME. Closes #10374	2022-04-15 16:40:31 +03:00
Avi Kivity	d90415434e	main: wait for memory_threshold_guard start We start the memory threshold guard (that enables large memory allocation warnings post-boot) but don't wait for it. I can't imagine it can hurt, but it does carry a FIXME label. Closes #10375	2022-04-15 16:37:47 +03:00
Kamil Braun	41f5b7e69e	Merge branch 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla into next * 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla: main: allow joining raft group0 before waiting for gossiper to settle service: raft_group0: make `join_group0` re-entrant service: storage_service: add `join_group0` method raft_group_registry: update gossiper state only on shard 0 raft: don't update gossiper state if raft is enabled early or not enabled at all gms: feature_service: add `cluster_uses_raft_mgmt` accessor method db: system_keyspace: add `bootstrap_needed()` method db: system_keyspace: mark getter methods for bootstrap state as "const"	2022-04-14 16:42:20 +02:00
Botond Dénes	fa75d58cf0	Merge "Make snitch start/stop code look classical" from Pavel Emelyanov " There's a generic way to start-stop services in scylla, that includes 5 "actions" (some are optional and/or implicit though) service_config cfg = ... sharded<service>.start(cfg) service.invoke_on_all(&service::start) service.invoke_on_all(&service::shutdown) service.invoke_on_all(&servuce::stop) sharded<service>.stop() and most of the service out there conforms to that scheme. Not snitch (spoiler: and not tracing), for which there's a couple of helpers that do all that magic behind the scenes, "configuring" snitch is done with the help of overloaded constructors. The latter is extra complicated with the need to register snitch drivers in class-registry for each constructor overload. Also there's an external shards synchronization on stop. This set brings snitch start/stop code to the described standard: the create/stop helpers are removed, creation acceps the config structure, per-shard start/stop (snitch has no drain for now) happens in the simple invoke-on-all manner. The intended side effect of this change is the ability to add explicit dependencies to snitch (in the future, not in this set). tests: unit(dev) " * 'br-snitch-config' of https://github.com/xemul/scylla: snitch: Remove create_snitch/stop_snitch snitch: Simplify stop (and pause_io) snitch: Move io_is_stopped to property-file driver snitch: Remove init_snitch_obj() snitch: Move instance creation into snitch_ptr constructor snitch: Make config-based construction of all drivers snitch: Declare snitch_ptr peering and rework container() method snitch: Introduce container() method	2022-04-14 16:56:32 +03:00
Pavel Solodovnikov	d4b717afa7	main: allow joining raft group0 before waiting for gossiper to settle A node can join group0 without waiting for gossiper if it is either a fresh node, or it's an existing node, which is already part of some group0 (i.e. have `group0_id` persisted in system tables). In that case the second `join_group0()` call inside the `storage_service::join_token_ring` will be a no-op. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-14 12:20:50 +03:00
Pavel Emelyanov	e501ebd6c2	repair: Keep system keyspace reference Repair updates (and queries on start) the system.repair_history table and thus depends on the system_keyspace object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-12 13:57:08 +03:00
Pavel Emelyanov	828a951886	snitch: Remove create_snitch/stop_snitch After previous patches both, create_snitch() and stop_snitch() no look like the classica sharded service start/stop sequence. Finally both helpers can be removed and the rest of the user can just call start/stop on locally obtained sharded references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-11 14:43:25 +03:00
Pavel Emelyanov	633746b87d	snitch: Make config-based construction of all drivers Currently snitch drivers register themselves in class-registry with all sorts of construction options possible. All those different constuctors are in fact "config options". When later snitch will declare its dependencies (gossiper and system keyspace), it will require patching all this registrations, which's very inconvenient. This patch introduces the snitch_config struct and replaces all the snitch constructors with the snitch_driver(snitch_config cfg) one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-11 14:38:34 +03:00
Pavel Emelyanov	26a12ac056	system_keyspace: Make load_local_host_id non-static The only caller is main(), the sharded<> sys_ks is started at this point. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 15:08:13 +03:00
Pavel Emelyanov	3da5f6ac30	gossiper: Add system keyspace dependency The gossiper reads peer features from system keyspace. Also the snitch code needs system keyspace, and since now it gets all its dependencies from gossiper (will be fixed some day, but not now), it will do the same for sys.ks.. Thus it's worth having gossiper->system_keyspace explicit dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 15:08:13 +03:00
Pavel Emelyanov	62417577ab	cdc_generation_service: Add system keyspace dependency The service uses system keyspace to, e.g., manage the generation id, thus it depends on the system_keyspace instance and deserves the explicit reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 13:39:32 +03:00
Tomasz Grabiec	cd5fec8a23	Merge "raft: re-advertise gossiper features when raft feature support changes" from Pavel Prior to the change, `USES_RAFT_CLUSTER_MANAGEMENT` feature wasn't properly advertised upon enabling `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` raft feature. This small series consists of 3 parts to fix the handling of supported features for raft: 1. Move subscription for `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` to the `raft_group_registry`. 2. Update `system.local#supported_features` directly in the `feature_service::support()` method. 3. Re-advertise gossiper state for `SUPPORTED_FEATURES` gossiper value in the support callback within `raft_group_registry`. * manmanson/track_supported_set_recalculation_v7: raft: re-advertise gossiper features when raft feature support changes raft: move tracking `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` feature to raft gms: feature_service: update `system.local#supported_features` when feature support changes test: cql_test_env: enable features in a `seastar::thread`	2022-03-18 12:34:17 +01:00

1 2 3 4 5 ...

809 Commits