scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 16:22:15 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	282a1880a5	forward service: Re-use proxy's helper with duplicated code The get_live_endpoints matches the same method on the proxy side. Since the forward service carries proxy reference, it can use its method (which needs to be made public for that sake). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:34:51 +03:00
Pavel Emelyanov	11c99fc41b	table: Don't use global gossiper The table::get_hit_rate needs gossiper to get hitrates state from. There's no way to carry gossiper reference on the table itself, so it's up to the callers of that method to provide it. Fortunately, there's only one caller -- the proxy -- but the call chain to carry the reference it not very short ... oh, well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:33:08 +03:00
Avi Kivity	a2901a376d	Merge 'Coroutinize some `storage_service` member functions' from Pavel Solodovnikov These trivial changes are mostly intended to reduce the use of `seastar::async`. Closes #10416 * github.com:scylladb/scylla: service: storage_service: coroutinize `start_gossiping()` service: storage_service: coroutinize `node_ops_cmd_heartbeat_updater()` service: storage_service: coroutinize `node_ops_abort_thread()` service: storage_service: coroutinize `node_ops_abort()` service: storage_service: coroutinize `node_ops_done()` service: storage_service: coroutinize `node_ops_update_heartbeat()` service: storage_service: coroutinize `force_remove_completion()` service: storage_service: coroutinize `start_leaving()` service: storage_service: coroutinize `start_sys_dist_ks()` service: storage_service: coroutinize `prepare_to_join()` service: storage_service: coroutinize `removenode_add_ranges()` service: storage_service: coroutinize `unbootstrap()` service: storage_service: coroutinize `get_changed_ranges_for_leaving()`	2022-05-02 12:59:36 +03:00
Pavel Solodovnikov	1031a9fa09	service: storage_service: coroutinize `start_gossiping()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-05-01 12:12:30 +03:00
Pavel Solodovnikov	4af27ca653	service: storage_service: coroutinize `node_ops_cmd_heartbeat_updater()` Also, pass `node_ops_cmd` by value to get rid of lifetime issues when converting to coroutine. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-05-01 12:07:36 +03:00
Eliran Sinvani	e0c7178e75	query_processor: remove default internal query caching behavior When executing internal queries, it is important that the developer will decide if to cache the query internally or not since internal queries are cached indefinitely. Also important is that the programmer will be aware if caching is going to happen or not. The code contained two "groups" of `query_processor::execute_internal`, one group has caching by default and the other doesn't. Here we add overloads to eliminate default values for caching behaviour, forcing an explicit parameter for the caching values. All the call sites were changed to reflect the original caching default that was there. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2022-05-01 08:33:55 +03:00
Tomasz Grabiec	dbef83af71	Merge 'raft: fix startup hangs' from Kamil Braun Fix hangs on Scylla node startup with Raft enabled that were caused by: - a deadlock when enabling the USES_RAFT feature, - a non-voter server forgetting who the leader is and not being able to forward a `modify_config` entry to become a voter. Read the commit messages for details. Fixes: #10379 Refs: #10355 Closes #10380 * github.com:scylladb/scylla: raft: actively search for a leader if it is not known for a tick duration raft: server: return immediately from `wait_for_leader` if leader is known service: raft: don't support/advertise USES_RAFT feature	2022-04-29 19:47:10 +02:00
Avi Kivity	de0ee13f45	schema_tables: forward-declare user_function and user_aggerates These bring in wasm.hh (though they really shouldn't) and make everyone suffer. Forward declare instead and add missing includes where needed. Closes #10444	2022-04-28 07:22:02 +03:00
Benny Halevy	e88871f4ec	replica: database: move shard_of implementation to mutation layer We don't need the database to determine the shard of the mutation, only its schema. So move the implementation to the respecive definitions of mutation and frozen_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10430	2022-04-27 14:40:24 +03:00
Avi Kivity	582802825a	treewide: use system-#include (angle brackets) for seastar Seastar is an external library from Scylla's point of view so we should use the angle bracket #include style. Most of the source follows this, this patch fixes a few stragglers. Also fix cases of #include which reached out to seastar's directory tree directly, via #include "seastar/include/sesatar/..." to just refer to <seastar/...>. Closes #10433	2022-04-26 14:46:42 +03:00
Botond Dénes	bf1b6ced3c	Merge "Make storage_service::bootstrap less if-y" from Pavel Emelyanov " The method in question performs node bootstrap in several different modes (regular, replacing, rnbo) and several subsequent if-else branches just duplicate each-other. This set merges them making the code easier to read. " * 'br-less-branchy-bootstrap' of https://github.com/xemul/scylla: storage_service: Remove pointless check in replace-bootstrap storage_service: Generalize wait for range setup storage_service: Merge common if-else branches in bootstrap storage_service: Move tables bootstrap-ON upwards	2022-04-26 10:58:30 +03:00
Pavel Solodovnikov	654e6726d1	service: storage_service: coroutinize `node_ops_abort_thread()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:11:20 +03:00
Pavel Solodovnikov	b27c989e62	service: storage_service: coroutinize `node_ops_abort()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:11:14 +03:00
Pavel Solodovnikov	f7e84c6138	service: storage_service: coroutinize `node_ops_done()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:11:08 +03:00
Pavel Solodovnikov	6936dbea49	service: storage_service: coroutinize `node_ops_update_heartbeat()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:11:04 +03:00
Pavel Solodovnikov	1c03d01927	service: storage_service: coroutinize `force_remove_completion()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:10:58 +03:00
Pavel Solodovnikov	fc1dfb0ae1	service: storage_service: coroutinize `start_leaving()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:10:54 +03:00
Pavel Solodovnikov	0a3a7534d6	service: storage_service: coroutinize `start_sys_dist_ks()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:10:49 +03:00
Pavel Solodovnikov	15ea74e41f	service: storage_service: coroutinize `prepare_to_join()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:10:43 +03:00
Pavel Solodovnikov	c739fad5d6	service: storage_service: coroutinize `removenode_add_ranges()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:10:05 +03:00
Pavel Solodovnikov	e392fdda96	service: storage_service: coroutinize `unbootstrap()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:09:56 +03:00
Pavel Solodovnikov	8fa7f47a74	service: storage_service: coroutinize `get_changed_ranges_for_leaving()` Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-25 09:09:04 +03:00
Pavel Emelyanov	41392a59bb	storage_service: Remove pointless check in replace-bootstrap The method in question is called in the branch where the replace address is checked to be present, no need in extra explicit check. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-19 13:27:52 +03:00
Pavel Emelyanov	49481b1a21	storage_service: Generalize wait for range setup Both the if is_replacing()/else branches call gossiper wating method as their first steps. Can be done once. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-19 13:27:52 +03:00
Pavel Emelyanov	d213e6ffd1	storage_service: Merge common if-else branches in bootstrap There are three modes in there -- bootstrap, b.s. with RBNO and b.s. for replacing. All three are checked two times in a row, but can be done once. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-19 13:27:52 +03:00
Pavel Emelyanov	b0df3a32b4	storage_service: Move tables bootstrap-ON upwards This call just places a boolean flag on all. It won't hurt if it lasts while the node is performing pre-bootstrap checks, but it allows making the whole method less branchy. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-19 13:27:52 +03:00
Avi Kivity	469bca5369	storage_proxy: coroutinize mutate_locally (vector overload) The do_with() means we have an unconditional allocation, so we can justify the coroutine's allocation (replacing it). Meanwhile, coroutine::parallel_for_each() reduces an allocation if mutate_locally() blocks. Closes #10387	2022-04-19 10:59:16 +03:00
Botond Dénes	3051fc3cbc	Merge 'Fix some errors and issues found by gcc 12' from Avi Kivity gcc 12 checks some things that clang doesn't, resulting in compile errors. This series fixes some of theses issues, but still builds (and tests) with clang. Unfortunately, we still don't have a clean gcc build due to an outstanding bug [1]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056 Closes #10386 * github.com:scylladb/scylla: build: disable warnings that cause false-positive errors with gcc 12 utils: result_loop: remove invalid and incorrect constraint service: forward_service: avoid using deprecated std::bind1st and std::not1 repair: explicityl ignore tombstone gc update response treewide: abort() after switch in formatters db: view: explicitly ignore unused result compaction: leveled_compaction_strategy: avoid compares between signed and unsigned compaction_manager: compaction_reenabler: disambiguate compaction_state api: avoid function specialization in req_param alternator: ttl: avoid specializing class templates in non-namespace scope alternator: executor: fix signed/unsigned comparison in is_big()	2022-04-19 10:25:38 +03:00
Avi Kivity	e55f5fab53	service: forward_service: avoid using deprecated std::bind1st and std::not1 Switch to newer alterantives std::bind_front, std::not_fn.	2022-04-18 12:27:18 +03:00
Avi Kivity	36aee57978	storage_proxy: convert rpc handlers from lambdas to member functions Currently, rpc handlers are all lambdas inside storage_proxy::init_messaging_service(). This means any stack trace refers to storage_proxy::init_messaging_service::lambda#n instead of a meaningful function name, and it makes init_messaging_service() very intimidating. Fix that by moving all such lambdas to regular member functions. This is easy now that they don't capture anything except `this`, which we provide during registration via std::bind_front(). A few #includes and forward declarations had to be added to storage_proxy.hh. This is unfortunate, but can only be solved by splitting storage_proxy into a client part and a server part.	2022-04-17 19:03:06 +03:00
Avi Kivity	f7e8109b16	storage_proxy: don't capture messaging_service in server callbacks We'd like to make the server callbacks member functions, rather than lambdas, so we need to eliminate their captures. This patch eliminats 'ms' by referringn to the already existing member '_messaging' instead.	2022-04-17 17:55:05 +03:00
Avi Kivity	4cac2eb43e	storage_proxy: don't capture migration_manager in server callbacks We'd like to make the server callbacks member functions, rather than lambdas, so we need to eliminate their captures. This patch eliminates 'mm' by making it a member variable and capturing 'this' instead. In one case 'mm' was used by a handle_write() intermediate lambda so we have to make that non-static and capture it too. uninit_messaging_service() clears the member variable to preserve the same lifetime 'mm' had before, in case that's important.	2022-04-17 17:54:51 +03:00
Kamil Braun	b1b22f2c2b	service: raft: don't support/advertise USES_RAFT feature The code would advertise the USES_RAFT feature when the SUPPORTS_RAFT feature was enabled through a listener registered on the SUPPORTS_RAFT feature. This would cause a deadlock: 1. `gossiper::add_local_application_state(SUPPORTED_FEATURES, ...)` locks the gossiper (it's called for the first time from sstables format selector). 2. The function calls `on_change` listeners. 3. One of the listeners is the one for SUPPORTS_RAFT. 4. The listener calls `gossiper::add_local_application_state(SUPPORTED_FEATURES, ...)`. 5. This tries to lock the gossiper. In turn, depending on timing, this could hang the startup procedure, which calls `add_local_application_state` multiple times at various points, trying to take the lock inside gossiper. This prevents us from testing raft / group 0, new schema change procedures that use group 0, etc. For now, simply remove the code that advertises the USES_RAFT feature. Right now the feature has no other effect on the system than just becoming enabled. In fact, it's possible that we don't need this second feature at all (SUPPORTS_RAFT may be enough), but that's work-in-progress. If needed, it will be easy to bring the enabling code back (in a fixed form that doesn't cause a deadlock). We don't remove the feature definitions yet just in case. Refs: #10355	2022-04-15 16:08:25 +02:00
Kamil Braun	41f5b7e69e	Merge branch 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla into next * 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla: main: allow joining raft group0 before waiting for gossiper to settle service: raft_group0: make `join_group0` re-entrant service: storage_service: add `join_group0` method raft_group_registry: update gossiper state only on shard 0 raft: don't update gossiper state if raft is enabled early or not enabled at all gms: feature_service: add `cluster_uses_raft_mgmt` accessor method db: system_keyspace: add `bootstrap_needed()` method db: system_keyspace: mark getter methods for bootstrap state as "const"	2022-04-14 16:42:20 +02:00
Piotr Sarna	61057446f7	Merge 'forward_service: retry failed forwarder call' from Michał Sala This pull request adds support for retrying failed forwarder calls (currently used to parallelize `select count() from ...` queries). Failed-to-forward sub-queries will be executed locally (on a super-coordinator). This local execution is meant as a fallback for a forward_requests that could not be sent to its destined coordinator (e.g. due gossiper not reacting fast enough). Local execution was chosen as the safest one - it does not require sending data to another coordinator. Due to problems with misscompilations, some parts of the `forward_service` were uncoroutinized. Fixes: #10131 Closes #10329 github.com:scylladb/scylla: forward_service: uncoroutinize dispatch method forward_service: uncoroutinize retrying_dispatcher forward_service: rety a failed forwarder call forward_service: copy arguments/captured vars to local variables	2022-04-13 09:41:35 +02:00
Gleb Natapov	a3e8ae0979	storage_proxy: fix silencing of remote read errors Filtering remote rpc errors based on exception type did not work because the remote errors were reported as std::runtime_error and all rpc exceptions inherit from it. New rpc propagates remote errors using special type rpc::remote_verb_error now, so we can filter on that instead. Fixes #10339 Message-Id: <YlQYV5G6GksDytGp@scylladb.com>	2022-04-11 18:53:25 +03:00
Piotr Sarna	58529591a9	database,cql3: add STORAGE option to keyspaces The STORAGE option is designed to hold a map of options used for customizing storage for given keyspace. The option is kept in a system_schema.scylla_keyspaces table. The option is only available if the whole cluster is aware of it - guarded by a cluster feature. Example of the table contents: ``` cassandra@cqlsh> select * from system_schema.scylla_keyspaces; keyspace_name \| storage_options \| storage_type ---------------+------------------------------------------------+-------------- ksx \| {'bucket': '/tmp/xx', 'endpoint': 'localhost'} \| S3 ```	2022-04-08 09:17:01 +02:00
Pavel Solodovnikov	293c5f39ee	service: raft_group0: make `join_group0` re-entrant Detect if we have already finished joining group0 before and do nothing in that case. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-07 12:36:40 +03:00
Pavel Solodovnikov	057a12e213	service: storage_service: add `join_group0` method Just delegates work to `service::raft_group0::join_group0()` so that it can be used in `main` to activate raft group0 early in some cases (before waiting for gossiper to settle). Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-07 12:36:33 +03:00
Pavel Solodovnikov	0d5e2157e1	raft_group_registry: update gossiper state only on shard 0 Since `gossiper::add_local_application_state` is not safe to call concurrently from multiple shards (which will cause a deadlock inside the method), call this only on shard 0 in `_raft_support_listener`. This fixes sporadic hangs when starting a fresh node in an empty cluster where node hangs during startup. Tests: unit(dev), manual Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-07 12:33:40 +03:00
Pavel Solodovnikov	7903d2afa8	raft: don't update gossiper state if raft is enabled early or not enabled at all There is a listener in the `raft_group_registry`, which makes the gossiper to re-publish supported features app state to the cluster. We don't need to do this in case `USES_RAFT_CLUSTER_MANAGEMENT` feature is enabled before the usual time, i.e. before the gossiper settles. So, short-circuit the listener logic in that case and do nothing. Also, don't do anything if raft group registry is not enabled at all, this is just a generic safeguard. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-04-07 12:31:29 +03:00
Botond Dénes	18be2e9faf	Merge "Remove gossiper->snitch kicking" from Pavel Emelyanov " Gossiper calls snitch->gossiper_starting() when being enabled. This generates a dependency loop -- snitch needs gossiper to gossip its states and get DC/RACK, gossiper needs snitch to do this kick. This set removes this notification. The new approach is to kick the snitch to gossip its states in the same places where gossiper is enabled() so that only the snitch->gossiper dependency remains. As a side effect the set ditches a bunch of references to global snitch instance. tests: unit(dev) " * 'br-snitch-gossiper-starting' of https://github.com/xemul/scylla: snitch: Remove gossiper_starting() snitch: Remove gossip_snitch_info() property-file snitch: Re-gossip states with the help of .get_app_states() property-file snitch: Reload state in .start() ec2 multi-region snitch: Register helper in .start() snitch, storage service: Gossip snitch info once snitch: Introduce get_app_states() method property-file snitch: Use _my_distributed to re-shard storage service: Shuffle snitch name gossiping	2022-04-06 17:41:36 +03:00
Michał Sala	28970389bc	forward_service: uncoroutinize dispatch method Done to mitigate potential misscompilations.	2022-04-06 15:01:31 +02:00
Michał Sala	edc32a7118	forward_service: uncoroutinize retrying_dispatcher Done to mitigate potential misscompilations.	2022-04-06 14:52:59 +02:00
Michał Sala	59ff51c824	forward_service: rety a failed forwarder call Failed-to-forward sub-queries will be executed locally (on a super-coordinator). This local execution is meant as a fallback for forward_requests that could not be sent to its destined coordinator (e.g. due gossiper not reacting fast enough). Local execution was chosen as the safest one - it does not require sending data to another coordinator.	2022-04-06 14:44:55 +02:00
Gleb Natapov	7bf557332f	storage_service: remove maybe from maybe_start_sys_dist_ks There is nothing "maybe" about it now. Message-Id: <Ykv/bj8MvKh0UU23@scylladb.com>	2022-04-05 12:49:56 +03:00
Michał Sala	e170961b4d	forward_service: copy arguments/captured vars to local variables Copying captured variables into local variables (that live in a coroutine's frame) is a mitigation of suspected lifetime issues. Arguments of forward_service::dispatch are also copied (to prevent potential undefined behavior or miss-compilation triggered by referencing the arguments in a capture list of a lambda that produces a coroutine).	2022-04-04 16:58:08 +02:00
Pavel Emelyanov	9fdb49c86a	Merge 'fix hang on shutdown while ddl query is running and there is no quorum' from Gleb A node that runs DDL query while its cluster does not have a quorum cannot be shutdown since the query is not abortable. The series makes it abortable and also fixes the order in which components are shutdown to avoid the deadlock. * gleb/raft_shutdown_v4 of git@github.com:scylladb/scylla-dev.git: migration_manager: drain migration manager before stopping protocol servers on shutdown migration_manager: pass abort source to raft primitives storage_proxy: relax some read error reporting	2022-04-04 17:25:13 +03:00
Pavel Emelyanov	f9af6fb430	snitch, storage service: Gossip snitch info once Nowadays snitch states are put into gossiper via .gossiper_starting() call by gossiper. This, in turn, happens in two places -- on node ring join code and on re-enabling gossiper via the API call. The former can be performed by the ring joining code with the help of recently introduced snitch.get_app_states() helper. The latter call is in fact not needed. Re-gossiped are DC, RACK and for some drivers the INTERNAL_IP states that don't change throughout snitch lifetime and are preserved in the gossiper pre-loaded states. Thus, once the snitch states are applied by storage service ring join code, the respective states udpate can be removed from the snitch gossiper_starting() implementations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-01 13:16:05 +03:00
Pavel Emelyanov	b8e876681d	storage service: Shuffle snitch name gossiping No functional changes, just have the local snitch reference in the ring joining code. This simplifies next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-04-01 13:16:05 +03:00

1 2 3 4 5 ...

2732 Commits