scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 03:30:49 +00:00

Author	SHA1	Message	Date
Piotr Dulikowski	da571ed93b	message: get rid of throws in send_message{,_timeout,_abortable} Now, those function don't rethrow existing or throw new exceptions.	2022-07-04 19:27:06 +02:00
Pavel Emelyanov	85033ea6ae	Merge 'A bunch of refactors related to Raft group 0' from Kamil Braun The commits here were extracted from PR https://github.com/scylladb/scylla/pull/10835 which implements upgrade procedure for Raft group 0. They are mostly refactors which don't affect the behavior of the system, except one: the commit `4d439a16b3` causes all schema changes to be bounced to shard 0. Previously, they would only be bounced when the local Raft feature was enabled. I do that because: 1. eventually, we want this to be the default behavior 2. in the upgrade PR I remove the `is_raft_enabled()` function - the function was basically created with the mindset "Raft is either enabled or not" - which was right when we didn't support upgrade, but will be incorrect when we introduce intermediate states (when we upgrade from non-raft-based to raft-based operations); the upgrade PR introduces another mechanism to dispatch based on the upgrade state, but for the case of bouncing to shard 0, dispatching is simply not necessary. Closes #10864 * github.com:scylladb/scylla: service/raft: raft_group_registry: add assertions when fetching servers for groups service/raft: raft_group_registry: remove `_raft_support_listener` service/raft: raft_group0: log adding/removing servers to/from group 0 RPC map service/raft: raft_group0: move group 0 RPC handlers from `storage_service` service/raft: messaging: extract raft_addr/inet_addr conversion functions service: storage_service: initialize `raft_group0` in `main` and pass a reference to `join_cluster` treewide: remove unnecessary `migration_manager::is_raft_enabled()` calls test/boost: memtable_test: perform schema operations on shard 0 test/boost: cdc_test: remove test_cdc_across_shards message: rename `send_message_abortable` to `send_message_cancellable` message: change parameter order in `send_message_oneway_timeout`	2022-06-29 16:51:54 +03:00
Avi Kivity	3131cbea62	Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones. This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters. Refs: https://github.com/scylladb/scylla/issues/3672 Refs: https://github.com/scylladb/scylla/issues/7689 Refs: https://github.com/scylladb/scylla/issues/7933 Tests: manual upgrade test. I wrote a data set with: ``` ./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000 ``` This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload: ``` ./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100 ``` I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs. Closes #10829 * github.com:scylladb/scylla: query: have replica provide the last position idl/query: add last_position to query_result mutlishard_mutation_query: propagate compaction state to result builder multishard_mutation_query: defer creating result builder until needed querier: use full_position instead of ad-hoc struct querier: rely on compactor for position tracking mutation_compactor: add current_full_position() convenience accessor mutation_compactor: s/_last_clustering_pos/_last_pos/ mutation_compactor: add state accessor to compact_mutation introduce full_position idl: move position_in_partition into own header service/paging: use position_in_partition instead of clustering_key for last row alternator/serialization: extract value object parsing logic service/pagers/query_pagers.cc: fix indentation position_in_partition: add to_string(partition_region) and parse_partition_region() mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh	2022-06-27 12:23:21 +03:00
Kamil Braun	8e907cbf57	service/raft: raft_group0: move group 0 RPC handlers from `storage_service` And generate the boilerplate from IDL declarations. Simplifies the code, and the code now resides where it belongs.	2022-06-23 16:14:41 +02:00
Kamil Braun	c030d03893	message: rename `send_message_abortable` to `send_message_cancellable` It's not possible to abort an RPC call entirely, since the remote part continues running (if the message got out). Calling the provided abort source does the following: 1. if the message is still in the outgoing queue, drop it, 2. resolve waiter callbacks exceptionally. Using the word "cancellable" is more appropriate. Also write a small comment at `send_message_cancellable`.	2022-06-23 16:14:41 +02:00
Kamil Braun	07fe3e4a99	message: change parameter order in `send_message_oneway_timeout` Make it consistent with the other 'send message' functions. Simplify code generation logic in idl-compiler. Interestingly this function is not used anywhere so I didn't have to fix any call sites.	2022-06-23 16:14:41 +02:00
Botond Dénes	009d2fe2f7	idl/query: add last_position to query_result To be used to allow the replica to specify the last position in the stream, where the query was left off. Currently this is always the same as the implicit position -- the last row in the result-set -- but this requires only stopping the read on a live row, which is a requirement we want to lift: we want to be able to stop on a tombstone. As tombstones are not included in the query result, we have to allow the replica to overwrite the last seen position explicitly. This patch introduces the new field in the query-result IDL but it is not written to yet, nor is it read, that is left for the next patches.	2022-06-23 13:36:24 +03:00
Botond Dénes	119be5d5db	idl: move position_in_partition into own header So it can be used without pulling in all of partition_checksum.idl.hh.	2022-06-23 13:36:24 +03:00
Piotr Dulikowski	02469e0b15	storage_proxy: add per partition rate limit info to write RPC Adds db::per_partition_rate_limit::info parameter to the write RPC. The rate limit info controls the behavior of the rate limiter on the replica.	2022-06-22 20:16:48 +02:00
Piotr Dulikowski	51546b0609	storage_proxy: pass rate_limit_exception through write RPC This commit modifies the storage_proxy logic so that the coordinator knows whether a write operation failed due to rate limit being exceeded, and returns `exceptions::rate_limit_exception` when that happens.	2022-06-22 20:16:48 +02:00
Avi Kivity	ee2420ff43	messaging: add boilerplate to rpc_protocol_impl.hh License, copyright, #pragma once. The copyright is set to 2021 since that was when the file was created. Closes #10778	2022-06-13 07:29:32 +02:00
Avi Kivity	afc06f0017	messaging: forward-declare types in messaging_service.hh messaging_service.hh is a switchboard - it includes many things, and many things include it. Therefore, changes in the things it includes affect many translation units. Reduce the dependencies by forward-declaring as much as possible. This isn't pretty, but it reduces compile time and recompilations. Other headers adjusted as needed so everything (including `ninja dev-headers`) still compile. Closes #10755	2022-06-09 15:52:12 +03:00
Michael Livshin	029508b77c	flat_mutation_reader ist tot Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Avi Kivity	c83393e819	messaging: do isolate default tenants In `10dd08c9` ("messaging_service: supply and interpret rpc isolation_cookies", 4.2), we added a mechanism to perform rpc calls in remote scheduling groups based on the connection identity (rather than the verb), so that connection processing itself can run in the correct group (not just verb processing), and so that one verb can run in different groups according to need. In `16d8cdadc` ("messaging_service: introduce the tenant concept", 4.2), we changed the way isolation cookies are sent: scheduling_group messaging_service::scheduling_group_for_verb(messaging_verb verb) const { return _scheduling_info_for_connection_index[get_rpc_client_idx(verb)].sched_group; @@ -665,11 +694,14 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge if (must_compress) { opts.compressor_factory = &compressor_factory; } opts.tcp_nodelay = must_tcp_nodelay; opts.reuseaddr = true; - opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie; + // We send cookies only for non-default statement tenant clients. + if (idx > 3) { + opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie; + } This effectively disables the mechanism for the default tenant. As a result some verbs will be executed in whatever group the messaging service listener was started in. This used to be the main group, but in `554ab03` ("main: Run init_server and join_cluster inside maintenance scheduling group", 4.5), this was change to the maintenance group. As a result normal read/writes now compete with maintenance operations, raising their latency significantly. Fix by sending the isolation cookie for all connections. With this, a 2-node cassandra-stress load has 99th percentile increase by just 3ms during repair, compared to 10ms+ before. Fixes #9505. Closes #10673	2022-05-27 16:36:57 +02:00
Benny Halevy	1308b45c58	messaging_service: do_make_sink_source: handle failed source future I've stumbled upon this with version `a2901a376d` in debug mode when testing repair_additional_test.py::TestRepairAdditional::test_repair_kill_3: WARN 2022-05-17 07:26:12,581 [shard 0] seastar - Exceptional future ignored: seastar::rpc::closed_error (connection is closed), backtrace: 0x137c33d0 0x1ad14d0d 0x1ad149cd 0x1ad16fc3 0x1ad17e52 0x19d8a809 0x19d8ab6a 0x139165a9 0x17be0d21 0x17bdcfb0 0x17bf3611 0x17bf39f0 0x17bf3c62 0x17bf3958 0x17bf57d8 0x17bf5468 0x19efe44e 0x19f04ac6 0x19f09732 0x19f072a1 0x19cca281 0x19cc7de5 0x13859cbf 0x13d309d6 0x13d3090b 0x13d30775 0x1391364d 0x13858521 /lib64/libc.so.6+0x27b74 0x137774ad Decode: ``` seastar::report_failed_future(seastar::future_state_base::any&&) at //./seastar/src/core/future.cc:218 seastar::future_state_base::any::check_failure() at //./seastar/include/seastar/core/future.hh:573 seastar::future_state<seastar::rpc::source<repair_row_on_wire_with_cmd> >::clear() at ././seastar/include/seastar/core/future.hh:615 ~future_state at ././seastar/include/seastar/core/future.hh:620 (inlined by) ~future at ././seastar/include/seastar/core/future.hh:1343 ~ at ./message/messaging_service.cc:841 ``` Looks like if sink.close() fails after source.failed() then source gets abandoned. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-19 11:47:38 +03:00
Kamil Braun	9551256e81	messaging_service: abortable version of `send_gossip_echo` Use the new `send_message_abortable` function to implement an abortable version of `send_gossip_echo`. These echo messages will be used for direct failure detection.	2022-05-09 13:14:41 +02:00
Kamil Braun	f2548fc3fa	message: abortable version of `send_message` I want to be able to timeout `send_message`, but not through the existing `send_message_timeout` API which forces me to use a particular clock/duration/timepoint type. Introduce a more general `send_message_abortable` API which gets an `abort_source&`, subscribes to it, and uses the `rpc::cancellable` interface to cancel the RPC on abort. The function is 90% copy-pasta from `send_message{_timeout}`, only the abort part is new.	2022-05-09 13:14:41 +02:00
Pavel Solodovnikov	95c8d65949	treewide: fix compilation issues with fmtlib 8.1.0+ Due to `fd62fba985` scoped enums are not automatically converted to integers anymore, this is the intended behavior, according to the fmtlib devs. A bit nicer solution would be to use `std::to_underlying` instead of a direct `static_cast`, but it's not available until C++23 and some compilers are still missing the support for it. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-03-16 12:31:50 +03:00
Mikołaj Sielużycki	1d84a254c0	flat_mutation_reader: Split readers by file and remove unnecessary includes. The flat_mutation_reader files were conflated and contained multiple readers, which were not strictly necessary. Splitting optimizes both iterative compilation times, as touching rarely used readers doesn't recompile large chunks of codebase. Total compilation times are also improved, as the size of flat_mutation_reader.hh and flat_mutation_reader_v2.hh have been reduced and those files are included by many file in the codebase. With changes real 29m14.051s user 168m39.071s sys 5m13.443s Without changes real 30m36.203s user 175m43.354s sys 5m26.376s Closes #10194	2022-03-14 13:20:25 +02:00
Michał Sala	fff454761a	messaging_service: add verb for count() request forwarding Except for the verb addition, this commit also defines forward_request and forward_result structures, used as an argument and result of the new rpc. forward_request is used to forward information about select statement that does count() (or other aggregating functions such as max, min, avg in the future). Due to the inability to serialize cql3::statements::select_statement, I chose to include query::read_command, dht::partition_range_vector and some configuration options in forward_request. They can be serialized and are sufficient enough to allow creation of service::pager::query_pagers::pager.	2022-02-01 21:14:41 +01:00
Kamil Braun	cc0c54ea15	service: migration_manager: allow using MIGRATION_REQUEST verb to fetch group 0 history table The MIGRATION_REQUEST verb is currently used to pull the contents of schema tables (in the form of mutations) when nodes synchronize schemas. We will (ab)use the verb to fetch additional data, such as the contents of the group 0 history table, for purposes of group 0 snapshot transfer. We extend `schema_pull_options` with a flag specifying that the puller requests the additional data associated with group 0 snapshots. This flag is `false` by default, so existing schema pulls will do what they did before. If the flag is `true`, the migration request handler will include the contents of group 0 history table. Note that if a request is set with the flag set to `true`, that means the entire cluster must have enabled the Raft feature, which also means that the handler knows of the flag.	2022-01-24 15:20:37 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Gleb Natapov	b1fea20d36	raft: move raft verbs to the IDL	2022-01-13 13:14:46 +02:00
Gleb Natapov	8a25b740df	raft: split idl to rpc and storage Storage uses only small part of the IDL, so it can include only the part that is relevant to it.	2022-01-13 13:14:46 +02:00
Gleb Natapov	1db151bd75	storage_proxy: move all verbs to the IDL Define all verbs in the IDL instead of manually codding them.	2022-01-10 14:58:28 +02:00
Gleb Natapov	ff6a0fffaf	storage_proxy: convert more address vectors to inet_address_vector_replica_set	2022-01-10 13:48:20 +02:00
Avi Kivity	57188de09e	Merge 'Make dc/rack encryption work for some cases where Nat hides ednpoint ips' from Eliran Sinvani This is a consolidation of #9714 and #9709 PRs by @elcallio that were reviewed by @asias The last comment on those was that they should be consolidated in order not to create a security degradation for ec2 setups. For some cases it is impossible to determine dc or rack association for nodes on outgoing connections. One example is when some IPs are hidden behind Nat layer. In some cases this creates problems where one side of the connection is aware of the rack/dc association where the other doesn't. The solution here is a two stage one: 1. First add a gossip reverse lookup that will help us determine the rack/dc association for a broader (hopefully all) range of setups and NAT situations. 2. When this fails - be more strict about downgrading a node which tries to ensure that both sides of the connection will at least downgrade the connection instead of just fail to start when it is not possible for one side to determine rack/dc association. Fixes #9653 /cc @elcallio @asias Closes #9822 * github.com:scylladb/scylla: messaging_service: Add reverse mapping of private ip -> public endpoint production_snitch_base: Do reverse lookup of endpoint for info messaging_service: Make dc/rack encryption check for connection more strict	2022-01-09 16:40:49 +02:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Calle Wilund	73c4a2f42b	messaging_service: Add reverse mapping of private ip -> public endpoint For quick reverse lookup. (cherry picked from commit `c86296f2a8`)	2022-01-04 15:14:58 +02:00
Calle Wilund	4778770814	messaging_service: Make dc/rack encryption check for connection more strict Fixes #9653 When doing an outgoing connection, in a internode_encryption=dc/rack situation we should not use endpoint/local broadcast solely to determine if we can downgrade a connection. If gossip/message_service determines that we will connect to a different address than the "official" endpoint address, we should use this to determine association of target node, and similarly, if we bind outgoing connection to interface != bc we need to use this to decide local one. Note: This will effectively _disable_ internode_encryption=dc/rack on ec2 etc until such time that gossip can give accurate info on dc/rack for "internal" ip addresses of nodes.	2021-12-20 06:20:46 +02:00
Konstantin Osipov	c22f945f11	raft: (service) manage Raft configuration during topology changes Operations of adding or removing a node to Raft configuration are made idempotent: they do nothing if already done, and they are safe to resume after a failure. However, since topology changes are not transactional, if a bootstrap or removal procedure fails midway, Raft group 0 configuration may go out of sync with topology state as seen by gossip. In future we must change gossip to avoid making any persistent changes to the cluster: all changes to persistent topology state will be done exclusively through Raft Group 0. Specifically, instead of persisting the tokens by advertising them through gossip, the bootstrap will commit a change to a system table using Raft group 0. nodetool will switch from looking at gossip-managed tables to consulting with Raft Group 0 configuration or Raft-managed tables. Once this transformation is done, naturally, adding a node to Raft configuration (perhaps as a non-voting member at first) will become the first persistent change to ring state applied when a node joins; removing a node from the Raft Group 0 configuration will become the last action when removing a node. Until this is done, do our best to avoid a cluster state when a removed node or a node which addition failed is stuck in Raft configuration, but the node is no longer present in gossip-managed system tables. In other words, keep the gossip the primary source of truth. For this purpose, carefully chose the timing when we join and leave Raft group 0: Join the Raft group 0 only after we've advertised our tokens, so the cluster is aware of this node, it's visible in nodetool status, but before node state jumps to "normal", i.e. before it accepts queries. Since the operation is idempotent, invoke it on each restart. Remove the node from Group 0 before its tokens are removed from gossip-managed system tables. This guarantees that if removal from Raft group 0 fails for whatever reason, the node stays in the ring, so nodetool removenode and friends are re-tried. Add tracing.	2021-11-25 12:35:42 +03:00
Konstantin Osipov	e3751068fe	raft: (server) allow adding entries/modify config on a follower Implement an RPC to forward add_entry calls from the follower to leader. Bounce & retry in case of not_a_leader. Do not retry in case of uncertainty - this can lead to adding duplicate entries. The feature is added to core Raft since it's needed by all current clients - both topology and schema changes. When forwarding an entry to a remote leader we may get back a term/index pair that conflicts (has the same index, but is with a higher term) with a local entry we're still waiting on. This can happen, e.g. because there was a leader change and the log was truncated, but we still haven't got the append_entries RPC from the new leader, still haven't truncated the log locally, still haven't aborted all the local waits for truncated entries. Only remove the offending entry from the wait list and abort it. There may be entries labeled with an older term to the right (with higher commit index) of the conflicting entry. However, finding them, would require a linear scan. If we allow it, we may end up doing this linear scan for every conflicting entry during the transition period, which brings us to N^2 complexity of this step. At the same time, as soon as append_entries that commits a higher-term entry with the same index reaches the follower, the waits for the respective truncated entry will be aborted anyway (see notify_waiters() which sets dropped_entry exception), so the scan is unnecessary. Similarly to being able to add entries, allow to modify Raft group configuration on a follower. The implementation works the same way as adding entries - forwards the command to the leader. Now that add_entry() or modify_config never throws not_a_leader, it's more likely to throw timed_out_error, e.g. in case the network is partitioned. Previously it was only possible due to a semaphore wait timeout, and this scenario was not tested. Handle timed_out_error on RPC level to let the existing tests (specifically the randomized nemesis test) pass.	2021-11-25 11:50:38 +03:00
Benny Halevy	ff18c0c14c	messaging_service: remove unused include of db/system_keyspace.hh As a followup to `eba20c7e5d` "messaging_service: init_local_preferred_ip_cache: get preferred ips from caller". Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211123080457.1247970-1-bhalevy@scylladb.com>	2021-11-23 11:12:36 +03:00
Benny Halevy	ce9836e2fd	messaging_service: init_local_preferred_ip_cache: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211119143523.3424773-2-bhalevy@scylladb.com>	2021-11-22 13:29:21 +03:00
Benny Halevy	eba20c7e5d	messaging_service: init_local_preferred_ip_cache: get preferred ips from caller To avoid back-calling the system_keyspace from the messaging layer let the system_keyspace get the preferred ips vector and pass it down to the messaging_service. This is part of the effort to deglobalize the system keyspace and query context. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211119143523.3424773-1-bhalevy@scylladb.com>	2021-11-22 13:29:17 +03:00
Avi Kivity	0ea79559a6	Merge 'IDL: support generating boilerplate code for RPC verbs' from Pavel Solodovnikov Introduce new syntax in IDL compiler to allow generating registration/sending code for RPC verbs: ``` verb [[attr1, attr2...] my_verb (args...) -> return_type; ``` `my_verb` RPC verb declaration corresponds to the `netw::messaging_verb::MY_VERB` enumeration value to identify the new RPC verb. For a given `idl_module.idl.hh` file, a registrator class named `idl_module_rpc_verbs` will be created if there are any RPC verbs registered within the IDL module file. These are the methods being created for each RPC verb: ``` static void register_my_verb(netw::messaging_service* ms, std::function<return_type(args...)>&&); static future<> unregister_my_verb(netw::messaging_service* ms); static future<> send_my_verb(netw::messaging_service* ms, netw::msg_addr id, args...); ``` Each method accepts a pointer to an instance of `messaging_service` object, which contains the underlying seastar RPC protocol implementation, that is used to register verbs and pass messages. There is also a method to unregister all verbs at once: ``` static future<> unregister(netw::messaging_service* ms); ``` The following attributes are supported when declaring an RPC verb in the IDL: * `[[with_client_info]]` - the handler will contain a const reference to an `rpc::client_info` as the first argument. * `[[with_timeout]]` - an additional `time_point` parameter is supplied to the handler function and `send` method uses `send_message__timeout` variant of internal function to actually send the message. * `[[one_way]]` - the handler function is annotated by `future<rpc::no_wait_type>` return type to designate that a client doesn't need to wait for an answer. The `-> return_type` clause is optional for two-way messages. If omitted, the return type is set to be `future<>`. For one-way verbs, the use of return clause is prohibited and the signature of `send` function always returns `future<>`. No existing code is affected. Ref: #1456 Closes #9359 github.com:scylladb/scylla: idl: support generating boilerplate code for RPC verbs idl: allow specifying multiple attributes in the grammar message: messaging_service: extract RPC protocol details and helpers into a separate header	2021-10-05 18:05:24 +03:00
Asias He	1657e7be14	gossiper: Send generation number with shutdown message Consider: - n1, n2 in the cluster - n2 shutdown - n2 sends gossip shutdown message to n1 - n1 delays processing of the handler of shutdown message - n2 restarts - n1 learns new gossip state of n2 - n1 resumes to handle the shutdown message - n1 will mark n2 as shutdown status incorrectly until n2 restarts again To prevent this, we can send the gossip generation number along with the shutdown message. If the generation number does not match the local generation number for the remote node, the shutdown message will be ignored. Since we use the rpc::optional to send the generation number, it works with mixed cluster. Fixes #8597 Closes #9381	2021-09-27 11:08:43 +03:00
Pavel Emelyanov	598841a5dd	code: Expell gossiper.hh from other headers This needs to add forward declarations of the gossiper class and re-include some other headers here and there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Pavel Emelyanov	a4118a70ee	database, messaging: Delete old connection drop notification Database no longer needs it. Since the only user of the old-style notification is gone -- remove it as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-15 17:49:06 +03:00
Pavel Emelyanov	dd498273dc	messaging, proxy: Notify connection drops with boost signal The messaging_service keeps track of a list of connection-drop listeners. This list is not auto-removing and is thus not safe on stop (fortunately there's only 1 non-stopping client of it so far). This patch adds a safter notification based on boost/signals. Also storage_proxy is subscribed on it in advance to demonstrate how it looks like altogether and make next patch shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-15 17:49:06 +03:00
Pavel Solodovnikov	7a8cadcca8	message: messaging_service: extract RPC protocol details and helpers into a separate header Introduce a new header `message/rpc_protocol_impl.hh`, move here the following things from `message/messaging_service.cc`: * RPC protocol wrappers implementation * Serialization thunks * `register_handler` and `send_message*` functions This code will be used later for IDL-generated RPC verbs implementation. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-09-15 17:47:11 +03:00
Avi Kivity	705f957425	Merge "Generalize TLS creds builder configuration" from Pavel E " There are 4 places out there that do the same steps parsing "client_\|server_encryption_options" and configuring the seastar::tls::creds_builder with the values (messaging, redis, alternator and transport). Also to make redis and transport look slimmer main() cleans the client_encryption_options by ... parsing it too. This set introduces a (coroutinized) helper to configure the creds_builder with map<string, string> and removes the options beautification from main. tests: unit(dev), dtest.internode_ssl_test(dev) " * 'br-generalize-tls-creds-builder-configuration' of https://github.com/xemul/scylla: code: Generalize tls::credentials_builder configuration transport, redis: Do not assume fixed encryption options messaging: Move encryption options parsing to ms main: Open-code internode encryption misconfig warning main, config: Move options parsing helpers	2021-09-01 14:19:19 +03:00
Gleb Natapov	03a266d73b	raft: make read_barrier work on a follower as well as on a leader This patch implements RAFT extension that allows to perform linearisable reads by accessing local state machine. The extension is described in section 6.4 of the PhD. To sum it up to perform a read barrier on a follower it needs to asks a leader the last committed index that it knows about. The leader must make sure that it is still a leader before answering by communicating with a quorum. When follower gets the index back it waits for it to be applied and by that completes read_barrier invocation. The patch adds three new RPC: read_barrier, read_barrier_reply and execute_read_barrier_on_leader. The last one is the one a follower uses to ask a leader about safe index it can read. First two are used by a leader to communicate with a quorum.	2021-08-25 08:57:13 +03:00
Pavel Emelyanov	e02b39ca3d	code: Generalize tls::credentials_builder configuration All the places in code that configure the mentioned creds builder from client_\|server_encryption_options now do it the same way. This patch generalizes it all in the utils:: helper. The alternator code "ignores" require_client_auth and truststore keys, but it's easy to make the generalized helper be compatible. Also make the new helper coroutinized from the beginning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-20 18:05:41 +03:00
Pavel Emelyanov	2f5941ca6f	messaging: Move encryption options parsing to ms Main collects a bunch of local variables from config and passes them as arguments to messaging service initialization helper. This patch replaces all these args with const config reference. The motivation is to facilitate next patching by providing the server encryption options k:v set right in the m.s. init code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-20 17:56:16 +03:00
Piotr Dulikowski	0d74dee683	Revert "messaging_service: add verbs for hint sync points" This reverts commit `82c419870a`. This commit removes the HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK rpc verbs. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for new verbs.	2021-08-09 09:24:36 +02:00
Calle Wilund	b8b5f69111	messaging_service: Bind to listen address, not broadcast Refs #8418 Broadcast can (apparently) be an address not actually on machine, but on the other side of NAT. Thus binding local side of outgoing connection there will fail. Bind instead to listen_address (or broadcast, if listen_to_broadcast), this will require routing + NAT to make the connection looking like from broadcast from node connected to, to allow the connection (if using partial encryption). Note: this is somewhat verified somewhat limitedly. I would suggest verifying various multi rack/dc setups before relying on it. Closes #8974	2021-07-15 13:18:10 +03:00
Avi Kivity	9059514335	build, treewide: enable -Wpessimizing-move warning This warning prevents using std::move() where it can hurt - on an unnamed temporary or a named automatic variable being returned from a function. In both cases the value could be constructed directly in its final destination, but std::move() prevents it. Fix the handful of cases (all trivial), and enable the warning. Closes #8992	2021-07-08 17:52:34 +03:00
Benny Halevy	51bc6c8b5a	messaging_service: do_start_listen: improve info log accuracy Make sure to log the info message when we actually start listening. Also, print a log message when listening on the broadcast address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-30 16:25:21 +03:00
Benny Halevy	df442d4d24	messaging_service: never listen on port 0 We never want to listen on port 0, even if configured so. When the listen port is set to 0, the OS will choose the port randomly, which makes it useless for communicating with other nodes in the cluster, since we don't support that. Also, it causes the listen_ports_conf_test internode_ssl_test to fail since it expects to disable listening on storage_port or ssl_storage_port when set to 0, as seen in https://github.com/scylladb/scylla-dtest/issues/2174. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-30 16:24:54 +03:00

1 2 3 4 5 ...

392 Commits