scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-20 08:30:35 +00:00

Author	SHA1	Message	Date
Eliran Sinvani	14520e843a	messagin service: fix execution order in messaging_service constructor The messaging service constructor's body does two main things in this order: 1. it registers the CLIENT_ID verb with rpc. 2. it initializes the scheduling mechanism in charge of locating the right scheduling group for each verb. The registration function uses the scheduling mechanism to determine the scheduling group for the verb. This commit simply reverses the order of execution. Fixes #6628	2020-06-11 12:14:10 +03:00
Pavel Emelyanov	67d5fad65f	storage_service: Remove some inclusions of its header GC pass over .cc files. Some really do not need it, some need for features/gossiper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Botond Dénes	16d8cdadc9	messaging_service: introduce the tenant concept Tenants get their own connections for statement verbs and are further isolated from each other by different scheduling groups. A tenant is identified by a scheduling group and a name. When selecting the client index for a statement verb, we look up the tenant whose scheduling group matches the current one. This scheduling group is persisted across the RPC call, using the name to identify the tenant on the remote end, where a reverse lookup (name -> scheduling group) happens. Instead of a single scheduling group to be used for all statement verbs, messaging_service::scheduling_config now contains a list of tenants. The first among these is the default tenant, the one we use when the current scheduling group doesn't match that of any configured tenant. To make this mapping easier, we reshuffle the client index assignment, such that statement and statement-ack verbs have the idx 2 and 3 respectively, instead of 0 and 3. The tenant configuration is configured at message service construction time and cannot be changed after. Adding such capability should be easy but is not needed for query classification, the current user of the tenant concept. Currently two tenants are configured: $user (default tenant) and $system.	2020-05-28 11:34:32 +03:00
Avi Kivity	db8974fef3	messaging_service: de-static-ify _scheduling_info_for_connection_index Per-user SLA means we have connection classifications determined dynamically, as SLAs are added or removed. This means the classification information cannot be static. Fix by making it a non-static vector (instead of a static array), allowing it to be extended. The scheduling group member pointer is replaced by a scheduling group as a member pointer won't work anymore - we won't have a member to refer to.	2020-05-28 10:40:08 +03:00
Avi Kivity	10dd08c9b0	messaging_service: supply and interpret rpc isolation_cookies On the client side, we supply an isolation cookie based on the connection index On the server side, we convert an isolation cookie back to a scheduling_group. This has two advantages: - rpc processes the entire connection using the scheduling group, so that code is also isolated and accounted for - we can later add per-user connections; the previous approach of looking at the verb to decide the scheduling_group doesn't help because we don't have a set of verbs per user With this, the main group sees <0.1% usage under simple read and write loads.	2020-05-28 10:40:08 +03:00
Avi Kivity	dbce57fa3c	messaging_service: extract connection_index -> scheduling_group translation Move it from a function-local static to a class static variable. We will want to extend it in two ways: - add more information per connection index (like the rpc isolation cookie) - support adding more connections for per-user SLA As a first step, make it an array of structures and make it accessible to all of messaging_service.	2020-05-28 10:40:08 +03:00
Asias He	c02fea5f04	repair: Ignore table removed in sync_data_using_repair Commit `75cf255c67` (repair: Ignore keyspace that is removed in sync_data_using_repair) is not enough to fix the issue because when the repair master checks if the table is dropped, the table might not be dropped yet on the repair master. To fix, the repair master should check if the follower failed the repair because the table is dropped by checking the error returned from follower. With this patch, we would see WARN 2020-04-14 11:19:00,417 [shard 0] repair - repair id 1 on shard 0 completed successfully, keyspace=ks, ignoring dropped tables={cf} when the table is dropped during bootstrap. Tests: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_test Fixes: #5942	2020-05-24 13:39:59 +03:00
Calle Wilund	08d069f78d	messaging_service: Use reloadable TLS certificates Changes messaging service rpc to use reloadable tls certificates iff tls is enabled- Note that this means that the service cannot start listening at construction time if TLS is active, and user need to call start_listen_ex to initialize and actually start the service. Since "normal" messaging service is actually started from gms, this route too is made a continuation.	2020-05-04 11:32:21 +00:00
Botond Dénes	7dabf75682	service: messaging_service: resolve rpc set_logger deprecation warning Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200407091413.310764-1-bdenes@scylladb.com>	2020-04-22 10:05:35 +03:00
Asias He	13a9c5eaf7	repair: Send reason for node operations Since `956b092012` (Merge "Repair based node operation" from Asias), repair is used by other node operations like bootstrap, decommission and so on. Send the reason for the repair, so that we can handle the materialized view update correctly according to the reason of the operation. We want to trigger the view update only if the repair is used by repair operation. Otherwise, the view table will be handled twice, 1) when the view table is synced using repair 2) when the base table is synced using repair and view table update is triggered. Fixes #5930 Fixes #5998	2020-04-13 13:47:26 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Gleb Natapov	8a408ac5a8	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com>	2020-03-30 21:02:14 +03:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Gleb Natapov	5753ab7195	lwt: drop invoke_on in paxos_state prepare and accept Since lwt requests are now running on an owning shard there is no longer a need to invoke cross shard call on paxos_state level. RPC calls may still arrive to a wrong shard so we need to make cross shard call there.	2020-01-13 10:26:02 +02:00
Benny Halevy	9ec98324ed	messaging_service: unregister_handler: return rpc unregister_handler future Now that seastar returns it. Fixes https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191212143214.99328-1-bhalevy@scylladb.com>	2019-12-12 16:38:36 +02:00
Benny Halevy	105c8ef5a9	messaging_service: wait on unregister_handler Prepare for returning future<> from seastar rpc unregister_handler. Refs https://github.com/scylladb/scylla/issues/5228 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191208153924.1953-1-bhalevy@scylladb.com>	2019-12-11 14:17:41 +02:00
Piotr Dulikowski	adfa7d7b8d	messaging_service: don't move `unsigned` values in handlers Performing std::move on integral types is pointless. This commit gets rid of moves of values of `unsigned` type in rpc handlers.	2019-12-05 00:58:31 +01:00
Piotr Dulikowski	2e802ca650	hh: add HINT_MUTATION verb Introduce a new verb dedicated for receiving and sending hints: HINT_MUTATION. It is handled on the streaming connection, which is separate from the one used for handling mutations sent by coordinator during a write. The intent of using a separate connection is to increase fariness while handling hints and user requests - this way, a situation can be avoided in which one type of requests saturate the connection, negatively impacting the other one.	2019-12-05 00:51:49 +01:00
Vladimir Davydov	bf5f864d80	paxos: piggyback result query on prepare response Current LWT implementation uses at least three network round trips: - first, execute PAXOS prepare phase - second, query the current value of the updated key - third, propose the change to participating replicas (there's also learn phase, but we don't wait for it to complete). The idea behind the optimization implemented by this patch is simple: piggyback the current value of the updated key on the prepare response to eliminate one round trip. To generate less network traffic, only the closest to the coordinator replica sends data while other participating replicas send digests which are used to check data consistency. Note, this patch changes the API of some RPC calls used by PAXOS, but this should be okay as long as the feature in the early development stage and marked experimental. To assess the impact of this optimization on LWT performance, I ran a simple benchmark that starts a number of concurrent clients each of which updates its own key (uncontended case) stored in a cluster of three AWS i3.2xlarge nodes located in the same region (us-west-1) and measures the aggregate bandwidth and latency. The test uses shard-aware gocql driver. Here are the results: latency 99% (ms) bandwidth (rq/s) timeouts (rq/s) clients before after before after before after 1 2 2 626 637 0 0 5 4 3 2616 2843 0 0 10 3 3 4493 4767 0 0 50 7 7 10567 10833 0 0 100 15 15 12265 12934 0 0 200 48 30 13593 14317 0 0 400 185 60 14796 15549 0 0 600 290 94 14416 15669 0 0 800 568 118 14077 15820 2 0 1000 710 118 13088 15830 9 0 2000 1388 232 13342 15658 85 0 3000 1110 363 13282 15422 233 0 4000 1735 454 13387 15385 329 0 That is, this optimization improves max LWT bandwidth by about 15% and allows to run 3-4x more clients while maintaining the same level of system responsiveness.	2019-11-24 11:35:29 +02:00
Vladimir Davydov	3d1d4b018f	paxos: remove unnecessary move constructor invocations invoke_on() guarantees that captures object won't be destroyed until the future returned by the invoked function is resolved so there's no need to move key, token, proposal for calling paxos_state::*_impl helpers.	2019-11-24 11:35:29 +02:00
Gleb Natapov	8d6201a23b	lwt: Add RPC verbs needed for paxos implementation Paxos protocol has three stages: prepare, accept, learn. This patch adds rpc verb for each of those stages. To be term compatible with Cassandra the patch calls those stages: prepare, propose, commit.	2019-10-27 23:21:51 +03:00
Avi Kivity	ba64ec78cf	messaging_service: use rpc::tuple instead of variadic futures for rpc Since variadic future<> is deprecated, switch to rpc::tuple for multiple return values in rpc calls. This is more or less mechanical translation.	2019-09-26 12:09:31 +02:00
Gleb Natapov	73e3d0a283	messaging_service: enable reuseaddr on messaging service rpc Fixes #4943 Message-Id: <20190918152405.GV21540@scylladb.com>	2019-09-19 11:43:03 +03:00
Gleb Natapov	9e9f64d90e	messaging_service: configure different streaming domain for each rpc server A streaming domain identifies a server across shards. Each server should have different one. Fixes: #4953 Message-Id: <20190908085327.GR21540@scylladb.com>	2019-09-08 14:05:40 +03:00
Botond Dénes	7adc764b6e	messaging_service: add canonical_support to schema pull and push verbs The verbs are: * DEFINITIONS_UPDATE (push) * MIGRATION_REQUEST (pull) Support was added in a backward-compatible way. The push verb, sends both the old frozen mutation parameter, and the new optional canonical mutation parameter. It is expected that new nodes will use the latter, while old nodes will fall-back to the former. The pull verb has a new optional `options` parameter, which for now contains a single flag: `remote_supports_canonical_mutation_retval`. This flag, if set, means that the remote node supports the new canonical mutation return value, thus the old frozen mutations return value can be left empty.	2019-09-04 10:32:44 +03:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Asias He	49a73aa2fc	streaming: Move stream_mutation_fragments_cmd to a new file (#4812 ) Avoid including the lengthy stream_session.hh in messaging_service. More importantly, fix the build because currently messaging_service.cc and messaging_service.hh does not include stream_mutation_fragments_cmd. I am not sure why it builds on my machine. Spotted this when backporting the "streaming: Send error code from the sender to receiver" to 3.0 branch. Refs: #4789	2019-08-07 14:59:46 +02:00
Asias He	bac987e32a	streaming: Send error code from the sender to receiver In case of error on the sender side, the sender does not propagate the error to the receiver. The sender will close the stream. As a result, the receiver will get nullopt from the source in get_next_mutation_fragment and pass mutation_fragment_opt with no value to the generating_reader. In turn, the generating_reader generates end of stream. However, the last element that the generating_reader has generated can be any type of mutation_fragment. This makes the sstable that consumes the generating_reader violates the mutation_fragment stream rule. To fix, we need to propagate the error. However RPC streaming does not support propagate the error in the framework. User has to send an error code explicitly. Fixes: #4789	2019-08-06 16:54:56 +02:00
Asias He	5d3e4d7b73	messaging_service: Check if messaging_service is stopped before get_rpc_client get_rpc_client assumes the messaging_service is not stopped. We should check is_stopping() before we call get_rpc_client. We do such check in existing code, e.g., send_message and friends. Do the same check in the newly introduced make_sink_and_source_for_stream_mutation_fragments() and friends for row level repair. Fixes: #4767	2019-07-31 11:44:57 +03:00
Calle Wilund	c540e36fe2	gms::inet_address: Make serialization ipv6 aware Because inet_address was initially hardcoded to ipv4, its wire format is not very forward compatible. Since we potentially need to communicate with older version nodes, we manually define the new serial format for inet_address to be: ipv4: 4 bytes address ipv6: 4 bytes marker 0xffffffff (invalid address) 16 bytes data -> address	2019-07-08 14:13:09 +00:00
Calle Wilund	e9816efe06	Remove usage of inet_address::raw_addr()	2019-07-08 14:13:09 +00:00
Calle Wilund	4ef940169f	Replace use of "ipv4_addr" with socket_address Allows the various sockets to use ipv6 address binding if so configured.	2019-07-08 14:13:09 +00:00
Asias He	37b3de4ea0	messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	a7c7ba9765	messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	dc92bda93b	messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support	2019-07-02 21:18:55 +08:00
Asias He	f312c95b74	messaging_service: Add do_make_sink_source helper It is used by the row level repair rpc stream verbs to make sink and source object.	2019-07-02 21:18:55 +08:00
Asias He	bc295a00a6	messaging_service: Add rpc stream verb for row level repair - REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM Get repair rows from follower nodes - REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM Put repair rows to follower nodes - REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM: Get full hashes from follower nodes	2019-07-02 21:18:55 +08:00
Asias He	3db136f81e	repair: Use the same schema version for repair master and followers Before this patch, repair master and followers use their own schema version at the point repair starts independently. The schemas can be different due to schema change. Repair uses the schema to serialize mutation_fragment and deserialize the mutation_fragment received from peer nodes. Using different schema version to serialize and deserialize cause undefined behaviour. To fix, we use the schema the repair master decides for all the repair nodes involved. On top of this patch, we could do another step to make sure all nodes has the latest schema. But let's do it in a separate patch. Fixes: #4549 Backports: 3.1	2019-06-18 18:27:21 +08:00
Asias He	b463d7039c	repair: Introduce get_combined_row_hash_response Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the repair_hash object. In the future, we will use set reconciliation algorithm to decode the full row hashes in working row buf. It is useful to return the number of rows inside working row buf in addition to the combined row hashes to make sure the decode is successful. It is also better to use a wrapper class for the verb response so we can extend the return values later more easily with IDL. Fixes #4526 Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>	2019-06-12 13:51:29 +03:00
Gleb Natapov	1d851a3892	messaging: catch an error that sending of CLIENT_ID may return Avoid a warning about unhandled exception. Message-Id: <20190506122718.GL21208@scylladb.com>	2019-05-06 18:13:51 +03:00
Paweł Dziepak	d47ea66ec6	messaging_service: add lz4_fragmented RPC compressor Seastar now supports two RPC compression algorithm: the original LZ4 one and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it to process large messages without fully linearising them. Since, RPC requests used by Scylla often contain user-provided data that potentially could be very large, LZ4_FRAGMENTED is a better choice for the default compression algorithm. Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>	2019-04-18 19:07:14 +03:00
Gleb Natapov	1abc50ad8a	messaging_service: make sure a client is unique for a destination Function messaging_service::get_rpc_client() suppose to either return existing client or create one and return it. The function is suppose to be atomic, so after checking that requested client does not exist it is safe to assume emplace() will succeed. But we saw bugs that made the function to not be atomic. Lets add an assert that will help to catch such bugs easier if they will happen in the future. Message-Id: <20190326115741.GX26144@scylladb.com>	2019-03-26 14:19:08 +02:00
Gleb Natapov	bb93d990ad	messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream Current code captures a reference to rpc::client in a continuation, but there is no guaranty that the reference will be valid when continuation runs. Capture shared pointer to rpc::client instead. Fixes #4350. Message-Id: <20190314135538.GC21521@scylladb.com>	2019-03-21 12:46:01 -03:00
Gleb Natapov	a70374d982	messaging_service: do not forget to close stream when sending it to another side failed Fixes #4124 Message-Id: <20190131091857.GC3172@scylladb.com>	2019-01-31 12:01:56 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Duarte Nunes	ede5742f9b	service/storage_proxy: Send view update backlog from replicas Change the inter-node protocol so we can propagate the view update backlog from a base replica to the coordinator through the mutation_done and mutation_failed verbs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Asias He	acc9ff8dce	messaging_service: Add RPC verbs for row level repair This patch adds the RPC verbs that are needed by the row level repair. The usage of those verbs are in the following patches. All the verbs for row level repair are sent by the repair master. Repair master asks repair slaves to create repair meta objects, a.k.a, repair_meta object, to store the repair meta data needed by row level repair algorithm. The repair meta object is identified by the IP address of the repair master and a uint32 number repair_meta_id chosen by repair master. When repair master restarts or is out of the cluster, repair slaves will detect it and remove all existing repair_meta for the repair master. When repair slave restarts, the existing repair_meta on the slave will be gone. The sync boundary used in the verbs is the position_in_partition of the last mutation_fragment. In each repair round, peers work on (last_sync_boundary, current_sync_boundary]	2018-12-12 16:49:01 +08:00
Asias He	063dfcda26	messaging_service: Add constructor for msg_addr Which takes the ip address and shard id.	2018-12-12 16:49:01 +08:00

1 2 3 4 5 ...

296 Commits