scylladb

Author	SHA1	Message	Date
Asias He	49a73aa2fc	streaming: Move stream_mutation_fragments_cmd to a new file (#4812 ) Avoid including the lengthy stream_session.hh in messaging_service. More importantly, fix the build because currently messaging_service.cc and messaging_service.hh does not include stream_mutation_fragments_cmd. I am not sure why it builds on my machine. Spotted this when backporting the "streaming: Send error code from the sender to receiver" to 3.0 branch. Refs: #4789	2019-08-07 14:59:46 +02:00
Asias He	bac987e32a	streaming: Send error code from the sender to receiver In case of error on the sender side, the sender does not propagate the error to the receiver. The sender will close the stream. As a result, the receiver will get nullopt from the source in get_next_mutation_fragment and pass mutation_fragment_opt with no value to the generating_reader. In turn, the generating_reader generates end of stream. However, the last element that the generating_reader has generated can be any type of mutation_fragment. This makes the sstable that consumes the generating_reader violates the mutation_fragment stream rule. To fix, we need to propagate the error. However RPC streaming does not support propagate the error in the framework. User has to send an error code explicitly. Fixes: #4789	2019-08-06 16:54:56 +02:00
Asias He	5d3e4d7b73	messaging_service: Check if messaging_service is stopped before get_rpc_client get_rpc_client assumes the messaging_service is not stopped. We should check is_stopping() before we call get_rpc_client. We do such check in existing code, e.g., send_message and friends. Do the same check in the newly introduced make_sink_and_source_for_stream_mutation_fragments() and friends for row level repair. Fixes: #4767	2019-07-31 11:44:57 +03:00
Calle Wilund	c540e36fe2	gms::inet_address: Make serialization ipv6 aware Because inet_address was initially hardcoded to ipv4, its wire format is not very forward compatible. Since we potentially need to communicate with older version nodes, we manually define the new serial format for inet_address to be: ipv4: 4 bytes address ipv6: 4 bytes marker 0xffffffff (invalid address) 16 bytes data -> address	2019-07-08 14:13:09 +00:00
Calle Wilund	e9816efe06	Remove usage of inet_address::raw_addr()	2019-07-08 14:13:09 +00:00
Calle Wilund	4ef940169f	Replace use of "ipv4_addr" with socket_address Allows the various sockets to use ipv6 address binding if so configured.	2019-07-08 14:13:09 +00:00
Asias He	37b3de4ea0	messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	a7c7ba9765	messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support It is used by row level repair.	2019-07-02 21:18:55 +08:00
Asias He	dc92bda93b	messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support	2019-07-02 21:18:55 +08:00
Asias He	f312c95b74	messaging_service: Add do_make_sink_source helper It is used by the row level repair rpc stream verbs to make sink and source object.	2019-07-02 21:18:55 +08:00
Asias He	bc295a00a6	messaging_service: Add rpc stream verb for row level repair - REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM Get repair rows from follower nodes - REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM Put repair rows to follower nodes - REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM: Get full hashes from follower nodes	2019-07-02 21:18:55 +08:00
Asias He	3db136f81e	repair: Use the same schema version for repair master and followers Before this patch, repair master and followers use their own schema version at the point repair starts independently. The schemas can be different due to schema change. Repair uses the schema to serialize mutation_fragment and deserialize the mutation_fragment received from peer nodes. Using different schema version to serialize and deserialize cause undefined behaviour. To fix, we use the schema the repair master decides for all the repair nodes involved. On top of this patch, we could do another step to make sure all nodes has the latest schema. But let's do it in a separate patch. Fixes: #4549 Backports: 3.1	2019-06-18 18:27:21 +08:00
Asias He	b463d7039c	repair: Introduce get_combined_row_hash_response Currently, REPAIR_GET_COMBINED_ROW_HASH RPC verb returns only the repair_hash object. In the future, we will use set reconciliation algorithm to decode the full row hashes in working row buf. It is useful to return the number of rows inside working row buf in addition to the combined row hashes to make sure the decode is successful. It is also better to use a wrapper class for the verb response so we can extend the return values later more easily with IDL. Fixes #4526 Message-Id: <93be47920b523f07179ee17e418760015a142990.1559771344.git.asias@scylladb.com>	2019-06-12 13:51:29 +03:00
Gleb Natapov	1d851a3892	messaging: catch an error that sending of CLIENT_ID may return Avoid a warning about unhandled exception. Message-Id: <20190506122718.GL21208@scylladb.com>	2019-05-06 18:13:51 +03:00
Paweł Dziepak	d47ea66ec6	messaging_service: add lz4_fragmented RPC compressor Seastar now supports two RPC compression algorithm: the original LZ4 one and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it to process large messages without fully linearising them. Since, RPC requests used by Scylla often contain user-provided data that potentially could be very large, LZ4_FRAGMENTED is a better choice for the default compression algorithm. Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>	2019-04-18 19:07:14 +03:00
Gleb Natapov	1abc50ad8a	messaging_service: make sure a client is unique for a destination Function messaging_service::get_rpc_client() suppose to either return existing client or create one and return it. The function is suppose to be atomic, so after checking that requested client does not exist it is safe to assume emplace() will succeed. But we saw bugs that made the function to not be atomic. Lets add an assert that will help to catch such bugs easier if they will happen in the future. Message-Id: <20190326115741.GX26144@scylladb.com>	2019-03-26 14:19:08 +02:00
Gleb Natapov	bb93d990ad	messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream Current code captures a reference to rpc::client in a continuation, but there is no guaranty that the reference will be valid when continuation runs. Capture shared pointer to rpc::client instead. Fixes #4350. Message-Id: <20190314135538.GC21521@scylladb.com>	2019-03-21 12:46:01 -03:00
Gleb Natapov	a70374d982	messaging_service: do not forget to close stream when sending it to another side failed Fixes #4124 Message-Id: <20190131091857.GC3172@scylladb.com>	2019-01-31 12:01:56 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Duarte Nunes	ede5742f9b	service/storage_proxy: Send view update backlog from replicas Change the inter-node protocol so we can propagate the view update backlog from a base replica to the coordinator through the mutation_done and mutation_failed verbs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Botond Dénes	1865e5da41	treewide: remove include database.hh from headers where possible Many headers don't really need to include database.hh, the include can be replaced by forward declarations and/or including the actually needed headers directly. Some headers don't need this include at all. Each header was verified to be compilable on its own after the change, by including it into an empty `.cc` file and compiling it. `.cc` files that used to get `database.hh` through headers that no longer include it were changed to include it themselves.	2018-12-14 08:03:57 +02:00
Asias He	acc9ff8dce	messaging_service: Add RPC verbs for row level repair This patch adds the RPC verbs that are needed by the row level repair. The usage of those verbs are in the following patches. All the verbs for row level repair are sent by the repair master. Repair master asks repair slaves to create repair meta objects, a.k.a, repair_meta object, to store the repair meta data needed by row level repair algorithm. The repair meta object is identified by the IP address of the repair master and a uint32 number repair_meta_id chosen by repair master. When repair master restarts or is out of the cluster, repair slaves will detect it and remove all existing repair_meta for the repair master. When repair slave restarts, the existing repair_meta on the slave will be gone. The sync boundary used in the verbs is the position_in_partition of the last mutation_fragment. In each repair round, peers work on (last_sync_boundary, current_sync_boundary]	2018-12-12 16:49:01 +08:00
Asias He	063dfcda26	messaging_service: Add constructor for msg_addr Which takes the ip address and shard id.	2018-12-12 16:49:01 +08:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Gleb Natapov	d144e6ceac	messaging_service: enable port load balancing algorithm for RPC server In a homogeneous cluster this will reduce number of internal cross-shard hops per request since RPC calls will arrive to correct shard. Message-Id: <20181118150817.GF2062@scylladb.com>	2018-11-20 16:15:12 +00:00
Asias He	7f826d3343	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>	2018-10-15 22:03:28 +01:00
Calle Wilund	3cb50c861d	messaging_service: Make rpc streaming sink respect tls connection Fixes #3787 Message service streaming sink was created using direct call to rpc::client::make_sink. This in turn needs a new socker, which it creates completely ignoring what underlying transport is active for the client in question. Fix by retaining the tls credential pointer in the client wrapper, and using this in a sink method to determine whether to create a new tls socker, or just go ahead with a plain one. Message-Id: <20181010003249.30526-1-calle@scylladb.com>	2018-10-10 12:55:28 +03:00
Avi Kivity	4553238653	messaging: fix unbounded allocation in TLS RPC server The non-TLS RPC server has an rpc::resource_limits configuration that limits its memory consumption, but the TLS server does not. That means a many-node TLS configuration can OOM if all nodes gang up on a single replica. Fix by passing the limits to the TLS server too. Fixes #3757. Message-Id: <20180907192607.19802-1-avi@scylladb.com>	2018-09-10 12:11:16 +01:00
Duarte Nunes	a025bf6a7d	Merge seastar upstream Seastar introduced a "compat" namespace, which conflicts with Scylla's own "compat" namespaces. The merge thus includes changes to scope uses of Scylla's "compat" namespaces. * seastar 8ad870f...9bb1611 (5): > util/variant_utils: Ensure variant_cast behaves well with rvalues > util/std-compat: Fix infinite recursion > doc/tutorial: Undo namespace changes > util/variant_utils: Add cast_variant() > Add compatbility with C++17's library types Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-14 13:07:09 +01:00
Avi Kivity	c4013f6fe1	messaging: categorize more streaming/repair verbs as streaming Since the messaging service will assign a scheduling group based on the client index, it's more important now to get the verbs categorized correctly. Re-categorize REPLICATION_FINISHED, REPAIR_CHECKSUM_RANGE, and most importantly STREAM_MUTATION_FRAGMENTS to the repair/streaming oriented connections so we get the correct scheduling.	2018-07-15 15:44:10 +03:00
Avi Kivity	ff3d7839ab	messaging: remove default when computing rpc client index A default means that when adding new verbs, we may forget to categorize a verb correctly. Without the default, the compiler will complain due to -Wswitch.	2018-07-15 15:40:29 +03:00
Avi Kivity	fe2db68be8	messaging: convert do_get_rpc_client_idx into a switch A switch is more readable for multiple choice with no clearly preferred choice.	2018-07-15 15:26:50 +03:00
Avi Kivity	3b1e04091c	messaging: choose connection index via a look-up table Looking up is faster than a bunch of if()s.	2018-07-15 15:21:06 +03:00
Avi Kivity	8ee807321f	Merge "scylla streaming with rpc streaming" from Asias " This work is on top of Gleb's rpc streaming which is merged recently. What this series does is to replace scylla streaming service's data plane to use the new rpc streaming instead of the old rpc verb to send the mutations for scylla streaming. Other parts of scylla streaming, the control plane, are not changed. In my test, to bootstrap a new node to the existing one node cluster, smp 2, scylla stores data on ramdisk to minimize disk io impact. I saw x2 improvment in streaming bandwidth. Before: [shard 0] stream_session - [Stream #2ae92320-5fc8-11e8-911a-000000000000] Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1570312 KiB, 109521.02 KiB/s [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 14.338 seconds After: [shard 0] stream_session - [Stream #e5589ac0-5fc7-11e8-b463-000000000000] Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1546875 KiB, 220415.36 KiB/s [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 7.018 seconds Tests: dtest update_cluster_layout_tests.py Fixes: #3591 " * tag 'asias/scylla_streaming_with_rpc_streaming_v8' of github.com:scylladb/seastar-dev: streaming: Add rpc streaming support storage_service: Introduce STREAM_WITH_RPC_STREAM feature streaming: Add estimate_partitions to send_info messaging_service: Add streaming with rpc streaming support messaging_service: Add streaming_domain database: Add add_sstable_and_update_cache database: Add make_streaming_sstable_for_write	2018-07-15 12:36:52 +03:00
Avi Kivity	8c993e0728	messaging: tag RPC services with scheduling groups Assign a scheduling_group for each RPC service. Assignement is done by connection (get_rpc_client_idx()) - all verbs on the same connection are assigned the same group. While this may seem arbitrary, it avoids priority inversion; if two verbs on the same connection have different scheduling groups, the verb with the low shares may cause a backlog and stall the connection, including following requests from verbs that ought to have higher shares. The scheduling_group parameters are encapsulated in different classes as they are passed around to avoid adding dependencies. Message-Id: <20180708140433.6426-1-avi@scylladb.com>	2018-07-13 13:57:08 +02:00
Asias He	ddfb4590ce	messaging_service: Add streaming with rpc streaming support Preparation for adding rpc streaming in scylla streaming. - register_stream_mutation_fragments is used to register the rpc streaming verb - make_sink_and_source_for_stream_mutation_fragments is used to get the sink and source object for the sender - make_sink_for_stream_mutation_fragments is used to get a sink object for the receiver	2018-07-13 08:36:46 +08:00
Asias He	671e1b08fe	messaging_service: Add streaming_domain The rpc streaming needs a streaming_domain id for the same logical server. Chose one for our messaging service.	2018-07-13 08:36:46 +08:00
Gleb Natapov	646e400918	Provide available memory size to messaging_service object during creation	2018-06-11 15:34:13 +03:00
Avi Kivity	dd12214628	messaging_service: move msg_addr into its own header file Make it possible to use msg_addr without depending on messaging_service.hh.	2018-03-12 20:05:23 +02:00
Avi Kivity	cd668061fc	storage_service: remove system_keyspace.hh include Re-distribute include among the files that really need it.	2018-03-11 18:53:49 +02:00
Duarte Nunes	440ea56010	message/messaging_service: Specify algorithm when requesting digest While not strictly needed, specify which algorithm to use when request a digest from a remote node. This is more flexible than relying on a cluster wide feature, although that's what we'll do in subsequent patches. It also makes the verb more consistent with the data request. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Glauber Costa	08a0c3714c	allow request-specific read timeouts in storage proxy reads Timeouts are a global property. However, for tables in keyspaces like the system keyspace, we don't want to uphold that timeout--in fact, we wan't no timeout there at all. We already apply such configuration for requests waiting in the queued sstable queue: system keyspace requests won't be removed. However, the storage proxy will insert its own timeouts in those requests, causing them to fail. This patch changes the storage proxy read layer so that the timeout is applied based on the column family configuration, which is in turn inherited from the keyspace configuration. This matches our usual way of passing db parameters down. In terms of implementation, we can either move the timeout inside the abstract read executor or keep it external. The former is a bit cleaner, the the latter has the nice property that all executors generated will share the exact same timeout point. In this patch, we chose the latter. We are also careful to propagate the timeout information to the replica. So even if we are talking about the local replica, when we add the request to the concurrency queue, we will do it in accordance with the timeout specified by the storage proxy layer. After this patch, Scylla is able to start just fine with very low timeouts--since read timeouts in the system keyspace are now ignored. Fixes #2462 Implementation notes, and general comments about open discussion in 2462: * Because we are not bypassing the timeout, just setting it high enough, I consider the concerns about the batchlog moot: if we fail for any other reason that will be propagated. Last case, because the timeout is per-CF, we could do what we do for the dirty memory manager and move the batchlog alone to use a different timeout setting. * Storage proxy likes specifying its timeouts as a time_point, whereas when we get low enough as to deal with the read_concurrency_config, we are talking about deltas. So at some point we need to convert time_points to durations. We do that in the database query functions. v2: - use per-request instead of per-table timeouts. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Vlad Zolotarov	be6f8be9cb	messaging_service: fix a mutli-NIC support Don't enforce the outgoing connections from the 'listen_address' interface only. If 'local_address' is given to connect() it will enforce it to use a particular interface to connect from, even if the destination address should be accessed from a different interface. If we don't specify the 'local_address' the source interface will be chosen according to the routing configuration. Fixes #3066 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1513372688-21595-1-git-send-email-vladz@scylladb.com>	2017-12-17 10:51:20 +02:00
Gleb Natapov	16964de1f3	storage_proxy: fail read/write requests early if it cannot be completed due to errors If errors make reaching CL impossible a request can be aborted earlier without waiting for timeout.	2017-12-05 16:46:25 +02:00
Duarte Nunes	1fbe9dc851	message/messaging_service: Close all server sockets We were stopping the loop prematurely. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20171127181417.8167-1-duarte@scylladb.com>	2017-11-28 11:08:08 +02:00
Asias He	8fa35d6ddf	messaging_service: Get rid of timeout and retry logic for streaming verb With the "Use range_streamer everywhere" (`7217b7ab36`) seires, all the user of streaming now do streaming with relative small ranges and can retry streaming at higher level. There are problems with timeout and retry at RPC verb level in streaming: 1) Timeout can be false negative. 2) We can not cancel the send operations which are already called. When user aborts the streaming, the retry logic keeps running for a long time. This patch removes all the timeout and retry logic for streaming verbs. After this, the timeout is the job of TCP, the retry is the job of the upper layer. Message-Id: <df20303c1fa728dcfdf06430417cf2bd7a843b00.1503994267.git.asias@scylladb.com>	2017-08-29 17:20:00 +03:00
Avi Kivity	3edec66903	Revert "repair: Make send_repair_checksum_range timeout" This reverts commit `98757069a5`. We have the failure detector which will detect an unresponsive node and fail the RPC. Adding a timeout can just introduce false positives.	2017-08-06 13:09:36 +03:00
Asias He	98757069a5	repair: Make send_repair_checksum_range timeout If the verb never returns the repair will hangs forever. Make it use the timeout version of the send_message. Fixes #2662	2017-08-02 21:41:50 +08:00
Duarte Nunes	85e85ec72e	Don't catch polymorphic exceptions by value It makes gcc a very sad compiler. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170726172053.5639-2-duarte@scylladb.com>	2017-07-27 09:39:58 +03:00

1 2 3 4 5 ...

270 Commits