scylladb

Author	SHA1	Message	Date
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	10c8ca6ace	priority manager: separate streaming reads from writes Streaming has currently one class, that can be used to contain the read operations being generated by the streaming process. Those reads come from two places: - checksums (if doing repair) - reading mutations to be sent over the wire. Depending on the amount of data we're dealing with, that can generate a significant chunk of data, with seconds worth of backlog, and if we need to have the incoming writes intertwined with those reads, those can take a long time. Even if one node is only acting as a receiver, it may still read a lot for the checksums - if we're talking about repairs, those are coming from the checksums. However, in more complicated failure scenarios, it is not hard to imagine a node that will be both sending and receiving a lot of data. The best way to guarantee progress on both fronts, is to put both kinds of operations into different classes. This patch introduces a new write class, and rename the old read class so it can have a more meaningful name. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Asias He	f747df2aff	streaming: Fix rethrow in stream_transfer_task Fix bootstrap_test.py:TestBootstrap.failed_bootstap_wiped_node_can_join_test Logs on node 1: INFO 2016-03-11 15:53:43,287 [shard 0] gossip - FatClient 127.0.0.2 has been silent for 30000ms, removing from gossip INFO 2016-03-11 15:53:43,287 [shard 0] stream_session - stream_manager: Close all stream_session with peer = 127.0.0.2 in on_remove WARN 2016-03-11 15:53:43,498 [shard 0] stream_session - [Stream #4e411ba0-e75e-11e5-81f8-000000000000] stream_transfer_task: Fail to send STREAM_MUTATION_DONE to 127.0.0.2:0: std::runtime_error ([Stream #4e411ba0-e75e-11e5-81f8-000000000000] GOT STREAM_ MUTATION_DONE 127.0.0.1: Can not find stream_manager) terminate called without an active exception Backtrace on node 1: #0 0x00007fb74723da98 in raise () from /lib64/libc.so.6 #1 0x00007fb74723f69a in abort () from /lib64/libc.so.6 #2 0x00007fb74ab84aed in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x00007fb74ab82936 in ?? () from /lib64/libstdc++.so.6 #4 0x00007fb74ab82981 in std::terminate() () from /lib64/libstdc++.so.6 #5 0x00007fb74ab82be9 in __cxa_rethrow () from /lib64/libstdc++.so.6 #6 0x0000000000f3521e in streaming::stream_transfer_task::<lambda()>::<lambda(auto:44)>::operator()<std::__exception_ptr::exception_ptr> (ep=..., __closure=0x7ffce74d8630) at streaming/stream_transfer_task.cc:169 #7 do_void_futurize_apply<const streaming::stream_transfer_task::start()::<lambda()>::<lambda(auto:44)>&, std::__exception_ptr::exception_ptr> (func=...) at /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1142 #8 futurize<void>::apply<const streaming::stream_transfer_task::start()::<lambda()>::<lambda(auto:44)>&, std::__exception_ptr::exception_ptr> (func=...) at /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1190 #9 future<>::<lambda(auto:7&&)>::operator()<future<> > ( fut=fut@entry=<unknown type in /home/asias/src/cloudius-systems/scylla/build/release/scylla, CU 0xec84d00, DIE 0xee2561d>, __closure=__closure@entry=0x7ffce74d8630) at /home/asias/src/cloudius-systems/scylla/seastar/core/future.hh:1014 Message-Id: <1457684884-4776-2-git-send-email-asias@scylladb.com>	2016-03-11 11:14:05 +02:00
Asias He	a9ec752939	streaming: Reduce STREAM_MUTATION error logging There might be larger number of STREAM_MUTATION inflight. Log one error per column_family per range to avoid spam the log.	2016-03-10 10:56:48 +08:00
Asias He	d9ead889f3	streaming: Handle cf is deleted when sending STREAM_MUTATION_DONE In the preparation phase of streaming, we check that remote node has all the cf_id which are needed for the entire streaming process, including the cf_id which local node will send to remote node and wise versa. So, at later time, if the cf_id is missing, it must be that the cf_id is deleted. It is fine to ingore no_such_column_family exception. In this patch, we change the code to ignore at server side to avoid sending the exception back, to avoid handle exception in an IDL compatiable way. One thing we can improve is that the sender might know the cf is deleted later than the receiver does. In this case, the sender will send some more mutations if we send back the no_such_column_family back to the sender. However, since we do not throw exceptions in the receiver stream mutation handler, it will not cause a lot of overhead, the receiver will just ignore the mutation received. Fixes #979	2016-03-09 16:50:38 +08:00
Asias He	efa74dbae0	streaming: Do not send if the cf is deleted It is possible that a cf is deleted after we make the cf reader. Avoid sending them to avoid the unnecessary overhead to send them on the wire and the peer node to drop the received mutations.	2016-03-09 16:50:38 +08:00
Asias He	d146045bc5	Revert "Revert "streaming: Send mutations on all shards"" This brings back streaming on all shards. The bug in locator/abstract_replication_strategy is now fixed. This reverts commit `9f3061ade8`. Message-Id: <a79ce9cdd6f4af1c6088b89e1911b4b2ed1c10ae.1455589460.git.asias@scylladb.com>	2016-02-16 11:16:51 +02:00
Avi Kivity	9f3061ade8	Revert "streaming: Send mutations on all shards" This reverts commit `31d439213c`. Fixes #894. Conflicts: streaming/stream_manager.cc (may have undone part of `63a5aa6122`)	2016-02-09 18:26:14 +02:00
Asias He	31d439213c	streaming: Send mutations on all shards Currently, only the shard where the stream_plan is created on will send streaing mutations. To utilize all the available cores, we can make each shard send mutations which it is responsbile for. On the receiver side, we do not forward the mutations to the shard where the stream_session is created, so that we can avoid unnecessary forwarding. Note: the downside is that it is now harder to: 1) to track number of bytes sent and received 2) to update the keep alive timer upon receive of the STREAM_MUTATION To fix, we now store the sent/recieved bytes info on all shards. When the keep alive timer expires, we check if any progress has been made. Hopefully, this patch will make the streaming much faster and in turn make the repair/decommission/adding a node faster. Refs: https://github.com/scylladb/scylla/issues/849 Tested with decommission/repair dtest. Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>	2016-02-07 10:57:51 +02:00
Gleb Natapov	63a5aa6122	prevent superfluous frozen_mutation copying Sometimes frozen_mutation is copied while it can be moved instead. Fix those cases. Message-Id: <20160204165708.GI6705@scylladb.com>	2016-02-07 10:54:16 +02:00
Asias He	2f48d402e2	streaming: Remove unused commented code	2016-01-29 16:31:07 +08:00
Asias He	91e245edac	streaming: Initialize total_size in stream_transfer_task Also rename the private member to _total_size and _files	2016-01-29 16:31:07 +08:00
Asias He	c4bdb6f782	streaming: Wire up session progress The progress info is needed by JMX api.	2016-01-29 16:31:07 +08:00
Asias He	03aced39c4	streaming: Account number of bytes sent and received per session The API will consume it soon.	2016-01-27 18:16:58 +08:00
Asias He	e8b8b454df	streaming: Flatten streaming messages class namespace There are only two messages: prepare_message and outgoing_file_message. Actually only the prepare_message is the message we send on wire. Flatten the namespace.	2016-01-26 13:04:29 +08:00
Asias He	51fa717b8e	streaming: Get rid of file_message_header Again, we do not send sstable files, thus neither header info for sstables files. TODO: Estimate mutation size we sent.	2016-01-25 17:56:43 +08:00
Asias He	bdd6a69af7	streaming: Drop unused parameters - int connections_per_host Scylla does not create connections per stream_session, instead it uses rpc, thus connections_per_host is not relevant to scylla. - bool keep_ss_table_level - int repaired_at Scylla does not stream sstable files. They are not relevant to scylla.	2016-01-25 11:38:13 +08:00
Asias He	88e99e89d6	streaming: Add more debug info - Add debug for the peer address info - Add debug in stream_transfer_task and stream_receive_task - Add debug when cancel the keep_alive timer - Add debug for has_active_sessions in stream_result_future::maybe_complete	2016-01-22 07:43:16 +08:00
Asias He	2345cda42f	messaging_service: Rename shard_id to msg_addr Use shard_id as the destination of the messaging_service is confusing, since shard_id is used in the context of cpu id. Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>	2016-01-07 10:36:35 +02:00
Asias He	22d0525bc0	streaming: Get rid of the _from_ parameter Get this from cinfo.retrieve_auxiliary inside the rpc handler.	2015-12-31 11:25:08 +01:00
Asias He	89b79d44de	streaming: Get rid of the _connecting_ parameter messaging_service will use private ip address automatically to connect a peer node if possible. There is no need for the upper level like streaming to worry about it. Drop it simplifies things a bit.	2015-12-31 11:25:08 +01:00
Avi Kivity	827a4d0010	Merge "streaming: Invalidate cache upon receiving of stream" from Asias "When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484."	2015-12-28 10:24:46 +02:00
Asias He	c971fad618	streaming: Introduce keep alive timer for each stream_session If the session is idle for 10 minutes, close the session. This can detect the following hangs: 1) if the sending node is gone, the receiving peer will wait forever 2) if the node which should send COMPLETE_MESSAGE to the peer node is gone, the peer node will wait forever Fixes simple_kill_streaming_node_while_bootstrapping_test.	2015-12-24 20:34:44 +08:00
Asias He	2d32195c32	streaming: Invalidate cache upon receiving of stream When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484.	2015-12-21 14:44:13 +08:00
Asias He	242e5ea291	streaming: Ignore remote no_such_column_family for stream_transfer_task When we start to sending mutations for cf_id to remote node, remote node might do not have the cf_id anymore due to dropping of the cf for instance. We should not fail the streaming if this happens, since the cf does not exist anymore there is no point streaming it. Fixes #566	2015-11-18 15:12:23 +02:00
Asias He	6ac54a27dc	streaming: Skip non-exist cf for stream_transfer_task Skip sending the mutation if the cf is dropped after we call make_local_reader in stream_session::add_transfer_ranges(). Fix #550.	2015-11-16 16:48:35 +01:00
Asias He	860c7aff37	streaming: Print plan_id in logger	2015-11-10 15:39:34 +08:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Avi Kivity	b22a598efb	mutation_reader: make noncopyable Many mutation_reader implementations capture 'this', which, if copied, becomes invalid. Protect against this error my making mutation_reader a non-copyable object. Fix inadvertant copied around the code base.	2015-08-25 15:49:08 +03:00
Asias He	fd1c0e0bb3	streaming: Fix iterate and delete The problem is that in start_streaming_files we iterate the _transfers map, however in task.start() we can delete the task from _transfers: stream_transfer_task::start() -> stream_transfer_task::complete -> stream_session::task_completed -> _transfers.erase(completed_task.cf_id) To fix, we advance the iterator before we start the task. std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6 /usr/include/c++/5.1.1/bits/stl_tree.h:205 (this=this@entry=0x6000000dc290) at streaming/stream_transfer_task.cc:55 streaming::stream_session::start_streaming_files (this=this@entry=0x6000000ab500) at streaming/stream_session.cc:526 (this=0x6000000ab500, requests=std::vector of length 1, capacity 1 = {...}, summaries=std::vector of length 1, capacity 1 = {...}) at streaming/stream_session.cc:356 streaming/stream_session.cc:83	2015-08-17 11:00:30 +08:00
Asias He	d2e826d6e6	streaming: Log STREAM_MUTATION_DONE before sending it It is useful for debug.	2015-08-17 11:00:30 +08:00
Asias He	0f1f710b27	streaming: Introduce transfer_task_completed	2015-08-17 11:00:30 +08:00
Asias He	651200c123	streaming: Log exception It is easier to tell what is going wrong.	2015-08-17 10:52:30 +08:00
Asias He	aa012ba374	streaming: Send STREAM_MUTATION in parallel At the moment, when local node send a mutation to remote node, it will wait for remote node to apply the mutation and send back a response, then it will send the next mutation. This means the sender are sending mutations one by one. To optimize, we can make the sender send more mutations in parallel without waiting for the response. In order to apply back pressure from remote node, a per shard mutation send limiter is introduced so that the sender will not overwhelm the receiver.	2015-08-17 10:52:30 +08:00
Asias He	e13d93b2ff	streaming: Improve error handling in stream_transfer_task::complete	2015-08-10 14:49:34 +08:00
Asias He	c7c33a9f44	streaming: Add error handling for STREAM_MUTATION sending	2015-08-10 14:44:25 +08:00
Asias He	be4d9c63b1	streaming: Drop do_with in stream_transfer_task::start We can copy id instead, it is cheap.	2015-08-10 14:13:15 +08:00
Asias He	f9109c33ba	streaming: Implement stream_transfer_task completion logic	2015-07-21 16:12:54 +08:00
Asias He	f2960a7cb0	streaming: Send plan_id for STREAM_MUTATION We need this to find session associated with this frozen_mutation.	2015-07-21 16:12:54 +08:00
Asias He	ccb32ceec5	streaming: Add stream_transfer_task::complete	2015-07-21 16:12:54 +08:00
Asias He	8561315cf2	streaming: de-thread_local-ize logger	2015-07-21 16:12:54 +08:00
Asias He	857fa5ccbb	messaging_service: Add wrapper for STREAM_MUTATION verb	2015-07-16 17:19:51 +08:00
Asias He	d720dadf7b	streaming: Switch to use logger class	2015-07-14 20:56:28 +08:00
Asias He	e82bdf2995	streaming: Swith to use shared_ptr from std::shared_ptr Since our shared_ptr works with incomplete types now, switch to it.	2015-07-14 20:41:14 +08:00
Asias He	8fd8f39d63	streaming: Add more debug info for message exchange	2015-07-14 20:41:14 +08:00
Asias He	ca7f5ca5c9	streaming: Set proper dst_cpu_id in shard_id for PREPARE_MESSAGE and STREAM_MUTATION	2015-07-14 20:41:14 +08:00
Asias He	14ae9e66ae	streaming: Use shared_ptr to track back to stream_session I tried our lw_shared_ptr, the compiler complained endless usage of incomplete type stream_session. I can not include stream_session.hh everywhere due to circular dependency. For now, I'm using std::shared_ptr which works fine.	2015-07-14 20:41:14 +08:00
Asias He	b7b0aa3318	streaming: Negotiate core to core connection. In streaming code, we need core to core connection(the second connection from B to A). That is when node A initiates a stream to node B, it is possible that node A will transfer data to node B and vice verse, so we need two connections. When node A creates a tcp connection (within the messaging_service) to node B, we have a connection ip_a:core_a to ip_b:core_b. When node B creates a connection to node B, we can not guarantee it is ip_b:core_b to ip_a:core_a. Current messaging_service does not support core to core connection yet, although we use shard_id{ip, cpu_id} as the destination of the message. We can solve the issue in upper layer. We can pass extra cpu_id as a user msg. Node A sends stream_init_message with my_cpu_id = current_cpu_id Node B receives stream_init_message, it runs on whatever cpu this connection goes to, then it sends response back with Node B's current_cpu_id. After this, each node knows which cpu_id to send to each other. TODO: we need to handle the case when peer node reboots with different number of cpus.	2015-07-09 15:52:28 +08:00
Asias He	3256a21556	streaming: Use frozen_mutation to send mutations Each outgoing_file_message might contain multiple mutations. Send them one mutation per RPC call (using frozen_mutation), instead of one big outgoing_file_message per one RPC call.	2015-07-09 15:52:28 +08:00
Asias He	4718211d4a	streaming: Wire up stream_transfer_task::add_transfer_file Wire up with outgoing_file_message	2015-07-09 15:52:27 +08:00

1 2

53 Commits