If the peer node of a stream_session is restarted or removed we should
abort the streaming. It is better to hook gossip callback in the stream
manager than in each streamm_session.
For each stream_session, we pretend we are sending/receiving one file,
to make it compatible with nodetool. For receiving_files, the file name
is "rxnofile". For sending_files, the file name is "txnofile".
stream_manager::update_all_progress_info is introduced to update the
progress info of all the stream_sessions in the node. We need this
because streaming mutations are received on all the cores, but the
stream_session object is only on one of the cores. It adds overhead if
we update progress info in stream_session object whenever we receive a
streaming mutation. So, what we do now is when we really need the
progress info, we update the progress info in stream_session object.
With http://127.0.0.$i:10000/stream_manager/, it looks like below when
decommission node 3 in a 3 nodes cluster.
=========== GET NODE 1
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]
=========== GET NODE 2
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]
=========== GET NODE 3
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction":
"OUT", "file_name": "txnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key":
"txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.1", "peer":
"127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT",
"file_name": "txnofile", "session_index": 0, "total_bytes": 16755552,
"peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}],
"sending_summaries": [{"files": 1, "total_size": 0, "cf_id":
"869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state":
"PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.
Note: the downside is that it is now harder to:
1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION
To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.
Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.
Refs: https://github.com/scylladb/scylla/issues/849
Tested with decommission/repair dtest.
Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>
The idea behind the current 10 stream_mutations per core limitation is
to avoid streaming overwhelms the TCP connection and starves normal cql
verbs if the streaming mutations are big and takes long time to
complete.
Now that we use a standalone connection for streaming verbs, we can
increase the limitation.
Hopefully, this will fix#849.
This expose the initiated_streams and the receiving_streams in the
stream_manager so the API would have access to it.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
At the moment, when local node send a mutation to remote node, it will
wait for remote node to apply the mutation and send back a response,
then it will send the next mutation. This means the sender are sending
mutations one by one. To optimize, we can make the sender send more
mutations in parallel without waiting for the response. In order to
apply back pressure from remote node, a per shard mutation send limiter
is introduced so that the sender will not overwhelm the receiver.