- int connections_per_host
Scylla does not create connections per stream_session, instead it uses
rpc, thus connections_per_host is not relevant to scylla.
- bool keep_ss_table_level
- int repaired_at
Scylla does not stream sstable files. They are not relevant to scylla.
messaging_service will use private ip address automatically to connect a
peer node if possible. There is no need for the upper level like
streaming to worry about it. Drop it simplifies things a bit.
"When a node gain or regain responsibility for certain token ranges, streaming
will be performed, upon receiving of the stream data, the row cache
is invalidated for that range.
Refs #484."
The problem is that we set the session state to WAIT_COMPLETE in
send_complete_message's continuation, the peer node might send
COMPLETE_MESSAGE before we run the continuation, thus we set the wrong
status in COMPLETE_MESSAGE's handler and will not close the session.
Before:
GOT STREAM_MUTATION_DONE
receive task_completed
SEND COMPLETE_MESSAGE to 127.0.0.2:0
GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
complete: PREPARING -> WAIT_COMPLETE
GOT COMPLETE_MESSAGE Reply
maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE
After:
GOT STREAM_MUTATION_DONE
receive task_completed
maybe_completed: PREPARING -> WAIT_COMPLETE
SEND COMPLETE_MESSAGE to 127.0.0.2:0
GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
complete: WAIT_COMPLETE -> COMPLETE
Session with 127.0.0.2 is complete
If the session is idle for 10 minutes, close the session. This can
detect the following hangs:
1) if the sending node is gone, the receiving peer will wait forever
2) if the node which should send COMPLETE_MESSAGE to the peer node is
gone, the peer node will wait forever
Fixes simple_kill_streaming_node_while_bootstrapping_test.
When a node gain or regain responsibility for certain token ranges,
streaming will be performed, upon receiving of the stream data, the
row cache is invalidated for that range.
Refs #484.
1) Node A sends prepare message (msg1) to Node A
2) Node B sends prepare message (msg2) back to Node A
3) Node A prepares what to receive according to msg2
The issue is that, Node B might sends before Node A prepares to receive.
To fix, we send a PREPARE_DONE_MESSAGE after step 3 to notify
node B to start sending.
We use storage_proxy::mutate_locally() to apply the mutations when we
receive them. mutate_locally() will ignore the mutation if the cf does not
exist. We check in the prepare phase to make sure all the cf's exist.
stream_session::stream_session(inet_address peer_, inet_address connecting_,
int index_, bool keep_ss_table_level_)
: peer(peer_)
, connecting(connecting_)
, conn_handler(shared_from_this())
Calling shared_from_this() inside stream_session's constructor is
problematic. I got
Exiting on unhandled exception of type 'std::bad_weak_ptr': bad_weak_ptr
exceptions, with
auto session = std::make_shared<stream_session>(peer, connecting, size, _keep_ss_table_level)
Also, the logic in connection_handler is not very useful for us. The
sending and receiving of messages are handled using messaging_service.
There is no need to add another layer.
I tried our lw_shared_ptr, the compiler complained endless usage of
incomplete type stream_session. I can not include stream_session.hh
everywhere due to circular dependency.
For now, I'm using std::shared_ptr which works fine.
In streaming code, we need core to core connection(the second connection
from B to A). That is when node A initiates a stream to node B, it is
possible that node A will transfer data to node B and vice verse, so we
need two connections. When node A creates a tcp connection (within the
messaging_service) to node B, we have a connection ip_a:core_a to
ip_b:core_b. When node B creates a connection to node B, we can not
guarantee it is ip_b:core_b to ip_a:core_a.
Current messaging_service does not support core to core connection yet,
although we use shard_id{ip, cpu_id} as the destination of the message.
We can solve the issue in upper layer. We can pass extra cpu_id as a
user msg.
Node A sends stream_init_message with my_cpu_id = current_cpu_id
Node B receives stream_init_message, it runs on whatever cpu this
connection goes to, then it sends response back with Node B's
current_cpu_id.
After this, each node knows which cpu_id to send to each other.
TODO: we need to handle the case when peer node reboots with different
number of cpus.
This is a bit different from Origin. We always send back a
prepare_message even if the initializer requested no data from the
follower, to unify the handling.
It is used to register the streaming message handler once. We don't need
it in every stream_session instance. The side effect is making
stream_session copyable, which is needed in get_all_stream_sessions.