There are places in which we need to use the column family object many
times, with deferring points in between. Because the column family may
have been destroyed in the deferring point, we need to go and find it
again.
If we use lw_shared_ptr, however, we'll be able to at least guarantee
that the object will be alive. Some users will still need to check, if
they want to guarantee that the column family wasn't removed. But others
that only need to make sure we don't access an invalid object will be
able to avoid the cost of re-finding it just fine.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>
Wrapping ranges are a pain, so we are moving wrap handling to the edges.
Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.
Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.
Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
If the peer node of a stream_session is restarted or removed we should
abort the streaming. It is better to hook gossip callback in the stream
manager than in each streamm_session.
When the first time the keep alive timer fires, the _last_stream_bytes
btyes will be zero since it is the first time we update it. The keep
alive timer will be rearmed and fired again. The second time, we find
there is no progress, we close the session. The total idle time will be
2 * keep alive timer.
To make the idle time to close the session be more precise, we reduce
the interval to check the progess and close the session by checking last
time the progress is made.
Message-Id: <c959cffce0cc738a3d73caaf71d2adb709d46863.1456831616.git.asias@scylladb.com>
For each stream_session, we pretend we are sending/receiving one file,
to make it compatible with nodetool. For receiving_files, the file name
is "rxnofile". For sending_files, the file name is "txnofile".
stream_manager::update_all_progress_info is introduced to update the
progress info of all the stream_sessions in the node. We need this
because streaming mutations are received on all the cores, but the
stream_session object is only on one of the cores. It adds overhead if
we update progress info in stream_session object whenever we receive a
streaming mutation. So, what we do now is when we really need the
progress info, we update the progress info in stream_session object.
With http://127.0.0.$i:10000/stream_manager/, it looks like below when
decommission node 3 in a 3 nodes cluster.
=========== GET NODE 1
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]
=========== GET NODE 2
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]
=========== GET NODE 3
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction":
"OUT", "file_name": "txnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key":
"txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.1", "peer":
"127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT",
"file_name": "txnofile", "session_index": 0, "total_bytes": 16755552,
"peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}],
"sending_summaries": [{"files": 1, "total_size": 0, "cf_id":
"869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state":
"PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.
Note: the downside is that it is now harder to:
1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION
To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.
Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.
Refs: https://github.com/scylladb/scylla/issues/849
Tested with decommission/repair dtest.
Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>
It is 1:1 mapping between session_info and stream_session. Putting
session_info inside stream_session, we can get rid of the
stream_coordinator::host_streaming_data class.
There are only two messages: prepare_message and outgoing_file_message.
Actually only the prepare_message is the message we send on wire.
Flatten the namespace.
- int connections_per_host
Scylla does not create connections per stream_session, instead it uses
rpc, thus connections_per_host is not relevant to scylla.
- bool keep_ss_table_level
- int repaired_at
Scylla does not stream sstable files. They are not relevant to scylla.
messaging_service will use private ip address automatically to connect a
peer node if possible. There is no need for the upper level like
streaming to worry about it. Drop it simplifies things a bit.
"When a node gain or regain responsibility for certain token ranges, streaming
will be performed, upon receiving of the stream data, the row cache
is invalidated for that range.
Refs #484."
The problem is that we set the session state to WAIT_COMPLETE in
send_complete_message's continuation, the peer node might send
COMPLETE_MESSAGE before we run the continuation, thus we set the wrong
status in COMPLETE_MESSAGE's handler and will not close the session.
Before:
GOT STREAM_MUTATION_DONE
receive task_completed
SEND COMPLETE_MESSAGE to 127.0.0.2:0
GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
complete: PREPARING -> WAIT_COMPLETE
GOT COMPLETE_MESSAGE Reply
maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE
After:
GOT STREAM_MUTATION_DONE
receive task_completed
maybe_completed: PREPARING -> WAIT_COMPLETE
SEND COMPLETE_MESSAGE to 127.0.0.2:0
GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
complete: WAIT_COMPLETE -> COMPLETE
Session with 127.0.0.2 is complete
If the session is idle for 10 minutes, close the session. This can
detect the following hangs:
1) if the sending node is gone, the receiving peer will wait forever
2) if the node which should send COMPLETE_MESSAGE to the peer node is
gone, the peer node will wait forever
Fixes simple_kill_streaming_node_while_bootstrapping_test.
When a node gain or regain responsibility for certain token ranges,
streaming will be performed, upon receiving of the stream data, the
row cache is invalidated for that range.
Refs #484.
1) Node A sends prepare message (msg1) to Node A
2) Node B sends prepare message (msg2) back to Node A
3) Node A prepares what to receive according to msg2
The issue is that, Node B might sends before Node A prepares to receive.
To fix, we send a PREPARE_DONE_MESSAGE after step 3 to notify
node B to start sending.
We use storage_proxy::mutate_locally() to apply the mutations when we
receive them. mutate_locally() will ignore the mutation if the cf does not
exist. We check in the prepare phase to make sure all the cf's exist.
stream_session::stream_session(inet_address peer_, inet_address connecting_,
int index_, bool keep_ss_table_level_)
: peer(peer_)
, connecting(connecting_)
, conn_handler(shared_from_this())
Calling shared_from_this() inside stream_session's constructor is
problematic. I got
Exiting on unhandled exception of type 'std::bad_weak_ptr': bad_weak_ptr
exceptions, with
auto session = std::make_shared<stream_session>(peer, connecting, size, _keep_ss_table_level)
Also, the logic in connection_handler is not very useful for us. The
sending and receiving of messages are handled using messaging_service.
There is no need to add another layer.