1) Node A sends prepare message (msg1) to Node A
2) Node B sends prepare message (msg2) back to Node A
3) Node A prepares what to receive according to msg2
The issue is that, Node B might sends before Node A prepares to receive.
To fix, we send a PREPARE_DONE_MESSAGE after step 3 to notify
node B to start sending.
The problem is that in start_streaming_files we iterate the _transfers
map, however in task.start() we can delete the task from _transfers:
stream_transfer_task::start() -> stream_transfer_task::complete ->
stream_session::task_completed -> _transfers.erase(completed_task.cf_id)
To fix, we advance the iterator before we start the task.
std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from
/lib64/libstdc++.so.6
/usr/include/c++/5.1.1/bits/stl_tree.h:205
(this=this@entry=0x6000000dc290) at streaming/stream_transfer_task.cc:55
streaming::stream_session::start_streaming_files
(this=this@entry=0x6000000ab500) at streaming/stream_session.cc:526
(this=0x6000000ab500, requests=std::vector of length 1, capacity 1 =
{...}, summaries=std::vector of length 1, capacity 1 = {...})
at streaming/stream_session.cc:356
streaming/stream_session.cc:83
At the moment, when local node send a mutation to remote node, it will
wait for remote node to apply the mutation and send back a response,
then it will send the next mutation. This means the sender are sending
mutations one by one. To optimize, we can make the sender send more
mutations in parallel without waiting for the response. In order to
apply back pressure from remote node, a per shard mutation send limiter
is introduced so that the sender will not overwhelm the receiver.
Otherwise code in stream_session.cc
_transfers.emplace(cf_id, stream_transfer_task(shared_from_this(), cf_id)).first;
will copy the stream_transfer_task and in turn copy the
outgoing_file_message which is not copyable due to the semaphore member.
Thanks avi for figuring this out.
Now, make_local_reader does not need partition_range to be alive when we
read the mutation reader. No need to store it in stream_detail for its
lifetime.
We use storage_proxy::mutate_locally() to apply the mutations when we
receive them. mutate_locally() will ignore the mutation if the cf does not
exist. We check in the prepare phase to make sure all the cf's exist.
Thanks to the new mutation reader (storage_proxy::make_local_reader), we
can read mutations for a cf on all shard. This simplifies the sharding
handling a lot. When user of streaming creates a stream_plan on any
shard, it will send data from all shards to remote node and receive
data from all shards on remote node.
It should be moved to i_partitioner.hh, but to do that range<> has to
be first moved out of query-request.hh to break cyclic dependency.
I didn't want to cause conflicts with in-flight patches to range<>.
range::is_wrap_around() and range::contains() rely on total ordering
on values to work properly. Current ring_position_comparator was only
imposing a weak ordering (token positions equal to all key positions
with that token).
range::before() and range::after() can't work for weak ordering. If
the bound is exclusive, we don't know if user-provided token position
is inside or outside.
Also, is_wrap_around() can't properly detect wrap around in all
cases. Consider this case:
(1) ]A; B]
(2) [A; B]
For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is
not. Without total ordering between A and B, range::is_wrap_around() can't
tell that.
I think the simplest soution is to define a total ordering on
ring_position by making token positions positioned either before or
after all keys with that token.
vector(size_type count) constructs the container with count
default-inserted instances of T. So, current code will end up with 2*num
elements which is wrong.