In each gossip round, i.e., gossiper::run(), we do:
1) send syn message
2) peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
apply_state_locally
mark_alive
send_gossip_echo
handle_major_state_change
on_restart
mark_alive
send_gossip_echo
mark_dead
on_dead
on_join
apply_new_states
do_on_change_notifications
on_change
4) send back ack2 message
5) peer node: process ack2 message
apply_state_locally
At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.
In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.
With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).
Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.
Refs: #900
* seastar 10e09b0...2e041c2 (7):
> Merge "Change app_template::run() to terminate when callback is done" from Tomasz
> resource: Fix compilation for hwloc version 1.8.0
> memory: Fix infinite recursion when throwing std::bad_alloc
> core/reactor: Throw the right error code when connect() fails
> future: improve exception safety
> xen: add missing virtual destructors
> circular_buffer: do not destroy uninitialized object
app_template::run() users updated to call app_template::run_depracated().
It is built on top of seastar rpc infrastructure. I've sorted out all
the message VERBs which Origin use. All of them can be implemented using
this messaging_service.
Each Verb contains a handler. There are two types of handlers, one
will return a message back to sender, the other will not. The former
can be registered using ms.register_handler(), the latter can be
registered using ms.register_handler_oneway().
Usage example:
To use messaging_service to send a message. All you need is:
messaging_service& ms = get_local_messaging_service();
1) To register a message hander:
ms.register_handler(messaging_verb::ECHO, [] (int x, long y) {
print("Server got echo msg = (%d, %ld) \n", x, y);
std::tuple<int, long> ret(x*x, y*y);
return make_ready_future<decltype(ret)>(std::move(ret));
});
ms.register_handler_oneway(messaging_verb::GOSSIP_SHUTDOWN, [] (empty_msg msg) {
print("Server got shutdown msg = %s\n", msg);
return messaging_service::no_wait();
});
2) To send a message:
using RetMsg = std::tuple<int, long>;
return ms.send_message<RetMsg>(messaging_verb::ECHO, id, msg1, msg2).then([] (RetMsg msg) {
print("Client sent echo got reply = (%d , %ld)\n", std::get<0>(msg), std::get<1>(msg));
return sleep(100ms).then([]{
return make_ready_future<>();
});
});
return ms.send_message_oneway<void>(messaging_verb::GOSSIP_SHUTDOWN, std::move(id), std::move(msg)).then([] () {
print("Client sent gossip_shutdown got reply = void\n");
return make_ready_future<>();
});
Tests:
send to cpu 0
$ ./message --server 127.0.0.1 --cpuid 0 --smp 2
send to cpu 1
$ ./message --server 127.0.0.1 --cpuid 1 --smp 2