For unknown reasons, I saw gossip syn message got rpc timeout erros when
the cluster is under heavy cassandra-strss stress.
Using a standalone tcp connection seems to fix the issue.
When the cluster is under heavy load, the time to exchange a gossip
message might take longer than 1s. Let's make the timeout longer for now
before we can solve the large delay of gossip message issue.
We should ignore equal and less than operators for shard_id as well.
Within a 3 nodes cluster, each node has 4 cpus, on first node
Before:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp 0 0 172.30.0.99:36998 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:36772 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:40125 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:60182 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:38013 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:51997 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:56532 172.30.0.100:7000 ESTABLISHED
After:
[fedora@ip-172-30-0-99 ~]$ netstat -nt|grep 100\:7000
tcp 0 0 172.30.0.99:45661 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:57395 172.30.0.100:7000 ESTABLISHED
tcp 0 0 172.30.0.99:37807 172.30.0.100:7000 ESTABLISHED
tcp 0 36 172.30.0.99:50567 172.30.0.100:7000 ESTABLISHED
Each shard of a node is supposed to have 1 connection to a peer node,
thus each node will have #cpu connections to a peer node.
With this patch, the cluster is much more stable than before on AWS. So
far, I see no timeout in the gossip syn message exchange.
Pointer to messageing_service object is stored in each request
continuation, so the object destruction should not happen while
any of these continuations is scheduled. Use shared pointer instead.
Fixes#273.
Pointer to messageing_service object is stored in each request
continuation, so the object destruction should not happen while any of
these continuations is scheduled. Use gate object to ensure that.
There can be multiple sends underway when first one detects an error and
destroys rpc client, but the rpc client is still in use by other sends.
Fix this by making rpc client pointer shared and hold on it for each
send operation.
Since we do not support shard to shard connections at the moment, ip
address should fully decide if a connection to a remote node exists or
not. messaging_service maintains connections to remote node using
std::unordered_map<shard_id, shard_info, shard_id::hash> _clients;
With this patch, we can possibly reduce number of tcp connections
between two nodes.
Instead of just destroying the client, stop() it first and wait for that
to complete. Otherwise any connect in progress, or any requests in progress,
will access a destroyed object (in practice it will fail earlier when
the _stopped promise is destroyed without having been fulfilled).
Depends on rpc client stop patch.
This patch introduce init.cc file which hosts all the initialization
code. The benefits are 1) we can share initialization code with tests
code. 2) all the service startup dependency / order code is in one
single place instead of everywhere.
It is used when a stream_transfer_task sends all the mutations inside
the messages::outgoing_file_message's it contains to notify the remote
peer this stream_transfer_task is done.
config.hh changes rapidly, so don't force lots of recompiles by including it.
Need to place seed_provider_type in namespace scope, so we can forward
declare it for that.
The "v" parameter points to uninitialized object in RPC deserializer,
but db::serializer<>::read() variant with output parameter assumes it
is initialized.
Spotted by Gleb. Should fix#21.