Commit Graph

215 Commits

Author SHA1 Message Date
Asias He
3a36ec33db gossip: Wait longer for seed node during boot up
When start a cluster on AWS, the seed node might get ready after
non-seed nodes is ready to contact it. Wait for seed node longer to make
the boot up process more robust.
2015-09-28 11:11:11 +08:00
Asias He
817c138034 gossip: Add get_current_heart_beat_version interface
HTTP API will use it.
2015-09-28 09:38:22 +08:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Asias He
c44afca3d8 gossip: Make is_dead_state take const reference 2015-09-11 15:43:27 +08:00
Asias He
1f0542931e gossip: Fix handle_major_state_change
Modify the state in the map of endpoint_state_map instead of the local
variable.
2015-09-11 15:43:27 +08:00
Asias He
a31d3aa7ee gossip: Pass reference in mark_alive and real_mark_alive
We need to modify the state.
2015-09-11 15:43:27 +08:00
Asias He
89f2959536 gossip: Rework stop() and shutdown()
Consolidate stop() and shutdown() into one function.

Fix crash:

scylla: urchin/seastar/core/future.hh:315: void
future_state<>::set(): Assertion `_u.st == state::future' failed.

=== stop gossip
$ curl -X DELETE --header "Accept: application/json"
"http://127.0.0.1:10000/storage_service/gossiping"

=== start gossip
$ curl -X POST --header "Content-Type: application/json" --header
"Accept: application/json"
"http://127.0.0.1:10000/storage_service/gossiping"
2015-09-08 12:20:53 +08:00
Asias He
247e9109d9 gossip: Introduce uninit_messaging_service_handler
It is useful in gossip shutdown process.
2015-09-08 12:19:06 +08:00
Asias He
7cc768a864 gossip: Fix wrong cluster name and partitioner name
Right now, gossip returns hard coded cluster and partitioner name.

  sstring get_cluster_name() {
      // FIXME: DatabaseDescriptor.getClusterName()
      return "my_cluster_name";
  }
  sstring get_partitioner_name() {
      // FIXME: DatabaseDescriptor.getPartitionerName()
      return "my_partitioner_name";
  }

Fix it by setting the correct name from configure option.

With this

   cqlsh 127.0.0.$i -e "SELECT * from system.local;

returns correct cluster_name.

Fixes #291
2015-09-07 09:21:18 +03:00
Asias He
8cff2318dc gossip: Add timeout support for send_echo 2015-09-06 16:35:11 +08:00
Asias He
2a06214306 gossip: Switch to use rpc timeout for send_gossip_digest_syn
Timeout support was added to gossip message by using semaphore's
timeout support, now that rpc has timeout support, switch to it.
2015-09-06 16:34:41 +08:00
Asias He
16522bc2da failure_detector: Protect failure_detector from been destroyed while in use
failure_detector::{interpret, force_conviction} will call into callback: convict
, which might start an async operation. Protect it by ref count.

Fixes #269
2015-09-06 11:25:23 +08:00
Asias He
5bec8cba82 gossip: Kill one async::thread for mark_dead
We have this call chain,

  gossiper::run -> do_status_check -> interpret -> convict -> mark_dead

since gossip::run is executed inside a seastar thread, we can assure all
functions above run inside a seastar thread.
2015-09-06 11:04:41 +08:00
Asias He
2ba8497399 gossip: Protect gossiper from been destroyed while in use
There are three places where async operations can be scheduled

- gossiper timer handler
- API called by user
- messaging service handler

Use reference tracking infrastructure to protect.

Fixes #268
2015-09-06 09:56:38 +08:00
Asias He
ad4008d50e gms: Fix release_version
With this patch, the release_version column in system.peers is now correct.
2015-08-27 11:01:08 +03:00
Asias He
8415218dba gossip: Save one seastar thread inside remove_endpoint
Now all the caller are inside a seastar thread. Kill one thread inside
remove_endpoint.
2015-08-19 14:46:44 +08:00
Asias He
b560a50ea7 gossip: Fix a name typo
evict_from_membershipg -> evict_from_membership
2015-08-19 14:22:54 +08:00
Asias He
54a42b4549 gossip: Fix a iterate and delete issue in do_status_check
evict_from_membership might delete an element inside endpoint_state_map
while iterating it.

Fixes #162
2015-08-19 14:21:11 +08:00
Asias He
016dfdc8e1 gossip: Gossip error messages
We are printing out error messages when a remote connection is closed

   ERROR   [shard 0] gossip - Fail to send GossipDigestACK2 to 127.0.0.2:0: rpc::closed_error (connection is closed)
   ERROR   [shard 0] gossip - Fail to handle GOSSIP_DIGEST_ACK: rpc::closed_error (connection is closed)
   WARN    [shard 0] unimplemented

this is causing issues with DTEST as it validates after finishing a run
that there are no ERRORs in the log

The rule is:
   We can handle it correctly if error occurs -> log warn
   We can not handle it correctly when error occurs -> log error

Fixes #144
2015-08-19 14:21:05 +08:00
Asias He
009f9e7f21 gossip: Add timeout for send_gossip_digest_syn in do_shadow_round
Fixes #134
2015-08-19 14:20:52 +08:00
Asias He
0f2ea6d7c0 gossip: Remove one TODO for SHUTDOWN handling 2015-08-12 17:35:46 +08:00
Asias He
c72c96f8aa gossip: Remove a outdated TODO
It is fixed already.
2015-08-12 17:35:46 +08:00
Asias He
5831dcba28 gossip: Add error handling for GOSSIP_SHUTDOWN and GOSSIP_DIGEST_ACK verb 2015-08-12 17:35:46 +08:00
Asias He
0b475a5173 gossip: Dump endpoint_state_map in debug mode
This is very useful for debug.
2015-08-10 09:48:32 +08:00
Asias He
5f7628da12 gossip: Run real_mark_alive under seastar::async context
Now on_dead is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
d15c8289a2 gossip: Run remove_endpoint inside seastar::async context
on_remove is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
56615a8a29 gossip: Make real_mark_alive run inside seastar::async context
on_alive callbacks are now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
4eedd417b1 gossip: Run code inside seastar::async context for add_local_application_state
So that do_before_change_notifications and do_on_change_notifications
are under seastar::async.

Now, before_change callbacks are inside seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
825f6d141d gossip: Run code inside seastar::async context for apply_state_locally
It is easier to futurize apply_new_states and handle_major_state_change.

Now, on_change, on_join and on_restart callbacks are inside
seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
c0aae33991 gossip: Futurize apply_state_locally 2015-08-10 09:48:32 +08:00
Asias He
802b3fdf19 gossip: Add timeout to send_gossip
Otherwise, when a node tries to send to a just killed node, it will
block for a long time, thus gossip round will be blocked.
2015-08-10 09:48:32 +08:00
Asias He
6ee2b138a4 gossip: Futurize handle_ack_msg 2015-08-10 09:48:32 +08:00
Asias He
c6509dad42 gossip: Make send_gossip and friends return future 2015-08-10 09:48:32 +08:00
Asias He
3b064c528e gossip: Make gossiper::run execute in a seastar::thread
Prepare to futurize gossiper.
2015-08-10 09:48:32 +08:00
Asias He
baec9e3449 gossip: Fix is_enabled
It is not correct to use _scheduled_gossip_task.armed() to tell if
gossip is enabled or not , since timer set _armed = false before calling
the timer callback.

It was working correctly because we did not actually check is_enabled()
flag inside the timer callback but inside the send_gossip_digest_syn()'s
continuation and at that time the timer is armed again.

Use a standalone flag to do so.
2015-08-10 09:48:32 +08:00
Asias He
1da37796bc gossip: Fix do_shadow_round
We sleep storage_service_ring_delay until we abort due to failing to
talk to a seed node. We should retry sending GossipDigestSyn message,
instead of sending it once.

With this, we can start the seed node and normal node in a script like
below, without any sleep between.

./scylla --listen-address 127.0.0.1
./scylla --listen-address 127.0.0.2

This is useful for testing.
2015-08-03 10:05:42 +03:00
Asias He
eb79a119bf gossip: Move code from gms/gossip_digest_syn.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
175ccfe49d gossip: Move code from gms/gossip_digest_ack2.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
3ad6d1309f gossip: Move code from gms/gossip_digest_ack.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
6398bb4bdc gossip: Move code from gms/endpoint_state.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
b1a8353e61 gossip: Drop gms/token_serializer.hh
It is unused.
2015-07-31 10:43:40 +08:00
Asias He
a95213e81e gossip: Kill gms/gms.cc
All headers of gms/* are included. No need to include them all in gms.cc now.
2015-07-31 10:43:40 +08:00
Asias He
e074b1b7f8 gossip: Move operator<< of gossip_digest_ack2 to gossip_digest_ack2.cc 2015-07-31 10:43:39 +08:00
Asias He
ca5eea7fad gossip: Move operator<< of gossip_digest_ack to gossip_digest_ack.cc 2015-07-31 10:43:39 +08:00
Asias He
76efae87b5 gossip: Move operator<< of gossip_digest_syn to gossip_digest_syn.cc 2015-07-31 10:43:39 +08:00
Asias He
d850e4ef31 gossip: Move _the_gossiper to gossiper.cc 2015-07-31 10:43:39 +08:00
Asias He
4390b448a2 gossip: Move _the_failure_detector to failure_detector.cc
We will kill gms/gms.cc soon.
2015-07-31 10:43:39 +08:00
Asias He
d4dce4aa43 gossip: Fix is_enabled 2015-07-31 10:43:39 +08:00
Asias He
5d08bd030c gossip: Kill one more unimplemented in shutdown 2015-07-31 10:43:39 +08:00
Asias He
efbdf428fc gossip: Remove commented code of resetVersion and destroyConnectionPool
Our messaging service is completely different. No need to reset version
or destroy connection pool.
2015-07-31 10:43:39 +08:00