Commit Graph

200 Commits

Author SHA1 Message Date
Asias He
8415218dba gossip: Save one seastar thread inside remove_endpoint
Now all the caller are inside a seastar thread. Kill one thread inside
remove_endpoint.
2015-08-19 14:46:44 +08:00
Asias He
b560a50ea7 gossip: Fix a name typo
evict_from_membershipg -> evict_from_membership
2015-08-19 14:22:54 +08:00
Asias He
54a42b4549 gossip: Fix a iterate and delete issue in do_status_check
evict_from_membership might delete an element inside endpoint_state_map
while iterating it.

Fixes #162
2015-08-19 14:21:11 +08:00
Asias He
016dfdc8e1 gossip: Gossip error messages
We are printing out error messages when a remote connection is closed

   ERROR   [shard 0] gossip - Fail to send GossipDigestACK2 to 127.0.0.2:0: rpc::closed_error (connection is closed)
   ERROR   [shard 0] gossip - Fail to handle GOSSIP_DIGEST_ACK: rpc::closed_error (connection is closed)
   WARN    [shard 0] unimplemented

this is causing issues with DTEST as it validates after finishing a run
that there are no ERRORs in the log

The rule is:
   We can handle it correctly if error occurs -> log warn
   We can not handle it correctly when error occurs -> log error

Fixes #144
2015-08-19 14:21:05 +08:00
Asias He
009f9e7f21 gossip: Add timeout for send_gossip_digest_syn in do_shadow_round
Fixes #134
2015-08-19 14:20:52 +08:00
Asias He
0f2ea6d7c0 gossip: Remove one TODO for SHUTDOWN handling 2015-08-12 17:35:46 +08:00
Asias He
c72c96f8aa gossip: Remove a outdated TODO
It is fixed already.
2015-08-12 17:35:46 +08:00
Asias He
5831dcba28 gossip: Add error handling for GOSSIP_SHUTDOWN and GOSSIP_DIGEST_ACK verb 2015-08-12 17:35:46 +08:00
Asias He
0b475a5173 gossip: Dump endpoint_state_map in debug mode
This is very useful for debug.
2015-08-10 09:48:32 +08:00
Asias He
5f7628da12 gossip: Run real_mark_alive under seastar::async context
Now on_dead is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
d15c8289a2 gossip: Run remove_endpoint inside seastar::async context
on_remove is now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
56615a8a29 gossip: Make real_mark_alive run inside seastar::async context
on_alive callbacks are now under seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
4eedd417b1 gossip: Run code inside seastar::async context for add_local_application_state
So that do_before_change_notifications and do_on_change_notifications
are under seastar::async.

Now, before_change callbacks are inside seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
825f6d141d gossip: Run code inside seastar::async context for apply_state_locally
It is easier to futurize apply_new_states and handle_major_state_change.

Now, on_change, on_join and on_restart callbacks are inside
seastar::async context.
2015-08-10 09:48:32 +08:00
Asias He
c0aae33991 gossip: Futurize apply_state_locally 2015-08-10 09:48:32 +08:00
Asias He
802b3fdf19 gossip: Add timeout to send_gossip
Otherwise, when a node tries to send to a just killed node, it will
block for a long time, thus gossip round will be blocked.
2015-08-10 09:48:32 +08:00
Asias He
6ee2b138a4 gossip: Futurize handle_ack_msg 2015-08-10 09:48:32 +08:00
Asias He
c6509dad42 gossip: Make send_gossip and friends return future 2015-08-10 09:48:32 +08:00
Asias He
3b064c528e gossip: Make gossiper::run execute in a seastar::thread
Prepare to futurize gossiper.
2015-08-10 09:48:32 +08:00
Asias He
baec9e3449 gossip: Fix is_enabled
It is not correct to use _scheduled_gossip_task.armed() to tell if
gossip is enabled or not , since timer set _armed = false before calling
the timer callback.

It was working correctly because we did not actually check is_enabled()
flag inside the timer callback but inside the send_gossip_digest_syn()'s
continuation and at that time the timer is armed again.

Use a standalone flag to do so.
2015-08-10 09:48:32 +08:00
Asias He
1da37796bc gossip: Fix do_shadow_round
We sleep storage_service_ring_delay until we abort due to failing to
talk to a seed node. We should retry sending GossipDigestSyn message,
instead of sending it once.

With this, we can start the seed node and normal node in a script like
below, without any sleep between.

./scylla --listen-address 127.0.0.1
./scylla --listen-address 127.0.0.2

This is useful for testing.
2015-08-03 10:05:42 +03:00
Asias He
eb79a119bf gossip: Move code from gms/gossip_digest_syn.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
175ccfe49d gossip: Move code from gms/gossip_digest_ack2.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
3ad6d1309f gossip: Move code from gms/gossip_digest_ack.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
6398bb4bdc gossip: Move code from gms/endpoint_state.hh to source file 2015-07-31 10:43:40 +08:00
Asias He
b1a8353e61 gossip: Drop gms/token_serializer.hh
It is unused.
2015-07-31 10:43:40 +08:00
Asias He
a95213e81e gossip: Kill gms/gms.cc
All headers of gms/* are included. No need to include them all in gms.cc now.
2015-07-31 10:43:40 +08:00
Asias He
e074b1b7f8 gossip: Move operator<< of gossip_digest_ack2 to gossip_digest_ack2.cc 2015-07-31 10:43:39 +08:00
Asias He
ca5eea7fad gossip: Move operator<< of gossip_digest_ack to gossip_digest_ack.cc 2015-07-31 10:43:39 +08:00
Asias He
76efae87b5 gossip: Move operator<< of gossip_digest_syn to gossip_digest_syn.cc 2015-07-31 10:43:39 +08:00
Asias He
d850e4ef31 gossip: Move _the_gossiper to gossiper.cc 2015-07-31 10:43:39 +08:00
Asias He
4390b448a2 gossip: Move _the_failure_detector to failure_detector.cc
We will kill gms/gms.cc soon.
2015-07-31 10:43:39 +08:00
Asias He
d4dce4aa43 gossip: Fix is_enabled 2015-07-31 10:43:39 +08:00
Asias He
5d08bd030c gossip: Kill one more unimplemented in shutdown 2015-07-31 10:43:39 +08:00
Asias He
efbdf428fc gossip: Remove commented code of resetVersion and destroyConnectionPool
Our messaging service is completely different. No need to reset version
or destroy connection pool.
2015-07-31 10:43:39 +08:00
Asias He
43c7ff217d gossip: Kill two warn unimplemented
It is implemented now.
2015-07-31 10:43:39 +08:00
Asias He
f326704240 gossip: Drop unused code 2015-07-29 15:45:46 +08:00
Asias He
782b78a4c7 gossip: Introduce handle_ack_msg helper
It is shared by shadow-round and non-shadow-round.
2015-07-29 15:16:01 +08:00
Asias He
2ab3033d5d gossip: Fix do_shadow_round
Add the sleep logic.
2015-07-29 15:16:01 +08:00
Asias He
d38deef499 gossip: Kill a FIMXE for knows_version 2015-07-29 11:25:38 +08:00
Asias He
471a964d8e gossip: Kill a FIXME for clone_with_higher_version 2015-07-29 11:15:52 +08:00
Asias He
d799d1aa8f gossip: Implement the missing sleep logic 2015-07-29 11:13:35 +08:00
Asias He
1a6f7cf2aa gossip: Fix assassinate
- Use seastar::async to simplify the sleep logic.
- Futurize assassinate so that future is ready when assassination finishes.
2015-07-29 10:57:23 +08:00
Asias He
acd6e6268f gossip: Fix storage_service_value_factory
We have value_factory in storage_service now. Use it.
2015-07-29 10:06:43 +08:00
Asias He
33d3fcf7db gossip: Add set_last_processed_message_at() helper
Set time_point to now by default.
2015-07-29 09:54:29 +08:00
Asias He
65232edfe8 gossip: Move swagger API to source file 2015-07-29 09:49:28 +08:00
Asias He
0ce3d89a85 gossip: Switch to use chrono for time operation
This is a long-awaited cleanup. Gossiper code runs every second, it is
not performance sensitive, so it does not make much sense to stick to
lowres db_clock, use high_resolution_clock instead.
2015-07-28 14:59:44 +03:00
Asias He
74b281b92a gossip: Fix QUARANTINE_DELAY initialization
Dependencies between static variables don't work if they're in different
translation units.

I see in gossiper's constructor, QUARANTINE_DELAY is still 0.

Make it a function. It is nicer to make it inline, but I don't want to
pull storage_service.hh into gossiper.hh.
2015-07-27 11:29:13 +03:00
Asias He
1547fa05a5 failure_detector: Simplify get_initial_value and get_max_interval 2015-07-24 19:01:49 +08:00
Asias He
64f8c6e498 failure_detector: Switch to use std::chrono::steady_clock
Instead of naked integer based time point value.
2015-07-24 18:55:21 +08:00