Commit Graph

153 Commits

Author SHA1 Message Date
Asias He
74b281b92a gossip: Fix QUARANTINE_DELAY initialization
Dependencies between static variables don't work if they're in different
translation units.

I see in gossiper's constructor, QUARANTINE_DELAY is still 0.

Make it a function. It is nicer to make it inline, but I don't want to
pull storage_service.hh into gossiper.hh.
2015-07-27 11:29:13 +03:00
Asias He
1547fa05a5 failure_detector: Simplify get_initial_value and get_max_interval 2015-07-24 19:01:49 +08:00
Asias He
64f8c6e498 failure_detector: Switch to use std::chrono::steady_clock
Instead of naked integer based time point value.
2015-07-24 18:55:21 +08:00
Asias He
bb2c30ed82 gossip: Fix QUARANTINE_DELAY
We have StorageService.RING_DELAY now, switch to use it.
2015-07-24 18:55:21 +08:00
Asias He
b01eeede1d gossip: Add more debug printouts
- for convict
- for send_gossip
2015-07-24 15:56:05 +08:00
Asias He
73bb690b40 failure_detector: Fix now unit in report 2015-07-24 15:56:05 +08:00
Asias He
9f1dc2877e failure_detector: Fix INITIAL_VALUE_NANOS 2015-07-24 15:56:05 +08:00
Asias He
1c2f5d5997 failure_detector: Add more log printout 2015-07-24 15:56:05 +08:00
Asias He
557f193737 gossip: Fix convict is_dead_state check
The logic in Origin is state.is_alive() && !is_dead_state(state).
2015-07-24 15:56:04 +08:00
Asias He
4e72b2a6b1 gossip: Fix mark_dead
We should change the state stored in endpoint_state_map, not local
variable.
2015-07-24 15:56:04 +08:00
Asias He
c3b77f499b failure_detector: Enable logger 2015-07-24 15:56:04 +08:00
Asias He
1eef40e9ae gossip: Add error handling for sending message 2015-07-24 15:56:04 +08:00
Asias He
b856d03338 gossip: Use logger.debug in dump_endpoint_state_map 2015-07-24 15:56:04 +08:00
Pekka Enberg
7fe99de608 gms/gossiper: Enable more token_metadata::is_member() checks
Translate more is_member() checks like in commit 67f4b55 ("gms/gossiper:
Fix is_gossip_only_member() logic").

This hopefully cures #36.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-23 19:50:11 +02:00
Pekka Enberg
67f4b55b16 gms/gossiper: Fix is_gossip_only_member() logic
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 11:57:00 +03:00
Pekka Enberg
91e601ed11 gms/gossiper: Make is_gossip_only_member() public
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 11:57:00 +03:00
Pekka Enberg
bc4c04a2ab gms: Make endpoint_state::get_application_state() const
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 11:57:00 +03:00
Asias He
ff457b018a gossip: Enable logger in gossiper class 2015-07-21 22:56:24 +08:00
Asias He
b16eb27c58 gms/versioned_value: Fix rpcaddress 2015-07-21 17:00:15 +08:00
Asias He
ba04f0bc6b gms/versioned_value: Fix network_version 2015-07-21 17:00:15 +08:00
Asias He
fa2aee57ac utils: Move util/serialization.hh to utils/serialization.hh
Now we will not have the ugly utils and util directories, only utils.
2015-07-21 16:12:54 +08:00
Asias He
47503d0eaf messaging_service: Add wrapper for verbs used by gossip
Tested with tests/urchin/gossip.cc.
2015-07-16 17:19:51 +08:00
Glauber Costa
d43933e642 gms: add addr method to inet_addr
Because the cql types deal with a raw inet address and not the gms container, we need
a method to fetch it

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-09 11:27:45 -04:00
Glauber Costa
6b8d823c82 gms: allow the construction of the object from a net address
That is what is going to be stored in the data_type(), so provide the conversion

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-09 09:34:22 -04:00
Gleb Natapov
d8dcceea09 stop storage and messaging services during exit 2015-06-18 15:13:02 +03:00
Asias He
13f2292596 storage_service: Use fb_utilities::get_broadcast_address 2015-06-16 15:08:44 +08:00
Asias He
1d0b78d80f gossip: Fix capture by ref on a stack variable 2015-06-15 10:08:17 +03:00
Vlad Zolotarov
ab14716ce8 gossiper: "Start" gossiper on all CPUs and initialize its services only on CPU0
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
4703987faf gossiper: replicate the endpoint_state_map and _live_endpoints on all shards
For all replicated maps:
   - Keep the shadow copy on CPU0 and if at the end of a gossiper task execution
     it differs from the current contents of the map replicate it on all shards
     and update the shadow copy on CPU0.
   - Ensure that gossiper task is restarted 1 second AFTER the current iteration
     is over and not 1 second after it started.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Rename: _live_endpoints_shadow -> _shadow_live_endpoints
   - s/inly/only/
   - Clean up the things that don't belong to this patch.
   - Replicate _live_endpoints as well
   - gossiper: copy _shadow_endpoint_state_map
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
1e32bdf090 gms: added missing operator==() required for endpoint_state_map comparison.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:18:46 +03:00
Asias He
c95364fe31 failure_detector: Start on all cpus
Code calls failure_detector::is_alive on all cpus, so we start
failure_detector on all cpus. However, the internal data of failure_detector
is modified on cpu zero and it is not replicated to non-zero cpus.
This is fine since the user of failure_detector (the gossiper) accesses
it on cpu0 only.
2015-06-04 17:25:20 +08:00
Asias He
26cd039005 gossip: Add is_alive helper
failure_detector::is_alive asks gossiper if a node is up or down.
2015-06-04 17:16:58 +08:00
Asias He
abad1520ad gossip: Fix get_host_id
Return a real UUID.
2015-06-04 17:12:10 +08:00
Asias He
9c5cd2bca8 storage_service: Switch to use unordered_set for tokens
We do not care about the order of the tokens.

Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.
2015-06-04 17:12:09 +08:00
Amnon Heiman
711fe64208 Expose the failure_detector functionality
The failure detector runs on CPU 0, for external usage, this is an
implementation detail which is unrelevant.

This adds a wrapper functions for the functions that are defined in
FailureDetectorMBean which would map the request to the correct CPU.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:13:03 +03:00
Asias He
1e289018ea gossip: Implement versioned_value for tokens
Tokens are store in versioned_value hex string split by semicolon, e.g.:

9f6fd6dd5149e39c;59068b2415190651;63684ccb1b73c1e3
2015-06-01 11:24:38 +08:00
Asias He
ff099c44ce gossip: Convert more versioned_value 2015-05-27 15:03:29 +08:00
Asias He
9736dc6d9c storage_service: Convert should_bootstrap 2015-05-26 17:45:29 +08:00
Asias He
efd879297e gossip: Add debug_show helper
This starts a periodic timer to dump gossip state every second. It is useful to
debug the internal of gossiper.
2015-05-26 17:45:29 +08:00
Asias He
87a8d1f77e gossip: Convert more versioned_value factory functions 2015-05-26 16:16:52 +08:00
Amnon Heiman
588fb4fdcd Gossiper: Add global function
The current gossiper implementation runs the gossiper on CPU 0, this is
irelevent to user of the gossiper, that may want to inquire it.

This adds a globally available API for get_unreachable_members,
get_live_members, get_endpoint_downtime, get_current_generation_number,
unsafe_assassinate_endpoint and assassinate_endpoint that returns a
future and perform on the correct CPU.

The target user is the API that would use this function to expose the
gossiper.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-05-25 18:28:38 +03:00
Asias He
898233ddcf Remove redundant const in static constexpr const
From http://en.cppreference.com/w/cpp/language/constexpr:

  A constexpr specifier used in an object declaration implies const.
2015-05-25 13:09:23 +03:00
Shlomi Livne
0ad0a02d93 Change failure_detector registration of listeners to accept a ptr
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:01:18 +08:00
Shlomi Livne
89b9443127 Adding gossiper stop and internal handler::stop
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:01:10 +08:00
Shlomi Livne
a73adc39f3 Rename gossiper stop to shutdown to allow creation of stop() needed for distributed<>::stop
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:01:01 +08:00
Shlomi Livne
fbeafa67cb Add failure_dector stop() that will be called by distributed<>::stop
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:00:46 +08:00
Asias He
979bb60d78 gms: Resolve localhost in inet_address
In db:config, "localhost" is used as the default IP address for
listen_address, rpc_address. We do not have a name resolver at the
moment.

Add a minimal resolver for localhost for now.
2015-05-14 10:37:26 +03:00
Asias He
827300ebe1 gossip: Fix sending ECHO message
The msg parameter is missing.

Fix a bug where Node B does not not recognize Node A.

Node A
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.1

Node B
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.2

The issue is that in gossiper::mark_alive(), the parameter for ECHO
message is wrong and after commit 1a8c4b75f5 (message: do not erase client's
rpc call type), we deduce the rpc handler using the parameters supplied
to messaging_service::send_message(), so we will use a wrong handler for
the ECHO message and the message will never reply thus we never mark the
peer node alive.

empty_msg was used as a placeholder when messaging_service does not handle void
return type correctly. Since we support it now, drop it.
2015-05-14 10:37:25 +03:00
Gleb Natapov
8d9fb8a96c message: consolidate send_message() and send_message_oneway()
send_message() and send_message_oneway() are almost identical, implement
the later in terms of the former. The patch also fixes send_message() to
work properly with MsgIn = void.

Reviewed-by: Asias He <asias@cloudius-systems.com>
2015-05-13 13:41:24 +03:00
Asias He
bbb4b90542 gms: Use unordered_map for endpoint_state_map
We do not really care about order.
2015-05-11 11:27:06 +03:00