Commit Graph

132 Commits

Author SHA1 Message Date
Asias He
47503d0eaf messaging_service: Add wrapper for verbs used by gossip
Tested with tests/urchin/gossip.cc.
2015-07-16 17:19:51 +08:00
Glauber Costa
d43933e642 gms: add addr method to inet_addr
Because the cql types deal with a raw inet address and not the gms container, we need
a method to fetch it

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-09 11:27:45 -04:00
Glauber Costa
6b8d823c82 gms: allow the construction of the object from a net address
That is what is going to be stored in the data_type(), so provide the conversion

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-09 09:34:22 -04:00
Gleb Natapov
d8dcceea09 stop storage and messaging services during exit 2015-06-18 15:13:02 +03:00
Asias He
13f2292596 storage_service: Use fb_utilities::get_broadcast_address 2015-06-16 15:08:44 +08:00
Asias He
1d0b78d80f gossip: Fix capture by ref on a stack variable 2015-06-15 10:08:17 +03:00
Vlad Zolotarov
ab14716ce8 gossiper: "Start" gossiper on all CPUs and initialize its services only on CPU0
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
4703987faf gossiper: replicate the endpoint_state_map and _live_endpoints on all shards
For all replicated maps:
   - Keep the shadow copy on CPU0 and if at the end of a gossiper task execution
     it differs from the current contents of the map replicate it on all shards
     and update the shadow copy on CPU0.
   - Ensure that gossiper task is restarted 1 second AFTER the current iteration
     is over and not 1 second after it started.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Rename: _live_endpoints_shadow -> _shadow_live_endpoints
   - s/inly/only/
   - Clean up the things that don't belong to this patch.
   - Replicate _live_endpoints as well
   - gossiper: copy _shadow_endpoint_state_map
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
1e32bdf090 gms: added missing operator==() required for endpoint_state_map comparison.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:18:46 +03:00
Asias He
c95364fe31 failure_detector: Start on all cpus
Code calls failure_detector::is_alive on all cpus, so we start
failure_detector on all cpus. However, the internal data of failure_detector
is modified on cpu zero and it is not replicated to non-zero cpus.
This is fine since the user of failure_detector (the gossiper) accesses
it on cpu0 only.
2015-06-04 17:25:20 +08:00
Asias He
26cd039005 gossip: Add is_alive helper
failure_detector::is_alive asks gossiper if a node is up or down.
2015-06-04 17:16:58 +08:00
Asias He
abad1520ad gossip: Fix get_host_id
Return a real UUID.
2015-06-04 17:12:10 +08:00
Asias He
9c5cd2bca8 storage_service: Switch to use unordered_set for tokens
We do not care about the order of the tokens.

Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.
2015-06-04 17:12:09 +08:00
Amnon Heiman
711fe64208 Expose the failure_detector functionality
The failure detector runs on CPU 0, for external usage, this is an
implementation detail which is unrelevant.

This adds a wrapper functions for the functions that are defined in
FailureDetectorMBean which would map the request to the correct CPU.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-03 19:13:03 +03:00
Asias He
1e289018ea gossip: Implement versioned_value for tokens
Tokens are store in versioned_value hex string split by semicolon, e.g.:

9f6fd6dd5149e39c;59068b2415190651;63684ccb1b73c1e3
2015-06-01 11:24:38 +08:00
Asias He
ff099c44ce gossip: Convert more versioned_value 2015-05-27 15:03:29 +08:00
Asias He
9736dc6d9c storage_service: Convert should_bootstrap 2015-05-26 17:45:29 +08:00
Asias He
efd879297e gossip: Add debug_show helper
This starts a periodic timer to dump gossip state every second. It is useful to
debug the internal of gossiper.
2015-05-26 17:45:29 +08:00
Asias He
87a8d1f77e gossip: Convert more versioned_value factory functions 2015-05-26 16:16:52 +08:00
Amnon Heiman
588fb4fdcd Gossiper: Add global function
The current gossiper implementation runs the gossiper on CPU 0, this is
irelevent to user of the gossiper, that may want to inquire it.

This adds a globally available API for get_unreachable_members,
get_live_members, get_endpoint_downtime, get_current_generation_number,
unsafe_assassinate_endpoint and assassinate_endpoint that returns a
future and perform on the correct CPU.

The target user is the API that would use this function to expose the
gossiper.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-05-25 18:28:38 +03:00
Asias He
898233ddcf Remove redundant const in static constexpr const
From http://en.cppreference.com/w/cpp/language/constexpr:

  A constexpr specifier used in an object declaration implies const.
2015-05-25 13:09:23 +03:00
Shlomi Livne
0ad0a02d93 Change failure_detector registration of listeners to accept a ptr
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:01:18 +08:00
Shlomi Livne
89b9443127 Adding gossiper stop and internal handler::stop
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:01:10 +08:00
Shlomi Livne
a73adc39f3 Rename gossiper stop to shutdown to allow creation of stop() needed for distributed<>::stop
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:01:01 +08:00
Shlomi Livne
fbeafa67cb Add failure_dector stop() that will be called by distributed<>::stop
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-05-14 17:00:46 +08:00
Asias He
979bb60d78 gms: Resolve localhost in inet_address
In db:config, "localhost" is used as the default IP address for
listen_address, rpc_address. We do not have a name resolver at the
moment.

Add a minimal resolver for localhost for now.
2015-05-14 10:37:26 +03:00
Asias He
827300ebe1 gossip: Fix sending ECHO message
The msg parameter is missing.

Fix a bug where Node B does not not recognize Node A.

Node A
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.1

Node B
$ ./gossip --seed 127.0.0.1  --listen-address 127.0.0.2

The issue is that in gossiper::mark_alive(), the parameter for ECHO
message is wrong and after commit 1a8c4b75f5 (message: do not erase client's
rpc call type), we deduce the rpc handler using the parameters supplied
to messaging_service::send_message(), so we will use a wrong handler for
the ECHO message and the message will never reply thus we never mark the
peer node alive.

empty_msg was used as a placeholder when messaging_service does not handle void
return type correctly. Since we support it now, drop it.
2015-05-14 10:37:25 +03:00
Gleb Natapov
8d9fb8a96c message: consolidate send_message() and send_message_oneway()
send_message() and send_message_oneway() are almost identical, implement
the later in terms of the former. The patch also fixes send_message() to
work properly with MsgIn = void.

Reviewed-by: Asias He <asias@cloudius-systems.com>
2015-05-13 13:41:24 +03:00
Asias He
bbb4b90542 gms: Use unordered_map for endpoint_state_map
We do not really care about order.
2015-05-11 11:27:06 +03:00
Asias He
f12e955b4e message: Drop register_handler_oneway 2015-05-07 13:16:18 +03:00
Asias He
177498375d gossip: Add storage_service_value_factory helper
Switch to use real version when storage_service is ready.
2015-05-07 15:29:46 +08:00
Asias He
7ac5277822 gossip: Fix link error with versioned_value
With the next patch "gossip: Add storage_service_value_factory helper"
in this series.

[asias@hjpc urchin]$ ninja-build

[8/10] LINK build/release/seastar

build/release/gms/gossiper.o: In function
`gms::versioned_value::versioned_value_factory::removing_nonlocal(utils::UUID
const&)':
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:201:
undefined reference to `gms::versioned_value::DELIMITER_STR'
build/release/gms/gossiper.o: In function
`gms::versioned_value::versioned_value_factory::removal_coordinator(utils::UUID
const&)':
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:211:
undefined reference to `gms::versioned_value::DELIMITER_STR'
build/release/gms/gossiper.o: In function
`gms::versioned_value::versioned_value_factory::removed_nonlocal(utils::UUID
const&, long)':
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:206:
undefined reference to `gms::versioned_value::DELIMITER_STR'
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:206:
undefined reference to `gms::versioned_value::DELIMITER_STR'
collect2: error: ld returned 1 exit status

Fix by defining the symbol in gms/versioned_value.cc.
2015-05-07 15:29:13 +08:00
Asias He
f689ef705a gossip: Forward gossip message to cpu0
There is one gossiper instance per node and it runs on cpu0 only. We can
not guarantee there will always be a core to core tcp connection within
messaging service, so messaging service needs to listen on all cpus.
When a remote node connects to local node with a connection bound to cpu
other than cpu0, we need to forward this message to cpu0.
2015-05-05 10:56:55 +03:00
Asias He
bf3d6a4c06 gossip: Disable sleep and retry logic in do_status_check
We do not have the ThreadPoolExecutor logic. Disable the sleep and retry
logic.
2015-04-23 14:55:26 +08:00
Asias He
0060eac413 gossip: Set last processed time when receiving gossip message 2015-04-23 14:55:26 +08:00
Asias He
622ec0111d gossip: Fix apply_new_states
We should take a reference, otherwise remote's endpoint_state will not
be updated locally.
2015-04-23 14:55:26 +08:00
Asias He
5f0050dc97 gossip: Fix add_application_state
If the key exists, we should update the new value.
2015-04-23 14:55:26 +08:00
Asias He
7b75df6bd4 gossip: Update heart beat 2015-04-23 14:55:26 +08:00
Asias He
f2e840de54 gossip: Switch from fail to warn
Warn is enough for now.
2015-04-23 14:55:26 +08:00
Asias He
a800fbfe64 gossip: Set get_phi_convict_threshold to 8
It is the default value.
2015-04-23 14:55:26 +08:00
Asias He
b38dae4a2b gossip: Dump failure detector info 2015-04-20 15:49:27 +08:00
Asias He
7e0a0c381f gossip: Remove debug print message 2015-04-20 15:49:27 +08:00
Gleb Natapov
c39af6dda0 gossip: store regular pointer to subscribers instead of shared one
Some subscribers are allocated statically, so it is a churn to make
shared pointers from them. And since registered subscribers have to be
unregister before been destroyed anyway there is no lifetime issue here
that require use of a smart pointer.
2015-04-20 09:18:23 +03:00
Asias He
02f8c9d965 gossip: Add dump_endpoint_state_map for debug 2015-04-16 17:44:20 +08:00
Asias He
4abee75c04 gossip: Drop fail guard in mark_alive and apply_state_locally 2015-04-16 17:44:20 +08:00
Asias He
4cffb5513d gossip: Drop unnecessary FIXME 2015-04-16 17:44:20 +08:00
Asias He
7f98644742 gossip: Fix send_gossip
Insert when local_ep_state_ptr is engaged not otherwise.
2015-04-16 17:08:19 +08:00
Asias He
eeafdf5815 gossip: Make gms::versioned_value::load static
We are supposed to call it without an instance.
We will convert other similar functions in follow up patches.
2015-04-16 17:03:46 +08:00
Asias He
d661827045 gossip: Fix get_broadcast_address
It is default to listen_address.
2015-04-16 17:01:52 +08:00
Asias He
adff3b9c79 gossip: Drop redundant print in heart_beat_state 2015-04-16 16:59:53 +08:00