Commit Graph

320 Commits

Author SHA1 Message Date
Paweł Dziepak
0d3d0a3c08 gossiper: handle failures in gossiper thread creation
seastar::async() creates a seastar thread and to do that allocates
memory. That allocation, obviously, may fail so the error handling code
needs to be moved so that it also catches errors from thread creation.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-11 23:54:47 +01:00
Pekka Enberg
47a904c0f6 Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.

The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.

This series adds API to do so.

The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.

   --- a/service/storage_service.cc
   +++ b/service/storage_service.cc
   @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
        // Add features supported by this local node. When a new feature is
        // introduced in scylla, update it here, e.g.,
        // return sstring("FEATURE1,FEATURE2")
   -    return sstring("");
   +    return sstring("FEATURE1,FEATURE2");
    }

    std::set<inet_address> get_seeds() {
   @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
        // gossip snitch infos (local DC and rack)
        gossip_snitch_info().get();

   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
   +    logger.info("Wait for FEATURE1 and FEATURE2 done");
   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
   +    logger.info("Wait for FEATURE3 done");
   +

We can query the supported_features:

    cqlsh> SELECT supported_features from system.peers;

     supported_features
    --------------------
      FEATURE1,FEATURE2
      FEATURE1,FEATURE2

    (2 rows)
    cqlsh> SELECT supported_features from system.local;

     supported_features
    --------------------
      FEATURE1,FEATURE2

    (1 rows)"
2016-04-08 09:22:50 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Asias He
e0a82a1107 gossip: Add supported_features helper in versioned_value
Give a supported features sstring, return a versioned_value for it.
2016-04-06 07:12:34 +08:00
Asias He
04e8727793 gossip: Introduce wait_for_feature_on_{all}_node
API to wait for features are available on a node or all the nodes in the
cluster.

$timeout specifies how long we want to wait. If the features are not
availabe yet, sleep 2 seconds and retry.
2016-04-06 07:12:34 +08:00
Asias He
1e437e925c gossip: Introduce get_supported_features
- Get features supported by this particular node

  std::set<sstring> get_supported_features(inet_address endpoint) const;

- Get features supported by all the nodes this node knows about

  std::set<sstring> get_supported_features() const;
2016-04-06 07:12:34 +08:00
Asias He
a6080773b3 gossip: Add SUPPORTED_FEATURES application_state
It is used to negotiate cluster wide features.
2016-04-06 07:12:34 +08:00
Asias He
7acc9816d2 gossip: Handle unknown application_state when printing
In case an unknown application_state is received, we should be able to
handle it when printting.

Message-Id: <98d2307359292e90c8925f38f67a74b69e45bebe.1458553057.git.asias@scylladb.com>
2016-03-21 11:59:04 +02:00
Asias He
16af12ca47 gossip: Add comments on external runtime dependency needed by gossip 2016-03-15 16:13:13 +08:00
Asias He
1034dd0aff gossip: Ignore ack2 message if gosisp is not enabled yet 2016-03-15 16:09:43 +08:00
Asias He
1bf0412e7a gossip: Introduce handle_shutdown_msg helper 2016-03-15 16:09:43 +08:00
Asias He
54d8ac16b5 gossip: Introduce handle_echo_msg helper 2016-03-15 16:09:42 +08:00
Asias He
1f64f4bfcb gossip: Introdcue handle_ack2_msg helper 2016-03-15 16:09:42 +08:00
Asias He
9f64c36a08 storage_service: Fix pending_range_calculator_service
Since calculate_pending_ranges will modify token_metadata, we need to
replicate to other shards. With this patch, when we call
calculate_pending_ranges, token_metadata will be replciated to other
non-zero shards.

In addition, it is not useful as a standalone class. We can merge it
into the storage_service. Kill one singleton class.

Fixes #1033
Refs #962
Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>
2016-03-14 10:14:22 +02:00
Asias He
134b814cde gossip: Log status info when stopping gossip 2016-03-10 10:56:48 +08:00
Asias He
ed723665df gossip: Do not stop gossip more than once
If we do
   - Decommission a node
   - Stop a node
we will shutdown gossip more than once in:
   - storage_service::decommission
   - storage_service::drain_on_shutdown

Fix by checking if it is already stopped and back off if so.
2016-03-10 10:56:48 +08:00
Vlad Zolotarov
87e6efcdab storage_service: distribute gossiper::endpoint_state_map together with token_metadata
If storage_service::token_metadata is not distributed together with
gossiper::endpoint_state_map there may be a situation when a non-zero
shard sees a new value in token_metadata (e.g. newly added node's
token ranges) while still seeing an old gossiper::endpoint_state_map
contents (e.g. a mentioned above newly added node may not be present,
thus causing gossiper::is_alive() to return FALSE for that node, while
the node is actually alive and kicking).

To avoid this discrepancy we will always update a token_metadata together
with an endpoint_state_map when we distribute new token_metadata data
among shards.

Fixes #909

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 13:15:19 +02:00
Vlad Zolotarov
3a72ef87f2 gossiper: make _shadow_endpoint_state_map public and rename
We will need to access it from a storage_service class when replicate
token_metadata.

Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map
according to our coding convention.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 11:16:44 +02:00
Vlad Zolotarov
4a21d48cc5 gossiper: use a semaphore instead of a future<> for serializing a timer callback
Use a semaphore to allow serializing with a gossiper's timer callback.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-03-06 11:16:44 +02:00
Asias He
01cb6b0d42 gossip: Send syn message in parallel and do not wait for it
1) As explained in commit 697b16414a (gossip: Make gossip message
handling async), in each gossip round we can make talking to the 1-3
peer nodes in parallel to reduce latency of gossip round.

2) Gossip syn message uses one way rpc message, but now the returned
future of the one way message is ready only when message is dequeued for
some reason (sent or dropped). If we wait for the one way syn messge to
return it might block the gossip round for a unbounded time. To fix, do
not wait for it in the gossip round. The downside is there will be no
back pressure to bound the syn messages, however since the messages are
once per second, I think it is fine.
Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>
2016-03-03 11:17:50 +02:00
Paweł Dziepak
b5eee2e5d4 gms: add inet_address::to_sstring()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-02 12:49:55 +00:00
Asias He
32eaaecf36 gossip: Get rid of assert
Log the error and throw the exception, instead of abort the whole
process. Make the code more robust.
2016-02-25 21:19:52 +08:00
Asias He
59564591d5 storage_service: Use get_gossip_status to get status
The help is introduced recently, use it. Avoid to open code it.
2016-02-25 21:19:52 +08:00
Asias He
94cb7f22d4 gossip: Make add_local_application_state safe to call on any cpu
add_local_application_state is used in various places. Before this
patch, it can only be called on cpu zero. To make it safer to use, use
invoke_on() to foward the code to run on cpu zero, so that caller can
call it on any cpu.

Refs: #795
Message-Id: <d69b81c5561622078dbe887d87209c4ea2e3bf46.1456315043.git.asias@scylladb.com>
2016-02-25 12:45:54 +02:00
Asias He
4e931c2453 gossip: Log the error when fails to add local application state
Gleb saw once:

scylla: gms/gossiper.cc:1393:
gms::gossiper::add_local_application_state(gms::application_state,
gms::versioned_value):: mutable: Assertion
`endpoint_state_map.count(ep_addr)' failed.

The assert is about we can not find the entry in endpoint_state_map of
the node itself. I can not really find any place we could call
add_local_application_state before we call gossiper::start_gossiping()
where it inserts broadcast address into endpoint_state_map.

I can not reproduce issue, let's log the error so we can narrow down
which application state triggered the assert.

Refs: #795
Message-Id: <f4433be0a0d4f23470a5e24e528afdb67b74c7ef.1456315043.git.asias@scylladb.com>
2016-02-25 12:45:17 +02:00
Asias He
697b16414a gossip: Make gossip message handling async
In each gossip round, i.e., gossiper::run(), we do:

1) send syn message
2)                           peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
   apply_state_locally
     mark_alive
       send_gossip_echo
     handle_major_state_change
       on_restart
       mark_alive
         send_gossip_echo
       mark_dead
         on_dead
       on_join
     apply_new_states
       do_on_change_notifications
          on_change
4) send back ack2 message
5)                            peer node: process ack2 message
   			      apply_state_locally

At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.

In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.

With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).

Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.

Refs: #900
2016-02-24 19:33:39 +08:00
Asias He
022c7e50a1 failure_detector: Fix false alarm of "Not marking nodes down due to local pause of"
The problem is we initialize _last_interpret when failure_detector
object is constructed. When interpret() runs for the first time, the
_last_interpret value is not the last time we run interpret() but the
time we initialize failure_detector object.

Fix by initializing _last_interpret inside interpret().

[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] storage_service - NORMAL: node is now in normal status
[Thu Feb 18 02:40:04 2016] INFO  [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - No gossip backlog; proceeding
Starting listening for CQL clients on 127.0.0.1:9042...
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - Node 127.0.0.2 is now part of the cluster
[Thu Feb 18 02:40:12 2016] INFO  [shard 0] gossip - InetAddress 127.0.0.2 is now UP
[Thu Feb 18 02:40:13 2016] INFO  [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2
[Thu Feb 18 02:40:13 2016] WARN  [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)
2016-02-24 19:31:14 +08:00
Erich Keane
e87019843f Fix PHI_FACTOR definition to be spec compliant
PHI_FACTOR is a constexpr variable that is defined using std::log.
Though G++ has a constexpr version of std::log, this itself is not spec
complaint (in fact, Clang enforces this).  See C++ Spec 26.8 for the
definition of std::log and 17.6.5.6 for the rule regarding adding
constexpr where it isn't specified.

This patch replaces the std::log statement with a version from math.h
that contains the exact value (M_LOG10El).

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454603285-32677-1-git-send-email-erich.keane@verizon.net>
2016-02-04 18:33:44 +02:00
Gleb Natapov
4e440ebf8e Remove old inet_address and uuid serializers 2016-02-02 12:15:50 +02:00
Asias He
5003c6e78b config: Introduce shutdown_announce_in_ms option
Time a node waits after sending gossip shutdown message in milliseconds.

Reduces ./cql_query_test execution time

from
   real    2m24.272s
   user    0m8.339s
   sys     0m10.556s

to
   real    1m17.765s
   user    0m3.698s
   sys     0m11.578
2016-01-27 11:19:38 +08:00
Asias He
53c6cd7808 gossip: Rename echo verb to gossip_echo
It is used by gossip only. I really could not allow myself to get along
this inconsistence. Change before we still can.
Message-Id: <1453719054-29584-2-git-send-email-asias@scylladb.com>
2016-01-25 12:53:07 +02:00
Asias He
7b633ad127 gossip: Drop unused serialization code
- heart_beat_state
2016-01-25 11:28:29 +08:00
Asias He
d7c7994f37 gossip: Drop unused serialization code
- versioned_value
2016-01-25 11:28:29 +08:00
Asias He
8098ba10b7 gossip: Drop unused serialization code
- endpoint_state
2016-01-25 11:28:29 +08:00
Asias He
6660658742 gossip: Drop unused serialization code
- gossip_digest_serialization_helper
- gossip_digest
2016-01-25 11:28:29 +08:00
Asias He
736d21a912 gossip: Drop unused serialization code
- gossip_digest_syn
- gossip_digest_ack
- gossip_digest_ack2
2016-01-25 11:28:29 +08:00
Asias He
d94b7e49d2 idl: Add gossip_digest_syn
Added get_partioner and get_cluster_id
2016-01-25 11:28:28 +08:00
Avi Kivity
b415f87324 Merge "Serializer Deserializer code generation" from Amnon
"The series do the following:
It adds the code generation
Perform the needed changes in the current classes so each would have getter for
each of its serializable value and a constructor from the serialized values.
It adds a schema definition that cover gossip_diget_ack
It changes the messaging_service to use the generated code.

An overall explanation of the solution with a description of the schema IDL can
be found on the wiki page:

https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation
"
2016-01-24 12:56:42 +02:00
Amnon Heiman
d27734b9be Add a constructor to inet_address from uint32_t
inet_address uses uint32_t to store the ip address, but its constructor
is int32_t.
So this patch adds such a constructor.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
8a4d211a99 Changes the versioned_value to make serializeble
This patch contains two changes, it make the constructor with parameters
public. And it removes the dependency in messaging_service.hh from the
header file by moving some of the code to the .cc file.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
ddc3fe1328 endpoint_state adds a constructor for all serialized parameters
An external deserialize function needs a constructor with all the
serialized parameters.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Asias He
755d792c78 gossip: Wait for gossip timer callback to finish in do_stop_gossiping
Also do not rearm the timer if we stopped the gossip.

Message-Id: <73765857b554d9914e87b24d287ff35ab0af6fce.1453378191.git.asias@scylladb.com>
2016-01-21 14:15:57 +02:00
Asias He
cc3073b42d gossip: cleanup application_state
Drop the unused one.

Message-Id: <4cc45164d55742951b618d2c7b1e8bdb997f005a.1452771260.git.asias@scylladb.com>
2016-01-14 19:01:51 +02:00
Asias He
826b6ed877 gossip: Print node status in handle_major_state_change
Message-Id: <1452768680-32355-1-git-send-email-asias@scylladb.com>
2016-01-14 14:22:37 +02:00
Asias He
e7a899f5f3 gossip: Enable debug msg for convcit
Kill one FIXME in convict

Message-Id: <1452768680-32355-2-git-send-email-asias@scylladb.com>
2016-01-14 14:22:36 +02:00
Pekka Enberg
973c62a486 gms/gossiper: Fix compilation error
Commit 02b04e5 ("gossip: Add is_safe_for_bootstrap") needs on extra
curly bracket to compile.
Message-Id: <1452177529-13555-1-git-send-email-penberg@scylladb.com>
2016-01-07 16:42:55 +02:00
Asias He
02b04e5907 gossip: Add is_safe_for_bootstrap
Make the following tests pass:

bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test
bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test

    1) start node2
    2) wait for cql connection with node2 is ready
    3) stop node2
    4) delete data and commitlog directory for node2
    5) start node2

In step 5), node2 will do the bootstrap process since its data,
including the system table is wiped. It will think itself is a completly
new node and can possiblly stream from wrong node and violate
consistency.

To fix, we reject the boot if we found the node was in SHUTDOWN or
STATUS_NORMAL.

CASSANDRA-9765
Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>
2016-01-07 15:55:01 +02:00
Asias He
2345cda42f messaging_service: Rename shard_id to msg_addr
Use shard_id as the destination of the messaging_service is confusing,
since shard_id is used in the context of cpu id.
Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>
2016-01-07 10:36:35 +02:00
Asias He
8c909122a6 gossip: Add wait_for_gossip_to_settle
Implement the wait for gossip to settle logic in the bootup process.

CASSANDRA-4288

Fixes:
bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test

1) start node2
2) wait for cql connection with node2 is ready
3) stop node2
4) delete data and commitlog directory for node2
5) start node2

In step 5, sometimes I saw in shadow round of node2, it gets node2's
status as BOOT from other nodes in the cluster instead of NORMAL. The
problem is we do not wait for gossip to settle before we start cql server,
as a result, when we stop node2 in step 3), other nodes in the cluster
have not got node2's status update to NORMAL.
2016-01-07 10:09:25 +02:00
Avi Kivity
f3980f1fad Merge seastar upstream
* seastar 51154f7...8b2171e (9):
  > memcached: avoid a collision of an expiration with time_point(-1).
  > tutorial: minor spelling corrections etc.
  > tutorial: expand semaphores section
  > Merge "Use steady_clock where monotonic clock is required" from Vlad
  > Merge "TLS fixes + RPC adaption" from Calle
  > do_with() optimization
  > tutorial: explain limiting parallelism using semaphores
  > submit_io: change pending flushes criteria
  > apps: remove defunct apps/seastar

Adjust code to use steady_clock instead of high_resolution_clock.
2015-12-27 14:40:20 +02:00