Commit Graph

386 Commits

Author SHA1 Message Date
Asias He
cc18da5640 Revert "gossip: Make bootstrap more robust"
This reverts commit b56ba02335.

After commit 8fa35d6ddf (messaging_service: Get rid of timeout and retry
logic for streaming verb), streaming verb in rpc does not check if a
node is in gossip memebership since all the retry logic is removed.

Remove the extra wait before removing the joining node from gossip
membership.

Message-Id: <a416a735bb8aad533bbee190e3324e6b16799415.1504063598.git.asias@scylladb.com>
2017-08-30 10:14:11 +03:00
Asias He
a36141843a gossip: Switch to seastar::lowres_system_clock
The newly added lowres_system_clock is good enough for gossip
resolution. Switch to use it.

Message-Id: <fe0e7a9ef1ea0caffaa8364afe5c78b6988613bf.1503971833.git.asias@scylladb.com>
2017-08-29 10:16:25 +03:00
Asias He
2701bfd1f8 gossip: Use unordered_map for _unreachable_endpoints and _shadow_unreachable_endpoints
The _unreachable_endpoints will be accessed in fast path soon by the
hinted hand off code.

Message-Id: <500d9cbb2117ab7b070fd1bd111c5590f46c3c3a.1503971826.git.asias@scylladb.com>
2017-08-29 10:15:55 +03:00
Tomasz Grabiec
16c1b0fb6b Merge "Reduce dependencies on types.hh" from Avi
* 'deps1/v1' of https://github.com/avikivity/scylla:
  types.hh: extract marshal_exception from types.hh into a new file
  utils: remove dependency on types.hh
  locator: add missing include "log.hh"
  supervisor: remove dependency on init.hh
  tracing: add missing include "log.hh"
  gms: remove unneeded #include "types.hh"
2017-08-28 13:58:46 +02:00
Tzach Livyatan
12fb975282 Fix typos in metrics description
Fixes #2658

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20170803121732.19640-1-tzach@scylladb.com>
2017-08-28 10:48:28 +03:00
Avi Kivity
171fe67a64 gms: remove unneeded #include "types.hh" 2017-08-27 15:18:57 +03:00
Asias He
65912dd1ac gossip: Add is_normal helper
It will be used by repair to check if a node is in NORMAL status.
2017-08-23 14:40:04 +08:00
Asias He
cf6f4a5185 gossip: Introduce the shadow_round_ms option
It specifies the maximum gossip shadow round time. It can be used to
reduce the gossip feature check time during node boot up.
For instance, when the first node in the cluster, which listed both
itself and other node as seed in the yaml config, boots up, it will try
to talk to other seed nodes which are not started yet. The gossip shadow
round will be used to fetch the feature info of the cluster. Since there
is no other seed node in the cluster, the shadow round will fail. User
can reduce the default shadow_round_ms option to reduce the boot time.

Fixes #2615
Message-Id: <10916ce9059f3c7f1a1fb465919ae57de3b67d59.1500540297.git.asias@scylladb.com>
2017-08-02 09:52:35 +03:00
Asias He
515a744303 gossip: Fix nr_live_nodes calculation
We need to consider the _live_endpoints size. The nr_live_nodes should
not be larger than _live_endpoints size, otherwise the loop to collect
the live node can run forever.

It is a regression introduced in commit 437899909d
(gossip: Talk to more live nodes in each gossip round).

Fixes #2637

Message-Id: <863ec3890647038ae1dfcffc73dde0163e29db20.1501026478.git.asias@scylladb.com>
2017-07-26 16:48:30 +03:00
Asias He
ed7e6974d5 gms: Add is_shutdown helper for endpoint_state class
It will be used by streaming manager to check if a node is in shutdown
status.
2017-07-19 10:11:05 +08:00
Asias He
adc5f0bd21 gossip: Implement the missing fd_max_interval_ms and fd_initial_value_ms option
It is useful for larger cluster with larger gossip message latency. By
default the fd_max_interval_ms is 2 seconds which means the
failure_detector will ignore any gossip message update interval larger
than 2 seconds. However, in larger cluster, the gossip message udpate
interval can be larger than 2 seconds.

Fixes #2603.

Message-Id: <49b387955fbf439e49f22e109723d3a19d11a1b9.1500278434.git.asias@scylladb.com>
2017-07-17 13:29:16 +03:00
Tomasz Grabiec
18a9e1762c service: Advertise schema tables format version through gossip
Will be needed to inhibit schema exchange on per-peer basis.
2017-07-07 19:07:59 +02:00
Asias He
e31d4a3940 gossip: Use vector for _live_endpoints
To speed up the random access in get_random_node. Switch to use vector
instead of set.
2017-06-26 22:49:59 +08:00
Asias He
437899909d gossip: Talk to more live nodes in each gossip round
In large clusters with multiple DC deployment, it is observed that it
takes long delay for gossip update to disseminate in the cluster.

To speed up, talk to more live nodes in each gossip round.

Fixes #2528
2017-06-26 22:49:59 +08:00
Avi Kivity
236a8370e4 Remove use of std::random_shuffle()
It was removed in C++17. Replace with std::shuffle().
Message-Id: <20170626063809.7563-1-avi@scylladb.com>
2017-06-26 09:36:38 +02:00
Gleb Natapov
8ca1432b04 Distribute cache temperature over gossiper.
When a node start it does not have any information about cache temperature
of other nodes in the cluster and it is hard (if not impossible) to make
right guess. During cluster startup all nodes have cold caches, so there
is no point to redirect reads to other nodes even though local cache it
cold, but if only that node restarted than other nodes have populated
cache and reads should be redirected.

The node will get up-to-date information about other nodes caches,
but only after receiving first reply, until then it does not have the
information to make right decisions which may cause unwanted spikes
immediately after restart. Having cache temperature in gossiper helps
to solve the problem.
2017-06-13 09:57:14 +03:00
Asias He
b56ba02335 gossip: Make bootstrap more robust
The bootstrapping node will be a gossip only member, until the streaming
finishes and the node becomes NORMAL state. If during this time, the
bootstrapping node is overwhelmed with streaming, it is possible the node will
delay the update the gossip heartbeat. Be forgiving for the bootstrapping node
and do not remove it from gossip too fast. Otherwise, streaming rpc verbs will
not be resent becasue the node is not in gossip membership anymore.

Fixes #2150

Message-Id: <286d7035d854f2a48abf4e1e2e3bfcb8b22b9ca2.1494553580.git.asias@scylladb.com>
2017-05-21 19:25:40 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Asias He
3bd9840c01 gossip: Ignore callbacks and mark alive operation in shadow round
In shadow round, we only interested in the peer's endpoint_state, e.g., gossip
features, host_id, tokens. No need to call the on_restart or on_join callbacks
or to go through the mark alive procedure with EchoMessage gossip message. We
will do them during normal gossip runs anyway.
2017-05-03 07:24:21 +08:00
Asias He
1441ae5cac gossip: Ingore the duplicated mark alive operation
If a node is being marked as alive with EchoMessage, ignore the future
duplicated mark alive opeariton.
2017-05-03 07:24:21 +08:00
Asias He
d682fbfa28 gossip: Fix user after free in mark_alive
After sending echo message, the Node might not be in the
endpoint_state_map anymore, use the reference of local_state
might cause user after free.

Fixes #2341
2017-05-03 07:24:20 +08:00
Avi Kivity
c885c468a9 gms: expose gms::inet_address streaming operator
The standard says, and clang enforces, that declaring a function via
a friend declaration is not sufficient for ADL to kick in.  Add a namespace
level declaration so ADL works.
2017-04-17 23:03:15 +03:00
Asias He
d27b47595b gossip: Fix possible use-after-free of entry in endpoint_state_map
We take a reference of endpoint_state entry in endpoint_state_map. We
access it again after code which defers, the reference can be invalid
after the defer if someone deletes the entry during the defer.

Fix this by checking take the reference again after the defering code.

I also audited the code to remove unsafe reference to endpoint_state_map entry
as much as possible.

Fixes the following SIGSEGV:

Core was generated by `/usr/bin/scylla --log-to-syslog 1 --log-to-stdout
0 --default-log-level info --'.
Program terminated with signal SIGSEGV, Segmentation fault.
(this=<optimized out>) at /usr/include/c++/5/bits/stl_pair.h:127
127     in /usr/include/c++/5/bits/stl_pair.h
[Current thread is 1 (Thread 0x7f1448f39bc0 (LWP 107308))]

Fixes #2271

Message-Id: <529ec8ede6da884e844bc81d408b93044610afd2.1491960061.git.asias@scylladb.com>
2017-04-13 13:18:17 +03:00
Calle Wilund
0a740b5ccf gms::inet_address: Add lookup functionality.
To find addresses by name.
2017-02-06 09:45:37 +00:00
Tomasz Grabiec
ddfee57c97 Replace iostream include with iosfwd in headers
Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>
2017-01-17 14:52:44 +02:00
Vlad Zolotarov
eb4fbb3949 gms::gossiper: move collectd counters registration to the metrics registration layer
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-01-10 16:24:55 -05:00
Asias He
e578e65103 gossip: Log feature enabled message on shard zero only
Feature is per node. No need to log them number of shards times.
2016-12-15 16:33:11 +08:00
Asias He
4137fab91b gossip: Make log in check_features debug level
We saw the message twice for the same feature check. This is a bit
confusing.

INFO  2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {RANGE_TOMBSTONES} in features {}
INFO  2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {RANGE_TOMBSTONES} in features {}
INFO  2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {LARGE_PARTITIONS} in features {}
INFO  2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {LARGE_PARTITIONS} in features {}

This is because

   ss._range_tombstones_feature = gms::feature(RANGE_TOMBSTONES_FEATURE);
   ss._large_partitions_feature = gms::feature(LARGE_PARTITIONS_FEATURE);

The first message is printed when gms::feature(RANGE_TOMBSTONES_FEATURE)
is constructed. The second message is printed when the
ss._range_tombstones_feature is copy-constructed.
2016-12-15 16:33:10 +08:00
Asias He
2b1ebc4719 gossip: Introduce gms:features::enable helper
Add the helper function to enable the a feature and log the feature is
enabled.

When a feature is enabled, we see

INFO  2016-12-15 11:29:32,443 [shard 0] gossip - Feature LARGE_PARTITIONS is enabled
INFO  2016-12-15 11:29:32,443 [shard 0] gossip - Feature RANGE_TOMBSTONES is enabled

in the log.
2016-12-15 16:33:10 +08:00
Asias He
86c2620b7a gossip: Skip stopping if it is not started
If exception is triggered early in boot when doing an I/O operation,
scylla will fail because io checker calls storage service to stop
transport services, and not all of them were initialized yet.

Scylla was failing as follow:
scylla: ./seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local()
[with Service = gms::gossiper]: Assertion `local_is_initialized()' failed.
Aborting on shard 0.
Backtrace:
  0x000000000048a2ca
  0x000000000048a3d3
  0x00007fc279e739ff
  0x00007fc279ad6a27
  0x00007fc279ad8629
  0x00007fc279acf226
  0x00007fc279acf2d1
  0x0000000000c145f8
  0x000000000110d1bc
  0x000000000041bacd
  0x00000000005520f1
  0x00007fc279aeaf1f
Aborted (core dumped)

Refs #883.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Asias He <asias@scylladb.com>
Message-Id: <963f7b0f5a7a8a1405728b414a7d7a6dccd70581.1479172124.git.asias@scylladb.com>
2016-12-05 09:42:37 +02:00
Calle Wilund
218df55349 failure_detector: add accessor and api shortcut for arrival samples 2016-11-08 12:22:04 +00:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Asias He
511f8aeb91 gossip: Do not remove failure_detector history on remove_endpoint
Otherwise a node could wrongly think the decommissioned node is still
alive and not evict it from the gossip membership.

Backport: CASSANDRA-10371

7877d6f Don't remove FailureDetector history on removeEndpoint

Fixes #1714
Message-Id: <f7f6f1eec2aab1b97a2e568acfd756cca7fc463a.1475112303.git.asias@scylladb.com>
2016-09-29 13:00:47 +03:00
Asias He
774d16306f gossip: Use lowres_clock for scheduled_gossip_task
The timer is fired once per second. Using low resolution clock is enough.
Message-Id: <1f21514e975afea6ac5c9dde18a881a41561da70.1475130948.git.asias@scylladb.com>
2016-09-29 10:03:14 +03:00
Asias He
1292341d77 gossip: Improve the expire time logging
Print when the node will be removed from gossip membership, e.g.,

INFO  2016-09-27 08:54:49,262 [shard 0] gossip - Node 127.0.0.3 will be
removed from gossip at [2016-09-30 08:54:48]: (expire = 1475196888294489339,
now = 1474937689262295270, diff = 259199 seconds)
2016-09-27 16:42:35 +08:00
Asias He
f0d3084c8b gossip: Switch to use system_clock
The expire time which is used to decide when to remove a node from
gossip membership is gossiped around the cluster. We switched to steady
clock in the past. In order to have a consistent time_point in all the
nodes in the cluster, we have to use wall clock. Switch to use
system_clock for gossip.

Fixes #1704
2016-09-27 16:42:13 +08:00
Asias He
830f4ee353 gossip: Make two log items debug level
It is duplciated with "InetAddresss x.x.x.x is now UP" message.

INFO  2016-09-23 10:35:15,512 [shard 0] gossip - Node 127.0.0.1 has restarted, now UP, status = NORMAL
INFO  2016-09-23 10:35:15,513 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL

Make the log a bit cleaner.
2016-09-25 07:17:19 +08:00
Asias He
a26a26963c gossip: Print node status when node is UP or DOWN
For example:

gossip - InetAddress 127.0.0.4 is now UP, status = NORMAL
gossip - InetAddress 127.0.0.3 is now DOWN, status = LEFT
gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown
2016-09-25 07:17:19 +08:00
Asias He
1d9401d080 gossip: Ignore the node which is decommissioned in gossip round
If the node is decommissioned, there is no point to try to contact it
again in the gossip round.
2016-09-25 07:17:19 +08:00
Asias He
4b73443222 gossip: Print convict debug info only when the node is alive 2016-09-25 07:17:19 +08:00
Asias He
99a2ae0fb5 gossip: Add more timing log in add_expire_time_for_endpoint
It tells when the node is expected to expire and how many seconds are
left.
2016-09-25 07:17:19 +08:00
Asias He
aa47265381 gossip: Fix std::out_of_range in setup_collectd
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.

Fixes the issue:

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)

Fixes #1656

Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
2016-09-20 19:38:16 +03:00
Asias He
4ffd867ad0 gossip: Add log when cluster or partioner mismatch
It is easier for user to figure out the configuration error.

The log looks like:

   WARN  2016-08-22 15:04:56,214 [shard 0] gossip - ClusterName mismatch
   from 127.0.0.2 test2!=test

   WARN  2016-08-22 15:06:16,106 [shard 0] gossip - Partitioner mismatch from 127.0.0.2
   org.apache.cassandra.dht.RandomPartitioner!=org.apache.cassandra.dht.Murmur3Partitioner

Fixes: #1587

Message-Id: <745ed8857da6f70745735b94eef7b226d2f22e10.1471849834.git.asias@scylladb.com>
2016-08-22 11:06:31 +03:00
Asias He
ef782f0335 gossip: Add heart_beat_version to collectd
$ tools/scyllatop/scyllatop.py '*gossip*'

node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0

Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.

Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
2016-08-15 12:32:00 +03:00
Asias He
d8bff4f745 gossip: Fix debug log in wait_for_gossip_to_settle
There is an extra '{}' in the logger format string.

Fixes:

gossip - Gossip looks settled. 8 gossip round completed: ???

Message-Id: <1470278008-29914-2-git-send-email-asias@scylladb.com>
2016-08-08 16:38:21 +03:00
Asias He
0c56bbe793 gossip: Make get_supported_features and wait_for_feature_on{_all}_node private
They are used only inside gossiper itself. Also make the helper
get_supported_features(std::unordered_map<gms::inet_address, sstring>) static.

Message-Id: <f434c145ad9138084708b60c1d959b84360e47b2.1467775291.git.asias@scylladb.com>
2016-07-06 09:54:56 +03:00
Asias He
bb80362c3f gossip: Insert with result.end() in get_supported_features
It is faster than result.begin(), suggested by Avi.
2016-07-05 10:09:54 +08:00
Asias He
72cb4a228b gossip: Add to_feature_set helper
To convert a "," split feature string to a feature set.
2016-07-05 10:09:54 +08:00
Asias He
1d6c57fb40 gossip: Reduce timeout in shadow round
In 3a36ec33db (gossip: Wait longer for seed node during boot up), we
increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5
seconds * 60 = 5 minutes.

In 57ee9676c2 (storage_service: Fix default ring_delay time), we fixed
the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds
= 30 minutes, which is too long.

Make it 5 minues.
2016-07-05 10:09:54 +08:00
Asias He
88f0bb3a7b gossip: Add check_knows_remote_features
To check if this node knows features in
std::unordered_map<inet_address, sstring> peer_features_string
2016-07-05 10:09:54 +08:00