The standard says, and clang enforces, that declaring a function via
a friend declaration is not sufficient for ADL to kick in. Add a namespace
level declaration so ADL works.
We take a reference of endpoint_state entry in endpoint_state_map. We
access it again after code which defers, the reference can be invalid
after the defer if someone deletes the entry during the defer.
Fix this by checking take the reference again after the defering code.
I also audited the code to remove unsafe reference to endpoint_state_map entry
as much as possible.
Fixes the following SIGSEGV:
Core was generated by `/usr/bin/scylla --log-to-syslog 1 --log-to-stdout
0 --default-log-level info --'.
Program terminated with signal SIGSEGV, Segmentation fault.
(this=<optimized out>) at /usr/include/c++/5/bits/stl_pair.h:127
127 in /usr/include/c++/5/bits/stl_pair.h
[Current thread is 1 (Thread 0x7f1448f39bc0 (LWP 107308))]
Fixes#2271
Message-Id: <529ec8ede6da884e844bc81d408b93044610afd2.1491960061.git.asias@scylladb.com>
We saw the message twice for the same feature check. This is a bit
confusing.
INFO 2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {RANGE_TOMBSTONES} in features {}
INFO 2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {RANGE_TOMBSTONES} in features {}
INFO 2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {LARGE_PARTITIONS} in features {}
INFO 2016-12-15 11:26:23,993 [shard 0] gossip - Checking if need_features {LARGE_PARTITIONS} in features {}
This is because
ss._range_tombstones_feature = gms::feature(RANGE_TOMBSTONES_FEATURE);
ss._large_partitions_feature = gms::feature(LARGE_PARTITIONS_FEATURE);
The first message is printed when gms::feature(RANGE_TOMBSTONES_FEATURE)
is constructed. The second message is printed when the
ss._range_tombstones_feature is copy-constructed.
Add the helper function to enable the a feature and log the feature is
enabled.
When a feature is enabled, we see
INFO 2016-12-15 11:29:32,443 [shard 0] gossip - Feature LARGE_PARTITIONS is enabled
INFO 2016-12-15 11:29:32,443 [shard 0] gossip - Feature RANGE_TOMBSTONES is enabled
in the log.
If exception is triggered early in boot when doing an I/O operation,
scylla will fail because io checker calls storage service to stop
transport services, and not all of them were initialized yet.
Scylla was failing as follow:
scylla: ./seastar/core/sharded.hh:439: Service& seastar::sharded<Service>::local()
[with Service = gms::gossiper]: Assertion `local_is_initialized()' failed.
Aborting on shard 0.
Backtrace:
0x000000000048a2ca
0x000000000048a3d3
0x00007fc279e739ff
0x00007fc279ad6a27
0x00007fc279ad8629
0x00007fc279acf226
0x00007fc279acf2d1
0x0000000000c145f8
0x000000000110d1bc
0x000000000041bacd
0x00000000005520f1
0x00007fc279aeaf1f
Aborted (core dumped)
Refs #883.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Asias He <asias@scylladb.com>
Message-Id: <963f7b0f5a7a8a1405728b414a7d7a6dccd70581.1479172124.git.asias@scylladb.com>
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.
Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
Print when the node will be removed from gossip membership, e.g.,
INFO 2016-09-27 08:54:49,262 [shard 0] gossip - Node 127.0.0.3 will be
removed from gossip at [2016-09-30 08:54:48]: (expire = 1475196888294489339,
now = 1474937689262295270, diff = 259199 seconds)
The expire time which is used to decide when to remove a node from
gossip membership is gossiped around the cluster. We switched to steady
clock in the past. In order to have a consistent time_point in all the
nodes in the cluster, we have to use wall clock. Switch to use
system_clock for gossip.
Fixes#1704
It is duplciated with "InetAddresss x.x.x.x is now UP" message.
INFO 2016-09-23 10:35:15,512 [shard 0] gossip - Node 127.0.0.1 has restarted, now UP, status = NORMAL
INFO 2016-09-23 10:35:15,513 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL
Make the log a bit cleaner.
For example:
gossip - InetAddress 127.0.0.4 is now UP, status = NORMAL
gossip - InetAddress 127.0.0.3 is now DOWN, status = LEFT
gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.
Fixes the issue:
Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)
Fixes#1656
Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
It is easier for user to figure out the configuration error.
The log looks like:
WARN 2016-08-22 15:04:56,214 [shard 0] gossip - ClusterName mismatch
from 127.0.0.2 test2!=test
WARN 2016-08-22 15:06:16,106 [shard 0] gossip - Partitioner mismatch from 127.0.0.2
org.apache.cassandra.dht.RandomPartitioner!=org.apache.cassandra.dht.Murmur3Partitioner
Fixes: #1587
Message-Id: <745ed8857da6f70745735b94eef7b226d2f22e10.1471849834.git.asias@scylladb.com>
$ tools/scyllatop/scyllatop.py '*gossip*'
node-1/gossip-0/gauge-heart_beat_version 1.0
node-2/gossip-0/gauge-heart_beat_version 1.0
node-3/gossip-0/gauge-heart_beat_version 1.0
Gossip heart beat version changes every second. If everyting is working
correctly, the gauge-heart_beat_version output should be 1.0. If not,
the gauge-heart_beat_version output should be less than 1.0.
Message-Id: <cbdaa1397cdbcd0dc6a67987f8af8038fd9b2d08.1470712861.git.asias@scylladb.com>
In 3a36ec33db (gossip: Wait longer for seed node during boot up), we
increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5
seconds * 60 = 5 minutes.
In 57ee9676c2 (storage_service: Fix default ring_delay time), we fixed
the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds
= 30 minutes, which is too long.
Make it 5 minues.
We want to prevent older version of scylla which has fewer features to
join a cluster with newer version of scylla which has more features,
because when scylla sees a feature is enabled on all other nodes, it
will start to use the feature and assume existing nodes and future nodes
will always have this feature.
In order to support downgrade during rolling upgrade, we need to support
mixed old and new nodes case.
1) All old nodes
O O O O O <- N OK
O O O O O <- O OK
2) All new nodes
N N N N N <- N OK
N N N N N <- O FAIL
3) Mixed old and new nodes
O N O N O <- N OK
O N O N O <- O OK
(O == old node, N == new node, <- == joining the cluster)
With this patch, I tested:
1.1) Add new node to new node cluster
gossip - Feature check passed. Local node 127.0.0.4 features =
{RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES}
1.2) Add old node to old node cluster
gossip - Feature check passed. Local node 127.0.0.4 features = {},
Remote common_features = {}
2.1) Add new node to new node cluster
gossip - Feature check passed. Local node 127.0.0.4 features =
{RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES}
2.2) Add old node to new node cluster
seastar - Exiting on unhandled exception: std::runtime_error (Feature
check failed. This node can not join the cluster because it does not
understand the feature. Local node 127.0.0.4 features = {}, Remote
common_features = {RANGE_TOMBSTONES})
3.1) Add new node to mixed cluster
gossip - Feature check passed. Local node 127.0.0.4 features =
{RANGE_TOMBSTONES}, Remote common_features = {}
3.2) Add old node to mixed cluster
gossip - Feature check passed. Local node 127.0.0.4 features = {},
Remote common_features = {}
Fixes#1253
If the feature string is empty, boost::split will return
std::set<sstring> = {""} instead of std::set<sstring> = {}
which will make a node with a feaure, e.g. std::set<sstring> =
{"RANGE_TOMBSTONES"}, think it does not understand the feature of
a node with no features at all.
This patch changes the gms::feature destructor so it
checks whether the gossiper has been stopped before trying
to unregister the feature.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
When a node starts up, peer node can send gossip syn message to it
before the gossip message handlers are registered in messaging_service.
We can see:
scylla[123]: [shard 0] rpc - client a.b.c.d: unknown verb exception 6 ignored
To fix, we delay the listening of messaging_service to the point when
gossip message handlers are registered.
Message-Id: <9b20d85e199ef0e44cdcde2920123a301a88f3d7.1464254400.git.asias@scylladb.com>
This class encapsulates the waiting for a cluster feature. A feature
object is registered with the gossiper, which is responsible for later
marking it as enabled.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the sleep-based mechanism of detecting new features
by instead registering waiters with a condition variable that is
signaled whenever a new endpoint information is received.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch removes the timeout when waiting for features,
since future patches will make this argument unnecessary.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch fixes an inadvertent change to the shadow endpoint state
map in gossiper::run, done by calling get_heart_beat_state() which
also updates the endpoint state's timestamp. This did not happen for
the normal map, but did happen for the shadow map. As a result, every
time gossiper::run() was scheduled, endpoint_map_changed would always
be true and all the shards would make superfluous copies of the
endpoint state maps.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464309023-3254-2-git-send-email-duarte@scylladb.com>
_live_endpoints_just_added tracks the peer node which just becomes live.
When a down node gets back, the peer nodes can receive multiple messages
which would mark the node up, e.g., the message piled up in the sender's
tcp stack, after a node was blocked with gdb and released. Each such
message will trigger a echo message and when the reply of the echo
message is received (real_mark_alive), the same node will be added to
_live_endpoints_just_added.push_back more than once. Thus, we see the
same node be favored more than once:
INFO 2016-04-12 12:09:57,399 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:09:58,412 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:09:59,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:00,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:01,430 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:02,442 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO 2016-04-12 12:10:03,454 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
To fix, do not insert the node if it is already in
_live_endpoints_just_added.
Fixes#1178
Message-Id: <6bcfad4430fbc63b4a8c40ec86a2744bdfafb40f.1464161975.git.asias@scylladb.com>
In perf-flame, I saw in
service::storage_proxy::create_write_response_handler (2.66% cpu)
gossiper::is_alive takes 0.72% cpu
locator::token_metadata::pending_endpoints_for takes 1.2% cpu
After this patch:
service::storage_proxy::create_write_response_handler (2.17% cpu)
gossiper::is_alive does not show up at all
locator::token_metadata::pending_endpoints_for takes 1.3% cpu
There is no need to copy the endpoint_state from the endpoint_state_map
to check if a node is alive. Optimize it since gossiper::is_alive is
called in the fast path.
Message-Id: <2144310aef8d170cab34a2c96cb67cabca761ca8.1463540290.git.asias@scylladb.com>
seastar::async() creates a seastar thread and to do that allocates
memory. That allocation, obviously, may fail so the error handling code
needs to be moved so that it also catches errors from thread creation.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.
The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.
This series adds API to do so.
The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.
--- a/service/storage_service.cc
+++ b/service/storage_service.cc
@@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
// Add features supported by this local node. When a new feature is
// introduced in scylla, update it here, e.g.,
// return sstring("FEATURE1,FEATURE2")
- return sstring("");
+ return sstring("FEATURE1,FEATURE2");
}
std::set<inet_address> get_seeds() {
@@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
// gossip snitch infos (local DC and rack)
gossip_snitch_info().get();
+ gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
+ logger.info("Wait for FEATURE1 and FEATURE2 done");
+ gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
+ logger.info("Wait for FEATURE3 done");
+
We can query the supported_features:
cqlsh> SELECT supported_features from system.peers;
supported_features
--------------------
FEATURE1,FEATURE2
FEATURE1,FEATURE2
(2 rows)
cqlsh> SELECT supported_features from system.local;
supported_features
--------------------
FEATURE1,FEATURE2
(1 rows)"
API to wait for features are available on a node or all the nodes in the
cluster.
$timeout specifies how long we want to wait. If the features are not
availabe yet, sleep 2 seconds and retry.