When a node start it does not have any information about cache temperature
of other nodes in the cluster and it is hard (if not impossible) to make
right guess. During cluster startup all nodes have cold caches, so there
is no point to redirect reads to other nodes even though local cache it
cold, but if only that node restarted than other nodes have populated
cache and reads should be redirected.
The node will get up-to-date information about other nodes caches,
but only after receiving first reply, until then it does not have the
information to make right decisions which may cause unwanted spikes
immediately after restart. Having cache temperature in gossiper helps
to solve the problem.
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.
The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.
This series adds API to do so.
The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.
--- a/service/storage_service.cc
+++ b/service/storage_service.cc
@@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
// Add features supported by this local node. When a new feature is
// introduced in scylla, update it here, e.g.,
// return sstring("FEATURE1,FEATURE2")
- return sstring("");
+ return sstring("FEATURE1,FEATURE2");
}
std::set<inet_address> get_seeds() {
@@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
// gossip snitch infos (local DC and rack)
gossip_snitch_info().get();
+ gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
+ logger.info("Wait for FEATURE1 and FEATURE2 done");
+ gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
+ logger.info("Wait for FEATURE3 done");
+
We can query the supported_features:
cqlsh> SELECT supported_features from system.peers;
supported_features
--------------------
FEATURE1,FEATURE2
FEATURE1,FEATURE2
(2 rows)
cqlsh> SELECT supported_features from system.local;
supported_features
--------------------
FEATURE1,FEATURE2
(1 rows)"
This patch contains two changes, it make the constructor with parameters
public. And it removes the dependency in messaging_service.hh from the
header file by moving some of the code to the .cc file.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The only reason we needed it is to make
_application_state[key] = value
work.
With the current default constructor, we increase the version number
needlessly. To fix and to be safe, remove the default constructor
completely.
Backported: CASSANDRA-8336 and CASSANDRA-9871
84b2846 remove redundant state
b2c62bb Add shutdown gossip state to prevent timeouts during rolling restarts
8f9ca07 Cannot replace token does not exist - DN node removed as Fat Client
Fixes:
When X is shutdown, X sends SHUTDOWN message to both Y and Z, but for
some reason, only Y receives the message and Z does not receive the
message. If Z has a higher gossip version for X than Y has for
X, Z will initiate a gossip with Y and Y will mark X alive again.
X ------> Y
\ /
\ /
Z
We do not care about the order of the tokens.
Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.
With the next patch "gossip: Add storage_service_value_factory helper"
in this series.
[asias@hjpc urchin]$ ninja-build
[8/10] LINK build/release/seastar
build/release/gms/gossiper.o: In function
`gms::versioned_value::versioned_value_factory::removing_nonlocal(utils::UUID
const&)':
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:201:
undefined reference to `gms::versioned_value::DELIMITER_STR'
build/release/gms/gossiper.o: In function
`gms::versioned_value::versioned_value_factory::removal_coordinator(utils::UUID
const&)':
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:211:
undefined reference to `gms::versioned_value::DELIMITER_STR'
build/release/gms/gossiper.o: In function
`gms::versioned_value::versioned_value_factory::removed_nonlocal(utils::UUID
const&, long)':
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:206:
undefined reference to `gms::versioned_value::DELIMITER_STR'
/home/asias/src/cloudius-systems/urchin/./gms/versioned_value.hh:206:
undefined reference to `gms::versioned_value::DELIMITER_STR'
collect2: error: ld returned 1 exit status
Fix by defining the symbol in gms/versioned_value.cc.
Left some of the stuff I got tired of converting in #if 0. Most likely
I'll need them later, and convert them then.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>