scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 12:06:44 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	6db002163f	gms/feature: Introduce a more convenient when_enabled() It can be invoked with a lambda without the ceremony of creating a class deriving from gms::feature::listener. The reutrned registration object controls listener's scope.	2019-04-28 12:33:10 +02:00
Tomasz Grabiec	22c07b9183	gms/feature: Mark all when_enabled() overloads as const	2019-04-28 12:33:10 +02:00
Piotr Jastrzebski	9934740c39	Register feature listeners in storage_service Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:36:58 +02:00
Piotr Jastrzebski	460fb260cb	feature: add when_enabled callbacks Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Asias He	b2c110699e	gms: Remove i_failure_detector.hh It is not used any more.	2019-03-22 09:08:51 +08:00
Asias He	af579a055b	gossip: Get rid of the gms::get_local_failure_detector static object Store the failure_detector object inside gossiper object. - No more the global object sharded<failure_detector> - No need to initialize sharded<failure_detector> manually which simplifies the code in tests/cql_test_env.cc and init.cc.	2019-03-22 09:08:51 +08:00
Asias He	967794798a	gossiper: Do not use value_factory from storage_service object Avoid using value_factory from storage_service inside gossiper.	2019-03-22 08:26:47 +08:00
Asias He	4a55617c6c	gossiper: Use cfg options from _cfg instead of get_local_storage_service Gossiper has db::config _cfg now, avoid using the get_local_storage_service() to get config options.	2019-03-22 08:26:44 +08:00
Asias He	ee1227b3ae	gossiper: Pass db::config object to gossiper class Gossiper calls service::get_local_storage_service() to get cfg options. To avoid cyclic dependency, pass the cfg object to gossiper directly.	2019-03-22 08:25:16 +08:00
Asias He	71bf757b2c	gossiper: Enable features only after gossip is settled n1, n2, n3 in the cluster, shutdown n1, n2, n3 start n1, n2 start n3, we saw features are enabled using the system table while n1 and n2 are already up and running in the cluster. INFO 2019-02-27 09:24:41,023 [shard 0] gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,025 [shard 0] storage_service - Starting up server gossip INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.1 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} INFO 2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.2 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH} The problem is we enable the features too early in the start up process. We should enable features after gossip is settled. Fixes #4289 Message-Id: <04f2edb25457806bd9e8450dfdcccc9f466ae832.1551406991.git.asias@scylladb.com>	2019-03-18 18:25:29 +01:00
Asias He	1d59f26c11	gossiper: Fix empty remote common_features in check_knows_remote_features Three nodes in the cluster node1, node2, node3 Shutdown the whole cluster Start node1 Start node2, node2 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.2 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't started yet, node1 sees node3 has empty features. In get_supported_features(), an empty common features will be returned if an empty features of a node is seen. To fix, we should fallback to use the features saved in system table. Start node3, node3 sees empty remote common_features. gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {} The problem is node3 hasn't inserted its own features into gossip endpoint_state_map. get_supported_features() returns the common features of all nodes in endpoint_state_map. To fix, we should fallback to use the features stored in the system table for such node in this case. Fixes #4225	2019-03-18 10:56:10 +01:00
Asias He	acb4badbc3	gossiper: Log feature is enabled only if the feature is not enabled previously We saw the log "Feature FOO is enabled" more than once like below. It is better to log it only when the feature is not enabled previously. gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL gossip - Feature CORRECT_COUNTER_ORDER is enabled gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled gossip - Feature COUNTERS is enabled gossip - Feature DIGEST_MULTIPARTITION_READ is enabled gossip - Feature INDEXES is enabled gossip - Feature LARGE_PARTITIONS is enabled gossip - Feature LA_SSTABLE_FORMAT is enabled gossip - Feature MATERIALIZED_VIEWS is enabled gossip - Feature MC_SSTABLE_FORMAT is enabled gossip - Feature RANGE_TOMBSTONES is enabled gossip - Feature ROLES is enabled gossip - Feature ROW_LEVEL_REPAIR is enabled gossip - Feature SCHEMA_TABLES_V3 is enabled gossip - Feature STREAM_WITH_RPC_STREAM is enabled gossip - Feature TRUNCATION_TABLE is enabled gossip - Feature WRITE_FAILURE_REPLY is enabled gossip - Feature XXHASH is enabled gossip - Feature CORRECT_COUNTER_ORDER is enabled gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled gossip - Feature COUNTERS is enabled gossip - Feature DIGEST_MULTIPARTITION_READ is enabled gossip - Feature INDEXES is enabled gossip - Feature LARGE_PARTITIONS is enabled gossip - Feature LA_SSTABLE_FORMAT is enabled gossip - Feature MATERIALIZED_VIEWS is enabled gossip - Feature MC_SSTABLE_FORMAT is enabled gossip - Feature RANGE_TOMBSTONES is enabled gossip - Feature ROLES is enabled gossip - Feature ROW_LEVEL_REPAIR is enabled gossip - Feature SCHEMA_TABLES_V3 is enabled gossip - Feature STREAM_WITH_RPC_STREAM is enabled gossip - Feature TRUNCATION_TABLE is enabled gossip - Feature WRITE_FAILURE_REPLY is enabled gossip - Feature XXHASH is enabled gossip - InetAddress 127.0.0.2 is now UP, status = NORMAL	2019-03-18 10:56:10 +01:00
Asias He	f32f08c91e	gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node Remove unused check_features helper as well.	2019-03-18 10:56:09 +01:00
Asias He	6dbcb2e0c9	gossiper: Remove unused register_feature and unregister_feature They are not used any more.	2019-03-18 10:56:09 +01:00
Jesse Haber-Kucharsky	b39eac653d	Switch to the the CMake-ified Seastar Committer: Avi Kivity <avi@scylladb.com> Branch: next Switch to the the CMake-ified Seastar This change allows Scylla to be compiled against the `master` branch of Seastar. The necessary changes: - Add `-Wno-error` to prevent a Seastar warning from terminating the build - The new Seastar build system generates the pkg-config files (for example, `seastar.pc`) at configure time, so we don't need to invoke Ninja to generate them - The `-march` argument is no longer inherited from Seastar (correctly), so it needs to be provided independently - Define `SEASTAR_TESTING_MAIN` so that the definition of an entry point is included for all unit test compilation units - Independently link Scylla against Seastar's compiled copy of fmt in its build directory - All test files use the (now public) Seastar testing headers - Add some missing Seastar headers to source files [avi: regenerate frozen toolchain, adjust seastar submoule] Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <02141f2e1ecff5cbcd56b32768356c3bf62750c4.1548820547.git.jhaberku@scylladb.com>	2019-01-30 11:17:38 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	f02c64cadf	streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh This header, which is easily replaced with a forward declaration, introduces a dependency on database.hh everywhere. Remove it and scatter includes of database.hh in source files that really need it.	2019-01-05 17:33:25 +02:00
Duarte Nunes	8da6a31e75	service: Advertise view update backlog over gossip This lays the groundwork for brokering a node's view update backlog across the whole cluster. This is needed for when a coordinator does not contact a given replica for a long time, and uses a backlog view that is outdated and causes requests to be unnecessarily delayed. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Tomasz Grabiec	538e041f22	Merge "Remove some dependencies on db::config" from Avi db::config is a global class; changes in any module can cause changes in db::config. Therefore, it is a cause of needless recompilation. Remove some of these dependencies by having consumers of db::config declare an intermediate config struct that is contains only configuration of interest to them, and have their caller fill it out (in the case of auth, it already followed this scheme and the patchset only moves the translation function). In addition, some outright pointless inclusions of db/config.hh are removed. The result is somewhat shorter compile times, and fewer needless recompiles. * https://github.com/avikivity/scylla unconfig-1/v1: config: remove inclusions of db/config.hh from header files repair: remove unneeded config.hh inclusion batchlog_manager: remove dependency on db::config auth: remove permissions_cache dependency on db::config auth: remove auth::service dependency on db::config auth: remove unneeded db/config.hh includes	2018-12-10 14:53:14 +01:00
Asias He	e07150166a	gossip: Add gossiper::is_cql_ready - New scylla node always send application_state::RPC_READY = false when the node boots and send application_state::RPC_READY = true when cql server is up - Old scylla node that does not support the application_state::RPC_READY never has application_state::RPC_READY in the endpoint_state, we can only think their cql server is up, so we return true here if application_state::RPC_READY is not present	2018-12-10 19:16:44 +08:00
Asias He	2737654c75	gms: Add endpoint_state::is_cql_ready Retrun if the endpoint_state has the RPC_READY application_state.	2018-12-10 19:16:44 +08:00
Asias He	67093324ad	gms: Add application_state::RPC_READY It is used to tell peer nodes that the cql server is ready and can accept clients request. Follow the same name which Cassandra uses.	2018-12-10 19:16:44 +08:00
Asias He	4ed2ef23e9	gms: Introduce cql_ready in versioned_value	2018-12-10 19:16:43 +08:00
Avi Kivity	864f55e745	config: remove inclusions of db/config.hh from header files Instead, distribute those inclusions to .cc files that require them. This reduces rebuilds when config.hh changes, and makes it easier to locate files that need config disaggregation.	2018-12-09 20:11:38 +02:00
Tomasz Grabiec	6012a63660	Merge "Fix window during init where waiting for a feature can be ignored" from Avi storage_service keeps a bunch of "feature" variables, indicating cluster-wide supported features, and has the ability to wait until the entire cluster supports a given feature. The propagation of features depends on gossip, but gossip is initialized after storage_service, so the current code late-initializes the features. However, that means that whoever waits on a feature between storage_service initialization and gossip initialization loses their wait entry. In #3952, we have proof that this in fact happens. Fix this by removing the circular dependency. We now store features in a new service, feature_service, that is started before both gossip and storage_service. Gossip updates feature_service while storage_service reads for it. Fixes #3953. * https://github.com/avikivity/3953/v4.1: storage_service: deinline enable_all_features() gossiper: keep features registered tests/gossip: switch to seastar::thread storage_service: deinline init/deinit functions gossiper: split feature storage into a new feature_service gossiper: maybe enable features after start_gossiping() storage_service: fix gap when feature::when_enabled() doesn't work	2018-12-06 15:42:26 +01:00
Avi Kivity	587fd9b6c0	gossiper: maybe enable features after start_gossiping() Since we may now start with features already registered, we need to enable features immediately after gossip is started. This case happens in a cluster that already is fully upgraded on startup. Before this series, features were only added after this point.	2018-12-06 16:31:04 +02:00
Avi Kivity	4e553b692e	gossiper: split feature storage into a new feature_service Feature lifetime is tied to storage_service lifetime, but features are now managed by gossip. To avoid circular dependency, add a new feature_service service to manage feature lifetime. To work around the problem, the current code re-initializes features after gossip is initialized. This patch does not fix this problem; it only makes it possible to solve it by untyping features from gossip.	2018-12-06 16:31:04 +02:00
Avi Kivity	1215512e98	gossiper: keep features registered Gossiper unregisters enabled features as an optimization. However that makes decoupling features from gossiper harder. Disable this optimization; since the number of features is small and normal access is to a single feature at a time, there is no significant performance or memory loss.	2018-12-06 16:31:04 +02:00
Asias He	eeeb2da7bb	gossip: Fix race in real_mark_alive and shutdown msg In dtest, we have self.check_rows_on_node(node1, 2000) self.check_rows_on_node(node2, 2000) which introduce the following cluster operations: 1) Initially: - node1 up - node2 up 2) self.check_rows_on_node(node1, 2000) - node2 down - node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots up to mark node1 up) 3) self.check_rows_on_node(node2, 2000) - node1 down (B: node1 will send shutdown gossip message to node2, node2 will mark node1 down) - node1 up (C: when node1 is up, node2 will call gossiper::real_mark_alive) Since there is no guarantee the order of Operation A and Operation B, it is possible node2 will mark node1 as status=shutdown and mark node1 is UP. In Operation C, node2 will call gossiper::real_mark_alive to mark node1 up, but since node2 might think node1 is already up, node2 will exit early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is now UP, status={}" As a result, dtest fails to see node2 reports node1 is up when it boots node1 and fail the test. TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP'] In the log we can see node1 marked as DOWN and UP almost at the same time on node2: INFO 2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown INFO 2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown Fixes #3940 Tests: dtest with 20 consecutive succesful runs Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com>	2018-12-05 21:51:01 +02:00
Asias He	a5d8b66f2c	gossip: Make favor newly added node log debug level It is not very useful for user to know this. Message-Id: <6c2dfc522d6974adb97c34fbc1e3a0339d2d530c.1543997137.git.asias@scylladb.com>	2018-12-05 10:45:03 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Avi Kivity	e096fa2fde	gms: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Duarte Nunes	e46ef6723b	Merge seastar upstream * seastar d152f2d...c1e0e5d (6): > scripts: perftune.py: properly merge parameters from the command line and the configuration file > fmt: update to 5.2.1 > io_queue: only increment statistics when request is admitted > Adds `read_first_line.cc` and `read_first_line.hh` to CMake. > fstream: remove default extent allocation hint > core/semaphore: Change the access of semaphore_units main ctor Due to a compile-time fight between fmt and boost::multiprecision, a lexical_cast was added to mediate. sprint("%s", var) no longer accepts numeric values, so some sprint()s were converted to format() calls. Since more may be lurking we'll need to remove all sprint() calls. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-25 12:53:30 +03:00
Duarte Nunes	48ebe6552c	Merge 'Fix issues with endpoint state replication to other shards' from Tomasz Fixes #3798 Fixes #3694 Tests: unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test) * tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla: gms/gossiper: Replicate enpoint states in add_saved_endpoint() gms/gossiper: Make reset_endpoint_state_map() have effect on all shards gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards gms/gossiper: Always override states from older generations	2018-10-08 14:19:19 +01:00
Tomasz Grabiec	3c7de9fee9	gms/gossiper: Replicate enpoint states in add_saved_endpoint()	2018-10-04 12:54:00 +02:00
Tomasz Grabiec	ddf3a61bcf	gms/gossiper: Make reset_endpoint_state_map() have effect on all shards	2018-10-04 12:53:56 +02:00
Tomasz Grabiec	9e3f744603	gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards Lack of this may result in non-zero shards on some nodes still seeing STATUS as NORMAL for a node which shut down, in some cases. mark_as_shutdown() is invoked in reaction to an RPC call initiated by the node which is shutting down. Another way a node can learn about other node shutting down is via gossiping with a node which knows this. In that case, the states will be replicated to non-zero shards. The node which learnt via mark_as_shutdown() may also eventually propagate this to non-zero shards, e.g. when it gossips about it with other nodes, and its local version number at the time of mark_as_shudown() was smaller than the one used to set the STATE by the shutting down node.	2018-10-04 12:51:42 +02:00
Tomasz Grabiec	c4ec81e126	gms/gossiper: Always override states from older generations Application states of each node are versioned per-node with a pair of generation number (more significant) and value version. Generation number uniquely identifies the life time of a scylla process. Generation number changes after restart. Value versions start from 0 on each restart. When a node gets updates for application states, it merges them with its view on given node. Value updates with older versions are ignored. Gossiper processes updates only on shard 0, and replicates value updates to other shards. When it sees a value with a new generation, it correclty forgets all previous values. However, non-zero shards don't forget values from previous generations. As a result, replication will fail to override the values on non-zero shards when generation number changes until their value version exceeds the version prior to the restart. This will result in incorrect STATUS for non-seed nodes on non-zero shards. When restarting a non-seed node, it will do a shadow gossip round before setting its STATUS to NORMAL. In the shadow round it will learn from other nodes about itself, and set its STATUS to shutdown on all shards with a high value version. Later, when it sets its status to NORMAL, it will override it only on shard 0, because on other shards the version of STATUS is higher. This will cause CQL truncate to skip current node if the coordinator runs on non-zero shards. The fix is to override the entries on remote shards in the same way we do on shard 0. All updates to endpoint states should be already serialized on shard 0, and remote shards should see them in the same order. Introduced in `2d5fb9d` Fixes #3798 Fixes #3694	2018-10-04 12:47:27 +02:00
Tomasz Grabiec	9c57abcce7	gossiper: Fix shutdown_announce_in_ms not being respected shutdown_announce_in_ms specifies a period of time that a node which is shutting down waits to allow its state to propagate to other nodes. However, we were setting _enabled to false before waiting, which will make the current node ignore gossip messages. Message-Id: <1538576996-26283-1-git-send-email-tgrabiec@scylladb.com>	2018-10-03 15:43:00 +01:00
Asias He	02befb6474	gossip: Log seeds seen It is useful for debugging bootstap issue, especially for large clusters. Also do not use the `_seeds` as the set_seeds function parameter since there is a class member called _seeds. Refs #3417 Message-Id: <15e6bdf06376949ced1bdb845f810da09266783d.1532474820.git.asias@scylladb.com>	2018-08-01 10:57:56 +03:00
Nadav Har'El	25bd139508	cross-tree: clean up use of std::random_device() std::random_device() uses the relatively slow /dev/urandom, and we rarely if ever intend to use it directly - we normally want to use it to seed a faster random_engine (a pseudo-random number generator). In many places in the code, we first created a random_device variable, and then using it created a random_engine variable. However, this practice created the risk of a programmer accidentally using the random_device object, instead of the random_engine object, because both have the same API; This hurts performance. This risk materialized in just two places in the code, utils/uuid.cc and gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is not included in this patch, and the fix for gossiper.{cc,hh} is included here. To avoid risking the same mistake in the future, this patch switches across the code to an idiom where the random_device object is not named, so cannot be accidentally used. We use the following idiom: std::default_random_engine _engine{std::random_device{}()}; Here std::random_device{}() creates the random device (/dev/urandom) and pulls a random integer from it. It then uses this seed to create the random_engine (the pseudo-random number generator). The std::random_device{} object is temporary and unnamed, and cannot be unintentionally used directly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180726154958.4405-1-nyh@scylladb.com>	2018-07-26 16:54:58 +01:00
Asias He	fd71c5718f	gossip: Reduce continuous memory usage Gossip SYN and ACK uses std::vector to store a list of gossip_digest, the larger the cluster, the more continuous memory is needed. To reduce the memory pressure which might cause std::bad_alloc, switch the std::vector to chunked_vector. In addition, change add_local_application_state to use std::list instead of std::vector. Refs #2782	2018-07-17 20:15:32 +08:00
Asias He	c3b5a2ecd5	gossip: Fix tokens assignment in assassinate_endpoint The tokens vector is defined a few lines above and is needed outsie the if block. Do not redefine it again in the if block, otherwise the tokens will be empty. Found by code inspection. Fixes #3551. Message-Id: <c7a06375c65c950e94236571127f533e5a60cbfd.1530002177.git.asias@scylladb.com>	2018-06-26 16:38:12 +01:00
Asias He	059ec89ad1	gms: Add is_normal helper to endpoint_state It is faster than gossiper::is_normal because it avoids to do search in the std::map<application_state, versioned_value>. It is useful for the code in the fast path which needs to query if a node is in NORMAL status. Fixes #3500 Message-Id: <42db91fa4108f9f4fcf94fed3ec403ccf35d15e9.1528354644.git.asias@scylladb.com>	2018-06-10 19:21:03 +03:00
Duarte Nunes	b1dd1876e5	gms/gossiper: Prevent duplicate processing of EchoMessage reply We make multiple attempts to mark a node as alive. We do that be sending an EchoMessage, and marking the node as alive upon receiving a successful answer. In case there's a network partition and the nodes can't reach each other, multiple messages may be delivered and processed. We can avoid processing duplicate EchoMessage replies by checking whether we had already marked the node as alive. Fixes #1184 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180428191942.31990-1-duarte@scylladb.com>	2018-04-29 14:20:01 +03:00
Calle Wilund	b1edf75c8b	types: Make seastar::inet_address the "native" type for CQL inet. Fixes #3187 Requires seastar "inet_address: Add constructor and conversion function from/to IPv4" Implements support IPv6 for CQL inet data. The actual data stored will now vary between 4 and 16 bytes. gms::inet_address has been augumented to interop with seastar::inet_address, though of course actually trying to use an Ipv6 address there or in any of its tables with throw badly. Tests assuming ipv4 changed. Storing a ipv4_address should be transparent, as it now "widens". However, since all ipv4 is inet_address, but not vice versa, there is no implicit overloading on the read paths. I.e. tests and system_keyspace (where we read ip addresses from tables explicitly) are modified to use the proper type. Message-Id: <20180424161817.26316-1-calle@scylladb.com>	2018-04-24 23:12:07 +01:00
Asias He	d71a94a08b	gossip: Add tokens and host_id in add_saved_endpoint Problem: Start node 1 2 3 Shutdown node2 Shutdown node1 node3 Start node1 node3 Try to repalce_address for node 2 The replace operation fails with the error: seastar - Exiting on unhandled exception: std::runtime_error (Cannot replace_address node2 because it doesn't exist in gossip) This is because after all nodes shutdown, the other nodes do not have the tokens and host_id info of node2 until node2 boots up and talks to the cluster. If node2 can not boots up for whatever reason, currently the only way to recover node2 is to `nodetool removenode` and bootstrap node2 again. This will change tokens in the cluster and cause more data movement than just replacing node2. To fix, we add the tokens and host_id gossip application state in add_saved_endpoint during boot up. This is pretty safe because the generation for application state added by add_saved_endpoint is zero, if node2 actually boots, other nodes will update with node2's version. Before: $ curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" \| python -mjson.tool { "addrs": "127.0.0.2", "generation": 0, "is_alive": false, "update_time": 1523344828953, "version": 0 } Node 2 can not be replaced. After: $ curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" \| python -mjson.tool { "addrs": "127.0.0.2", "application_state": [ { "application_state": 12, "value": "31284090-2557-4036-9367-7bb4ef49c35a", "version": 2 }, { "application_state": 13, "value": "... a lot of tokens ...", "version": 1 } ], "generation": 0, "is_alive": false, "update_time": 1523344828953, "version": 0 } Node 2 can be replaced. Tests: dtest/replace_address_test.py Fixes: #3347 Message-Id: <117fd6649939e0505847335791be8d7a96e7d273.1523346805.git.asias@scylladb.com>	2018-04-10 13:14:31 +02:00
Asias He	f539e993d3	gossip: Relax generation max difference check start node 1 2 3 shutdown node2 shutdown node1 and node3 start node1 and node3 nodetool removenode node2 clean up all scylla data on node2 bootstrap node2 as a new node I saw node2 could not bootstrap stuck at waiting for schema information to compelte for ever: On node1, node3 [shard 0] gossip - received an invalid gossip generation for peer 127.0.0.2; local generation = 2, received generation = 1521779704 On node2 [shard 0] storage_service - JOINING: waiting for schema information to complete This is becasue in nodetool removenode operation, the generation of node1 was increased from 0 to 2. gossiper::advertise_removing () calls eps.get_heart_beat_state().force_newer_generation_unsafe(); gossiper::advertise_token_removed() calls eps.get_heart_beat_state().force_newer_generation_unsafe(); Each force_newer_generation_unsafe increases the generation by 1. Here is an example, Before nodetool removenode: ``` curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" \| python -mjson.tool { "addrs": "127.0.0.2", "generation": 0, "is_alive": false, "update_time": 1521778757334, "version": 0 }, ``` After nodetool revmoenode: ``` curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" \| python -mjson.tool { "addrs": "127.0.0.2", "application_state": [ { "application_state": 0, "value": "removed,146b52d5-dc94-4e35-b7d4-4f64be0d2672,1522038476246", "version": 214 }, { "application_state": 6, "value": "REMOVER,14ecc9b0-4b88-4ff3-9c96-38505fb4968a", "version": 153 } ], "generation": 2, "is_alive": false, "update_time": 1521779276246, "version": 0 }, ``` In gossiper::apply_state_locally, we have this check: ``` if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) { // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself) logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",ep, local_generation, remote_generation); } ``` to skip the gossip update. To fix, we relax generation max difference check to allow the generation of a removed node. After this patch, the removed node bootstraps successfully. Tests: dtest:update_cluster_layout_tests.py Fixes #3331 Message-Id: <678fb60f6b370d3ca050c768f705a8f2fd4b1287.1522289822.git.asias@scylladb.com>	2018-03-29 12:09:49 +03:00
Duarte Nunes	9cadfb27f1	gms/gossiper: Remove superfluous check Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-19 13:08:53 +00:00
Duarte Nunes	69b28a4f2b	gms/gossiper: Check for shadow round completion before throwing For values of `shadow_round_ms` lower than 1 second, this was assuming failure without checking. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-19 13:08:53 +00:00

1 2 3 4 5 ...

492 Commits