scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 18:40:38 +00:00

Author	SHA1	Message	Date
Pekka Enberg	815c91a1b8	service/storage_service: Add feature flag for secondary indices	2017-05-04 14:59:11 +03:00
Paweł Dziepak	67ca6959bd	storage_service: add COUNTERS feature	2017-02-02 10:35:14 +00:00
Paweł Dziepak	c66db213d3	storage_service: allow getting local host id without futures<>	2017-02-02 10:35:13 +00:00
Paweł Dziepak	e03868c226	tests: run with all features enabled Since `ce083308a1` "random_mutation_generator: Generate RTs by default" random mutation generator produces range tombstones. However, so far the tests were run with all features disabled (because of incomplete initialization of all services) which meant that RANGE_TOMBSTONE feature was not enabled and the code couldn't handle range tombstones that weren't just prefixes. This patch solves the problem by forcing all features to be enabled when tests are run. Message-Id: <20170116103324.22956-1-pdziepak@scylladb.com>	2017-01-16 11:38:45 +01:00
Nadav Har'El	d49aa7abd2	storage_service: make is_joined() an immediate function Commit `d41cd48a` made the is_joined() method a future<bool> because only cpu 0 knows its real value. This makes this function inconvenient to use. So this patch reverts commit `d41cd48a`, and instead sets this flag's value on all shards, so each shard can read its value locally (and immediately). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20161228160450.5831-1-nyh@scylladb.com>	2016-12-28 18:37:22 +02:00
Duarte Nunes	02bc0d2ab3	create_view_statement: Require MV feature This patch adds the MATERIALIZED_VIEWS_FEATURE to the set of cluster features and requires its presence to allow creating a view. This ensures view schemas can be safely propagated across nodes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Duarte Nunes	22d8aa9bb6	migration_listener: Listen for view schema changes Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Asias He	1f06eedb58	dht: Rename token_range to token_range_endpoints It is a helper class used in storage_service only. Rename it so we can use it for the real dht::token_range.	2016-12-19 08:04:29 +08:00
Duarte Nunes	c0d450c57d	storage_service: get_local_tokens() returns a future This patch changes the get_local_tokens() function in storage_service to return a future instead of requiring running under a seastar::thread. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 11:15:04 +00:00
Avi Kivity	2670e46f3e	storage_service: deinline most methods Most inline methods in storage_service are too large to be inlined, and just increase compile time. De-inline them.	2016-11-12 21:12:28 +02:00
Avi Kivity	767cfb4fe9	storage_service: fix range wrapping in describe_ring even more Commit `8fca1887c2` ("storage_service: fix range wrapping in describe_ring") fixed incorrect range wrapping code for describe_ring, but fails when the number of endpoints for a token is greater than one, because the endpoints are stored in an unordered vector. Fix by comparing the endpoints in a way that ignores their order. Message-Id: <1478460826-15923-1-git-send-email-avi@scylladb.com>	2016-11-07 16:18:20 +01:00
Avi Kivity	8fca1887c2	storage_service: fix range wrapping in describe_ring describe_ring() tries to re-wrap the ranges, but fails because the ranges are not sorted. Adjust the code not to rely on sorting. Message-Id: <1478198630-27483-1-git-send-email-avi@scylladb.com>	2016-11-04 10:48:14 +00:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Duarte Nunes	01ab2081cd	storage_service: Implement get_splits() function This patch implements the get_splits() function in storage_service, used to split a particular token range in slices of approximately the specified size, using the sample keys and estimates of the CF's sstables. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 22:32:08 +02:00
Duarte Nunes	a36888f3cb	storage_service: Convert token through partitioner This patch ensures we use the partitioner to convert a token to sstring instead of casting. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1475179683-28552-1-git-send-email-duarte@scylladb.com>	2016-09-30 10:54:26 +02:00
Asias He	f377a3b7ac	streaming: Fail streaming sessions during shutdown Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test The test does: - Insert data on node1 only - Insert data on node2 only - Run repair on node1 and stop node1 once "starting user-requested repair" is seen The repair shutdown code may wait for the stream session to complete for a very long time if node 1 finishes sending data to node2 and is waiting for node2 to send data to it, when node1 is stopped. The stream session will not be closed in this case until stream session _keep_alive_timeout (10 minutes) expires. Instead of waiting for the stream_session keep alive timer to expire, we can fail all the stream sessions during shutdown. Before 1 - The bad case (repair shutdown will last for 10 minutes): INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:23:56,618 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:23:58,669 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:25:56,624 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] The session 0x600021516c00 made no progress with peer 127.0.0.2 Before 2 - The good case: INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:18:34,155 [shard 0] messaging_service - Retry verb=19 to 127.0.0.2:0, retry=10: rpc::closed_error (connection is closed) WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] COMPLETE_MESSAGE for 127.0.0.2 has failed: rpc::closed_error (connection is closed) WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Streaming error occurred INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Session with 127.0.0.2 is complete, state=FAILED INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] bytes_sent = 0, bytes_received = 245000 WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Stream failed, peers={127.0.0.2} WARN 2016-09-21 16:18:34,155 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) INFO 2016-09-21 16:18:34,155 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed) INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:18:34,156 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:18:34,199 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:18:34,199 [shard 0] repair - Completed shutdown of repair INFO 2016-09-21 16:18:34,199 [shard 0] compaction_manager - Asked to stop INFO 2016-09-21 16:18:34,199 [shard 1] compaction_manager - Asked to stop After: INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:06:21,685 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Session with 127.0.0.2 is complete, state=FAILED INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - stream_manager stopped INFO 2016-09-21 16:06:23,688 [shard 1] storage_service - stream_manager stopped INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] bytes_sent = 0, bytes_received = 25725 INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown stream_manager done WARN 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Stream failed, peers={127.0.0.2} WARN 2016-09-21 16:06:23,688 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) INFO 2016-09-21 16:06:23,688 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed) INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:06:23,774 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:06:23,774 [shard 0] repair - Completed shutdown of repair INFO 2016-09-21 16:06:23,774 [shard 0] compaction_manager - Asked to stop INFO 2016-09-21 16:06:23,774 [shard 1] compaction_manager - Asked to stop	2016-09-26 06:29:40 +08:00
Duarte Nunes	7d1b7e8da3	storage_service: Fix get_range_to_address_map_in_local_dc This patch fixes a couple of bugs in get_range_to_address_map_in_local_dc. Fixes #1517 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1469782666-21320-1-git-send-email-duarte@scylladb.com>	2016-07-29 11:11:07 +02:00
Paweł Dziepak	85c092c56c	storage_service: add LARGE_PARTITIONS_FEATURE Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Pekka Enberg	bcba45f546	Merge "Prevent old node to join new cluster" from Asias Fixes #1253	2016-06-23 10:25:38 +03:00
Asias He	4f3ce42163	storage_service: Prevent old version node to join a new version cluster We want to prevent older version of scylla which has fewer features to join a cluster with newer version of scylla which has more features, because when scylla sees a feature is enabled on all other nodes, it will start to use the feature and assume existing nodes and future nodes will always have this feature. In order to support downgrade during rolling upgrade, we need to support mixed old and new nodes case. 1) All old nodes O O O O O <- N OK O O O O O <- O OK 2) All new nodes N N N N N <- N OK N N N N N <- O FAIL 3) Mixed old and new nodes O N O N O <- N OK O N O N O <- O OK (O == old node, N == new node, <- == joining the cluster) With this patch, I tested: 1.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 1.2) Add old node to old node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} 2.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 2.2) Add old node to new node cluster seastar - Exiting on unhandled exception: std::runtime_error (Feature check failed. This node can not join the cluster because it does not understand the feature. Local node 127.0.0.4 features = {}, Remote common_features = {RANGE_TOMBSTONES}) 3.1) Add new node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {} 3.2) Add old node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} Fixes #1253	2016-06-17 10:49:45 +08:00
Pekka Enberg	d72c608868	service/storage_service: Make do_isolate_on_error() more robust Currently, we only stop the CQL transport server. Extract a stop_transport() function from drain_on_shutdown() and call it from do_isolate_on_error() to also shut down the inter-node RPC transport, Thrift, and other communications services. Fixes #1353	2016-06-16 13:34:09 +03:00
Duarte Nunes	e46537b7d3	storage_service: Include range tombstones feature This patch adds the range tombstones feature, which is not enabled yet, to the storage_service, so that consumers can query for it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Asias He	891e947314	storage_service: Rename remove_node to removenode nodetool uses removenode command to remove a node. Rename the implementation in storage_service to match the command.	2016-04-13 14:53:28 +08:00
Asias He	9ffb95216d	storage_service: Add force_remove_completion It is needed by the $ nodetool removenode force command.	2016-04-13 14:53:28 +08:00
Asias He	7c7e5967f6	storage_service: Add get_removal_status It is needed by the $ nodetool removenode status command.	2016-04-13 14:53:28 +08:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	b710a5f9ee	storage_service: Introduce get_config_supported_features It tells features supported by this local node. When new feature is introduced in scylla, update features returned by get_config_supported_features, e.g., return sstring("FEATURE1,FEATURE2")	2016-04-06 07:12:34 +08:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Asias He	83ffae1568	storage_service: Drop block_until_update_pending_ranges_finished It is a legacy API from c*. Since we can wait for the update_pending_ranges to complete, we can wait for it directly instead of calling block_until_update_pending_ranges_finished to do so. Also, change do_update_pending_ranges to be private. Message-Id: <ac79b2879ec08fdcd3b2278ff68962cc71492f12.1458040608.git.asias@scylladb.com>	2016-03-15 15:18:45 +02:00
Pekka Enberg	917ed4adbe	Merge "verb init/handler for gosisp and storage_service" from Asias "- ignore ack2 msg if gossip is not enabled - move REPLICATION_FINISHED to where it belongs to - add comments for gossip runtime dependency"	2016-03-15 11:12:10 +02:00
Asias He	883d8cb8fd	storage_service: Move REPLICATION_FINISHED verb to storage_service It belongs to storage_service not storage_proxy.	2016-03-15 16:13:22 +08:00
Asias He	d63281b256	storage_service: Update pending ranges when keyspace is changed If a keyspace is created after we calcuate the pending ranges during bootstrap. We will ignore the keyspace in pending ranges when handling write request for that keyspace which will casue data lose if rf = 1. Fixes #1000	2016-03-15 15:41:23 +08:00
Asias He	9f64c36a08	storage_service: Fix pending_range_calculator_service Since calculate_pending_ranges will modify token_metadata, we need to replicate to other shards. With this patch, when we call calculate_pending_ranges, token_metadata will be replciated to other non-zero shards. In addition, it is not useful as a standalone class. We can merge it into the storage_service. Kill one singleton class. Fixes #1033 Refs #962 Message-Id: <fb5b26311cafa4d315eb9e72d823c5ade2ab4bda.1457943074.git.asias@scylladb.com>	2016-03-14 10:14:22 +02:00
Asias He	138c5f5834	storage_service: Do not stop messaging_service more than once If we do - Decommission a node - Stop a node we will shutdown messaging_service more than once in: - storage_service::decommission - storage_service::drain_on_shutdown Fixes #1005 Refs #1013 This fix a dtest failure in debug build. update_cluster_layout_tests.TestUpdateClusterLayout.simple_decommission_node_1_test/ /data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:802:35: runtime error: member call on null pointer of type 'struct future_state' core/future.hh:334:49: runtime error: member access within null pointer of type 'const struct future_state' ASAN:SIGSEGV ================================================================= ==4557==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00000065923e bp 0x7fbf6ffac430 sp 0x7fbf6ffac420 T0) #0 0x65923d in future_state<>::available() const /data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:334 #1 0x41458f1 in future<>::available() /data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:802 #2 0x41458f1 in then_wrapped<parallel_for_each(Iterator, Iterator, Func&&)::<lambda(parallel_for_each_state&)> [with Iterator = std::__detail::_Node_iterator<std::pair<const net::msg_addr, net::messaging_service::shard_info>, false, true>; Func = net::messaging_service::stop()::<lambda(auto:39&)> [with auto:39 = std::unordered_map<net::msg_addr, net::messaging_service::shard_info, net::msg_addr::hash>]::<lambda(std::pair<const net::msg_addr, net::messaging_service::shard_info>&)>]::<lambda(future<>)>, future<> > /data/jenkins/workspace/urchin-dtest/label/monster/mode/debug/scylla/seastar/core/future.hh:878	2016-03-10 10:56:48 +08:00
Vlad Zolotarov	87e6efcdab	storage_service: distribute gossiper::endpoint_state_map together with token_metadata If storage_service::token_metadata is not distributed together with gossiper::endpoint_state_map there may be a situation when a non-zero shard sees a new value in token_metadata (e.g. newly added node's token ranges) while still seeing an old gossiper::endpoint_state_map contents (e.g. a mentioned above newly added node may not be present, thus causing gossiper::is_alive() to return FALSE for that node, while the node is actually alive and kicking). To avoid this discrepancy we will always update a token_metadata together with an endpoint_state_map when we distribute new token_metadata data among shards. Fixes #909 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 13:15:19 +02:00
Pekka Enberg	6d7e14a53a	Merge "Implement describe_schema_versions" from Paweł "This series implements describe_schema_versions so that we nodetool describecluster can return proper schema information for the whole cluster. It involves adding new verb SCHEMA_CHECK which is used to get schema version for a given node and a simple map-reduce that using that verb gets info from the whole cluster. This fixes #677, fixes #684, and fixes #472."	2016-03-02 16:02:53 +02:00
Paweł Dziepak	723b3ae7ed	storage_service: implement describe_schema_versions Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 12:49:55 +00:00
Asias He	a41bcad585	storage_service: Fix run with api lock Start with coarse control: 1) converting the run_with_write_api_lock operations: join_ring, start_gossiping, stop_gossiping, start_rpc_server, stop_rpc_server, start_native_transport, stop_native_transport, decommission, remove_node, drain, move, rebuild to use run_with_api_lock which uses a flag to indicate current operation in progress. If one of the above operation is in progress when admin issues another opeartion we return a "try again" exception to avoid running two operations in parallel. 2) converting the run_with_read_api_lock to use no lock. Fixes #850. Message-Id: <00782b601028ed87437e5decae382f72dff634f6.1456758391.git.asias@scylladb.com>	2016-03-02 11:32:02 +02:00
Asias He	abafec99a5	system_keyspace: Implement increment_and_get_generation	2016-02-29 16:31:42 +08:00
Raphael S. Carvalho	d54c77d5d0	change abstract_replication_strategy::get_ranges to not return wrap-arounds The main motivation behind this change is to make get_ranges() easier for consumers to work with the returned ranges, e.g. binary search to find a range in which a token is contained. In addition, a wrap-around range introduces corner cases, so we should avoid it altogether. Suppose that a node owns three tokens: -5, 6, 8 get_ranges() would return the following ranges: (8, -5], (-5, 6], (6, 8] get_ranges() will now return the following ranges: (-inf, -5], (-5, 6], (6, 8], (8, +inf) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <4bda1428d1ebbe7c8af25aa65119edc5b97bc2eb.1453827605.git.raphaelsc@scylladb.com>	2016-01-27 09:48:31 +01:00
Asias He	b2f2c1c28c	storage_service: Add drain on shutdown logic We register engine().at_exit() callbacks when we initialize the services. We do not really call the callbacks at the moment due to #293. It is pretty hard to see the whole picture in which order the services are shutdown. Instead of for each services to register a at_exit() callbacks, I proposal to have a single at_exit() callback which do the shutdown for all the services. In cassandra, the shutdown work is done in storage_service::drain_on_shutdown callbacks. In this patch, the drain_on_shutdown is executed during shutdown. As a result, the proper gossip shutdown is executed and fixes #790. With this patch, when Ctrl-C on a node, it looks like: INFO [shard 0] storage_service - Drain on shutdown: starts INFO [shard 0] gossip - Announcing shutdown INFO [shard 0] storage_service - Node 127.0.0.1 state jump to normal INFO [shard 0] storage_service - Drain on shutdown: stop_gossiping done INFO [shard 0] storage_service - CQL server stopped INFO [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done INFO [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done INFO [shard 0] storage_service - Drain on shutdown: flush column_families done INFO [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO [shard 0] storage_service - Drain on shutdown: done	2016-01-27 11:45:52 +08:00
Asias He	d6e352706a	storage_service: Drop duplicated print We have done that in the logger.	2016-01-01 10:15:17 +08:00
Amnon Heiman	71905081b1	API: report the load map as an unformatted double In origin the storage_serivce report the load map as a formatted string. As an API a better option is to report the load map as double and let the JMX proxy do the formatting. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-29 11:55:34 +02:00
Amnon Heiman	2c79fe1488	storage_service: describe_ring return full data The describe_ring method in storage_service did not report the start and end tokens. Also for rpc addresses that are not the local address, it returned the value representation (including the version) and not just the adress. Fixes #695 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-28 09:56:12 +02:00
Asias He	f57ba6902b	storage_service: Introduce ring_delay_ms option It is hard-coded as 30 seconds at the moment. Usage: $ scylla --ring-delay-ms 5000 Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.	2015-12-25 15:08:22 +08:00
Paweł Dziepak	8ee1a44720	storage_service: implement get_drain_progress() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-17 14:06:40 +01:00
Asias He	57ee9676c2	storage_service: Fix default ring_delay time It is 30 seconds instead of 5 seconds by default. To align with c*. Pleas note, after this a node will takes at least 30 seconds to complete a bootstrap.	2015-12-11 09:05:19 +02:00

1 2 3 4 5

211 Commits