storage_service: Stop cql server before gossip

We saw failure in dtest concurrent_schema_changes_test.py:
TestConcurrentSchemaChanges.changes_while_node_down_test test.

======================================================================
ERROR: changes_while_node_down_test (concurrent_schema_changes_test.TestConcurrentSchemaChanges)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 432, in changes_while_node_down_test
    self.make_schema_changes(session, namespace='ns2')
  File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 86, in make_schema_changes
    session.execute('USE ks_%s' % namespace)
  File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result()
  File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result
    raise self._final_exception
ConnectionShutdown: Connection to 127.0.0.1 is closed

The test:

   session = self.patient_cql_connection(node2)
   self.prepare_for_changes(session, namespace='ns2')
   node1.stop()
   self.make_schema_changes(session, namespace='ns2') --> ConnectionShutdown exception throws

The problem is that, after receiving the DOWN event, the python
Cassandra driver will call Cluster:on_down which checks if this client
has any connections to the node being shutdown. If there is any
connections, the Cluster:on_down handler will exit early, so the session
to the node being shutdown will not be removed.

If we shutdown the cql server first, the connection count will be zero
and the session will be removed.

Fixes: #4013
Message-Id: <7388f679a7b09ada10afe7e783d7868a58aac6ec.1545634941.git.asias@scylladb.com>
This commit is contained in:
Asias He
2018-12-24 15:02:42 +08:00
committed by Pekka Enberg
parent 2f69ba2844
commit 4d3c463536

View File

@@ -1298,12 +1298,12 @@ future<> storage_service::stop_transport() {
return seastar::async([&ss] {
slogger.info("Stop transport: starts");
gms::stop_gossiping().get();
slogger.info("Stop transport: stop_gossiping done");
ss.shutdown_client_servers().get();
slogger.info("Stop transport: shutdown rpc and cql server done");
gms::stop_gossiping().get();
slogger.info("Stop transport: stop_gossiping done");
ss.do_stop_ms().get();
slogger.info("Stop transport: shutdown messaging_service done");