mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-31 12:06:44 +00:00
Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test
The test does:
- Insert data on node1 only
- Insert data on node2 only
- Run repair on node1 and stop node1
once "starting user-requested repair" is seen
The repair shutdown code may wait for the stream session to complete for
a very long time if node 1 finishes sending data to node2 and is waiting
for node2 to send data to it, when node1 is stopped. The stream session
will not be closed in this case until stream session _keep_alive_timeout
(10 minutes) expires. Instead of waiting for the stream_session keep
alive timer to expire, we can fail all the stream sessions during
shutdown.
Before 1 - The bad case (repair shutdown will last for 10 minutes):
INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Executing streaming plan for repair-in
INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Starting streaming to 127.0.0.2
INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Beginning stream session with 127.0.0.2
INFO 2016-09-21 16:23:56,618 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0
INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: stop_gossiping done
INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Thrift server stopped
INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - CQL server stopped
INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - messaging_service stopped
INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: shutdown messaging_service done
INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: auth shutdown
INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: done
INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: stop_transport done
INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Asked to shut down
INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Tracing is down
INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Asked to shut down
INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Tracing is down
INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: tracing is stopped
INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: done
INFO 2016-09-21 16:23:58,669 [shard 0] repair - Starting shutdown of repair
INFO 2016-09-21 16:25:56,624 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] The session 0x600021516c00 made no progress with peer 127.0.0.2
Before 2 - The good case:
INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Executing streaming plan for repair-in
INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Starting streaming to 127.0.0.2
INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Beginning stream session with 127.0.0.2
INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0
INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: stop_gossiping done
INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Thrift server stopped
INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - CQL server stopped
INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
INFO 2016-09-21 16:18:34,155 [shard 0] messaging_service - Retry verb=19 to 127.0.0.2:0, retry=10: rpc::closed_error (connection is closed)
WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] COMPLETE_MESSAGE for 127.0.0.2 has failed: rpc::closed_error (connection is closed)
WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Streaming error occurred
INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Session with 127.0.0.2 is complete, state=FAILED
INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - messaging_service stopped
INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: shutdown messaging_service done
INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] bytes_sent = 0, bytes_received = 245000
WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Stream failed, peers={127.0.0.2}
WARN 2016-09-21 16:18:34,155 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed)
INFO 2016-09-21 16:18:34,155 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed)
INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: auth shutdown
INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: done
INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Drain on shutdown: stop_transport done
INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Asked to shut down
INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Tracing is down
INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Asked to shut down
INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Tracing is down
INFO 2016-09-21 16:18:34,156 [shard 0] storage_service - Drain on shutdown: tracing is stopped
INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: done
INFO 2016-09-21 16:18:34,199 [shard 0] repair - Starting shutdown of repair
INFO 2016-09-21 16:18:34,199 [shard 0] repair - Completed shutdown of repair
INFO 2016-09-21 16:18:34,199 [shard 0] compaction_manager - Asked to stop
INFO 2016-09-21 16:18:34,199 [shard 1] compaction_manager - Asked to stop
After:
INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Executing streaming plan for repair-in
INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Starting streaming to 127.0.0.2
INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Beginning stream session with 127.0.0.2
INFO 2016-09-21 16:06:21,685 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0
INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: stop_gossiping done
INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Thrift server stopped
INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - CQL server stopped
INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - messaging_service stopped
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown messaging_service done
INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Session with 127.0.0.2 is complete, state=FAILED
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - stream_manager stopped
INFO 2016-09-21 16:06:23,688 [shard 1] storage_service - stream_manager stopped
INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] bytes_sent = 0, bytes_received = 25725
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown stream_manager done
WARN 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Stream failed, peers={127.0.0.2}
WARN 2016-09-21 16:06:23,688 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed)
INFO 2016-09-21 16:06:23,688 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed)
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: auth shutdown
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: done
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: stop_transport done
INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Asked to shut down
INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Tracing is down
INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Asked to shut down
INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Tracing is down
INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: tracing is stopped
INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: done
INFO 2016-09-21 16:06:23,774 [shard 0] repair - Starting shutdown of repair
INFO 2016-09-21 16:06:23,774 [shard 0] repair - Completed shutdown of repair
INFO 2016-09-21 16:06:23,774 [shard 0] compaction_manager - Asked to stop
INFO 2016-09-21 16:06:23,774 [shard 1] compaction_manager - Asked to stop