scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 10:00:35 +00:00

Author	SHA1	Message	Date
Asias He	27cf758f12	streaming: Get rid of the keep alive timer in streaming There is no guarantee that rpc streaming makes progress in some time period. Remove the keep alive timer in streaming to avoid killing the session when the rpc streaming is just slow. The keep alive timer is used to close the session in the following case: n2 (the rpc streaming sender) streams to n1 (the rpc streaming receiver) kill -9 n2 We need this because we do not kill the session when gossip think a node is down, because we think the node down might only be temporary and it is a waste to drop the previous work that has done especially when the stream session takes long time. Since in range_streamer, we do not stream all data in a single stream session, we stream 10% of the data per time, and we have retry logic. I think it is fine to kill a stream session when gossip thinks a node is down. This patch changes to close all stream session with the node that gossip think it is down. Message-Id: <bdbb9486a533eee25fcaf4a23a946629ba946537.1551773823.git.asias@scylladb.com> (cherry picked from commit `b8158dd65d`)	2019-03-20 19:47:11 +01:00
Duarte Nunes	9776a048e7	Merge 'Generating view updates during streaming' from Piotr During streaming, there are cases when we should invoke the view write path. In particular, if we're streaming because of repair or if a view has not yet finished building and we're bootstrapping a new node. The design constraints are: 1) The streamed writes should be visible to new writes, but the sstable should not participate in compaction, or we would lose the ability to exclude the streamed writes on a restart; 2) The streamed writes must not be considered when generating view updates for them; 3) Resilient to node restarts; 4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges. We achieve this by writing the streamed writes to an sstable in a different folder, call it "staging". We achieve 1) by publishing the sstable to the column family sstable set, but excluding it from compactions. We do these steps upon boot, by looking at the staging directory, thus achieving 3). Fixes #3275 * 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits) tests: add materialized views test tests: add view update generator to cql test env main: add registering staging sstables read from disk database: add a check if loaded sstable is already staging database: add get_staging_sstable method streaming: stream tables with views through staging sstables streaming: add system distributed keyspace ref to streaming streaming: add view update generator reference to streaming main: add generating missed mv updates from staging sstables storage_service: move initializing sys_dist_ks before bootstrap db/view: add view_update_from_staging_generator service db/view: add view updating consumer table: add stream_view_replica_updates table: split push_view_replica_updates table: add as_mutation_source_excluding table: move push_view_replica_updates to table.cc database: add populating tables with staging sstables database: add creating /staging directory for sstables database: add sstable-excluding reader table: add move_sstable_from_staging_in_thread function ... (cherry picked from commit `a38f6078fb`)	2018-11-15 17:46:20 +02:00
Asias He	10cf97375e	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com> (cherry picked from commit `7f826d3343`)	2018-11-15 17:45:31 +02:00
Asias He	ad7b132188	Revert "streaming: Do not abort session too early in idle detection" This reverts commit `f792c78c96`. With the "Use range_streamer everywhere" (`7217b7ab36`) series, all the user of streaming now do streaming with relative small ranges and can retry streaming at higher level. Reduce the time-to-recover from 5 hours to 10 minutes per stream session. Even if the 10 minutes idle detection might cause higher false positive, it is fine, since we can retry the "small" stream session anyway. In the long term, we should replace the whole idle detection logic with whenever the stream initiator goes away, the stream slave goes away. Message-Id: <75f308baf25a520d42d884c7ef36f1aecb8a64b0.1520992219.git.asias@scylladb.com>	2018-03-14 10:11:00 +02:00
Asias He	774307b3a7	streaming: Do send failed message for uninitialized session The uninitialized session has no peer associated with it yet. There is no point sending the failed message when abort the session. Sending the failed message in this case will send to a peer with uninitialized dst_cpu_id which will casue the receiver to pass a bogus shard id to smp::submit_to which cases segfault. In addition, to be safe, initialize the dst_cpu_id to zero. So that uninitialized session will send message to shard zero instead of random bogus shard id. Fixes the segfault issue found by repair_additional_test.py:RepairAdditionalTest.repair_abort_test Fixes #3115 Message-Id: <9f0f7b44c7d6d8f5c60d6293ab2435dadc3496a9.1515380325.git.asias@scylladb.com>	2018-01-08 15:04:06 +02:00
Asias He	a9dab60b6c	streaming: One cf per time on sender In the case there are large number of column families, the sender will send all the column families in parallel. We allow 20% of shard memory for streaming on the receiver, so each column family will have 1/N, N is the number of in-flight column families, memory for memtable. Large N causes a lot of small sstables to be generated. It is possible there are multiple senders to a single receiver, e.g., when a new node joins the cluster, the maximum in-flight column families is number of peer node. The column families are sent in the order of cf_id. It is not guaranteed that all peers has the same speed so they are sending the same cf_id at the same time, though. We still have chance some of the peers are sending the same cf_id. Fixes #3065 Message-Id: <46961463c2a5e4f1faff232294dc485ac4f1a04e.1513159678.git.asias@scylladb.com>	2017-12-13 12:32:41 +02:00
Avi Kivity	85a6a2b3cb	streaming: remove unneeded includes	2017-09-12 10:43:39 +03:00
Asias He	fad34801bf	streaming: Introduce streaming::abort() It will be used soon by stream_plan::abort() to abort a stream session.	2017-08-30 15:19:50 +08:00
Asias He	eace5fc6e8	streaming: Introduce received_failed_complete_message It is the handler for the failed complete message. Add a flag to remember if we received a such message from peer, if so, do not send back the failed complete message back to the peer when running close_session with failed status.	2017-08-30 15:18:27 +08:00
Asias He	ca5248cd58	streaming: Introduce send_failed_complete_message Currently, send_complete_message is not used. We will use it shortly in case the local session is failed. Send a complete message with failed flag to notify peer node that the session is failed so that peer can close the session. This can speed up the closing of failed session. Also rename it to send_failed_complete_message.	2017-07-19 10:11:04 +08:00
Asias He	7599c1524d	streaming: Remove unused session_failed function It is never used. Get rid of it.	2017-07-18 11:22:09 +08:00
Asias He	f792c78c96	streaming: Do not abort session too early in idle detection Streaming ususally takes long time to complete. Abort it on false positive idle detection can be very wasteful. Increase the abort timeout from 10 minutes to a very large timeout, 300 minutes. The real idle session will be aborted eventually if other mechanisms, e.g., streaming manager has gossip callback for on_remove and on_restart event to abort, do not abort the session. Fixes #2197 Message-Id: <57f81bfebfdc6f42164de5a84733097c001b394e.1494552921.git.asias@scylladb.com>	2017-05-24 12:29:50 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Paweł Dziepak	f2ae31711e	streaming: inform CF when streaming fails Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	dca9e594cc	streaming: Remove the unused test code It is introduced in the early development of streaming. We have dtest for streaming now, drop it. Message-Id: <1457499303-21163-1-git-send-email-asias@scylladb.com>	2016-03-09 10:31:42 +02:00
Asias He	1f3928c321	streaming: Hook streaming with gossip callback If the peer node of a stream_session is restarted or removed we should abort the streaming. It is better to hook gossip callback in the stream manager than in each streamm_session.	2016-03-09 07:35:20 +08:00
Asias He	50bf65db8d	streaming: Fix keep alive timer progress checking When the first time the keep alive timer fires, the _last_stream_bytes btyes will be zero since it is the first time we update it. The keep alive timer will be rearmed and fired again. The second time, we find there is no progress, we close the session. The total idle time will be 2 * keep alive timer. To make the idle time to close the session be more precise, we reduce the interval to check the progess and close the session by checking last time the progress is made. Message-Id: <c959cffce0cc738a3d73caaf71d2adb709d46863.1456831616.git.asias@scylladb.com>	2016-03-01 16:46:08 +02:00
Asias He	fd5f3cff47	streaming: Fix stream_manager progress api For each stream_session, we pretend we are sending/receiving one file, to make it compatible with nodetool. For receiving_files, the file name is "rxnofile". For sending_files, the file name is "txnofile". stream_manager::update_all_progress_info is introduced to update the progress info of all the stream_sessions in the node. We need this because streaming mutations are received on all the cores, but the stream_session object is only on one of the cores. It adds overhead if we update progress info in stream_session object whenever we receive a streaming mutation. So, what we do now is when we really need the progress info, we update the progress info in stream_session object. With http://127.0.0.$i:10000/stream_manager/, it looks like below when decommission node 3 in a 3 nodes cluster. =========== GET NODE 1 [{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description": "Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction": "IN", "file_name": "rxnofile", "session_index": 0, "total_bytes": 16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key": "rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0, "cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}] =========== GET NODE 2 [{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description": "Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction": "IN", "file_name": "rxnofile", "session_index": 0, "total_bytes": 16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key": "rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0, "cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}] =========== GET NODE 3 [{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description": "Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction": "OUT", "file_name": "txnofile", "session_index": 0, "total_bytes": 16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key": "txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0, "cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state": "PREPARING", "connecting": "127.0.0.1", "peer": "127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT", "file_name": "txnofile", "session_index": 0, "total_bytes": 16755552, "peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0, "cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state": "PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]	2016-02-26 17:38:37 +08:00
Asias He	37f52d632f	streaming: Remove unused progress() function	2016-02-26 17:38:37 +08:00
Asias He	d146045bc5	Revert "Revert "streaming: Send mutations on all shards"" This brings back streaming on all shards. The bug in locator/abstract_replication_strategy is now fixed. This reverts commit `9f3061ade8`. Message-Id: <a79ce9cdd6f4af1c6088b89e1911b4b2ed1c10ae.1455589460.git.asias@scylladb.com>	2016-02-16 11:16:51 +02:00
Avi Kivity	9f3061ade8	Revert "streaming: Send mutations on all shards" This reverts commit `31d439213c`. Fixes #894. Conflicts: streaming/stream_manager.cc (may have undone part of `63a5aa6122`)	2016-02-09 18:26:14 +02:00
Asias He	31d439213c	streaming: Send mutations on all shards Currently, only the shard where the stream_plan is created on will send streaing mutations. To utilize all the available cores, we can make each shard send mutations which it is responsbile for. On the receiver side, we do not forward the mutations to the shard where the stream_session is created, so that we can avoid unnecessary forwarding. Note: the downside is that it is now harder to: 1) to track number of bytes sent and received 2) to update the keep alive timer upon receive of the STREAM_MUTATION To fix, we now store the sent/recieved bytes info on all shards. When the keep alive timer expires, we check if any progress has been made. Hopefully, this patch will make the streaming much faster and in turn make the repair/decommission/adding a node faster. Refs: https://github.com/scylladb/scylla/issues/849 Tested with decommission/repair dtest. Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>	2016-02-07 10:57:51 +02:00
Asias He	360df6089c	streaming: Remove unused stream_session::retry	2016-01-29 16:31:07 +08:00
Asias He	2f48d402e2	streaming: Remove unused commented code	2016-01-29 16:31:07 +08:00
Asias He	ed3da7b04c	streaming: Drop flush_tables option for add_transfer_ranges We do not stream sstable files. No need to flush it.	2016-01-29 16:31:07 +08:00
Asias He	46bec5980b	streaming: Put session_info inside stream_session It is 1:1 mapping between session_info and stream_session. Putting session_info inside stream_session, we can get rid of the stream_coordinator::host_streaming_data class.	2016-01-29 16:31:07 +08:00
Asias He	c4bdb6f782	streaming: Wire up session progress The progress info is needed by JMX api.	2016-01-29 16:31:07 +08:00
Asias He	03aced39c4	streaming: Account number of bytes sent and received per session The API will consume it soon.	2016-01-27 18:16:58 +08:00
Asias He	e8b8b454df	streaming: Flatten streaming messages class namespace There are only two messages: prepare_message and outgoing_file_message. Actually only the prepare_message is the message we send on wire. Flatten the namespace.	2016-01-26 13:04:29 +08:00
Asias He	eba9820b22	streaming: Remove stream_session::file_sent It is the callback after sending file_message_header. In scylla, we do not sent the file_message_header. Drop it.	2016-01-25 17:25:34 +08:00
Asias He	fa4e94aa27	streaming: Get rid of keep_ss_table_level We stream mutation instead of files, so keep_ss_table_level is not relevant for us.	2016-01-25 16:58:57 +08:00
Asias He	2cc31ac977	streaming: Get rid of the stream_index It is always zero.	2016-01-25 16:58:57 +08:00
Asias He	2a04e8d70e	streaming: Drop streaming/messages/incoming_file_message It is not used.	2016-01-25 11:38:13 +08:00
Asias He	bdd6a69af7	streaming: Drop unused parameters - int connections_per_host Scylla does not create connections per stream_session, instead it uses rpc, thus connections_per_host is not relevant to scylla. - bool keep_ss_table_level - int repaired_at Scylla does not stream sstable files. They are not relevant to scylla.	2016-01-25 11:38:13 +08:00
Asias He	767e25a686	streaming: Remove the _handlers helper It is introduced to help to run the invoke_on_all, we can reuse the distributed<database> db for it. Message-Id: <1453283955-23691-1-git-send-email-asias@scylladb.com>	2016-01-20 13:58:44 +02:00
Asias He	2345cda42f	messaging_service: Rename shard_id to msg_addr Use shard_id as the destination of the messaging_service is confusing, since shard_id is used in the context of cpu id. Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>	2016-01-07 10:36:35 +02:00
Asias He	1b3d2dee8f	streaming: Drop src_cpu_id parameter Now that we can get the src_cpu_id from rpc::client_info. No need to pass it as verb parameter.	2015-12-31 11:25:09 +01:00
Asias He	89b79d44de	streaming: Get rid of the _connecting_ parameter messaging_service will use private ip address automatically to connect a peer node if possible. There is no need for the upper level like streaming to worry about it. Drop it simplifies things a bit.	2015-12-31 11:25:08 +01:00
Avi Kivity	827a4d0010	Merge "streaming: Invalidate cache upon receiving of stream" from Asias "When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484."	2015-12-28 10:24:46 +02:00
Asias He	20c258f202	streaming: Fix session hang with maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE The problem is that we set the session state to WAIT_COMPLETE in send_complete_message's continuation, the peer node might send COMPLETE_MESSAGE before we run the continuation, thus we set the wrong status in COMPLETE_MESSAGE's handler and will not close the session. Before: GOT STREAM_MUTATION_DONE receive task_completed SEND COMPLETE_MESSAGE to 127.0.0.2:0 GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0 complete: PREPARING -> WAIT_COMPLETE GOT COMPLETE_MESSAGE Reply maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE After: GOT STREAM_MUTATION_DONE receive task_completed maybe_completed: PREPARING -> WAIT_COMPLETE SEND COMPLETE_MESSAGE to 127.0.0.2:0 GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0 complete: WAIT_COMPLETE -> COMPLETE Session with 127.0.0.2 is complete	2015-12-24 20:34:44 +08:00
Asias He	c971fad618	streaming: Introduce keep alive timer for each stream_session If the session is idle for 10 minutes, close the session. This can detect the following hangs: 1) if the sending node is gone, the receiving peer will wait forever 2) if the node which should send COMPLETE_MESSAGE to the peer node is gone, the peer node will wait forever Fixes simple_kill_streaming_node_while_bootstrapping_test.	2015-12-24 20:34:44 +08:00
Asias He	2d32195c32	streaming: Invalidate cache upon receiving of stream When a node gain or regain responsibility for certain token ranges, streaming will be performed, upon receiving of the stream data, the row cache is invalidated for that range. Refs #484.	2015-12-21 14:44:13 +08:00
Asias He	517fd9edd4	streaming: Add helper to get distributed<database> db	2015-12-21 14:42:47 +08:00

1 2 3

102 Commits