Commit Graph

202 Commits

Author SHA1 Message Date
Asias He
d47d46e1a8 streaming: Use streaming_write_priority for the sstable writer
Use the streaming io priority otherwise it uses the default io priority.

Message-Id: <e1836a9a93e7204d4bc9bba9c841d57c8b24aff8.1533715438.git.asias@scylladb.com>
2018-08-08 11:08:06 +03:00
Asias He
deff5e7d60 streaming: Add rpc streaming support
This patch changes scylla streaming to use the recently added rpc
streaming feature provided by seastar to send mutation fragments for
scylla streaming instead of the rpc verbs.

It also changes the receiver to write to the sstable file directly,
skipping writing to memtable.
2018-07-13 08:36:47 +08:00
Asias He
e20038eb84 streaming: Handle stream_mutation rpc handler on all shards
In streaming, the sender sends the mutations on all the local shards in
parallel, it is possible that the receiver handle more than one such
connection on the same shard. It is determined by where the tcp
connection goes. Current rpc ignores the dest shard id when sending the
rpc message.

For instance, say node1 has 2 shards, node2 has 2 shards. Currently, we
can end up with like this:

   Node 1 shard 0 -> Node 2 shard 1
   Node 1 shard 1 -> Node 2 shard 1

It is better if we do:

   Node 1 shard 0 -> Node 2 shard 0
   Node 1 shard 1 -> Node 2 shard 1

This patch solves this problem by let the handler always handle on
shard = src_cpu_id % smp::count.

If sender and receiver have the same shard config, it is completely
distributed the work evenly.

If sender and receiver do not have the same shard config, it is
unavoidable some of the shard will do more work than the others.

Tests: dtest update_cluster_layout_tests.py

Message-Id: <911827bcf67459a07ec92623a9ed4c4fbba195ca.1524622375.git.asias@scylladb.com>
2018-05-19 21:08:25 +03:00
Asias He
774307b3a7 streaming: Do send failed message for uninitialized session
The uninitialized session has no peer associated with it yet. There is
no point sending the failed message when abort the session. Sending the
failed message in this case will send to a peer with uninitialized
dst_cpu_id which will casue the receiver to pass a bogus shard id to
smp::submit_to which cases segfault.

In addition, to be safe, initialize the dst_cpu_id to zero. So that
uninitialized session will send message to shard zero instead of random
bogus shard id.

Fixes the segfault issue found by
repair_additional_test.py:RepairAdditionalTest.repair_abort_test

Fixes #3115
Message-Id: <9f0f7b44c7d6d8f5c60d6293ab2435dadc3496a9.1515380325.git.asias@scylladb.com>
2018-01-08 15:04:06 +02:00
Raphael S. Carvalho
95d1995876 fix compilation of stream_session.cc
stream_session.cc:417:62: error: cannot call member function ‘utils::UUID streaming::stream_session::plan_id()’ without object
         sslog.warn("[Stream #{}] Failed to send: {}", plan_id(), ep);

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20171214022621.19442-1-raphaelsc@scylladb.com>
2017-12-14 10:57:33 +01:00
Asias He
a9dab60b6c streaming: One cf per time on sender
In the case there are large number of column families, the sender will
send all the column families in parallel. We allow 20% of shard memory
for streaming on the receiver, so each column family will have 1/N, N is
the number of in-flight column families, memory for memtable. Large N
causes a lot of small sstables to be generated.

It is possible there are multiple senders to a single receiver, e.g.,
when a new node joins the cluster, the maximum in-flight column families
is number of peer node. The column families are sent in the order of
cf_id. It is not guaranteed that all peers has the same speed so they
are sending the same cf_id at the same time, though. We still have
chance some of the peers are sending the same cf_id.

Fixes #3065

Message-Id: <46961463c2a5e4f1faff232294dc485ac4f1a04e.1513159678.git.asias@scylladb.com>
2017-12-13 12:32:41 +02:00
Asias He
fad34801bf streaming: Introduce streaming::abort()
It will be used soon by stream_plan::abort() to abort a stream session.
2017-08-30 15:19:50 +08:00
Asias He
7fba7cca01 streaming: Make stream_manager and coordinator message debug level
When we abort a session, it is possible that:

node 1 abort the session by user request
node 1 send the complete_message to node 2
node 2 abort the session upon receive of the complete_message
node 1 sends one more stream message to node 2 and the stream_manager
for the session can not be found.

It is fine for node 2 to not able to find the stream_manager, make the
log on node 2 less verbose to confuse user less.
2017-08-30 15:19:50 +08:00
Asias He
be573bcafb streaming: Check if _stream_result is valid
If on_error() was called before init() was executed, the
_stream_result can be invalid.
2017-08-30 15:19:44 +08:00
Asias He
8a3f6acdd2 streaming: Log peer address in on_error 2017-08-30 15:18:27 +08:00
Asias He
eace5fc6e8 streaming: Introduce received_failed_complete_message
It is the handler for the failed complete message. Add a flag to
remember if we received a such message from peer, if so, do not send
back the failed complete message back to the peer when running
close_session with failed status.
2017-08-30 15:18:27 +08:00
Duarte Nunes
85e85ec72e Don't catch polymorphic exceptions by value
It makes gcc a very sad compiler.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170726172053.5639-2-duarte@scylladb.com>
2017-07-27 09:39:58 +03:00
Asias He
aa87429e67 streaming: Send complete message with failed flag when session is failed
To notify peer node the session is failed.
2017-07-19 10:11:05 +08:00
Asias He
03b838705c streaming: Handle failed flag in complete message
Fail the current session if the failed flag is on in the complete
message handler.
2017-07-19 10:11:05 +08:00
Asias He
12d18cfab4 streaming: Do not fail the session when failed to send complete message
Since the complete message is not mandatary, no point to fail the session
in case failed to send the complete message.
2017-07-19 10:11:04 +08:00
Asias He
ca5248cd58 streaming: Introduce send_failed_complete_message
Currently, send_complete_message is not used. We will use it shortly in
case the local session is failed. Send a complete message with failed
flag to notify peer node that the session is failed so that peer can
close the session. This can speed up the closing of failed session.

Also rename it to send_failed_complete_message.
2017-07-19 10:11:04 +08:00
Asias He
f21cb75cdb streaming: Do not send complete message when session is successful
The complete_message is not needed and the handler of this rpc message
does nothing but returns a ready future. The patch to remove it did not
make into the Scylla 1.0 release so it was left there.
2017-07-18 15:29:42 +08:00
Asias He
0ba4e73068 streaming: Introduce the failed parameter for complete message
Use this flag to notify the peer that the session is failed so that the
peer can close the failed session more quickly.

The flag is used as a rpc::optional so it is compatible use old
version of the verb.
2017-07-18 11:24:31 +08:00
Asias He
7599c1524d streaming: Remove unused session_failed function
It is never used. Get rid of it.
2017-07-18 11:22:09 +08:00
Asias He
caad7ced23 streaming: Less verbose in logging
Now, we will have large number of small streaming. Make the
not very important logging message debug level.
2017-07-18 11:17:09 +08:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Tomasz Grabiec
c1a7e2090e Revert "database: change find_column_families signature so it returns a lw_shared_ptr"
This reverts commit f3528ede65.
2016-11-04 10:48:21 +01:00
Glauber Costa
f3528ede65 database: change find_column_families signature so it returns a lw_shared_ptr
There are places in which we need to use the column family object many
times, with deferring points in between. Because the column family may
have been destroyed in the deferring point, we need to go and find it
again.

If we use lw_shared_ptr, however, we'll be able to at least guarantee
that the object will be alive. Some users will still need to check, if
they want to guarantee that the column family wasn't removed. But others
that only need to make sure we don't access an invalid object will be
able to avoid the cost of re-finding it just fine.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>
2016-11-03 13:27:31 +01:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Asias He
a0020fdad2 stream_session: Allow adding ranges to a cf more than once
Append the ranges to a stream_transfer_task if the cf is already added to
_transfers in add_transfer_ranges.
2016-09-26 06:28:50 +08:00
Duarte Nunes
aaa76d58ba query: Move to_partition_range to dht namespace
This patch moves to_partition_range, from the query namespace
to the dht namespace, where it is a more natural fit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>
2016-07-15 10:41:52 +02:00
Paweł Dziepak
3fe1aec29d streaming: avoid word "ERROR" in non-error messages
Some tools (e.g. ccm) get confused and consider messages containing word
"ERROR" as error level messagess irrespectively of their actual severity
level.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1468399752-5228-1-git-send-email-pdziepak@scylladb.com>
2016-07-13 12:06:33 +03:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
f2ae31711e streaming: inform CF when streaming fails
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
4031c0ed8f streaming: pass plan_id to column family for apply and flush
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.

Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Asias He
94c9211b0e streaming: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session, for
example, the peer node is gone or stopped. Switch to use log level warn
instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <0149d30044e6e4d80732f1a20cd20593de489fc8.1465979288.git.asias@scylladb.com>
2016-06-15 13:01:22 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Asias He
62d443a07d streaming: Fix log of plan_id and session address in stream_session
They are get swapped. Fix it up. Spotted by looking at the log.
Message-Id: <d163d71e9a96d1a45c3a4c529519790eeff7c486.1459172778.git.asias@scylladb.com>
2016-03-29 09:01:06 +03:00
Asias He
6fd6e57e80 streaming: Harden keep alive timer
- Do nothing in case the session is closed, to prevent we fire up the
  timer again

- Print log info when no progress has been made if the time expires, it
  is very useful to debug a idle session

- Grab a reference when the keep alive timer is running

Message-Id: <9f2cc3164696905a6a39c0d072a980765d598dfd.1458782956.git.asias@scylladb.com>
2016-03-24 11:58:54 +02:00
Asias He
fe263e5436 Revert "Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE""
This reverts commit 1f29a698d5.
2016-03-24 08:43:17 +08:00
Asias He
a6dd6e6d55 Revert "Revert "streaming: Simplify session completion logic""
This reverts commit 354fca9d56.
2016-03-24 07:48:27 +08:00
Asias He
c2eff7e824 streaming: Complete receive task after the flush
A STREAM_MUTATION_DONE message will signal the receiver that the sender
has completed the sending of streams mutations. When the receiver finds
it has zero task to send and zero task to receive, it will finish the
stream_session, and in turn finish the stream_plan if all the
stream_sessions are finished. We should call receive_task_completed only
after the flush finishes so that when stream_plan is finshed all the
data is on disk.

Fixes repair_disjoint_data_test issue with Glauber's "[PATCH v4 0/9] Make
sure repairs do not cripple incoming load" serries

======================================================================
FAIL: repair_disjoint_data_test
(repair_additional_test.RepairAdditionalTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "scylla-dtest/repair_additional_test.py",
line 102, in repair_disjoint_data_test
    self.check_rows_on_node(node1, 3000)
  File "scylla-dtest/repair_additional_test.py",
line 33, in check_rows_on_node
    self.assertEqual(len(result), rows, len(result))
AssertionError: 2461
2016-03-23 09:40:49 -04:00
Glauber Costa
5fa866223d streaming: add incoming streaming mutations to a different sstable
Keeping the mutations coming from the streaming process as mutations like any
other have a number of advantages - and that's why we do it.

However, this makes it impossible for Seastar's I/O scheduler to differentiate
between incoming requests from clients, and those who are arriving from peers
in the streaming process.

As a result, if the streaming mutations consume a significant fraction of the
total mutations, and we happen to be using the disk at its limits, we are in no
position to provide any guarantees - defeating the whole purpose of the
scheduler.

To implement that, we'll keep a separate set of memtables that will contain
only streaming mutations. We don't have to do it this way, but doing so
makes life a lot easier. In particular, to write an SSTable, our API requires
(because the filter requires), that a good estimate on the number of partitions
is informed in advance. The partitions also need to be sorted.

We could write mutations directly to disk, but the above conditions couldn't be
met without significant effort. In particular, because mutations can be
arriving from multiple peer nodes, we can't really sort them without keeping a
staging area anyway.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:13:00 -04:00
Pekka Enberg
354fca9d56 Revert "streaming: Simplify session completion logic"
This reverts commit 208b7fa7ba. It breaks
Glauber's upcoming repair series.
2016-03-22 20:37:50 +02:00
Pekka Enberg
1f29a698d5 Revert "streaming: Start to send mutations after PREPARE_DONE_MESSAGE"
This reverts commit 4c06221766. It breaks
Glauber's upcoming repair series.
2016-03-22 20:37:22 +02:00
Asias He
4c06221766 streaming: Start to send mutations after PREPARE_DONE_MESSAGE
Below are 3 possible cases in a stream session, after commit
208b7fa7ba (streaming: Simplify session completion logic) We might
close the session before the exchange of the PREPARE_DONE_MESSAGE
message in case 1). To fix, we defer the sending of mutations after
PREPARE_DONE_MESSAGE is sent at the initiator node.

1)
Initiator         Follower
tx rx              tx rx
1  0               0  1
send prepare
                   send back prepare
recev prepare
send mutations (close the session before prepare_done msg is sent)
                  recv mutations (close session before prepare_done msg is received)
send prepare_done
                   recv prepare_done and send no mutations
2)
Initiator         Follower
tx rx              tx rx
0  1               1  0
send prepare
                    send back prepare
recv prepare
nothing to send
send prepare_done
                    recv prepare_done and send mutations (close session)
recv mutations (close session)

3)
Initiator         Follower
tx rx              tx rx
1  1               1  1
send prepare
                    send back prepare
recv prepare
send mutations
                    recv mutations, can not close session since we have mutations to send
send prepare_done
                    recv prepare_done and send mutations (close session)
recv mutations (close session)
Message-Id: <d6510b558565db23202164fa491b883ef3796e58.1458634037.git.asias@scylladb.com>
2016-03-22 15:05:57 +02:00
Asias He
208b7fa7ba streaming: Simplify session completion logic
Both the initiator and follower of a stream session knows how many
transfer task and receive task the stream session contains in the
preparation phase. They use the _transfers and _receivers map to track
the tasks, like below:

       std::map<UUID, stream_transfer_task> _transfers;
       std::map<UUID, stream_receive_task> _receivers;

A stream_transfer_task will send STREAM_MUTATION verb to transfer data
with frozen_mutation, when all the STREAM_MUTATIONs are sent, it will
send STREAM_MUTATION_DONE to tell the peer the stream_transfer_task is
completed and remove the stream_transfer_task from _transfers map.  The
peer will remove the corresponding stream_receive_task in _receivers.

We do not really need the COMPLETE_MESSAGE verb to notify the peer we
have completed sending. It makes the session completion logic much
simpler and cleaner if we do not depend on COMPLETE_MESSAGE verb.

However, to be compatible with older version, we always send a
COMPLETE_MESSAGE message and do nothing in the COMPLETE_MESSAGE handler
and replies a ready future even if the stream_session is closed already.
This way, node with older version will get a COMPLETE_MESSAGE message
and manage to send a COMPLETE_MESSAGE message to new node as before.

Message-Id: <1458540564-34277-2-git-send-email-asias@scylladb.com>
2016-03-21 16:58:03 +02:00
Glauber Costa
e52b869b25 fix small typo
will sent -> will send

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20eaf0cea6fe14b03332547b7c4a3b85e9b619e7.1458325926.git.glauber@scylladb.com>
2016-03-18 20:34:22 +02:00
Glauber Costa
a3ebf640c6 stream_session: print debug message for STREAM_MUTATION
For this verb(), we don't call get_session - and it doesn't look like we will.
We currently have no debug message for this one, which makes it harder to debug
the stream of messages. Print it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-16 22:09:46 -04:00
Glauber Costa
0ab4275893 stream_session: remove duplicated debug message
Whenever we call get_session, that will print a debug message about the arrival
of this new verb. Because we also print that explicitly in PREPARE_DONE, that
message gets duplicated.

That confuses poor developers who are, for a while, left wondering why is it that
the sender is sender the message twice.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-16 22:04:25 -04:00
Asias He
2d50c71ca3 streaming: Handle cf is deleted after the deletion check
The cf can be deleted after the cf deletion check. Handle this case as
well.

Use "warn" level to log if cf is missing. Although we can handle the
case, but it is good to distingush where the receiver of streaming
applied all the stream mutations or not. We believe that the cf is
missing because it was dropped, but it could be missing because of a bug
or something we didn't anticipated here.

Related patch: "streaming: Handle cf is deleted when sending
STREAM_MUTATION_DONE"

Fixes simple_add_new_node_while_schema_changes_test failure.
Message-Id: <c4497e0500f50e0a3422efb37e73130765c88c57.1458090598.git.asias@scylladb.com>
2016-03-16 09:46:41 +01:00
Asias He
7c4c99d7c7 streaming: Fix a log level in get_column_family_stores
It is supposed to be debug level instead of info level.
2016-03-10 10:56:48 +08:00