Commit Graph

59 Commits

Author SHA1 Message Date
Tomasz Grabiec
fdcaaea91a tablets, streaming: Implement tablet streaming for intra-node migration 2024-05-16 00:28:46 +02:00
Aleksandra Martyniuk
219e1eda09 streaming: handle no_such_column_family from remote node gracefully
If no_such_column_family is thrown on remote node, then streaming
operation fails as the type of exception cannot be determined.

Use repair::with_table_drop_silenced in streaming to continue
operation if a table was dropped.
2024-02-15 12:06:47 +01:00
Kefu Chai
f86a5ae87a streaming: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16947
2024-01-23 19:38:30 +02:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
1d07a596bf everywhere: drop before_change subscription
None of the subscribers is doing anything before_change.
This is done before changing `on_change` in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Benny Halevy
a1acf6854b everywhere: reduce dependencies on i_partitioner.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:47:44 +02:00
Benny Halevy
c16ec870da gms: pass endpoint_state_ptr to endpoint_state change subscribers
Now that the endpoint_state isn't change in place
we do not need to copy it to each subscriber.
We can rather just pass the lw_shared_ptr holding
a snapshot of it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:35:15 +03:00
Benny Halevy
f74d154fe3 gossiper: use permit_id to serialize state changes while preventing deadlocks
Pass permit_id to subscribers when we acquire one
via lock_endpoint.  The subscribers then pass it back to
gossiper for paths that acquire lock_endpoint for
the same endpoint, to detect nested locks when the endpoint
is locked with the same permit_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-01 17:41:57 +03:00
Pavel Emelyanov
678f8fb1b7 stream_manager: Add streaming sched group copy
The manager in question is responsible for maintaining the streaming
class IO bandwidth update. Nowadays it does it via priority manager's
global streaming IO priority class field, but it will need to switch to
streaming sched group.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-23 14:31:23 +03:00
Gleb Natapov
7caf1d26fb migration manager: Make schema pull abortable.
Now which schema pull may issues raft read barrier it may stuck if
majority is not available. Make the operation abortable and abort it
during queries if timeout is reached.
2023-05-11 16:31:23 +03:00
Kefu Chai
a34e417069 streaming: remove unused operator==
since this operator is used nowhere, let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13697
2023-04-28 12:39:17 +03:00
Kefu Chai
f5b05cf981 treewide: use defaulted operator!=() and operator==()
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.

fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.

in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.

sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.

also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.

please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.

this change is a cleanup to modernize the code base with C++20
features.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13687
2023-04-27 10:24:46 +03:00
Asias He
9ed401c4b2 streaming: Add finished percentage metrics for node ops using streaming
We have added the finished percentage for repair based node operations.

This patch adds the finished percentage for node ops using the old
streaming.

Example output:

scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945
scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000

In addition to the metrics, log shows the percentage is added.

[shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646

Fixes #11600

Closes #11601
2022-09-22 14:19:34 +03:00
Benny Halevy
314e45d957 streaming: define plan_id as a strong tagged_uuid type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-22 19:45:30 +03:00
Benny Halevy
3554533e2c stream_manager: update_progress: rename cf_id param to plan_id
Before changing its type to streaming::plan_id
this patch clarifies that the parameter actually represents
the plan id and not the table id as its name suggests.

For reference, see the call to update_progress in
`stream_transfer_task::execute`, as well as the function
using _stream_bytes which map key is the plan id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-21 16:56:41 +03:00
Benny Halevy
c1fc0672a5 streaming: add forward declarations in stream_fwd.hh
To be used for defining streaming::plan_id
in the next patcvh.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-21 16:00:02 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Pavel Emelyanov
96d6be7daf streaming: Maintain class bandwidth
Same as was done in b112a983 for compaction manager

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-19 12:19:56 +03:00
Pavel Emelyanov
a246b6d3eb streaming: Pass db::config& to manager constructor
The stream_manager will bookkeep the streaming bandwidth option, to
subscribe on its changes it needs the config reference. It would be
better if it was stream_manager::config, but currently subscription on
db::config::<stuff> updates is not very shard-friendly, so we need to
carry the config reference itself around.

Similar trouble is there for compaction_manager. The option is passed
through its own config, but the config is created on each shard by
database code. Stream manager config would be created once by main code
on shard 0.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-19 12:18:08 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Pavel Solodovnikov
5dcfb94d5a gms: i_endpoint_state_change_subscriber: make callbacks to return futures
Coroutinize a few simple callbacks in the process.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-01-11 09:29:12 +03:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Pavel Emelyanov
4a34226aa6 streaming, main: Remove global stream_manager
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:37 +03:00
Pavel Emelyanov
8ab96a8362 streaming: Move get_session into stream_manager
This makes the code a bit shorter and helps removing one more call
for global stream manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:59 +03:00
Pavel Emelyanov
228b4520a6 streaming: Use container.invoke_on in rpc handlers
This will help to reduce the usage of global manager instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:59 +03:00
Pavel Emelyanov
c2c676784a streaming: Fix interaction with gossiper
Streaming manager registers itself in gossiper, so it needs an explicit
dependency reference. Also it forgets to unregister itself, so do it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:59 +03:00
Pavel Emelyanov
73e10c7aed streaming: Move start/stop onto common rails
In case of streaming this mostly means dropping the global
init/uninit calls and replacing them with sharded<stream_manager>
instance. It's still global, but it's being fixed atm.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
08818ffe75 streaming: Rename .stop() into .shutdown()
The start/stop standard is becoming like

    sharded<foo> foo;
    foo.start();
    defer([] { foo.stop() });
    foo.invoke_on_all(&foo::start);
    ...
    defer([] { foo.shutdown() });
    wait_for_stop_signal();
    /* quit making the above defers self-unroll */

where .shutdown() for a service would mean "do whatever is
appropriate to start stopping, the real synchronous .stop() will
come some time later".

According to that, rename .stop() as it's really the mentioned
preparation, not real stopping.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
6d7eb76fad streaming: Use get_stream_manager to get dependencies
Currently streaming uses global pointers to save and get a
dependency. Now all the dependencies live on the manager,
this patch changes all the places in streaming/ to get the
needed dependencies from it, not from global pointer (next
patch will remove those globals).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
e448774588 streaming: Move rpc verbs reg/unreg into manager
As a part of streaming start/stop unification.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Pavel Emelyanov
165971fb7f streaming: Initialize stream manager with proper deps
The stream manager is going to become central point of control
for the streaming subsys. This patch makes its dependencies
explicit and prepares the gound for further patching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:15:58 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Asias He
d90836a2d3 streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic
Currently, they increases and decreases as the stream sessions are
created and destroyed. Make them prometheus monotonically increasing
counter for easier monitoring.

Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>
2018-12-30 16:52:17 +02:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Asias He
d6cebd1341 streaming: Listen on shutdown gossip callback
When a node shutdown itself, it will send a shutdown status to peer
nodes. When peer nodes receives the shtudown status update, they are
supposed to close all the sessions with that node becasue the node is
shutdown, no need to wait and timeout, then fail the session.

This change can speed up the closing of sessions.
2017-07-19 10:11:06 +08:00
Vlad Zolotarov
a850bea820 streaming::stream_manager: move a collectd counters registration to the metrics registration layer
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-01-10 16:24:54 -05:00
Asias He
a6d6341627 streaming: Add total_{incoming,outgoing}_bytes collectd metrics
It reflects number of bytes sent or received per second in streaming.

To use it:

$ tools/scyllatop/scyllatop.py "*streaming*"

Refs #1655
Message-Id: <5f7943cb2b459db5ed4bd8d7365532ea201ad2d9.1475116963.git.asias@scylladb.com>
2016-09-29 11:54:32 +02:00
Asias He
f377a3b7ac streaming: Fail streaming sessions during shutdown
Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

The test does:

- Insert data on node1 only
- Insert data on node2 only
- Run repair on node1 and stop node1
  once "starting user-requested repair" is seen

The repair shutdown code may wait for the stream session to complete for
a very long time if node 1 finishes sending data to node2 and is waiting
for node2 to send data to it, when node1 is stopped. The stream session
will not be closed in this case until stream session _keep_alive_timeout
(10 minutes) expires. Instead of waiting for the stream_session keep
alive timer to expire, we can fail all the stream sessions during
shutdown.

Before 1 - The bad case (repair shutdown will last for 10 minutes):

  INFO  2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Executing streaming plan for repair-in
  INFO  2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Starting streaming to 127.0.0.2
  INFO  2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Beginning stream session with 127.0.0.2
  INFO  2016-09-21 16:23:56,618 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0
  INFO  2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: stop_gossiping done
  INFO  2016-09-21 16:23:58,625 [shard 0] storage_service - Thrift server stopped
  INFO  2016-09-21 16:23:58,625 [shard 0] storage_service - CQL server stopped
  INFO  2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
  INFO  2016-09-21 16:23:58,626 [shard 0] storage_service - messaging_service stopped
  INFO  2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: shutdown messaging_service done
  INFO  2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: auth shutdown
  INFO  2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: done
  INFO  2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: stop_transport done
  INFO  2016-09-21 16:23:58,626 [shard 0] tracing - Asked to shut down
  INFO  2016-09-21 16:23:58,626 [shard 0] tracing - Tracing is down
  INFO  2016-09-21 16:23:58,626 [shard 1] tracing - Asked to shut down
  INFO  2016-09-21 16:23:58,626 [shard 1] tracing - Tracing is down
  INFO  2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: tracing is stopped
  INFO  2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: flush column_families done
  INFO  2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
  INFO  2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: done
  INFO  2016-09-21 16:23:58,669 [shard 0] repair - Starting shutdown of repair
  INFO  2016-09-21 16:25:56,624 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] The session 0x600021516c00 made no progress with peer 127.0.0.2

Before 2 - The good case:

  INFO  2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Executing streaming plan for repair-in
  INFO  2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Starting streaming to 127.0.0.2
  INFO  2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Beginning stream session with 127.0.0.2
  INFO  2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0
  INFO  2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: stop_gossiping done
  INFO  2016-09-21 16:18:34,098 [shard 0] storage_service - Thrift server stopped
  INFO  2016-09-21 16:18:34,098 [shard 0] storage_service - CQL server stopped
  INFO  2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
  INFO  2016-09-21 16:18:34,155 [shard 0] messaging_service - Retry verb=19 to 127.0.0.2:0, retry=10: rpc::closed_error (connection is closed)
  WARN  2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] COMPLETE_MESSAGE for 127.0.0.2 has failed: rpc::closed_error (connection is closed)
  WARN  2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Streaming error occurred
  INFO  2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Session with 127.0.0.2 is complete, state=FAILED
  INFO  2016-09-21 16:18:34,155 [shard 0] storage_service - messaging_service stopped
  INFO  2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: shutdown messaging_service done
  INFO  2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] bytes_sent = 0, bytes_received = 245000
  WARN  2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Stream failed, peers={127.0.0.2}
  WARN  2016-09-21 16:18:34,155 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed)
  INFO  2016-09-21 16:18:34,155 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed)
  INFO  2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: auth shutdown
  INFO  2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: done
  INFO  2016-09-21 16:18:34,155 [shard 0] storage_service - Drain on shutdown: stop_transport done
  INFO  2016-09-21 16:18:34,155 [shard 0] tracing - Asked to shut down
  INFO  2016-09-21 16:18:34,155 [shard 0] tracing - Tracing is down
  INFO  2016-09-21 16:18:34,156 [shard 1] tracing - Asked to shut down
  INFO  2016-09-21 16:18:34,156 [shard 1] tracing - Tracing is down
  INFO  2016-09-21 16:18:34,156 [shard 0] storage_service - Drain on shutdown: tracing is stopped
  INFO  2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: flush column_families done
  INFO  2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
  INFO  2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: done
  INFO  2016-09-21 16:18:34,199 [shard 0] repair - Starting shutdown of repair
  INFO  2016-09-21 16:18:34,199 [shard 0] repair - Completed shutdown of repair
  INFO  2016-09-21 16:18:34,199 [shard 0] compaction_manager - Asked to stop
  INFO  2016-09-21 16:18:34,199 [shard 1] compaction_manager - Asked to stop

After:

  INFO  2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Executing streaming plan for repair-in
  INFO  2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Starting streaming to 127.0.0.2
  INFO  2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Beginning stream session with 127.0.0.2
  INFO  2016-09-21 16:06:21,685 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0
  INFO  2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: stop_gossiping done
  INFO  2016-09-21 16:06:23,687 [shard 0] storage_service - Thrift server stopped
  INFO  2016-09-21 16:06:23,687 [shard 0] storage_service - CQL server stopped
  INFO  2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - messaging_service stopped
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown messaging_service done
  INFO  2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Session with 127.0.0.2 is complete, state=FAILED
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - stream_manager stopped
  INFO  2016-09-21 16:06:23,688 [shard 1] storage_service - stream_manager stopped
  INFO  2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] bytes_sent = 0, bytes_received = 25725
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown stream_manager done
  WARN  2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Stream failed, peers={127.0.0.2}
  WARN  2016-09-21 16:06:23,688 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed)
  INFO  2016-09-21 16:06:23,688 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed)
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: auth shutdown
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: done
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: stop_transport done
  INFO  2016-09-21 16:06:23,688 [shard 0] tracing - Asked to shut down
  INFO  2016-09-21 16:06:23,688 [shard 0] tracing - Tracing is down
  INFO  2016-09-21 16:06:23,688 [shard 1] tracing - Asked to shut down
  INFO  2016-09-21 16:06:23,688 [shard 1] tracing - Tracing is down
  INFO  2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: tracing is stopped
  INFO  2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: flush column_families done
  INFO  2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
  INFO  2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: done
  INFO  2016-09-21 16:06:23,774 [shard 0] repair - Starting shutdown of repair
  INFO  2016-09-21 16:06:23,774 [shard 0] repair - Completed shutdown of repair
  INFO  2016-09-21 16:06:23,774 [shard 0] compaction_manager - Asked to stop
  INFO  2016-09-21 16:06:23,774 [shard 1] compaction_manager - Asked to stop
2016-09-26 06:29:40 +08:00
Asias He
2ac4ce77a9 streaming: Introduce has_peer in stream_manager
It is used to query if a streaming peer with inet_address exists.
2016-09-25 07:17:13 +08:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Asias He
1f3928c321 streaming: Hook streaming with gossip callback
If the peer node of a stream_session is restarted or removed we should
abort the streaming. It is better to hook gossip callback in the stream
manager than in each streamm_session.
2016-03-09 07:35:20 +08:00
Asias He
fd5f3cff47 streaming: Fix stream_manager progress api
For each stream_session, we pretend we are sending/receiving one file,
to make it compatible with nodetool. For receiving_files, the file name
is "rxnofile". For sending_files, the file name is "txnofile".

stream_manager::update_all_progress_info is introduced to update the
progress info of all the stream_sessions in the node. We need this
because streaming mutations are received on all the cores, but the
stream_session object is only on one of the cores. It adds overhead if
we update progress info in stream_session object whenever we receive a
streaming mutation. So, what we do now is when we really need the
progress info, we update the progress info in stream_session object.

With http://127.0.0.$i:10000/stream_manager/, it looks like below when
decommission node 3 in a 3 nodes cluster.

=========== GET NODE 1
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.3", "current_bytes": 16876296}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]

=========== GET NODE 2

[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"receiving_files": [{"value": {"direction":
"IN", "file_name": "rxnofile", "session_index": 0, "total_bytes":
16755552, "peer": "127.0.0.3", "current_bytes": 16755552}, "key":
"rxnofile"}], "receiving_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.3", "peer": "127.0.0.3"}]}]

=========== GET NODE 3
[{"plan_id": "935a2cc0-dc6b-11e5-bdbf-000000000000", "description":
"Unbootstrap", "sessions": [{"sending_files": [{"value": {"direction":
"OUT", "file_name": "txnofile", "session_index": 0, "total_bytes":
16876296, "peer": "127.0.0.1", "current_bytes": 16876296}, "key":
"txnofile"}], "sending_summaries": [{"files": 1, "total_size": 0,
"cf_id": "869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0,
"state": "PREPARING", "connecting": "127.0.0.1", "peer":
"127.0.0.1"},{"sending_files": [{"value": {"direction": "OUT",
"file_name": "txnofile", "session_index": 0, "total_bytes": 16755552,
"peer": "127.0.0.2", "current_bytes": 16755552}, "key": "txnofile"}],
"sending_summaries": [{"files": 1, "total_size": 0, "cf_id":
"869d8630-dc6b-11e5-bdbf-000000000000"}], "session_index": 0, "state":
"PREPARING", "connecting": "127.0.0.2", "peer": "127.0.0.2"}]}]
2016-02-26 17:38:37 +08:00
Asias He
9dede89e07 streaming: Add get_progress_on_all_shards for plan_id
Get stream_bytes for a specific plan_id.
2016-02-26 17:38:37 +08:00
Asias He
d146045bc5 Revert "Revert "streaming: Send mutations on all shards""
This brings back streaming on all shards. The bug in
locator/abstract_replication_strategy is now fixed.

This reverts commit 9f3061ade8.

Message-Id: <a79ce9cdd6f4af1c6088b89e1911b4b2ed1c10ae.1455589460.git.asias@scylladb.com>
2016-02-16 11:16:51 +02:00
Avi Kivity
9f3061ade8 Revert "streaming: Send mutations on all shards"
This reverts commit 31d439213c.

Fixes #894.

Conflicts:
    streaming/stream_manager.cc

(may have undone part of 63a5aa6122)
2016-02-09 18:26:14 +02:00
Asias He
31d439213c streaming: Send mutations on all shards
Currently, only the shard where the stream_plan is created on will send
streaing mutations. To utilize all the available cores, we can make each
shard send mutations which it is responsbile for. On the receiver side,
we do not forward the mutations to the shard where the stream_session is
created, so that we can avoid unnecessary forwarding.

Note: the downside is that it is now harder to:

1) to track number of bytes sent and received
2) to update the keep alive timer upon receive of the STREAM_MUTATION

To fix, we now store the sent/recieved bytes info on all shards. When
the keep alive timer expires, we check if any progress has been made.

Hopefully, this patch will make the streaming much faster and in turn
make the repair/decommission/adding a node faster.

Refs: https://github.com/scylladb/scylla/issues/849

Tested with decommission/repair dtest.

Message-Id: <96b419ab11b736a297edd54a0b455ffdc2511ac5.1454645370.git.asias@scylladb.com>
2016-02-07 10:57:51 +02:00
Asias He
c618c699b3 streaming: Increase mutation_send_limiter
The idea behind the current 10 stream_mutations per core limitation is
to avoid streaming overwhelms the TCP connection and starves normal cql
verbs if the streaming mutations are big and takes long time to
complete.

Now that we use a standalone connection for streaming verbs, we can
increase the limitation.

Hopefully, this will fix #849.
2016-02-01 11:01:56 +08:00
Asias He
2f48d402e2 streaming: Remove unused commented code 2016-01-29 16:31:07 +08:00