The hander of raft_topology_cmd::command::stream_ranges switches to
streaming scheduling group to perform data streaming in it. It grabs the
group from database db_config, which's not great. There's streaming
manager at hand in storage service handlers, since it's using its
functionality, it should use _its_ scheduling group.
This will help splitting the streaming scheduling group into more
elaborated groups under the maintenance supergroup: SCYLLADB-351
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28363
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.
Most of the patch is generated with
for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done
and a small manual change in test/perf/perf.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26136
Change return type of `check_needs_view_update_path()`. Instead of
retrning bool which tells whether to use staging directory (and register
to `view_update_generator`) or use normal directory.
Now the function returns enum with possible values:
- `normal_directory` - use normal directory for the sstable
- `staging_directly_to_generator` - use staging directory and register
to `view_update_generator`
- `staging_managed_by_vbc` - use staging directory but don't register it
to `view_update_generator` but create view building tasks for
later
The third option is new, it's used when the table has any view which is
in building process currrently. In this case, registering it to `view_update_generator`
prematurely may lead to base-view inconsistency
(for example when a replica is in a pending state).
Continuation of the previous path -- view builder is started early
enough and construction of stream manager can happen with non-sharded
reference on it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The following is observed in pytest:
1) node1, stream master, tried to pull data from node3
2) node3, stream follower, found node1 restarted
3) node3 killed the rpc stream
4) node1 did not get the stream session failure message from node3. This
failure message was supposed to kill the stream plan on node1. That's the
reason node1 failed the stream session much later at "2024-08-19 21:07:45,539".
Note, node3 failed the stream on its side, so it should have sent the stream
session failure message.
```
$ cat node1.log |grep f890bea0-5e68-11ef-99ae-e5bca04385fc
INFO 2024-08-19 20:24:01,162 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers={127.0.34.3}, master
ERROR 2024-08-19 20:24:01,190 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=127.0.34.3: seastar::nested_exception: seastar::rpc::stream_closed (rpc stream was closed by peer) (while cleaning up after seastar::rpc::stream_closed (rpc stream was closed by peer))
WARN 2024-08-19 21:07:45,539 [shard 0:main] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.3}, tx=0 KiB, 0.00 KiB/s, rx=484 KiB, 0.18 KiB/s
$ cat node3.log |grep f890bea0-5e68-11ef-99ae-e5bca04385fc
INFO 2024-08-19 20:24:01,163 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers=127.0.34.1, slave
INFO 2024-08-19 20:24:01,164 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Start sending ks=ks, cf=cf, estimated_partitions=2560, with new rpc streaming
WARN 2024-08-19 20:24:01,187 [shard 0: gms] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.1}, tx=633 KiB, 26506.81 KiB/s, rx=0 KiB, 0.00 KiB/s
WARN 2024-08-19 20:24:01,188 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] stream_transfer_task: Fail to send to 127.0.34.1:0: seastar::rpc::stream_closed (rpc stream was closed by peer)
WARN 2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to send: seastar::rpc::stream_closed (rpc stream was closed by peer)
WARN 2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming error occurred, peer=127.0.34.1
```
To be safe in case the stream fail message is not received, node1 could fail
the stream plan as soon as the rpc stream is aborted in the
stream_mutation_fragments handler.
Fixes#20227Closesscylladb/scylladb#21960
In Scylla there are two options that control IO bandwidth limit -- the /storage_service/(compaction|stream)_throughput REST API endpoints. The endpoints are partially implemented and have no counterparts in the nodetool.
This set implements the missing bits and adds tests for new functionality.
Closesscylladb/scylladb#21877
* github.com:scylladb/scylladb:
nodetool: Implement [gs]etstreamthroughput commands
nodetool: Implement [gs]etcompationthroughput commands
test: Add validation of how IO-updating endpoints work
api: Implement /storage_service/(stream|compaction)_throughput endpoints
api: Disqualify const config reference
api: Implement /storage_service/stream_throughput endpoint
api: Move stream throughput set/get endpoints from storage service block
api: Move set_compaction_throughput_mb_per_sec to config block
util: Include fmt/ranges.h in config_file.hh
The `reader_consumer_v2` type
(`std::function<future<> (mutation_reader)>`) is defined alongside
`mutation_reader` in `mutation_reader.hh`.
before this change, we sometimes use
`std::function<future<> (mutation_reader)>` directly when defining a
consumer parameter or a consumer variable.
in this change, we improve maintainability by:
- Reducing duplicate function type declarations
- Centralizing the consumer type definition
- Making future signature updates easier to implement
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21369
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:
e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"
as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.
The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit
026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"
In turn, flat_mutation_reader was introduced in 2017 in commit
748205ca75 "Introduce flat_mutation_reader"
To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.
Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.
Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.
Some notes about the transition:
- files were also renamed. In one case (flat_mutation_reader_test.cc), the
rename target already existed, so we rename to
mutation_reader_another_test.cc.
- a namespace 'mutation_reader' with two definitions existed (in
mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
class. As a result, a few #includes had to be adjusted.
Closesscylladb/scylladb#19356
If no_such_column_family is thrown on remote node, then streaming
operation fails as the type of exception cannot be determined.
Use repair::with_table_drop_silenced in streaming to continue
operation if a table was dropped.
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.
In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
None of the subscribers is doing anything before_change.
This is done before changing `on_change` in the following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Now that the endpoint_state isn't change in place
we do not need to copy it to each subscriber.
We can rather just pass the lw_shared_ptr holding
a snapshot of it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Pass permit_id to subscribers when we acquire one
via lock_endpoint. The subscribers then pass it back to
gossiper for paths that acquire lock_endpoint for
the same endpoint, to detect nested locks when the endpoint
is locked with the same permit_id.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The manager in question is responsible for maintaining the streaming
class IO bandwidth update. Nowadays it does it via priority manager's
global streaming IO priority class field, but it will need to switch to
streaming sched group.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now which schema pull may issues raft read barrier it may stuck if
majority is not available. Make the operation abortable and abort it
during queries if timeout is reached.
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.
fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.
in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.
sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.
also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.
please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.
this change is a cleanup to modernize the code base with C++20
features.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13687
We have added the finished percentage for repair based node operations.
This patch adds the finished percentage for node ops using the old
streaming.
Example output:
scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945
scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000
In addition to the metrics, log shows the percentage is added.
[shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646
Fixes#11600Closes#11601
Before changing its type to streaming::plan_id
this patch clarifies that the parameter actually represents
the plan id and not the table id as its name suggests.
For reference, see the call to update_progress in
`stream_transfer_task::execute`, as well as the function
using _stream_bytes which map key is the plan id.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.
Fixes#11207
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The stream_manager will bookkeep the streaming bandwidth option, to
subscribe on its changes it needs the config reference. It would be
better if it was stream_manager::config, but currently subscription on
db::config::<stuff> updates is not very shard-friendly, so we need to
carry the config reference itself around.
Similar trouble is there for compaction_manager. The option is passed
through its own config, but the config is created on each shard by
database code. Stream manager config would be created once by main code
on shard 0.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
Streaming manager registers itself in gossiper, so it needs an explicit
dependency reference. Also it forgets to unregister itself, so do it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In case of streaming this mostly means dropping the global
init/uninit calls and replacing them with sharded<stream_manager>
instance. It's still global, but it's being fixed atm.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The start/stop standard is becoming like
sharded<foo> foo;
foo.start();
defer([] { foo.stop() });
foo.invoke_on_all(&foo::start);
...
defer([] { foo.shutdown() });
wait_for_stop_signal();
/* quit making the above defers self-unroll */
where .shutdown() for a service would mean "do whatever is
appropriate to start stopping, the real synchronous .stop() will
come some time later".
According to that, rename .stop() as it's really the mentioned
preparation, not real stopping.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently streaming uses global pointers to save and get a
dependency. Now all the dependencies live on the manager,
this patch changes all the places in streaming/ to get the
needed dependencies from it, not from global pointer (next
patch will remove those globals).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The stream manager is going to become central point of control
for the streaming subsys. This patch makes its dependencies
explicit and prepares the gound for further patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>