The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.
Most of the patch is generated with
for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done
and a small manual change in test/perf/perf.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26136
As requested in #22120, moved the files and fixed other includes and build system.
Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh
Fixes: #22120
This is a cleanup, no need to backport
Closesscylladb/scylladb#25105
The primary goal of this change is to reduce the time during which the
Effective Replication Map (ERM) is retained by the mapreduce service.
This ensures that long aggregate queries do not block topology
operations. As ScyllaDB transitions towards tablets, which simplify
work dispatching, the new algorithm is designed specifically for
tablets.
The algorithm divides work so that each `tablet_replica` (a <host,
shard> pair) processes two tablets at a time. After processing of each
`tablet_replica`, the ERM is released and re-acquired.
The new algorithm can be summarized as follows:
1. Prepare a set of exclusive `partition_ranges`, where each range
represents one tablet. This set is called `ranges_left`, because it
contains ranges that still need processing.
2. Loop until `ranges_left` is empty:
I. Create `tablet_replica` -> `ranges` mapping for the current ERM
and `ranges_left`. Store this mapping and the number
representing current ERM version as `ranges_per_replica`.
II. In parallel, for each tablet_replica, iterate through
ranges_per_tablet_replica. Select independently up to two ranges
that are still existing in ranges_left. Remove each range
selected for processing from ranges_left. Before each iteration,
verify that ERM version has not changed. If it has,
return to Step I.
Steps I and II are exclusive to simplify maintaining `ranges_left` and
`ranges_per_replica`:
- Step I iterates through `ranges_left` and creates
`ranges_per_replica`
- Step II iterates through `ranges_per_replica` and remove processed
ranges from `ranges_left`
To maintain the exclusivity, the algorithm uses `parallel_for_each` in
Step II, requiring all ongoing `tablet_replica` processing to finish
before returning to Step I.
Currently, each node can handle any partition range, even if the
mapreduce supercoordinator does not retain the ERM and the range is
absent locally. This is because `execute_on_this_shard` creates a new
pager to coordinate the partition range read, including obtaining its
own ERM. However, absent ranges are handled by shard 0, so proper
routing is necessary to avoid overloading shard 0. Thus, in Step II,
the ERM is retained during each `tablet_replica` processing.
The tablet split scenario is not well-handled in this implementation.
After a split, the entire pre-split range is sent to a node hosting
the `tablet_replica` containing the range's `end_token`. The node
will typically not have other tablets in the range, and as
aforementioned, absent ranges are handled by shard 0. As a result,
in such scenario, shard 0 handles a significant portion of the range.
This issue is addressed later in this patch series by introducing
`shard_id` in `mapreduce_request`.
Ref. scylladb#21831
Before this change, `mapreduce_service` used `_shared_token_metadata`
to get the topology. However, the token was used in a part of the code
that already had its own ERM with its own metadata token. Moreover,
as mapreduce_service's token and ERM's token are not guaranteed to be
the same, inconsistencies could occur.
Therefore, this commit removes `_shared_token_metadata` and its usage.
This commit moves the current dispatching logic of the mapreduce service
to a new dispatch_to_vnodes function. The moved code was written before
tablets were introduced, and although it works with tablets,
the variable naming still refers to vnodes (e.g., vnodes_per_addr,
vnodes_generator).
The motivation for this change is that later in this patch series,
a new algorithm for tablets is introduced, and both algorithms
need to coexist.
Ref. scylladb#21831
This commit removes unnecessary underscores from tr_state_ and
dispatcher_ variable names, that were left after moving code to
a separate function in the previous commit.
The motivation for this change is to enable code reuse when
a new implementation of the mapreduce algorithm for tablets
is introduced later in this patch series.
Ref. scylladb#21831
Before these changes, shutting down a node could be prolonged because of
mapreduce_service. `mapreduce_service::stop()` uninitializes messaging
service, which includes waiting for all ongoing RPC handlers. We already
had a mechanism for cancelling local mapreduce tasks, but we were missing
one for cancelling external queries.
In this commit, we modify the signature of the request so it supports
cancelling via an abort source. We also provide a reproducer test
for the problem.
Fixesscylladb/scylladb#22337Closesscylladb/scylladb#22651
forward_service is nondescriptive and misnamed, as it does more than
forward requests. It's a classic map/reduce algorithm (and in fact one
of its parameters is "reducer"), so name it accordingly.
The name "forward" leaked into the wire protocol for the messaging
service RPC isolation cookie, so it's kept there. It's also maintained
in the name of the logger (for "nodetool setlogginglevel") for
compatibility with tests.
Closesscylladb/scylladb#19444