Commit Graph

11 Commits

Author SHA1 Message Date
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Pavel Emelyanov
a1ea553fe1 code: Replace distributed<> with sharded<>
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.

Most of the patch is generated with

    for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
    for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done

and a small manual change in test/perf/perf.hh

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26136
2025-09-19 12:22:51 +02:00
Ernest Zaslavsky
d624413ddd treewide: Move query related files to a new query directory
As requested in #22120, moved the files and fixed other includes and build system.

Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh

Fixes: #22120

This is a cleanup, no need to backport

Closes scylladb/scylladb#25105
2025-09-16 23:40:47 +03:00
Andrzej Jackowski
ea2bdae45a mapreduce: add tablet-aware dispatching algorithm
The primary goal of this change is to reduce the time during which the
Effective Replication Map (ERM) is retained by the mapreduce service.
This ensures that long aggregate queries do not block topology
operations. As ScyllaDB transitions towards tablets, which simplify
work dispatching, the new algorithm is designed specifically for
tablets.

The algorithm divides work so that each `tablet_replica` (a <host,
shard> pair) processes two tablets at a time. After processing of each
`tablet_replica`, the ERM is released and re-acquired.

The new algorithm can be summarized as follows:
1. Prepare a set of exclusive `partition_ranges`, where each range
   represents one tablet. This set is called `ranges_left`, because it
   contains ranges that still need processing.
2. Loop until `ranges_left` is empty:
   I.  Create `tablet_replica` -> `ranges` mapping for the current ERM
       and `ranges_left`. Store this mapping and the number
       representing current ERM version as `ranges_per_replica`.
   II. In parallel, for each tablet_replica, iterate through
       ranges_per_tablet_replica. Select independently up to two ranges
       that are still existing in ranges_left. Remove each range
       selected for processing from ranges_left. Before each iteration,
       verify that ERM version has not changed. If it has,
       return to Step I.

Steps I and II are exclusive to simplify maintaining `ranges_left` and
`ranges_per_replica`:
 - Step I iterates through `ranges_left` and creates
   `ranges_per_replica`
 - Step II iterates through `ranges_per_replica` and remove processed
   ranges from `ranges_left`

To maintain the exclusivity, the algorithm uses `parallel_for_each` in
Step II, requiring all ongoing `tablet_replica` processing to finish
before returning to Step I.

Currently, each node can handle any partition range, even if the
mapreduce supercoordinator does not retain the ERM and the range is
absent locally. This is because `execute_on_this_shard` creates a new
pager to coordinate the partition range read, including obtaining its
own ERM. However, absent ranges are handled by shard 0, so proper
routing is necessary to avoid overloading shard 0. Thus, in Step II,
the ERM is retained during each `tablet_replica` processing.

The tablet split scenario is not well-handled in this implementation.
After a split, the entire pre-split range is sent to a node hosting
the `tablet_replica` containing the range's `end_token`. The node
will typically not have other tablets in the range, and as
aforementioned, absent ranges are handled by shard 0. As a result,
in such scenario, shard 0 handles a significant portion of the range.
This issue is addressed later in this patch series by introducing
`shard_id` in `mapreduce_request`.

Ref. scylladb#21831
2025-06-25 10:18:02 +02:00
Andrzej Jackowski
9dbb1468b4 mapreduce: remove _shared_token_metadata from mapreduce_service
Before this change, `mapreduce_service` used `_shared_token_metadata`
to get the topology. However, the token was used in a part of the code
that already had its own ERM with its own metadata token. Moreover,
as mapreduce_service's token and ERM's token are not guaranteed to be
the same, inconsistencies could occur.

Therefore, this commit removes `_shared_token_metadata` and its usage.
2025-06-25 08:42:16 +02:00
Andrzej Jackowski
94ce5a0ed6 mapreduce: move dispatching logic to dispatch_to_vnodes
This commit moves the current dispatching logic of the mapreduce service
to a new dispatch_to_vnodes function. The moved code was written before
tablets were introduced, and although it works with tablets,
the variable naming still refers to vnodes (e.g., vnodes_per_addr,
vnodes_generator).

The motivation for this change is that later in this patch series,
a new algorithm for tablets is introduced, and both algorithms
need to coexist.

Ref. scylladb#21831
2025-06-25 08:42:03 +02:00
Andrzej Jackowski
48aced87f5 mapreduce: remove underscores from variable names
This commit removes unnecessary underscores from tr_state_ and
dispatcher_ variable names, that were left after moving code to
a separate function in the previous commit.
2025-06-25 08:41:21 +02:00
Andrzej Jackowski
d238a2f73e mapreduce: move req_with_modified_pr handling to a new function
The motivation for this change is to enable code reuse when
a new implementation of the mapreduce algorithm for tablets
is introduced later in this patch series.

Ref. scylladb#21831
2025-06-25 08:40:02 +02:00
Dawid Mędrek
cd50152522 service/mapreduce_service: Cancel query when stopping
Before these changes, shutting down a node could be prolonged because of
mapreduce_service. `mapreduce_service::stop()` uninitializes messaging
service, which includes waiting for all ongoing RPC handlers. We already
had a mechanism for cancelling local mapreduce tasks, but we were missing
one for cancelling external queries.

In this commit, we modify the signature of the request so it supports
cancelling via an abort source. We also provide a reproducer test
for the problem.

Fixes scylladb/scylladb#22337

Closes scylladb/scylladb#22651
2025-02-10 20:12:59 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Avi Kivity
3fc4e23a36 forward_service: rename to mapreduce_service
forward_service is nondescriptive and misnamed, as it does more than
forward requests. It's a classic map/reduce algorithm (and in fact one
of its parameters is "reducer"), so name it accordingly.

The name "forward" leaked into the wire protocol for the messaging
service RPC isolation cookie, so it's kept there. It's also maintained
in the name of the logger (for "nodetool setlogginglevel") for
compatibility with tests.

Closes scylladb/scylladb#19444
2024-07-03 19:29:47 +03:00