scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 00:02:37 +00:00

Author	SHA1	Message	Date
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Pavel Emelyanov	a1ea553fe1	code: Replace distributed<> with sharded<> The latter is recommended in seastar, and the former was left as compatibility alias. Latest seastar explicitly marks it as deprecated so once the submodule is updated, compilation logs will explode. Most of the patch is generated with for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]>') ; do sed -e 's/\<distributed<$[A-Za-z0-9:_]$>/sharded<\1>/g' -i $f; done for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done and a small manual change in test/perf/perf.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26136	2025-09-19 12:22:51 +02:00
Ernest Zaslavsky	d624413ddd	treewide: Move query related files to a new `query` directory As requested in #22120, moved the files and fixed other includes and build system. Moved files: - query.cc - query-request.hh - query-result.hh - query-result-reader.hh - query-result-set.cc - query-result-set.hh - query-result-writer.hh - query_id.hh - query_result_merger.hh Fixes: #22120 This is a cleanup, no need to backport Closes scylladb/scylladb#25105	2025-09-16 23:40:47 +03:00
Andrzej Jackowski	ea2bdae45a	mapreduce: add tablet-aware dispatching algorithm The primary goal of this change is to reduce the time during which the Effective Replication Map (ERM) is retained by the mapreduce service. This ensures that long aggregate queries do not block topology operations. As ScyllaDB transitions towards tablets, which simplify work dispatching, the new algorithm is designed specifically for tablets. The algorithm divides work so that each `tablet_replica` (a <host, shard> pair) processes two tablets at a time. After processing of each `tablet_replica`, the ERM is released and re-acquired. The new algorithm can be summarized as follows: 1. Prepare a set of exclusive `partition_ranges`, where each range represents one tablet. This set is called `ranges_left`, because it contains ranges that still need processing. 2. Loop until `ranges_left` is empty: I. Create `tablet_replica` -> `ranges` mapping for the current ERM and `ranges_left`. Store this mapping and the number representing current ERM version as `ranges_per_replica`. II. In parallel, for each tablet_replica, iterate through ranges_per_tablet_replica. Select independently up to two ranges that are still existing in ranges_left. Remove each range selected for processing from ranges_left. Before each iteration, verify that ERM version has not changed. If it has, return to Step I. Steps I and II are exclusive to simplify maintaining `ranges_left` and `ranges_per_replica`: - Step I iterates through `ranges_left` and creates `ranges_per_replica` - Step II iterates through `ranges_per_replica` and remove processed ranges from `ranges_left` To maintain the exclusivity, the algorithm uses `parallel_for_each` in Step II, requiring all ongoing `tablet_replica` processing to finish before returning to Step I. Currently, each node can handle any partition range, even if the mapreduce supercoordinator does not retain the ERM and the range is absent locally. This is because `execute_on_this_shard` creates a new pager to coordinate the partition range read, including obtaining its own ERM. However, absent ranges are handled by shard 0, so proper routing is necessary to avoid overloading shard 0. Thus, in Step II, the ERM is retained during each `tablet_replica` processing. The tablet split scenario is not well-handled in this implementation. After a split, the entire pre-split range is sent to a node hosting the `tablet_replica` containing the range's `end_token`. The node will typically not have other tablets in the range, and as aforementioned, absent ranges are handled by shard 0. As a result, in such scenario, shard 0 handles a significant portion of the range. This issue is addressed later in this patch series by introducing `shard_id` in `mapreduce_request`. Ref. scylladb#21831	2025-06-25 10:18:02 +02:00
Andrzej Jackowski	9dbb1468b4	mapreduce: remove _shared_token_metadata from mapreduce_service Before this change, `mapreduce_service` used `_shared_token_metadata` to get the topology. However, the token was used in a part of the code that already had its own ERM with its own metadata token. Moreover, as mapreduce_service's token and ERM's token are not guaranteed to be the same, inconsistencies could occur. Therefore, this commit removes `_shared_token_metadata` and its usage.	2025-06-25 08:42:16 +02:00
Andrzej Jackowski	94ce5a0ed6	mapreduce: move dispatching logic to dispatch_to_vnodes This commit moves the current dispatching logic of the mapreduce service to a new dispatch_to_vnodes function. The moved code was written before tablets were introduced, and although it works with tablets, the variable naming still refers to vnodes (e.g., vnodes_per_addr, vnodes_generator). The motivation for this change is that later in this patch series, a new algorithm for tablets is introduced, and both algorithms need to coexist. Ref. scylladb#21831	2025-06-25 08:42:03 +02:00
Andrzej Jackowski	48aced87f5	mapreduce: remove underscores from variable names This commit removes unnecessary underscores from tr_state_ and dispatcher_ variable names, that were left after moving code to a separate function in the previous commit.	2025-06-25 08:41:21 +02:00
Andrzej Jackowski	d238a2f73e	mapreduce: move req_with_modified_pr handling to a new function The motivation for this change is to enable code reuse when a new implementation of the mapreduce algorithm for tablets is introduced later in this patch series. Ref. scylladb#21831	2025-06-25 08:40:02 +02:00
Dawid Mędrek	cd50152522	service/mapreduce_service: Cancel query when stopping Before these changes, shutting down a node could be prolonged because of mapreduce_service. `mapreduce_service::stop()` uninitializes messaging service, which includes waiting for all ongoing RPC handlers. We already had a mechanism for cancelling local mapreduce tasks, but we were missing one for cancelling external queries. In this commit, we modify the signature of the request so it supports cancelling via an abort source. We also provide a reproducer test for the problem. Fixes scylladb/scylladb#22337 Closes scylladb/scylladb#22651	2025-02-10 20:12:59 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Avi Kivity	3fc4e23a36	forward_service: rename to mapreduce_service forward_service is nondescriptive and misnamed, as it does more than forward requests. It's a classic map/reduce algorithm (and in fact one of its parameters is "reducer"), so name it accordingly. The name "forward" leaked into the wire protocol for the messaging service RPC isolation cookie, so it's kept there. It's also maintained in the name of the logger (for "nodetool setlogginglevel") for compatibility with tests. Closes scylladb/scylladb#19444	2024-07-03 19:29:47 +03:00

11 Commits