Commit Graph

918 Commits

Author SHA1 Message Date
Gleb Natapov
491b7232de locator: drop inet_address usage to figure out per dc/rack replication
It allows to correctly calculate replication map even without knowing
IPs of the nodes.
2025-01-02 18:44:19 +02:00
Kefu Chai
6acc5294a4 treewide: migrate from boost::copy_range to std::ranges::to
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::to`.

in this change, we:

- replace `boost::copy_range` to `std::ranges::to`
- remove unused `#include` of boost headers

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21880
2024-12-26 11:46:26 +02:00
Benny Halevy
d1490bb7bf locator/topology: do_sort_by_proximity: shuffle equal-distance replicas
To improve balancing when reading in 1 < CL < ALL

This implementation has a moderate impact on
the function performance in contrast to full
std::shuffle of the vector before stable_sort:ing it
(especially with large number of nodes to sort).

Before:
test                                               iterations      median         mad         min         max      allocs       tasks        inst      cycles
sort_by_proximity_topology.perf_sort_by_proximity    25541973    39.225ns     0.114ns    38.966ns    39.339ns       0.000       0.000       588.5       116.6

After:
sort_by_proximity_topology.perf_sort_by_proximity    19689561    50.195ns     0.119ns    50.076ns    51.145ns       0.000       0.000       622.5       150.6

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-24 13:00:17 +02:00
Benny Halevy
0fe8bdd0db locator/topology: sort_by_proximity: calculate distance only once
And use a temporary vector to use the precalculated distances.
A later patch will add some randomization to shuffle nodes
at the same distance from the reference node.

This improves the function performance by 50% for 3 replicas,
from 77.4 ns to 39.2 ns, larger replica sets show greater improvement
(over 4X for 15 nodes):

Before:
test                                               iterations      median         mad         min         max      allocs       tasks        inst      cycles
sort_by_proximity_topology.perf_sort_by_proximity    12808773    77.368ns     0.062ns    77.300ns    77.873ns       0.000       0.000      1194.2       231.6

After:
sort_by_proximity_topology.perf_sort_by_proximity    25541973    39.225ns     0.114ns    38.966ns    39.339ns       0.000       0.000       588.5       116.6

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-24 12:27:03 +02:00
Benny Halevy
75da99ce8b test/perf: add perf_sort_by_proximity benchmark
benchmark sort_by_proximity

Baseline results on my desktop for sorting 3 nodes:

single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          1
random seed:              20241224

test                                               iterations      median         mad         min         max      allocs       tasks        inst      cycles
sort_by_proximity_topology.perf_sort_by_proximity    12808773    77.368ns     0.062ns    77.300ns    77.873ns       0.000       0.000      1194.2       231.6

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-24 12:18:24 +02:00
Benny Halevy
68b0b442fd locator: refactor sort_by_proximity
Extract can_sort_by_proximity() out so it can be used
later by storage_proxy, and introduce do_sort_by_proximity
that sorts unconditionally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-23 16:42:55 +02:00
Avi Kivity
eb62593f2c treewide: use angle brackets when including seastar headers
We treat Seastar as a "system" library, and those are included
with angle brackets.

Closes scylladb/scylladb#21959
2024-12-20 16:16:28 +02:00
Benny Halevy
67b7015ced test: network_topology_strategy_test: add test_topology_sort_by_proximity
Before further changes are made to sort_by_proximity
add a unit test for it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-19 09:45:02 +02:00
Benny Halevy
1c5b0eca41 locator/topology: retire sort_by_proximity/compare_endpoints for inet_address
Those are not used anymore now that the last call
site for compare_endpoints by inet_address is converted
to use host_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-19 09:44:41 +02:00
Benny Halevy
dcdc60fffd test: test_topology_compare_endpoints: use host_id:s
This is the last call site requiring the compare_endpoints
flavour for inet_address.

Once this test is converted to use host_id:s instead,
compare_endpoints and sort_by_proximity can be simplified
to support only host_id:s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-19 09:44:26 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Avi Kivity
5a849b0a6a Merge "Move more subsystems to use host ids instead of ips" from Gleb
"
This series converts repair, streaming and node_ops (and some parts of
alternator) to work on host ids instead of ips. This allows to remove
a lot of (but not all) functions that work on ips from effective
replication map.

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13830/

Refs: scylladb/scylladb#21777
"

* 'gleb/move-to-host-id-more' of github.com:scylladb/scylla-dev:
  locator: topology: remove no longer use get_all_ips()
  gossiper: change get_unreachable_nodes to host ids
  locator: drop no longer used ip based functions from effective replication map and friends
  test: move network_topology_strategy_test and token_metadata_test to use host id based APIs
  replica/database: drop usage of ip in favor of host id in get_keyspace_local_ranges
  replica/mutation_dump: use host ids instead of ips
  alternator: move ttl to work with host ids instead of ips
  storage_service: move node_ops code to use host ids instead of host ips
  streaming: move streaming code to use host ids instead of host ips
  repair: move repair code to use host ids instead of host ips
  gossiper: add get_unreachable_host_ids() function
  locator: topology: add more function that return host ids to effective replication map
  locator: add more function that return host ids to effective replication map
2024-12-18 13:48:22 +02:00
Botond Dénes
e6447f60c2 Merge 'db,auth,locator: Remove unused member variables' from Kefu Chai
this issue was identified by clang-20.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#21835

* github.com:scylladb/scylladb:
  locator: remove unused member variable
  auth: remove unused member variable
  db: remove unused member variable
2024-12-16 15:16:17 +02:00
Gleb Natapov
6890281486 locator: topology: remove no longer use get_all_ips() 2024-12-15 11:31:11 +02:00
Gleb Natapov
c39474cc7e locator: drop no longer used ip based functions from effective replication map and friends 2024-12-15 11:31:11 +02:00
Gleb Natapov
1751791b53 locator: topology: add more function that return host ids to effective replication map
Add host id functions variants along with those that ip based. We will
need them to move more code to host ids.
2024-12-15 11:31:10 +02:00
Gleb Natapov
3b8345ee44 locator: add more function that return host ids to effective replication map
Add host id functions variants along with those that ip based. We will
need them to move more code to host ids.
2024-12-15 11:16:45 +02:00
Botond Dénes
5880a1b90b Merge 'tasks: add tablet migration virtual task' from Aleksandra Martyniuk
In this change, tablet_virtual_task starts supporting tablet
migration, in addition to tablet repair. Both tablet operations
reuse the same virtual_task because their task data is retrieved
similarly. However, it changes nothing from the task manager
API users' perspective. They can list running migrations or check
their statuses all the same as if migration had its own virtual_task.

Users can see running migration tasks - finished tasks are not
presented with the task manager API. However, the result
of the migration (whether it succeeded or failed) would be
presented to users, if they use wait API.

If a migration was reverted, it will appear to users as failed.
We assume that the migration was reverted, when its destination
does not contain a tablet replica.

Fixes: https://github.com/scylladb/scylladb/issues/21365.

No backport, new feature

Closes scylladb/scylladb#21729

* github.com:scylladb/scylladb:
  test: boost: check migration_task_info in tablet_test.cc
  replica: add repair related fields to tablet_map_to_mutation
  test: add tests to check the failed migration virtual tasks
  test: add tests to check the list of migration virtual tasks
  test: add tests to check migration virtual tasks status
  test: topology_tasks: generalize repair task functions
  service: extend tablet_virtual_task::abort
  service: extend tablet_virtual_task::wait
  service: extend tablet_virtual_task::get_status_helper
  service: extend tablet_virtual_task::contains
  service: extend tablet_virtual_task::get_stats
  service: tasks: make get_table_id a method of virtual_task_hint
  service: tasks: extend virtual_task_hint
  replica: service: add migration_task_info column to system.tablets
  locator: extend tablet_task_info to cover migration tasks
  locator: rename tablet_task_info methods
2024-12-13 10:54:03 +02:00
muthu90tech
e49381119d locator: topology: use node& instead of node*
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.

Fixes scylladb/scylladb#20357

Closes scylladb/scylladb#21863
2024-12-12 13:22:55 +01:00
Aleksandra Martyniuk
9fad3a621a replica: service: add migration_task_info column to system.tablets
Add migration_task_info column to system.tablets. Set migration_task_info
value on migration request if the feature is enabled in the cluster.
Reflect the column content in tablet_metadata.
2024-12-11 12:07:36 +01:00
Aleksandra Martyniuk
332347490c locator: extend tablet_task_info to cover migration tasks 2024-12-11 12:07:36 +01:00
Aleksandra Martyniuk
dee6404aa4 locator: rename tablet_task_info methods 2024-12-11 12:07:36 +01:00
Gleb Natapov
fbfee9666e locator: put real host id into the replication map for everywhere replication strategy
Everywhere replication strategy returns zero host id in replica set instead
of the real one if no tokens are configured yet in token metadata. It
worked because code that translates ids to ips knows that zero host id
is a special one, so putting zero there was equivalent to allow local
access. But now we use host ids directly so we need to return real host
id here to allow local access before token metadata is populated.

Message-ID: <Z1hBHsEo4wYzzgvJ@scylladb.com>
2024-12-10 15:36:00 +02:00
Kefu Chai
259ab6dee7 locator: remove unused member variable
this issue was identified by clang-20:

```
/home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Debug/seastar/gen/include -isystem /usr/include/p11-kit-1 -isystem /home/kefu/dev/scylladb/abseil -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -std=gnu++23 -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -DWITH_GZFILEOP -MD -MT locator/CMakeFiles/scylla_locator.dir/Debug/ec2_multi_region_snitch.cc.o -MF locator/CMakeFiles/scylla_locator.dir/Debug/ec2_multi_region_snitch.cc.o.d -o locator/CMakeFiles/scylla_locator.dir/Debug/ec2_multi_region_snitch.cc.o -c /home/kefu/dev/scylladb/locator/ec2_multi_region_snitch.cc
In file included from /home/kefu/dev/scylladb/locator/ec2_multi_region_snitch.cc:11:
/home/kefu/dev/scylladb/locator/ec2_multi_region_snitch.hh:31:10: error: private field '_broadcast_rpc_address_specified_by_user' is not used [-Werror,-Wunused-private-field]
   31 |     bool _broadcast_rpc_address_specified_by_user;
      |          ^
1 error generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-12-09 10:31:09 +08:00
Tomasz Grabiec
7e2875d648 Merge 'Add tablet merge support' from Raphael Raph Carvalho
The goal of merge is to reduce the tablet count for a shrinking table. Similar to how split increases the count while the table is growing. The load balancer decision to merge is implemented today (came with infrastructure introduced for split), but it wasn't handled until now.

Initial tablet count is respected while the table is in "growing mode". For example, the table leaves it if there was a need to split above the initial tablet count. After the table leaves the mode, the average size can be trusted to determine that the table is shrinking. Merge decision is emitted if the average tablet size is 50% of the target. Hysteresis is applied to avoid oscillations between split and merges.

Similar to split, the decision to merge is recorded in tablet map's resize_type field with the string "merge". This is important in case of coordinator failover, so new coordinator continues from where the old left off.

Unlike split, the preparation phase during merge is not done by the replica (with split compactions), but rather by the coordinator by co-locating sibling tablets in the same node's shard. We can define sibling tablets as tablets that have contiguous range and will become one after merge. The concept is based on the power-of-two constraint and token contiguity. For example, in a table with 4 tablets, tablets of ids 0 and 1 are siblings, 2 and 3 are also siblings.

The algorithm for co-locating sibling tablets is very simple. The balancer is responsible for it, and it will emit migrations so that "odd" tablet will follow the "even" one. For example, tablet 1 will be migrated to where tablet 0 lives. Co-location is low in priority, it's not the end of the world to delay merge, but it's not ideal to delay e.g. decommission or even regular load balancing as that can translate into temporary unbalancing, impacting the user activities. So co-location migrations will happen when there is no more important work to do.
While regular balancing is higher in priority, it will not undo the co-location work done so far. It does that by treating co-located tablets as if they were already merged. The load inversion convergence check was adjusted so balancer understand when two tablets are being migrated instead of one, to avoid oscillations.

When balancer completes co-location work for a table undergoing merge, it will put the id of the table into the resize_plan, which is about communicating with the topology coordinator that a table is ready for it. With all sibling tablets co-located, the coordinator can resize the tablet map (reduce it by a factor of 2) and record the new map into group0. All the replicas will react to it (on token metadata update) by merging the storage (memtable(s) + sstables) of sibling tablets into one.

Fixes #18181.

system test details:

test: https://github.com/pehala/scylla-cluster-tests/blob/tablets_split_merge/tablets_split_merge_test.py
yaml file: https://github.com/pehala/scylla-cluster-tests/blob/tablets_split_merge/test-cases/features/tablets/tablets-split-merge-test.yaml

instance type: i3.8xlarge
nodes: 3
target tablet size: 0.5G (scaled down by 10, to make it easier to trigger splits and merges)
description: multiple cycles of growing and shrinking the data set in order to trigger splits and merges.
data_set_size: ~100G
initial_tablets: 64, so it grew to 128 tablets on split, and back to 64 on merge.

latency of reads and writes that happened in parallel to split and merge:
```
$ for i in scylla-bench*; do cat $i | grep "Mode\|99th:\|99\.9th:"; done
Mode:			 write
  99.9th:	 3.145727ms
  99th:		 1.998847ms
  99.9th:	 3.145727ms
  99th:		 2.031615ms
Mode:			 read
  99.9th:	 3.145727ms
  99th:		 2.031615ms
  99.9th:	 3.145727ms
  99th:		 2.031615ms
Mode:			 write
  99.9th:	 3.047423ms
  99th:		 1.933311ms
  99.9th:	 3.047423ms
  99th:		 1.933311ms
Mode:			 read
  99.9th:	 3.145727ms
  99th:		 1.900543ms
  99.9th:	 3.145727ms
  99th:		 1.900543ms
Mode:			 write
  99.9th:	 5.079039ms
  99th:		 3.604479ms
  99.9th:	 35.389439ms
  99th:		 25.624575ms
Mode:			 write
  99.9th:	 3.047423ms
  99th:		 1.998847ms
  99.9th:	 3.047423ms
  99th:		 1.998847ms
Mode:			 read
  99.9th:	 3.080191ms
  99th:		 2.031615ms
  99.9th:	 3.112959ms
  99th:		 2.031615ms
```

Closes scylladb/scylladb#20572

* github.com:scylladb/scylladb:
  docs: Document tablet merging
  tests/boost: Add test to verify correctness of balancer decisions during merge
  tests/topology_experimental_raft: Add tablet merge test
  service: Handle exception when retrying split
  service: Co-locate sibling tablets for a table undergoing merge
  gms: Add cluster feature for tablet merge
  service: Make merge of resize plan commutative
  replica: Implement merging of compaction groups on merge completion
  replica: Handle tablet merge completion
  service: Implement tablet map resize for merge
  locator: Introduce merge_tablet_info()
  service: Rename topology::transition_state::tablet_split_finalization
  service: Respect initial_tablet_count if table is in growing mode
  service: Wire migration_tablet_set into the load balancer
  locator: Add tablet_map::sibling_tablets()
  service: Introduce sorted_replicas_for_tablet_load()
  locator/tablets: Extend tablet_replica equality comparator to three-way
  service: Introduce alias to per-table candidate map type
  service: Add replication constraint check variant for migration_tablet_set
  service: Add convergence check variant for migration_tablet_set
  service: Add migration helpers for migration_tablet_set
  service/tablet_allocator: Introduce migration_tablet_set
  service: Introduce migration_plan::add(migrations_vector)
  locator/tablets: Introduce tablet_map::for_each_sibling_tablets()
  locator/tablets: Introduce tablet_map::needs_merge()
  locator/tablets: Introduce resize_decision::initial_decision()
  locator/tablets: Fix return type of three-way comparison operators
  service: Extract update of node load on migrations
  service: Extract converge check for intra-node migration
  service: Extract erase of tablet replicas from candidate list
  scripts/tablet-mon: Allow visualization of tablet id
2024-12-06 18:06:20 +01:00
Kefu Chai
9f5e2488dd locator,service: correct the misspellings
these misspellings were identified by codespell. in this change,
they are corrected.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21796
2024-12-06 11:10:51 +02:00
Raphael S. Carvalho
014e1c9a0f locator: Introduce merge_tablet_info()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 21:47:00 -03:00
Raphael S. Carvalho
a5cc6fb297 locator: Add tablet_map::sibling_tablets()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 20:45:20 -03:00
Raphael S. Carvalho
fd6bf7b357 locator/tablets: Extend tablet_replica equality comparator to three-way
Will be needed later for sorting tablet replicas.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 20:45:20 -03:00
Raphael S. Carvalho
3082ff992c locator/tablets: Introduce tablet_map::for_each_sibling_tablets()
Adding interface to iterate through sibling tablets for a given table,
one pair at a time.

Initially I thought of having for_each_sibling_tablet do nothing for single
tablet tables. But later I bumped into complications when wiring it into
load balancer for building candidate list, since single-tablet tables
have to be special cased.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 20:45:20 -03:00
Raphael S. Carvalho
47c8237de0 locator/tablets: Introduce tablet_map::needs_merge()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 20:45:20 -03:00
Raphael S. Carvalho
93990eb162 locator/tablets: Introduce resize_decision::initial_decision()
Know whether resize (e.g. split) decision was needed above initial tablet
count will be helpful for guiding the merge decision, since we don't
want a merge to happen while table is still growing, but hasn't left
the merge threshold yet.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 20:45:20 -03:00
Raphael S. Carvalho
61f694acf5 locator/tablets: Fix return type of three-way comparison operators
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-12-03 20:45:20 -03:00
Avi Kivity
841481c202 Merge "move storage proxy and adjacent services to identify hosts by ids" from Gleb
"
This rather large patch series moves storage proxy and some adjacent
services (like migration manager) to use host ids to identify nodes rather
than ips. Messaging service gains a capability to address nodes by host
ids (which allows dropping translations from topology coordinator code
that worked on host ids already) and also makes sure that a node with
incorrect host id will reject a message (can happen during address
changes).

The series gets rid of the raft address map completely and replaces it with
the gossiper address map which is managed by the gossiper since translation
is now done in the layer below raft.

Fixes: scylladb/scylladb#6403

perf-simple-query -- smp 1 -m 1G output

Before:

enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64336.82 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41291 insns/op,   24485 cycles/op,        0 errors)
62669.58 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41277 insns/op,   24695 cycles/op,        0 errors)
69172.12 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41326 insns/op,   24463 cycles/op,        0 errors)
56706.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41143 insns/op,   24513 cycles/op,        0 errors)
56416.65 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41186 insns/op,   24851 cycles/op,        0 errors)

         throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65
instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80
  cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70

After:

enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
65237.35 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   40733 insns/op,   23145 cycles/op,        0 errors)
59283.09 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40624 insns/op,   23948 cycles/op,        0 errors)
70851.03 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40625 insns/op,   23027 cycles/op,        0 errors)
70549.61 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40650 insns/op,   23266 cycles/op,        0 errors)
68634.96 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40622 insns/op,   22935 cycles/op,        0 errors)

         throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09
instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33
  cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/
SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/

Tested mixed cluster manually.
"

* 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits)
  group0: drop unused field from replace_info struct
  test: rename raft_address_map_test to address_map_test and move if from raft tests
  raft_address_map: remove raft address map
  topology coordinator: do not modify expire state for left/new nodes any more in raft address map
  topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used
  group0: drop raft address map dependency from raft_rpc
  group0: move raft_ticker_type definition from raft_address_map.hh
  storage_service: do not update raft address map on gossiper events
  group0: drop raft address map dependency from raft_server_with_timeouts
  group0: move group0 upgrade code to host ids
  repair: drop raft address map dependency
  group0: remove unused raft address map getter from raft_group0
  group0: drop raft address map from group0_state_machine dependency since it is not used there any more
  group0: remove dependency on raft address map from group0_state_id_handler
  gossiper: add get_application_state_ptr that searches by host_id
  gossiper: change get_live_token_owners to return host ids
  view: move view building to host id
  hints: use host id to send hints
  storage_proxy: remove id_vector_to_addr since it is no longer used
  db: consistency_level: change is_sufficient_live_nodes to work on host ids
  ...
2024-12-03 18:18:48 +02:00
Avi Kivity
b99d4ec055 abstract_replication_strategy.hh: apply pimpl to boost::icl::interval_map
interval_map is a heavyweight header, hide it behind the pimpl idiom
to reduce #include load.

Ref #1
2024-12-03 13:59:45 +01:00
Kefu Chai
4bc7e068ff locator: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21754
2024-12-03 11:05:35 +02:00
Gleb Natapov
eb3d2307ce replication_strategy: move sanity_check_read_replicas to host id
It is called from storage proxy which works on host ids now.
2024-12-02 10:31:12 +02:00
Gleb Natapov
b7402af872 locator: topology: add sort_by_proximity function that works on host ids 2024-12-02 10:31:11 +02:00
Gleb Natapov
0882f2024c locator: topology: make topology object always contain local node
Currently the locator::topology object, when created, does not contain
local node, but it is started to be used to access local database. It
sort of work now because there are explicit checks in the code to handle
this special case like in topology::get_location for instance. We do not
want to hack around it and instead rely on an invariant that the local
node is always there. To do that we add local node during
locator::topology creation. There is a catch though. Unlike with IP host
ID is not known during startup. We actually need to read from the
database to know it, so the topology starts with host ID zero and then
it changes once to the real one. This is not a problem though. As long as
the (one node) topology is consistent (_cfg.this_host_id is equal to the
node's id) local access will work.
2024-12-02 10:31:11 +02:00
Gleb Natapov
9cda32af92 locator: put real host id into the replication map for local replication strategy
Local replication strategy returns zero host id in replica set instead
of the real one. It mostly works now because code that translates ids
to ips knows that zero host id is a special one. But we want to use host
ids directly and we need to return real one (or handle zero special case
everywhere).
2024-12-02 10:31:11 +02:00
Gleb Natapov
faef04e688 replication_strategy: add host id versions of get_natural_endpoints/get_pending_endpoints/get_endpoints_for_reading functions
Those functions will return host ids instead of ips.
2024-12-02 10:31:11 +02:00
Kefu Chai
f436edfa22 mutation: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-29 14:01:44 +08:00
Kefu Chai
a5ee0c896b treewide: migrate from boost::adaptors::filtered to std::views::filter
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.

Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views

Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices

Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
   - `std::ranges::to` adaptor requires rvalue references
   - Necessitated updates to variable and parameter constness
   - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
     from `common` to enable efficient range conversion

2. Range Iteration and Mutation
   - Range views may mutate internal state during iteration
   - Cannot pass ranges by const reference in some scenarios
   - Solution: Pass ranges by rvalue reference to explicitly indicate
     state invalidation

Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
  due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change

This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21648
2024-11-26 14:26:50 +02:00
Nadav Har'El
cb6c55209a Merge 'locator: token_metadata: replace boost range with std range' from Avi Kivity
Reduce dependency load by standardizing on std::ranges. This is a little
involved since a we use a custom iterator.

Code cleanup; no backport.

Closes scylladb/scylladb#21421

* github.com:scylladb/scylladb:
  locator: token_metadata: switch from boost ranges to std ranges
  locator: token_metadata: make iterator support std::input_iterator concept
  locator: tokens_metadata: move tokens_iterator to namespace scope
2024-11-25 14:58:45 +02:00
Tomasz Grabiec
0d2583600d Merge 'Add tablet repair scheduler support' from Asias He
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.

The current repair scheduler implementation does the following:

- A tablet is picked to be repaired when is requested by user

- The tablet repair can be scheduled along with tablet migration and
  rebuild. It runs in the tablet_migration track.

- Repair jobs are scheduled in a smart way so that at any point in time,
  there are no more than configured jobs per shard, which is similar to
  scylla manager's control.

New feature. No backport is needed.

Closes scylladb/scylladb#21088

* github.com:scylladb/scylladb:
  test: Add tests for tablet repair scheduler
  repair: Add restful API for tablet repair
  repair: Add tablet repair scheduler internal API support
  docs: Update system_keyspace.md for tablet repair related info
  docs: Add docs for tablet repair migration
  repair: Add core tablet repair scheduler support
  messaging_service: Introduce TABLET_REPAIR verb
  tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info
  network_topology_strategy: Preserve fields of task_info in reallocate_tablets
2024-11-20 13:28:17 +01:00
Asias He
b71a563030 repair: Add core tablet repair scheduler support
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.

The current repair scheduler implementation does the following:

- A tablet is picked to be repaired when the time since last repair is
  bigger than a threshold (auto repair mode) or it is requested by user
  (manual repair mode)

- The tablet repair can be scheduled along with tablet migration and
  rebuild. It runs in the tablet_migration track.

- Repair jobs are scheduled in a smart way so that at any point in time,
  there are no more than configured jobs per shard, which is similar to
  scylla manager's control.

In this patch, both the manual repair and the auto repair are not
enabled yet.
2024-11-20 09:42:41 +08:00
Kefu Chai
33a0e5b892 treewide: replace boost::find_if with std::ranges::find_if
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::find_if`.

in this change, we:

- replace `boost::find_if` with `std::ranges::find_if`
- remove all `#include <boost/range/algorithm/find_if.hpp>`

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-19 10:50:01 +08:00
Asias He
82a10eca55 tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info
The stream_weight for repair migration is set to 2, because it requires
more work than just moving the tablet around. The stream_weight for all
other migrations are set to 1.
2024-11-19 10:04:41 +08:00
Asias He
c975882e03 network_topology_strategy: Preserve fields of task_info in reallocate_tablets
So other fields will not be dropped when the new tablet is created.
2024-11-19 10:04:41 +08:00
Nadav Har'El
e639434a89 change remaining sstring_view to std::string_view
Our "sstring_view" is an historic alias for the standard std::string_view.
The patch changes the last remaining random uses of this old alias across
our source directory to the standard type name.

After this patch, there are no more uses of the "sstring_view" alias.
It will be removed in the following patch.

Refs #4062.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-11-18 16:48:57 +02:00