When repair master and followers have different shard count, the repair
followers need to create multi-shard readers. Each multi-shard reader
will create one local reader on each shard, N (smp::count) local readers
in total.
There is a hard limit on the number of readers who can work in parallel.
When there are more readers than this limit. The readers will start to
evict each other, causing buffers already read from disk to be dropped
and recreating of readers, which is not very efficient.
To optimize and reduce reader eviction overhead, a global reader permit
is introduced which considers the multi-shard reader bloats.
With this patch, at any point in time, the number of readers created by
repair will not exceed the reader limit.
Test Results:
1) with stream sem 10, repair global sem 10, 5 ranges in parallel, n1=2
shards, n2=8 shards, memory wanted =1
1.1)
[asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2)
[2022-11-23 17:45:24,770] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:45:53,869] Repair session 1
[2022-11-23 17:45:53,869] Repair session 1 finished
real 0m30.212s
user 0m1.680s
sys 0m0.222s
1.2)
[asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1)
[2022-11-23 17:46:07,507] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:46:30,608] Repair session 1
[2022-11-23 17:46:30,608] Repair session 1 finished
real 0m24.241s
user 0m1.731s
sys 0m0.213s
2) with stream sem 10, repair global sem no_limit, 5 ranges in
parallel, n1=2 shards, n2=8 shards, memory wanted =1
2.1)
[asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2)
[2022-11-23 17:49:49,301] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:52:01,414] Repair session 1
[2022-11-23 17:52:01,415] Repair session 1 finished
real 2m13.227s
user 0m1.752s
sys 0m0.218s
2.2)
[asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1)
[2022-11-23 17:52:19,280] Starting repair command #1, repairing 1
ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true)
[2022-11-23 17:52:42,387] Repair session 1
[2022-11-23 17:52:42,387] Repair session 1 finished
real 0m24.196s
user 0m1.689s
sys 0m0.184s
Comparing 1.1) and 2.1), it shows the eviction played a major role here.
The patch gives 73s / 30s = 2.5X speed up in this setup.
Comparing 1.1 and 1.2, it shows even if we limit the readers, starting
on the lower shard is faster 30s / 24s = 1.25X (the total number of
multishard readers is lower)
Fixes#12157Closes#12158