Commit Graph

43718 Commits

Author SHA1 Message Date
Yaron Kaikov
8221a178d8 Revert "dist: support nonroot and offline mode for scylla-housekeeping"
This reverts commit c3bea539b6.

Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer

Closes scylladb/scylladb#19976
2024-08-04 10:55:26 +03:00
Piotr Dulikowski
39b49a41cc Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak
Currently when a partition is deleted from the base table, we generate a
row tombstone update for each one of the view rows in the partition.

When the partition key in the view is the same as the base, maybe in a
different order, this can be done more efficiently - The whole corresponding
view partition can be deleted with one partition tombstone update.

With this commit, when generating view updates, if the update mutation has a
partition tombstone then for the views which have the same partition key
we will generate a partition tombstone update, and skip the individual
row tombstone updates.

Fixes scylladb/scylladb#8199

Closes scylladb/scylladb#19338

* github.com:scylladb/scylladb:
  mv: skip reading rows when generating partition tombstone update
  mv: delete a partition in a single operation when applicable
  cql-pytest: move ScyllaMetrics to util file to allow reuse
2024-08-02 11:00:18 +02:00
Avi Kivity
99d0aaa7d2 Merge 'tablets: load_balancer: Improve per-table balance' from Tomasz Grabiec
Tablet load balancer tries to equalize tablet load between shards by
moving tablets. Currently, the tablet load balancer assumes that each
tablet has the same hotness. This may not be true, and some tables may
be hotter than others. If some nodes end up getting more tablets of
the hot table, we can end up with request load imbalance and reduced
performance.

In 79d0711c7e we implemented a
mitigation for the problem by randomly choosing the table whose tablet
replica should be moved. This should improve fairness of
movement. However, this proved to not be enough to get a good
distribution of tablets.

This change improves candidate selection to not relay on randomness
but rather evaluating candidates with respect to the impact on load
imbalance.  Also, if there is no good candidate, we consider picking
other source shards, not the most-loaded one. This is helpful because
when finishing node drain we get just a few candidates per shard, all
of which may belong to a single table, and the destination may already
be overloaded with that table. Another shard may contain tablets of
another table which is not yet overloaded on the destination. And
shards may be of similar load, so it doesn't matter much which shard
we choose to unload.

We also consider other destinations, not the least-loaded one. This
helps when draining nodes and the source node has few shard
candidates. Shards on the destination may have similar load so there
is more than one good destinatin candidate. By limiting ourselves to a
single shard, we increase the chance that we're overload the table on
that shard.

The algorithm was evaluated using "scylla perf-load-balancing", which
simulates a sequeunce of 8 node bootstraps and decommissions for
different node and shard counts, RF, and tablet counts.

For example, for the following parameters:

  params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32}

The results are:

Before:

  Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}}
  Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}}

After:

  Overcommit       : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit       : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}}
  Overcommit       : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}

So worst shard overcommit for table1 was reduced from 4 to 1.5. Overcommit
of 4 means that the most-loaded shard has 4 times more tablets than
the average per-shard load in the cluster.

Also, node overcommit for table1 was reduced from 1.81 to 1.02.

The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution.

The algorithm is not perfect, it finds a local optimum. In the above
test, overcommit of 1.5 is not the best possible (1.25).

One of the reason why the current algorithm doesn't achieve best
distribution is that it works with a single movement at a time and
replication constraints limit the choice of destinations. Viable
destinations for remaining candidates may by only on nodes which are
not least-loaded, and we won't be able to fill the least loaded
node. Doing so would require more complex movement involving moving a
tablet from one of the destination nodes which doesn't have a replica
on the least loaded node and then replacing it with the candidate from
the source node.

Another limitation is that the algorithm can only fix balance by
moving tablets away from most loaded nodes, and it does so due to
imbalance between nodes. So it cannot fix the imbalance which is
already present on the nodes if there is not much to move due to
similar load between nodes. It is designed to not make the imbalance
worse, so it works good if we started in a good shape.

Fixes https://github.com/scylladb/scylladb/issues/16824

Closes scylladb/scylladb#19779

* github.com:scylladb/scylladb:
  test: perf: tablet_load_balancing: Test with higher shard and tablet counts
  tablets: load_balancer: Avoid quadratic complexity when finding best candidate
  tablets: load_balancer: Maintain load sketch properly during intra-node migration
  tablets: load_balancer: Use "drained" flag
  test: perf: tablet_load_balancing: Report load balancer stats
  tablets: load_balancer: Move load_balancer_stats_manager to header file
  tablets: load_balancer: Split evaluate_candidate() into src and dst part
  tablets: load_balancer: Optimize evaluate_candidate()
  tablets: load_balancer: Add more statistics
  tablets: load_balancer: Track load per table on cluster level
  tablets: load_balancer: Track load per table on node level
  tablets: load_balancer: Use a single load sketch for tracking all nodes
  locator: load_sketch: Introduce populate_dc()
  tablets: load_balancer: Modify target load sketch only when emitting migration
  locator: load_sketch: Introduce get_most_loaded_shard()
  locator: load_sketch: Introduce get_least_loaded_shard()
  locator: load_sketch: Optimize pick()/unload()
  locator: load_sketch: Introduce load_type
  test: perf: tablet_load_balancing: Report total tablet counts
  test: perf: tablet_load_balancing: Print run parameters in the single simulation case too
  test: perf: tablet_load_balancing: Report time it took to schedule migrations
  tablets: load_balancer: Log table load stats after each migration
  tablets: load_balancer: Log per-shard load distribution in debug level
  tablets: load_balancer: Improve per-table balance
  tablets: load_balancer: Extract check_convergence()
  tablets: load_balancer: Extract nodes_by_load_cmp
  tablets: load_balancer: Maintain tablet count per table
  tablets: load_balancer: Reuse src_node_info
  test: perf: tablet_load_balancing: Print warnings about bad overcommit
  test: perf: tablet_load_balancing: Allow running a single simulation
  test: perf: tablet_load_balancing: Report best possible shard overcommit
  test: perf: tablet_load_balancing: Report global shard overcommit
2024-08-01 21:12:14 +03:00
Piotr Dulikowski
44f327675d Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov
It's only needed to start hints via proxy, but proxy can do it without gossiper argument

Closes scylladb/scylladb#19894

* github.com:scylladb/scylladb:
  storage_service: Remote gossiper argument from join_cluster()
  proxy: Use remote gossiper to start hints resource manager
  hints: Const-ify gossiper references and anchor pointers
2024-08-01 10:18:14 +02:00
Michael Litvak
c944e28e43 db: fix waiting for counter update operations on table stop
When a table is dropped it should wait for all pending operations in the
table before the table is destroyed, because the operations may use the
table's resources.
With counter update operations, currently this is not the case. The
table may be destroyed while there is a counter update operation in
progress, causing an assert to be triggered due to a resource being
destroyed while it's in use.
The reason the operation is not waited for is a mistake in the lifetime
management of the object representing the write in progress. The commit
fixes it so the object lives for the duration of the entire counter
update operation, by moving it to the `do_with` list.

Fixes scylladb/scylla-enterprise#4475

Closes scylladb/scylladb#19948
2024-08-01 09:39:49 +02:00
Nadav Har'El
5411559a94 test/cql-pytest: test ALLOW FILTERING in intersection of two indexes
A user complained that ScyllaDB is incompatible with Cassandra when it
requires ALLOW FILTERING on a restriction like WHERE x=1 AND y=1 where
x and y are two columns with secondary indexes.

In the tests added in this patch we show that:

1. Scylla *is* compatible with Cassandra when the traditional "CREATE
   INDEX" is used - ALLOW FILTERING *is* required in this case in both
   Cassandra and Scylla.

2. If SAI is used in Cassandra (CREATE CUSTOM INDEX USING 'SAI'),
   indeed ALLOW FILTERING becomes optional. I believe this is incorrect
   so I opened CASSANDRA-19795.

These two tests combined show that we're not incompatible with Cassandra,
rather Cassandra's two index implementations are incompatible between
themselves, and Scylla is in fact compatible in this case with Cassadra's
traditional index and not with SAI.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19909
2024-07-31 14:01:29 +03:00
Laszlo Ersek
e67eb0ccc1 test/sstable: coroutinize do_write_sst()
Make do_write_sst() easier to read by coroutinizing it.

Closes #19803.

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#19937
2024-07-31 13:59:26 +03:00
Kefu Chai
020333fcf1 sstables: fix a typo in comment
s/guranteed/guaranteed/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19946
2024-07-31 13:58:09 +03:00
Tomasz Grabiec
28de5231f4 test: perf: tablet_load_balancing: Test with higher shard and tablet counts
We have up to 200 shards in production, so test this to catch
performance issues.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
19b7fb3a4d tablets: load_balancer: Avoid quadratic complexity when finding best candidate
If the source and destination shards picked for migration based on
global tablet balance do not have a good candidate in terms of effect
on per-table balance, the algorithm explores other source shards and
destinations. This has quadratic complexity in terms of shard count in
the worst case, when there are no good candidates.

Since we can have up to ~200 shards, this can slow down scheduling
significantly. I saw total scheduling time of 5 min in the following run:

 scylla perf-load-balancing -c1 -m1G  --iterations=8 \
    --nodes=4 --tablets1=1024 --tablets2=8096 \
    --rf1=2 --rf2=3 --shards=256

To improve, change the apprach to first find the best source shard and
then best target shard, sequentially. So it's now linear in terms of
shard count.

After the change, the total scheduling time in that run is down to 4s.

Minimizing source and destination metrics piece-wise minimizes the
combined metric, so badness of the best candidate doesn't suffer after
this change.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
93df82032f tablets: load_balancer: Maintain load sketch properly during intra-node migration
Affects only intra-node migration. The code was recording destination
shard as taken and did not un-take it in case we skipped the migration
due to lack of candidates.

Noticed during code review. Impact is minor, since even if this leads
to suboptimal balance, the next scheduling round should fix it.

Also, the source shard was not unloaded, but that should have no
impact on decisions. But to be future-proof, better to maintain the
load accurately in case the algorithm is extended with more steps.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
88988ce0db tablets: load_balancer: Use "drained" flag
Cleanup / optimization.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
56801b7cb7 test: perf: tablet_load_balancing: Report load balancer stats 2024-07-31 12:57:15 +02:00
Tomasz Grabiec
90c9934099 tablets: load_balancer: Move load_balancer_stats_manager to header file
So that stats can be accessed outside tablet allocator.
2024-07-31 12:57:15 +02:00
Anna Stuchlik
ae28880fc8 doc: enable publishing docs for branch-6.1
This commit enables publishing documentation from branch-6.1. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed).

Fixes https://github.com/scylladb/scylladb/issues/19926

No backport is required.

Closes scylladb/scylladb#19931
2024-07-31 12:48:51 +02:00
Kamil Braun
c05e077a13 Merge 'raft: fix the shutdown phase being stuck' from Emil Maskovsky
Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it.

This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered.

This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100):
* TestRepairAdditional::test_repair_kill_1
* TestRepairAdditional::test_repair_kill_3

Fixes scylladb/scylladb#19223

Closes scylladb/scylladb#19860

* github.com:scylladb/scylladb:
  raft: fix the shutdown phase being stuck
  raft: use the abort source reference in raft group0 client interface
2024-07-31 12:10:30 +02:00
Tomasz Grabiec
94cce4b7d3 tablets: load_balancer: Split evaluate_candidate() into src and dst part
Those parts will be used separately later.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
4df2abe47a tablets: load_balancer: Optimize evaluate_candidate()
Moves load computation out of the hot path by relying on
data structures maintained globally during plan making.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
5e7facd543 tablets: load_balancer: Add more statistics 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
be055977c9 tablets: load_balancer: Track load per table on cluster level 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
81fcee2040 tablets: load_balancer: Track load per table on node level 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
e7ef7419dc tablets: load_balancer: Use a single load sketch for tracking all nodes
This is code simplification and optimization.

Avoids multiple passes of tablet metadata to consturct load sketch for
each target node.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
352b8e0ddd locator: load_sketch: Introduce populate_dc() 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
9a7afd334b tablets: load_balancer: Modify target load sketch only when emitting migration
This avoids the need to unpick() a replica when the candidate is not
selected. Optimization.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
b78657ce7d locator: load_sketch: Introduce get_most_loaded_shard() 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
de404471b7 locator: load_sketch: Introduce get_least_loaded_shard() 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
8fbfd595bb locator: load_sketch: Optimize pick()/unload()
They are executed frequently during tablet scheduling. Currently, they
have time complexity of O(N*log(N)) in terms of shard count. With
large shard counts, that has significant overhead.

This patch optimizes them down to O(log(N)).
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
d0b0f95849 locator: load_sketch: Introduce load_type 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
8f3b623144 test: perf: tablet_load_balancing: Report total tablet counts 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
662a0ff038 test: perf: tablet_load_balancing: Print run parameters in the single simulation case too 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
a040404875 test: perf: tablet_load_balancing: Report time it took to schedule migrations 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
ae7fd80554 tablets: load_balancer: Log table load stats after each migration 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
b8996a0f59 tablets: load_balancer: Log per-shard load distribution in debug level 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
469e2f3f90 tablets: load_balancer: Improve per-table balance
Tablet load balancer tries to equalize tablet load between shards by
moving tablets. Currently, the tablet load balancer assumes that each
tablet has the same hotness. This may not be true, and some tables may
be hotter than others. If some nodes end up getting more tablets of
the hot table, we can end up with request load imbalance and reduced
performance.

In 79d0711c7e we implemented a
mitigation for the problem by randomly choosing the table whose tablet
replica should be moved. This should improve fairness of
movement. However, this proved to not be enough to get a good
distribution of tablets.

This change improves candidate selection to not relay on randomness
but rather evaluating candidates with respect to the impact on load
imbalance.  Also, if there is no good candidate, we consider picking
other source shards, not the most-loaded one. This is helpful because
when finishing node drain we get just a few candidates per shard, all
of which may belong to a single table, and the destination may already
be overloaded with that table. Another shard may contain tablets of
another table which is not yet overloaded on the destination. And
shards may be of similar load, so it doesn't matter much which shard
we choose to unload.

We also consider other destinations, not the least-loaded one. This
helps when draining nodes and the source node has few shard
candidates. Shards on the destination may have similar load so there
is more than one good destinatin candidate. By limiting ourselves to a
single shard, we increase the chance that we're overload the table on
that shard.

The algorithm was evaluated using "scylla perf-load-balancing", which
simulates a sequeunce of 8 node bootstraps and decommissions for
different node and shard counts, RF, and tablet counts.

For example, for the following parameters:

  params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32}

The results are:

After:

  Overcommit       : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit       : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}}
  Overcommit       : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}

Before:

  Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}}
  Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}}

So shard overcommit for table1 was reduced from 4 to 1.5. Overcommit
of 4 means that the most-loaded shard has 4 times more tablets than
the average per-shard load in the cluster.

Also, node overcommit for table1 was reduced from 1.81 to 1.02.

The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution.

The algorithm is not perfect, it finds a local optimum. In the above
test, overcommit of 1.5 is not the best possible (1.25).

One of the reason why the current algorithm doesn't achieve best
distribution is that it works with a single movement at a time and
replication constraints limit the choice of destinations. Viable
destinations for remaining candidates may by only on nodes which are
not least-loaded, and we won't be able to fill the least loaded
node. Doing so would require more complex movement involving moving a
tablet from one of the destination nodes which doesn't have a replica
on the least loaded node and then replacing it with the candidate from
the source node.

Another limitation is that the algorithm can only fix balance by
moving tablets away from most loaded nodes, and it does so due to
imbalance between nodes. So it cannot fix the imbalance which is
already present on the nodes if there is not much to move due to
similar load between nodes. It is designed to not make the imbalance
worse, so it works good if we started in a good shape.

Fixes #16824
2024-07-31 11:38:16 +02:00
Tomasz Grabiec
b7661aa6c9 tablets: load_balancer: Extract check_convergence()
Will be reused when evaluating different targets for migration in later
stages.

The refactoring drops updating of _stats.for_dc(dc).stop_no_candidates
and we update _stats.for_dc(dc).stop_load_inversion in both cases
where convergence check may fail. The reason is that stat updates must
be outside check_convergence(), since the new use case should not
update those stats (it doesn't stop balancing, just drops
candidates). Propagating the information for distinguishing the two
cases would be a burden. But it's not necessary, since both cases are
actually load inversion cases, one pre-migration the other
post-migration, so we don't need the distinction.

It's actually wrong to increment stop_no_candidates, since there may
still be candidates, it's the load which is inverted.
2024-07-31 11:26:11 +02:00
Tomasz Grabiec
41e643ddb9 tablets: load_balancer: Extract nodes_by_load_cmp
Will be reused in a different place.
2024-07-31 11:26:11 +02:00
Tomasz Grabiec
8a7257971d tablets: load_balancer: Maintain tablet count per table 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
4e4f13ac9d tablets: load_balancer: Reuse src_node_info 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
71b8d6b7aa test: perf: tablet_load_balancing: Print warnings about bad overcommit 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
0d50a028a5 test: perf: tablet_load_balancing: Allow running a single simulation 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
3f3660c3fe test: perf: tablet_load_balancing: Report best possible shard overcommit 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
c89a320925 test: perf: tablet_load_balancing: Report global shard overcommit
Rather than maximum per-node shard overcommit. Global shard overcommit
is a better metric since we want to equalize global load not just
per-node load.
2024-07-31 11:26:11 +02:00
Emil Maskovsky
5dfc50d354 raft: fix the shutdown phase being stuck
Some of the calls inside the `raft_group0_client::start_operation()`
method were missing the abort source parameter. This caused the repair
test to be stuck in the shutdown phase - the abort source has been
triggered, but the operations were not checking it.

This was in particular the case of operations that try to take the
ownership of the raft group semaphore (`get_units(semaphore)`) - these
waits should be cancelled when the abort source is triggered.

This should fix the following tests that were failing in some percentage
of dtest runs (about 1-3 of 100):
* TestRepairAdditional::test_repair_kill_1
* TestRepairAdditional::test_repair_kill_3

Fixes scylladb/scylladb#19223
2024-07-31 09:18:54 +02:00
Emil Maskovsky
2dbe9ef2f2 raft: use the abort source reference in raft group0 client interface
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
2024-07-31 09:18:54 +02:00
Pavel Emelyanov
9214aecbe7 storage_service: Remove orphan forward declaration of a method
The start_sys_dist_ks() itself was removed by bc051387c5

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19928
2024-07-30 16:17:49 +03:00
Benny Halevy
e58ca8c44b service_level_controller: stop: always call subscription on_abort
We want to call `service_level_controller::do_abort()` in all cases.
The current code (introduced in
535e5f4ae7)
calls do_abort if abort was not requested, however, since
it does so by checking the subscription bool operator,
it would miss the case where abort was already requested
before the subscription took place (in service_level_controller
ctor).

With scylladb/seastar@470b539b1c and
scylladb/seastar@8ecce18c51
we can just unconditionally call the subscription `on_abort`
method, that ensures only-once semantics, even if abort
was already requested at subscription time.

Fixes scylladb/scylladb#19075

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19929
2024-07-30 13:23:17 +03:00
Kefu Chai
35394c3f9a docs/dev: fix a typo
remove the extraneous "is".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19902
2024-07-30 10:46:25 +03:00
Pavel Emelyanov
97154b0671 Merge 'mapreduce_service: complete coroutinization' from Avi Kivity
mapreduce_server was previously coroutinized, but only partially. This
series completes coroutinization and eliminates remaining continuation chains.

None of this code is performance sensitive as it runs at the super-coordinator level
and is amortized over a full scan of the entire table.

No backport needed as this is a cleanup.

Closes scylladb/scylladb#19913

* github.com:scylladb/scylladb:
  mapreduce_service: reindent
  mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node()
  mapreduce_service: coroutinize dispatch() inner lambda
2024-07-30 10:44:34 +03:00
Nadav Har'El
d293a5787f alternator: exclude CDC log table from ListTables
The Alternator command ListTables is supposed to list actual tables
created with CreateTable, and should list things like materialized views
(created for GSI or LSI) or CDC log tables.

We already properly excluded materialized views from the list - and
had the tests to prove it - but forgot both the exclusion and the testing
for CDC log tables - so creating a table xyz with streams enable would
cause ListTables to also list "xyz_scylla_cdc_log".

This patch fixes both oversights: It adds the code to exclude CDC logs
from the output of ListTables, add adds a test which reproduces the bug
before this fix, and verifies the fix works.

Fixes #19911.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19914
2024-07-30 10:43:29 +03:00
Nadav Har'El
ca8b91f641 test: increase timeouts for /localnodes test
In commit bac7c33313 we introduced a new
test for the Alternator "/localnodes" request, checking that a node
that is still joining does not get returned. The tests used what I
thought were "very high" timeouts - we had a timeout of 10 seconds
for starting a single node, and injected a 20 second sleep to leave
us 10 seconds after the first sleep.

But the test failed in one extremely slow run (a debug build on
aarch64), where starting just a single node took more than 15 seconds!

So in this patch I increase the timeouts significantly: We increase
the wait for the node to 60 seconds, and the sleeping injection to
120 seconds. These should definitely be enough for anyone (famous
last words...).

The test doesn't actually wait for these timeouts, so the ridiculously
high timeouts shouldn't affect the normal runtime of this test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19916
2024-07-30 10:41:48 +03:00