scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 14:03:06 +00:00

Author	SHA1	Message	Date
Yaron Kaikov	8221a178d8	Revert "dist: support nonroot and offline mode for scylla-housekeeping" This reverts commit `c3bea539b6`. Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer Closes scylladb/scylladb#19976	2024-08-04 10:55:26 +03:00
Piotr Dulikowski	39b49a41cc	Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199 Closes scylladb/scylladb#19338 * github.com:scylladb/scylladb: mv: skip reading rows when generating partition tombstone update mv: delete a partition in a single operation when applicable cql-pytest: move ScyllaMetrics to util file to allow reuse	2024-08-02 11:00:18 +02:00
Avi Kivity	99d0aaa7d2	Merge 'tablets: load_balancer: Improve per-table balance' from Tomasz Grabiec Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} So worst shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes https://github.com/scylladb/scylladb/issues/16824 Closes scylladb/scylladb#19779 * github.com:scylladb/scylladb: test: perf: tablet_load_balancing: Test with higher shard and tablet counts tablets: load_balancer: Avoid quadratic complexity when finding best candidate tablets: load_balancer: Maintain load sketch properly during intra-node migration tablets: load_balancer: Use "drained" flag test: perf: tablet_load_balancing: Report load balancer stats tablets: load_balancer: Move load_balancer_stats_manager to header file tablets: load_balancer: Split evaluate_candidate() into src and dst part tablets: load_balancer: Optimize evaluate_candidate() tablets: load_balancer: Add more statistics tablets: load_balancer: Track load per table on cluster level tablets: load_balancer: Track load per table on node level tablets: load_balancer: Use a single load sketch for tracking all nodes locator: load_sketch: Introduce populate_dc() tablets: load_balancer: Modify target load sketch only when emitting migration locator: load_sketch: Introduce get_most_loaded_shard() locator: load_sketch: Introduce get_least_loaded_shard() locator: load_sketch: Optimize pick()/unload() locator: load_sketch: Introduce load_type test: perf: tablet_load_balancing: Report total tablet counts test: perf: tablet_load_balancing: Print run parameters in the single simulation case too test: perf: tablet_load_balancing: Report time it took to schedule migrations tablets: load_balancer: Log table load stats after each migration tablets: load_balancer: Log per-shard load distribution in debug level tablets: load_balancer: Improve per-table balance tablets: load_balancer: Extract check_convergence() tablets: load_balancer: Extract nodes_by_load_cmp tablets: load_balancer: Maintain tablet count per table tablets: load_balancer: Reuse src_node_info test: perf: tablet_load_balancing: Print warnings about bad overcommit test: perf: tablet_load_balancing: Allow running a single simulation test: perf: tablet_load_balancing: Report best possible shard overcommit test: perf: tablet_load_balancing: Report global shard overcommit	2024-08-01 21:12:14 +03:00
Piotr Dulikowski	44f327675d	Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov It's only needed to start hints via proxy, but proxy can do it without gossiper argument Closes scylladb/scylladb#19894 * github.com:scylladb/scylladb: storage_service: Remote gossiper argument from join_cluster() proxy: Use remote gossiper to start hints resource manager hints: Const-ify gossiper references and anchor pointers	2024-08-01 10:18:14 +02:00
Michael Litvak	c944e28e43	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 Closes scylladb/scylladb#19948	2024-08-01 09:39:49 +02:00
Nadav Har'El	5411559a94	test/cql-pytest: test ALLOW FILTERING in intersection of two indexes A user complained that ScyllaDB is incompatible with Cassandra when it requires ALLOW FILTERING on a restriction like WHERE x=1 AND y=1 where x and y are two columns with secondary indexes. In the tests added in this patch we show that: 1. Scylla is compatible with Cassandra when the traditional "CREATE INDEX" is used - ALLOW FILTERING is required in this case in both Cassandra and Scylla. 2. If SAI is used in Cassandra (CREATE CUSTOM INDEX USING 'SAI'), indeed ALLOW FILTERING becomes optional. I believe this is incorrect so I opened CASSANDRA-19795. These two tests combined show that we're not incompatible with Cassandra, rather Cassandra's two index implementations are incompatible between themselves, and Scylla is in fact compatible in this case with Cassadra's traditional index and not with SAI. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19909	2024-07-31 14:01:29 +03:00
Laszlo Ersek	e67eb0ccc1	test/sstable: coroutinize do_write_sst() Make do_write_sst() easier to read by coroutinizing it. Closes #19803. Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19937	2024-07-31 13:59:26 +03:00
Kefu Chai	020333fcf1	sstables: fix a typo in comment s/guranteed/guaranteed/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19946	2024-07-31 13:58:09 +03:00
Tomasz Grabiec	28de5231f4	test: perf: tablet_load_balancing: Test with higher shard and tablet counts We have up to 200 shards in production, so test this to catch performance issues.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	19b7fb3a4d	tablets: load_balancer: Avoid quadratic complexity when finding best candidate If the source and destination shards picked for migration based on global tablet balance do not have a good candidate in terms of effect on per-table balance, the algorithm explores other source shards and destinations. This has quadratic complexity in terms of shard count in the worst case, when there are no good candidates. Since we can have up to ~200 shards, this can slow down scheduling significantly. I saw total scheduling time of 5 min in the following run: scylla perf-load-balancing -c1 -m1G --iterations=8 \ --nodes=4 --tablets1=1024 --tablets2=8096 \ --rf1=2 --rf2=3 --shards=256 To improve, change the apprach to first find the best source shard and then best target shard, sequentially. So it's now linear in terms of shard count. After the change, the total scheduling time in that run is down to 4s. Minimizing source and destination metrics piece-wise minimizes the combined metric, so badness of the best candidate doesn't suffer after this change.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	93df82032f	tablets: load_balancer: Maintain load sketch properly during intra-node migration Affects only intra-node migration. The code was recording destination shard as taken and did not un-take it in case we skipped the migration due to lack of candidates. Noticed during code review. Impact is minor, since even if this leads to suboptimal balance, the next scheduling round should fix it. Also, the source shard was not unloaded, but that should have no impact on decisions. But to be future-proof, better to maintain the load accurately in case the algorithm is extended with more steps.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	88988ce0db	tablets: load_balancer: Use "drained" flag Cleanup / optimization.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	56801b7cb7	test: perf: tablet_load_balancing: Report load balancer stats	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	90c9934099	tablets: load_balancer: Move load_balancer_stats_manager to header file So that stats can be accessed outside tablet allocator.	2024-07-31 12:57:15 +02:00
Anna Stuchlik	ae28880fc8	doc: enable publishing docs for branch-6.1 This commit enables publishing documentation from branch-6.1. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed). Fixes https://github.com/scylladb/scylladb/issues/19926 No backport is required. Closes scylladb/scylladb#19931	2024-07-31 12:48:51 +02:00
Kamil Braun	c05e077a13	Merge 'raft: fix the shutdown phase being stuck' from Emil Maskovsky Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 Closes scylladb/scylladb#19860 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-07-31 12:10:30 +02:00
Tomasz Grabiec	94cce4b7d3	tablets: load_balancer: Split evaluate_candidate() into src and dst part Those parts will be used separately later.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	4df2abe47a	tablets: load_balancer: Optimize evaluate_candidate() Moves load computation out of the hot path by relying on data structures maintained globally during plan making.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	5e7facd543	tablets: load_balancer: Add more statistics	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	be055977c9	tablets: load_balancer: Track load per table on cluster level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	81fcee2040	tablets: load_balancer: Track load per table on node level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	e7ef7419dc	tablets: load_balancer: Use a single load sketch for tracking all nodes This is code simplification and optimization. Avoids multiple passes of tablet metadata to consturct load sketch for each target node.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	352b8e0ddd	locator: load_sketch: Introduce populate_dc()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	9a7afd334b	tablets: load_balancer: Modify target load sketch only when emitting migration This avoids the need to unpick() a replica when the candidate is not selected. Optimization.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	b78657ce7d	locator: load_sketch: Introduce get_most_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	de404471b7	locator: load_sketch: Introduce get_least_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8fbfd595bb	locator: load_sketch: Optimize pick()/unload() They are executed frequently during tablet scheduling. Currently, they have time complexity of O(N*log(N)) in terms of shard count. With large shard counts, that has significant overhead. This patch optimizes them down to O(log(N)).	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	d0b0f95849	locator: load_sketch: Introduce load_type	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8f3b623144	test: perf: tablet_load_balancing: Report total tablet counts	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	662a0ff038	test: perf: tablet_load_balancing: Print run parameters in the single simulation case too	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	a040404875	test: perf: tablet_load_balancing: Report time it took to schedule migrations	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	ae7fd80554	tablets: load_balancer: Log table load stats after each migration	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	b8996a0f59	tablets: load_balancer: Log per-shard load distribution in debug level	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	469e2f3f90	tablets: load_balancer: Improve per-table balance Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} So shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes #16824	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	b7661aa6c9	tablets: load_balancer: Extract check_convergence() Will be reused when evaluating different targets for migration in later stages. The refactoring drops updating of _stats.for_dc(dc).stop_no_candidates and we update _stats.for_dc(dc).stop_load_inversion in both cases where convergence check may fail. The reason is that stat updates must be outside check_convergence(), since the new use case should not update those stats (it doesn't stop balancing, just drops candidates). Propagating the information for distinguishing the two cases would be a burden. But it's not necessary, since both cases are actually load inversion cases, one pre-migration the other post-migration, so we don't need the distinction. It's actually wrong to increment stop_no_candidates, since there may still be candidates, it's the load which is inverted.	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	41e643ddb9	tablets: load_balancer: Extract nodes_by_load_cmp Will be reused in a different place.	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	8a7257971d	tablets: load_balancer: Maintain tablet count per table	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	4e4f13ac9d	tablets: load_balancer: Reuse src_node_info	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	71b8d6b7aa	test: perf: tablet_load_balancing: Print warnings about bad overcommit	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	0d50a028a5	test: perf: tablet_load_balancing: Allow running a single simulation	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	3f3660c3fe	test: perf: tablet_load_balancing: Report best possible shard overcommit	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	c89a320925	test: perf: tablet_load_balancing: Report global shard overcommit Rather than maximum per-node shard overcommit. Global shard overcommit is a better metric since we want to equalize global load not just per-node load.	2024-07-31 11:26:11 +02:00
Emil Maskovsky	5dfc50d354	raft: fix the shutdown phase being stuck Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223	2024-07-31 09:18:54 +02:00
Emil Maskovsky	2dbe9ef2f2	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent.	2024-07-31 09:18:54 +02:00
Pavel Emelyanov	9214aecbe7	storage_service: Remove orphan forward declaration of a method The start_sys_dist_ks() itself was removed by `bc051387c5` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19928	2024-07-30 16:17:49 +03:00
Benny Halevy	e58ca8c44b	service_level_controller: stop: always call subscription on_abort We want to call `service_level_controller::do_abort()` in all cases. The current code (introduced in `535e5f4ae7`) calls do_abort if abort was not requested, however, since it does so by checking the subscription bool operator, it would miss the case where abort was already requested before the subscription took place (in service_level_controller ctor). With scylladb/seastar@470b539b1c and scylladb/seastar@8ecce18c51 we can just unconditionally call the subscription `on_abort` method, that ensures only-once semantics, even if abort was already requested at subscription time. Fixes scylladb/scylladb#19075 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19929	2024-07-30 13:23:17 +03:00
Kefu Chai	35394c3f9a	docs/dev: fix a typo remove the extraneous "is". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19902	2024-07-30 10:46:25 +03:00
Pavel Emelyanov	97154b0671	Merge 'mapreduce_service: complete coroutinization' from Avi Kivity mapreduce_server was previously coroutinized, but only partially. This series completes coroutinization and eliminates remaining continuation chains. None of this code is performance sensitive as it runs at the super-coordinator level and is amortized over a full scan of the entire table. No backport needed as this is a cleanup. Closes scylladb/scylladb#19913 * github.com:scylladb/scylladb: mapreduce_service: reindent mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node() mapreduce_service: coroutinize dispatch() inner lambda	2024-07-30 10:44:34 +03:00
Nadav Har'El	d293a5787f	alternator: exclude CDC log table from ListTables The Alternator command ListTables is supposed to list actual tables created with CreateTable, and should list things like materialized views (created for GSI or LSI) or CDC log tables. We already properly excluded materialized views from the list - and had the tests to prove it - but forgot both the exclusion and the testing for CDC log tables - so creating a table xyz with streams enable would cause ListTables to also list "xyz_scylla_cdc_log". This patch fixes both oversights: It adds the code to exclude CDC logs from the output of ListTables, add adds a test which reproduces the bug before this fix, and verifies the fix works. Fixes #19911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19914	2024-07-30 10:43:29 +03:00
Nadav Har'El	ca8b91f641	test: increase timeouts for /localnodes test In commit `bac7c33313` we introduced a new test for the Alternator "/localnodes" request, checking that a node that is still joining does not get returned. The tests used what I thought were "very high" timeouts - we had a timeout of 10 seconds for starting a single node, and injected a 20 second sleep to leave us 10 seconds after the first sleep. But the test failed in one extremely slow run (a debug build on aarch64), where starting just a single node took more than 15 seconds! So in this patch I increase the timeouts significantly: We increase the wait for the node to 60 seconds, and the sleeping injection to 120 seconds. These should definitely be enough for anyone (famous last words...). The test doesn't actually wait for these timeouts, so the ridiculously high timeouts shouldn't affect the normal runtime of this test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19916	2024-07-30 10:41:48 +03:00

1 2 3 4 5 ...

43718 Commits