mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-29 11:10:40 +00:00
A recent commit370707b111(re)introduced a timeout for every group0 Raft operation. This timeout was set to 60 seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody". However, one of the things we do as a group0 operation is schema changes, and we already noticed a few years ago, see commit0b2cf21932, that in some extremely overloaded test machines where tests run hundreds of times (!) slower than usual, a single big schema operation - such as Alternator's DeleteTable deleting a table and multiple of its CDC or view tables - sometimes takes more than 60 seconds. The above fix changed the client's timeout to wait for 300 seconds instead of 60 seconds, but now we also need to increase our Raft timeout, or the server can time out. We've seen this happening recently making some tests flaky in CI (issue #23543). So let's make this timeout configurable, as a new configuration option group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e, 60 seconds), the same as the existing default. The test framework overrides this default with a a higher 300 second timeout, matching the client-side timeout. Before this patch, this timeout was already configurable in a strange way, using injections. But this was a misstep: We already have more than a dozen timeouts configurable through the normal configration, and this one should have been configured in the same way. There is nothing "holy" about the default of 60 seconds we chose, and who knows maybe in the future we might need to tweek it in the field, just like we made the other timeouts tweakable. Injections cannot be used in release mode, but configuration options can. Fixes #23543 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23717