From b4c5b79f5eb91aebf4b470596c306ace94cd0803 Mon Sep 17 00:00:00 2001 From: Kamil Braun Date: Thu, 11 Aug 2022 11:56:29 +0200 Subject: [PATCH] db: system_distributed_keyspace: don't call `on_internal_error` in `check_exists` The function `check_exists` checks whether a given table exists, giving an error otherwise. It previously used `on_internal_error`. `check_exists` is used in some old functions that insert CDC metadata to CDC tables. These tables are no longer used in newer Scylla versions (they were replaced with other tables with different schema), and this function is no longer called. The table definitions were removed and these tables are no longer created. They will only exists in clusters that were upgraded from old versions of Scylla (4.3) through a sequence of upgrades. If you tried to upgrade from a very old version of Scylla which had neither the old or the new tables to a modern version, say from 4.2 to 5.0, you would get `on_internal_error` from this `check_exists` function. Fortunately: 1. we don't support such upgrade paths 2. `on_internal_error` in production clusters does not crash the system, only throws. The exception would be catched, printed, and the system would run (just without CDC - until you finished upgrade and called the propoer nodetool command to fix the CDC module). Unfortunately, there is a dtest (`partitioner_tests.py`) which performs an unsupported upgrade scenario - it starts Scylla from Cassandra (!) work directories, which is like upgrading from a very old version of Scylla. This dtest was not failing due to another bug which masked the problem. When we try to fix the bug - see #11225 - the dtest starts hitting the assertion in `check_exists`. Because it's a test, we configure `on_internal_error` to crash the system. The point of this commit is to not crash the system in this rare scenario which happens only in some weird tests. We now throw `std::runtime_error` instead of calling `on_internal_error`. In the dtest, we already ignore the resulting CDC error appearing in the logs (see scylladb/scylla-dtest#2804). Together with this change, we'll be able to fix the #11225 bug and pass this test. Closes #11287 --- db/system_distributed_keyspace.cc | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/db/system_distributed_keyspace.cc b/db/system_distributed_keyspace.cc index fc892d3eb0..257889993f 100644 --- a/db/system_distributed_keyspace.cc +++ b/db/system_distributed_keyspace.cc @@ -175,7 +175,13 @@ static std::vector ensured_tables() { // Precondition: `ks_name` is either "system_distributed" or "system_distributed_everywhere". static void check_exists(std::string_view ks_name, std::string_view cf_name, const replica::database& db) { if (!db.has_schema(ks_name, cf_name)) { - on_internal_error(dlogger, format("expected {}.{} to exist but it doesn't", ks_name, cf_name)); + // Throw `std::runtime_error` instead of calling `on_internal_error` due to some dtests which + // 'upgrade' Scylla from Cassandra work directories (which is an unsupported upgrade path) + // on which this check does not pass. We don't want the node to crash in these dtests, + // but throw an error instead. In production clusters we don't crash on `on_internal_error` anyway. + auto err = format("expected {}.{} to exist but it doesn't", ks_name, cf_name); + dlogger.error(err.c_str()); + throw std::runtime_error{std::move(err)}; } }