Files
scylladb/service
Piotr Dulikowski 3edfd29c86 storage_service: repeat the feature check after read barrier
We would like to guarantee the following property: if all nodes have
some feature X in their `supported_features` column in
`system.topology`, then it's no longer possible for anybody to revoke
support for it. Currently, it is not guaranteed because the following
can happen:

1. A node commits a command that updates its `supported_features`,
   marking feature X as supported. It is the last node to do so and now
   all nodes support X.
2. Node crashes before applying the command locally.
3. Node is downgraded not to support X and restarted.
4. The feature check in `enable_features_on_startup` passes because it
   happens before starting the group 0 server.
5. The `supported_features` column is updated in
   `update_topology_with_local_metadata`, removing support for X.

Even though the guarantee does not hold, it's not a problem because the
`barrier_after_metadata_update` is required to succeed on all nodes
before topology coordinator moves to enable a feature, and - as the name
suggests - it requires `update_topology_with_local_metadata` to finish.

However, choosing to give this guarantee makes it simpler to reason
about how cluster features on raft work and removes some pathological
cases (e.g. trying to downgrade some other node after step 1 will fail,
but will be again possible after step 5). Therefore, this commit adds a
second check to `update_topology_with_local_metadata` which disallows
removing support for a feature that is supported by everybody - and
stops the boot process if necessary.
2023-08-31 16:46:11 +02:00
..
2023-06-06 13:29:16 +03:00