mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-29 04:37:00 +00:00
When the cluster is changed (nodes added or removed), ranges of tokens are moved between nodes. Scylla initiates a streaming process between an old and a new owner of the range, which can take a long time. During that streaming time, the new owner of the range is known as a "pending node" for this range, and all updates must go to both the old owner (in case the movement fails!) and the pending node (in case the movement succeeds). For materialized views, because they are ordinary tables, streaming moves all the view's data that existed before the streaming started. But we did not send updates done to the view *during* the streaming. A dtest demonstrates that the new node will miss some of the view update, and will require a repair of the view tables immediately after the cluster change ends, which is not good. To fix that, we need to send every new update that happens during the streaming also to the "pending node". We already did this properly for base-table updates, but not to the view updates: Each base table replica wrote to only one paired view table replica, and nobody wrote to the new pending node (in case where there is one, for the particular view token involved). In this patch, we make sure that all view updates go also to the "pending nodes" when there are any. We do the same thing that Cassandra does, which is - *all* base replicas write the update to the pending node(s). Arguably, it is inefficient that all replicas send the update to the same node. In most cases it is enough to send it from just one base replica - the one who is slated to be the new node's pair. I opened https://issues.apache.org/jira/browse/CASSANDRA-14262 about this idea. But that is an optimization. The patch as-is already fixes the bug. Fixes #3211 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180313171853.17283-1-nyh@scylladb.com>