test: propagate view update backlog before partition delete

In the test_delete_partition_rows_from_table_with_mv case we perform a deletion of a large partition to verify that the deletion will self-throttle when generating many view updates. Before the deletion, we first build the materialized view, which causes the view update backlog to grow. The backlog should be back to empty when the view building finishes, and we do wait for that to happen, but the information about the backlog drop may not be propagated to the delete coordinator in time - the gossip interval is 1s and we perform no other writes between the nodes in the meantime, so we don't make use of the "piggyback" mechanism of propagating view backlog either. If the coordinator thinks that the backlog is high on the replica, it may reject the delete, failing this test. We change this in this patch - after the view is built, we perform an extra write from the coordinator. When the write finishes, the coordinator will have the up-to-date view backlog and can proceed with the DELETE. Additionally, we enable the "update_backlog_immediately" injection, which makes the node backlog (the highest backlog across shards) update immediately after each change. Fixes: SCYLLADB-1795 Closes scylladb/scylladb#29775
2026-05-12 19:02:12 +00:00 · 2026-05-06 15:28:39 +02:00
parent 454a8e6966
commit ab12083525
1 changed files with 8 additions and 3 deletions
--- a/test/cluster/mv/test_mv_delete_partitions.py
+++ b/test/cluster/mv/test_mv_delete_partitions.py
@@ -60,8 +60,8 @@ async def insert_with_concurrency(cql, table, value_count, concurrency):
@pytest.mark.skip_mode(mode='release', reason="error injections aren't enabled in release mode")
 async def test_delete_partition_rows_from_table_with_mv(manager: ManagerClient) -> None:
    node_count = 2
-    await manager.servers_add(node_count, config={'error_injections_at_startup': ['view_update_limit', 'delay_before_remote_view_update']})
-    cql = manager.get_cql()
+    servers = await manager.servers_add(node_count, config={'error_injections_at_startup': ['view_update_limit', 'delay_before_remote_view_update', 'update_backlog_immediately']})
+    cql, hosts = await manager.get_ready_cql(servers)
    async with new_test_keyspace(manager, "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1}") as ks:
        await cql.run_async(f"CREATE TABLE {ks}.tab (key int, c int, PRIMARY KEY (key, c))")
        await insert_with_concurrency(cql, f"{ks}.tab", 200, 100)
@@ -71,8 +71,13 @@ async def test_delete_partition_rows_from_table_with_mv(manager: ManagerClient)

        await wait_for_view(cql, "mv_cf_view", node_count)

+        # The view building process elevates the view update backlog, potentially above the limit.
+        # When the view is build it should drop back down to 0 but this information may not reach
+        # the coordinator before the delete, so we perform an additional write on the same host before
+        # the delete - the current view update backlog will be propagated along the write response.
+        await cql.run_async(f"INSERT INTO {ks}.tab (key, c) VALUES (0, 999)", host=hosts[0], timeout=300)
        logger.info(f"Deleting all rows from partition with key 0")
-        await cql.run_async(f"DELETE FROM {ks}.tab WHERE key = 0", timeout=300)
+        await cql.run_async(f"DELETE FROM {ks}.tab WHERE key = 0", host=hosts[0], timeout=300)

 # Test deleting a large partition when there is a view with the same partition
 # key, and verify that view updates metrics is increased by exactly 1. Deleting