From 14cdd034eea65fffd8875d7a283f1ddcd0e83496 Mon Sep 17 00:00:00 2001 From: Nadav Har'El Date: Mon, 13 Feb 2023 12:21:24 +0200 Subject: [PATCH] test/alternator: fix flaky test for partition-tombstone scan The test test_scan.py::test_scan_long_partition_tombstone_string checks that a full-table Scan operation ends a page in the middle of a very long string of partition tombstones, and does NOT scan the entire table in one page (if we did that, getting a single page could take an unbounded amount of time). The test is currently flaky, having failed in CI runs three times in the past two months. The reason for the flakiness is that we don't know exactly how long we need to make the sequence of partition tombstones in the test before we can be absolutely sure a single page will not read this entire sequence. For single-partition scans we have the "query_tombstone_page_limit" configuration parameter, which tells us exactly how long we need to make the sequence of row tombstones. But for a full-table scan of partition tombstones, the situation is more complicated - because the scan is done in parallel on several vnodes in parallel and each of them needs to read query_tombstone_page_limit before it stops. In my experiments, using query_tombstone_limit * 4 consecutive tombstones was always enough - I ran this test hundreds of times and it didn't fail once. But since it did fail on Jenkins very rarely (3 times in the last two months), maybe the multiplier 4 isn't enough. So this patch doubles it to 8. Hopefully this would be enough for anyone (TM). This makes this test even bigger and slower than it was. To make it faster, I changed this test's write isolation mode from the default always_use_lwt to forbid_rmw (not use LWT). This leaves the test's total run time to be similar to what it was before this patch - around 0.5 seconds in dev build mode on my laptop. Fixes #12817 Signed-off-by: Nadav Har'El Closes #12819 --- test/alternator/test_scan.py | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/test/alternator/test_scan.py b/test/alternator/test_scan.py index d38bbeaf01..8c5f7c42b9 100644 --- a/test/alternator/test_scan.py +++ b/test/alternator/test_scan.py @@ -368,12 +368,15 @@ def test_scan_long_partition_tombstone_string(dynamodb, query_tombstone_page_lim # Unfortunately, unlike strings of row-tombstones which end a page after # exactly query_tombstone_page_limit, in the case of partition tombstones # the limit applies to separate vnode subscans, so we need to have - # significantly more before the split. Experimentally "* 4" works here + # significantly more before the split. Experimentally "* 8" works here # with test/alternator/run, but may need to be changed in the future. - N = int(query_tombstone_page_limit * 4) + N = int(query_tombstone_page_limit * 8) with new_test_table(dynamodb, KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }], - AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'N' }] + AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'N' }], + # This test does a lot of writes, so let's do them without LWT + # to make the test less slow. + Tags=[{'Key': 'system:write_isolation', 'Value': 'forbid_rmw'}] ) as table: # We want to have two live partitions with a lot of partition # tombstones between them. But the hash function is pseudo-random