test/alternator: fix flaky test for partition-tombstone scan

The test test_scan.py::test_scan_long_partition_tombstone_string
checks that a full-table Scan operation ends a page in the middle of
a very long string of partition tombstones, and does NOT scan the
entire table in one page (if we did that, getting a single page could
take an unbounded amount of time).

The test is currently flaky, having failed in CI runs three times in
the past two months.

The reason for the flakiness is that we don't know exactly how long
we need to make the sequence of partition tombstones in the test before
we can be absolutely sure a single page will not read this entire sequence.
For single-partition scans we have the "query_tombstone_page_limit"
configuration parameter, which tells us exactly how long we need to
make the sequence of row tombstones. But for a full-table scan of
partition tombstones, the situation is more complicated - because the
scan is done in parallel on several vnodes in parallel and each of
them needs to read query_tombstone_page_limit before it stops.

In my experiments, using query_tombstone_limit * 4 consecutive tombstones
was always enough - I ran this test hundreds of times and it didn't fail
once. But since it did fail on Jenkins very rarely (3 times in the last
two months), maybe the multiplier 4 isn't enough. So this patch doubles
it to 8. Hopefully this would be enough for anyone (TM).

This makes this test even bigger and slower than it was. To make it
faster, I changed this test's write isolation mode from the default
always_use_lwt to forbid_rmw (not use LWT). This leaves the test's
total run time to be similar to what it was before this patch - around
0.5 seconds in dev build mode on my laptop.

Fixes #12817

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #12819
This commit is contained in:
Nadav Har'El
2023-02-13 12:21:24 +02:00
committed by Botond Dénes
parent 310638e84d
commit 14cdd034ee

View File

@@ -368,12 +368,15 @@ def test_scan_long_partition_tombstone_string(dynamodb, query_tombstone_page_lim
# Unfortunately, unlike strings of row-tombstones which end a page after
# exactly query_tombstone_page_limit, in the case of partition tombstones
# the limit applies to separate vnode subscans, so we need to have
# significantly more before the split. Experimentally "* 4" works here
# significantly more before the split. Experimentally "* 8" works here
# with test/alternator/run, but may need to be changed in the future.
N = int(query_tombstone_page_limit * 4)
N = int(query_tombstone_page_limit * 8)
with new_test_table(dynamodb,
KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'N' }]
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'N' }],
# This test does a lot of writes, so let's do them without LWT
# to make the test less slow.
Tags=[{'Key': 'system:write_isolation', 'Value': 'forbid_rmw'}]
) as table:
# We want to have two live partitions with a lot of partition
# tombstones between them. But the hash function is pseudo-random