Three test cases in multishard_query_test.cc set the querier_cache entry
TTL to 2s and then assert, between pages of a stateful paged query, that
cached queriers are still present (population >= 1) and that
time_based_evictions stays 0.
The 2s TTL is not load-bearing for what these tests exercise — they are
checking the paging-cache handoff, not TTL semantics. But on busy CI
runners (SCYLLADB-1642 was observed on aarch64 release), scheduling
jitter between saving a reader and sampling the population can exceed
2s. When that happens, the TTL fires, both saved queriers are
time-evicted, population drops to 0, and the assertion
`require_greater_equal(saved_readers, 1u)` fails. The trailing
`require_equal(time_based_evictions, 0)` check never runs because the
earlier assertion has already aborted the iteration — which is why the
Jenkins failure surfaces only as a bare "C++ failure at seastar_test.cc:93".
Reproduced deterministically in test_read_with_partition_row_limits by
injecting a `seastar::sleep(2500ms)` between the save and the sample:
the hook then reports
population=0 inserts=2 drops=0 time_based_evictions=2 resource_based_evictions=0
and the assertion fires — matching the Jenkins symptoms exactly.
Bump the TTL to 60s in all three affected tests:
- test_read_with_partition_row_limits (confirmed repro for SCYLLADB-1642)
- test_read_all (same pattern, same invariants — suspect)
- test_read_all_multi_range (same pattern, same invariants — suspect)
Leave test_abandoned_read (1s TTL, actually tests TTL-driven eviction)
and test_evict_a_shard_reader_on_each_page (tests manual eviction via
evict_one(); its TTL is not load-bearing but the fix is deferred for a
separate review) unchanged.
Fixes: SCYLLADB-1642
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closesscylladb/scylladb#29564
Remove the unused timeout field from fuzzy_test_config. It was
declared, initialized per build mode, and logged, but never actually
enforced anywhere.
Document the intentionally small max_size (1024 bytes) passed to
read_partitions_with_paged_scan in run_fuzzy_test_scan: it forces
many pages per scan to stress the paging and result-merging logic.
The multishard_query_test/fuzzy_test was timing out (SIGKILL after
15 minutes) in release mode CI.
In release mode the test generates up to 64 partitions with up to
1000 clustering rows and 1000 range tombstones each. With deeply
nested randomly-generated types (e.g. frozen<map<varint,
frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed
the 15-minute CI timeout.
Reduce the release-mode clustering-row and range-tombstone
distributions from 0-1000 to 0-200. This caps the worst case at
~12,800 rows -- still 2x the devel-mode maximum (0-100) and
sufficient to exercise multi-partition paged scanning with many
pages.
Fixes: SCYLLADB-1270
And switch to std::source_location.
Upcoming seastar update will deprecate its compatibility layer.
The patch is
for f in $(git grep -l 'seastar::compat::source_location'); do
sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f;
done
and removal of few header includes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#27309
The query namespace is used for symbols which span the coordinator and
replica, or that are mostly coordinator side. The querier is mainly in
this namespace due to its similar name, but this is a mistake which
confuses people. Now that the code was moved to replica/, also fix the
namespace to be namespace replica.