Consider the following: 1) single-key read starts, blocks on replica e.g. waiting for memory. 2) the same replica is migrated away 3) single-key read expires, coordinator abandons it, releases erm. 4) migration advances to cleanup stage, barrier doesn't wait on timed-out read 5) compaction group of the replica is deallocated on cleanup 6) that single-key resumes, but doesn't find sstable set (post cleanup) 7) with abort-on-internal-error turned on, node crashes It's fine for abandoned (= timed out) reads to fail, since the coordinator is gone. For active reads (non timed out), the barrier will wait for them since their coordinator holds erm. This solution consists of failing reads which underlying tablet replica has been cleaned up, by just converting internal error to plain exception. Fixes #26229. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#27078
210 KiB
210 KiB