Scylla implements `LWT` in the` storage_proxy::cas` method. This method expects to be called on a specific shard, represented by the `cas_shard` parameter. Clients must create this object before calling `storage_proxy::cas`, check its `this_shard()` method, and jump to `cas_shard.shard()` if it returns false.
The nuance is that by the time the request reaches the destination shard, the tablet may have already advanced in its migration state machine. For example, a client may acquire a `cas_shard` at the `streaming` tablet state, then submit a request to another shard via `smp::submit_to(cas_shard.shard())`. However, the new `cas_shard` created on that other shard might already be in the `write_both_read_new` state, and its `cas_shard.shard()` would not be equal to `this_shard_id()`. Such broken invariant results in an `on_internal_error` in `storage_proxy::cas`.
Clients of `storage_proxy::cas` are expected to check` cas_shard.this_shard()` and recursively jump to another shard if it returns false. Most calls to `storage_proxy::cas` already implement this logic. The only exception is `executor::do_batch_write`, which currently checks `cas_shard.this_shard()` only once. This can break the invariant if the tablet state changes more than once during the operation.
This PR fixes the issue by implementing recursive `cas_shard.this_shard()` checks in `executor::do_batch_write`. It also adds a test that reproduces the problem.
Fixes: scylladb/scylladb#27353
backport: need to be backported to 2025.4
Closesscylladb/scylladb#27396
* github.com:scylladb/scylladb:
alternator/executor.cc: eliminate redundant dk copy
alternator/executor.cc: release cas_shard on the original shard
alternator/executor.cc: move shard check into cas_write
alternator/executor.cc: make cas_write a private method
alternator/executor.cc: make do_batch_write a private method
alternator/executor.cc: fix indent
test_alternator: add test_alternator_invalid_shard_for_lwt