transport: move requests_serving decrement to after response is sent

The requests_serving metric was decremented right after query processing
completed, but before the response was written to the client. This means
requests whose responses were queued in the write pipeline were no longer
counted as in-flight, understating the actual load.

Move the decrement into the 'leave' defer block, which fires after the
response is fully sent via _ready_to_respond. This makes the shedding
check (max_concurrent_requests_per_shard) more accurate: requests that
have finished processing but are still waiting in the response queue now
correctly count toward the in-flight limit.
This commit is contained in:
Piotr Smaron
2026-04-17 15:05:29 +02:00
parent 0ae22a09d4
commit 4988077249

View File

@@ -1001,8 +1001,6 @@ future<foreign_ptr<std::unique_ptr<cql_server::response>>>
auto stop_trace = defer([&] {
tracing::stop_foreground(trace_state);
});
--_server._stats.requests_serving;
return seastar::futurize_invoke([&] () {
if (f.failed()) {
return make_exception_future<foreign_ptr<std::unique_ptr<cql_server::response>>>(std::move(f).get_exception());
@@ -1240,6 +1238,7 @@ future<> cql_server::connection::process_request() {
_pending_requests_gate.enter();
auto leave = defer([this] {
--_server._stats.requests_serving;
_shedding_timer.cancel();
_shed_incoming_requests = false;
_pending_requests_gate.leave();