transport: move requests_serving decrement to after response is sent

The requests_serving metric was decremented right after query processing completed, but before the response was written to the client. This means requests whose responses were queued in the write pipeline were no longer counted as in-flight, understating the actual load. Move the decrement into the 'leave' defer block, which fires after the response is fully sent via _ready_to_respond. This makes the shedding check (max_concurrent_requests_per_shard) more accurate: requests that have finished processing but are still waiting in the response queue now correctly count toward the in-flight limit.
2026-04-23 10:00:35 +00:00 · 2026-04-17 15:05:29 +02:00
parent 0ae22a09d4
commit 4988077249
1 changed files with 1 additions and 2 deletions
--- a/transport/server.cc
+++ b/transport/server.cc
@@ -1001,8 +1001,6 @@ future<foreign_ptr<std::unique_ptr<cql_server::response>>>
        auto stop_trace = defer([&] {
            tracing::stop_foreground(trace_state);
        });
-        --_server._stats.requests_serving;
-
        return seastar::futurize_invoke([&] () {
            if (f.failed()) {
                return make_exception_future<foreign_ptr<std::unique_ptr<cql_server::response>>>(std::move(f).get_exception());
@@ -1240,6 +1238,7 @@ future<> cql_server::connection::process_request() {

            _pending_requests_gate.enter();
            auto leave = defer([this] {
+                --_server._stats.requests_serving;
                _shedding_timer.cancel();
                _shed_incoming_requests = false;
                _pending_requests_gate.leave();