mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-20 00:20:47 +00:00
The concurrency limit check in the Alternator server was positioned after memory acquisition (get_units), request body reading (read_entire_stream), signature verification, and decompression. This allowed unlimited requests to pile up consuming memory before being rejected, exhausting LSA memory and causing logalloc::bad_alloc errors that cascade into Raft applier and topology coordinator failures, breaking subsequent operations. Without this fix, test_limit_concurrent_requests on a 1GB node produces 50 logalloc::bad_alloc errors and cascading failures: reads from system.scylla_local fail, the Raft applier fiber stops, the topology coordinator stops, and all subsequent CreateTable operations fail with InternalServerError (500). With this fix, the cascade is eliminated -- admitted requests may still cause LSA pressure on a memory-constrained node, but the server remains functional. Move the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. This mirrors the CQL transport which correctly checks concurrency before memory acquisition (transport/server.cc). The concurrency check was originally added in1b8c946ad7(Sep 2020) *before* memory acquisition, which at the time lived inside with_gate (after the concurrency gate). The ordering was inverted byf41dac2a3a(Mar 2021, "avoid large contiguous allocation for request body"), which moved get_units() earlier in the function to reserve memory before reading the newly-introduced content stream -- but inadvertently also moved it before the concurrency check.c3593462a4(Mar 2025) further worsened the situation by adding a 16MB fallback reservation for requests without Content-Length and ungzip/deflate decompression steps -- all before the concurrency check -- greatly increasing the memory consumed by requests that would ultimately be rejected.