Per-route / per-client concurrency limits
As of 0.37.0 DaloyJS ships concurrencyLimit() — HAProxy maxconn+ request-queue parity, but inside the app where the framework already owns routing and client identity. Where the Node adapter's maxConnections caps sockets at accept time and loadShedding() rejects traffic under process pressure, concurrencyLimit() bounds the number of requests in flight through a given surface.
Each request:
- tries to acquire a slot from a per-bucket semaphore (
maxConcurrent); - if all slots are busy, waits in a bounded FIFO queue (
maxQueue) for up toqueueTimeoutMs; - is rejected with a fast
503 Service Unavailable(+Retry-After) once the queue is full or the wait times out; - releases its slot when the response is finalized — on success, error, and short-circuit paths alike, so a slot is never leaked.
Quick start
Scopes
scope decides how the concurrency budget is partitioned:
"global"(default) — one shared budget across the whole mount."route"— a separate budget permethod + path, so one hot endpoint can't starve the others mounted under the same guard."client"— a separate budget per client identity (requirestrustProxyHeadersor akeyGenerator), so a heavy client can't consume everyone else's slots.- a function — return a custom bucket key, or
undefinedto skip limiting for that request (fail-open).
No queue vs. queue
With the default maxQueue: 0, an overflowing request is rejected immediately with 503 — useful when you prefer fast failure over added latency. Set maxQueue to absorb short bursts, and pair it with queueTimeoutMsto bound tail latency so a waiting request doesn't hang indefinitely.
Observability
onReject fires whenever a request is turned away, with the bucket key, the reason ("queue-full" or "queue-timeout"), and the live active / queued counts:
Customizing the 503
How it complements the rest of the stack
maxConnections(Node adapter) — rejects surplus sockets at accept time (L4 admission).loadShedding()— sheds traffic when the process is under pressure (event-loop delay, heap, RSS).concurrencyLimit()— bounds in-flight requests per route / client with queueing (L7 fairness + backpressure).rateLimit()— bounds request rate over time per client.
They stack cleanly: admission cap → process shedding → concurrency fairness → rate limiting.