Per-route / per-client concurrency limits

DaloyJS ships concurrencyLimit(): HAProxy maxconn+ request-queue parity, but inside the app where the framework already owns routing and client identity. Where the Node adapter's maxConnections caps sockets at accept time and loadShedding() rejects traffic under process pressure, concurrencyLimit() bounds the number of requests in flight through a given surface.

Each request:

tries to acquire a slot from a per-bucket semaphore (maxConcurrent);
if all slots are busy, waits in a bounded FIFO queue (maxQueue) for up to queueTimeoutMs;
is rejected with a fast 503 Service Unavailable (+ Retry-After) once the queue is full or the wait times out;
releases its slot when the response is finalized: on success, error, and short-circuit paths alike, so a slot is never leaked.

Acquire, queue, or shed

requestIncoming requestbucket key from scope: global / route / client / fn
Slot free?per-bucket semaphore (maxConcurrent)
Run handlerslot released on success, error, and short-circuit
Full -> wait in FIFO queueup to maxQueue, for up to queueTimeoutMs
Queue full / timed out -> shed503 + Retry-After; onReject(queue-full | queue-timeout)

A request acquires a slot from the per-bucket semaphore, waits in a bounded FIFO queue if every slot is busy, and is shed with a fast 503 once the queue is full or the wait times out. The slot is always released when the response finalizes, so a slot is never leaked.

Quick start

import { App, concurrencyLimit } from "@daloyjs/core";

const app = new App();

// At most 100 in flight per route, queue up to 50 more, wait at most 2s.
app.use(concurrencyLimit({
  maxConcurrent: 100,
  maxQueue: 50,
  queueTimeoutMs: 2000,
  scope: "route",
}));

Scopes

scope decides how the concurrency budget is partitioned:

"global" (default): one shared budget across the whole mount.
"route": a separate budget per method + path, so one hot endpoint can't starve the others mounted under the same guard.
"client": a separate budget per client identity (requires trustProxyHeaders or a keyGenerator), so a heavy client can't consume everyone else's slots.
a function: return a custom bucket key, or undefined to skip limiting for that request (fail-open).

// Per-client fairness behind a trusted proxy.
app.use(concurrencyLimit({
  maxConcurrent: 10,
  maxQueue: 20,
  queueTimeoutMs: 1000,
  scope: "client",
  trustProxyHeaders: true,
}));

// Custom partition (e.g. per API tenant); undefined => unlimited.
app.use(concurrencyLimit({
  maxConcurrent: 50,
  scope: (ctx) => ctx.state.tenantId as string | undefined,
}));

No queue vs. queue

With the default maxQueue: 0, an overflowing request is rejected immediately with 503, useful when you prefer fast failure over added latency. Set maxQueue to absorb short bursts, and pair it with queueTimeoutMsto bound tail latency so a waiting request doesn't hang indefinitely.

// Fail fast, no waiting.
app.use(concurrencyLimit({ maxConcurrent: 200 }));

// Absorb bursts, but never wait longer than 500ms.
app.use(concurrencyLimit({
  maxConcurrent: 200,
  maxQueue: 100,
  queueTimeoutMs: 500,
}));

Observability

onReject fires whenever a request is turned away, with the bucket key, the reason ("queue-full" or "queue-timeout"), and the live active / queued counts:

app.use(concurrencyLimit({
  maxConcurrent: 100,
  maxQueue: 50,
  queueTimeoutMs: 2000,
  scope: "route",
  onReject: ({ key, reason, active, queued }) => {
    metrics.increment("concurrency.rejected", { key, reason });
    logger.warn({ key, reason, active, queued }, "request shed by concurrencyLimit");
  },
}));

Customizing the 503

app.use(concurrencyLimit({
  maxConcurrent: 100,
  retryAfterSeconds: 5,           // default 1; set 0 to omit the header
  message: "Server is busy, please retry shortly.",
}));

How it complements the rest of the stack

maxConnections (Node adapter): rejects surplus sockets at accept time (L4 admission).
loadShedding() sheds traffic when the process is under pressure (event-loop delay, heap, RSS).
concurrencyLimit() bounds in-flight requests per route / client with queueing (L7 fairness + backpressure).
rateLimit() bounds request rate over time per client.

They stack cleanly: admission cap → process shedding → concurrency fairness → rate limiting.

Search docs