Bot / User-Agent management

DaloyJS ships botGuard(), the in-app equivalent of the bot rules Nginx, Cloudflare, and other WAFs run at the edge, but inside the app where the framework already owns request parsing and client-IP resolution. It does three opt-in jobs:

Block empty / missing User-Agent (a common signature of crude scrapers and vulnerability scanners, on by default).
Block known-abusive User-Agent strings (your own substrings or RegExps).
Verify declared crawlers: when a request claims to be Googlebot or Bingbot, confirm it via reverse-DNS + forward-confirm (the method Google and Bing themselves document) so a spoofed User-Agentcan't impersonate a trusted crawler.

Every check is opt-in and allowlist-friendly, and the middleware is dependency-free and runtime-portable.

Decision order

allowlistOn allowUserAgents?consulted first; bypasses every other rule
Empty / missing UA?blockEmptyUserAgent (on by default) -> 403
Matches blockedUserAgents?substring or RegExp -> 403
Claims a verified crawler?reverse-DNS + forward-confirm; spoofed / unverifiable -> 403
Allowed -> handlerlog mode never blocks; only fires onBlock

The allowlist wins first. After that an empty User-Agent, a blocked pattern, or a crawler that fails reverse-DNS plus forward-confirm is rejected with a 403 by default, so a spoofed Googlebot cannot impersonate a trusted crawler.

Quick start

import { createApp } from "@daloyjs/core";
import { botGuard, WELL_KNOWN_BOTS } from "@daloyjs/core";

const app = createApp();

app.use(
  botGuard({
    trustProxyHeaders: true, // needed to read the client IP for crawler checks
    blockedUserAgents: [/sqlmap/i, /nikto/i, "masscan"],
    verifiedBots: WELL_KNOWN_BOTS, // a spoofed Googlebot/Bingbot → 403
  }),
);

Mount it with app.use() so it runs in beforeHandle before your handlers. A blocked request is rejected with 403 Forbidden RFC 9457 problem+json.

Blocking empty & abusive User-Agents

blockEmptyUserAgent defaults to true. A plain string in blockedUserAgents matches case-insensitively as a substring; a RegExp is tested as-is.

app.use(
  botGuard({
    blockEmptyUserAgent: true,
    blockedUserAgents: ["masscan", "zgrab", /\bnmap\b/i],
  }),
);

Allowlist wins

allowUserAgents is consulted first and bypasses everyother rule (including empty-UA blocking and crawler verification), handy for your own monitoring agents or a partner's integration.

app.use(
  botGuard({
    blockedUserAgents: ["curl"],
    allowUserAgents: ["MyUptimeBot/1.0", /internal-scanner/i],
  }),
);

Verifying declared crawlers

Spoofing User-Agent: Googlebot is trivial. The only reliable check is the one Google and Bing publish: reverse-DNS the client IP, make sure the PTR hostname is on an official domain, then forward-resolve that hostname back to the same IP. botGuard() ships GOOGLEBOT and BINGBOT rules (bundled as WELL_KNOWN_BOTS) and you can add your own:

import { botGuard, GOOGLEBOT } from "@daloyjs/core";

app.use(
  botGuard({
    trustProxyHeaders: true,
    verifiedBots: [
      GOOGLEBOT,
      {
        name: "MyPartnerCrawler",
        userAgent: /partnercrawler/i,
        // Leading dot enforces a subdomain boundary so evil-partner.com
        // cannot satisfy .partner.example.
        domains: [".partner.example"],
      },
    ],
  }),
);

Because verifiedBots needs the client IP, the middleware refuses to construct unless you supply resolveIp or set trustProxyHeaders. A request that claims to be a crawler but can't be verified (no client IP, or a DNS failure) is blocked by default (blockUnverifiableBots, the secure-by-default posture). Set it to false to fail open. Verification results are cached per IP (default 1 h via cacheTtlMs) so DNS stays off the hot path.

Monitor mode & callbacks

Roll it out safely with mode: "log": nothing is blocked, but every match fires onBlock so you can measure impact before enforcing.

app.use(
  botGuard({
    mode: "log",
    trustProxyHeaders: true,
    verifiedBots: WELL_KNOWN_BOTS,
    onBlock: (event) =>
      log.warn(
        { reason: event.reason, ua: event.userAgent, ip: event.ip, bot: event.botName },
        "botGuard match",
      ),
  }),
);

The reason is one of "empty-user-agent", "blocked-user-agent", "spoofed-bot", or "unverifiable-bot".

Custom DNS resolver

The default resolver lazily imports node:dns/promises. On a runtime without it (Workers, Deno without --allow-net) or in tests, supply your own BotResolver:

import type { BotResolver } from "@daloyjs/core/bot-guard";

const resolver: BotResolver = {
  reverse: (ip) => myDns.reverse(ip),
  forward: (hostname) => myDns.resolve(hostname),
};

app.use(botGuard({ trustProxyHeaders: true, verifiedBots: WELL_KNOWN_BOTS, resolver }));

Search docs