The International AI Safety Report 2026, Translated Into a Minimum Safety Baseline for AI Backends
Aikido's read of the International AI Safety Report 2026 lands on a short list of deployment-time requirements for any backend an autonomous AI system can call — layered defense, independent verification, prompt-injection-resistant guardrails, network scope control, inference/execution separation, full observability and emergency controls. Here's the honest per-requirement mapping to what a DaloyJS app already enforces by default, what one opt-in line adds, and what still lives above the HTTP layer.
A reader sent me Aikido's "International AI Safety Report 2026: Aikido Security Analysis" with the same question I get every other week now: are we doing anything about this?
The piece reads the International AI Safety Report 2026 — 100+ experts, 30+ countries, Yoshua Bengio chairing — through an operator's lens and lands on a short, useful conclusion: the interesting safety work for the rest of us is at deployment time and runtime, not at training time. Aikido summarises it as a few deployment-time requirements that any backend an autonomous AI system can call should meet, no matter which model is calling it:
- Layered defense. Training-time safety, deployment-time controls, and post-deployment monitoring are three independent layers. The deployment-time layer must work even when the model layer fails.
- Mandatory verification. Models game evals and sandbag on demand. Self-reports and chain-of-thought are not evidence. You need an independent verifier in front of every side effect.
- Prompt-injection-resistant constraints. Leading models still fall to prompt injection with a handful of tries in 2025 evals. Constraints must be enforced, not requested in the system prompt.
- Minimum safety requirements. Abuse prevention, network-level scope control, inference/execution separation, full observability with emergency controls, data processing guarantees, and verification with false-positive control.
That list reads almost like a feature spec for an HTTP framework built for the agent era. Which is convenient, because DaloyJS is an HTTP framework built for the agent era. Below is the honest per-requirement mapping of what an app on @daloyjs/core already enforces by default, what one opt-in line adds, and the items no framework can own.
Layered defense — the deployment layer must stand alone
- DaloyJS ships
- The DaloyJS constructor ships secure-by-default for the deployment layer: 1 MiB body limit, 30s request timeout, prod-mode 5xx redaction, prototype-pollution-safe JSON parse, CRLF / header-splitting refusal, path-traversal rejection, method-confusion 405 (not 404), 415 on unsupported content types, __Host- / Secure / HttpOnly / SameSite=Lax cookies. None of these depend on the model behaving — they hold even when the calling agent is fully compromised.
- You still own
- Decide what the runtime layer does when the deployment layer fires: page someone, drop the request, fail open, fail closed. The framework gives you the signal; the runbook is yours.
Mandatory verification — the route schema is the contract
- DaloyJS ships
- Every DaloyJS route declares a schema (Zod, Valibot, ArkType — anything Standard Schema). The handler does not run until the request matches. .strict() is the project convention so unknown keys are rejected, not silently dropped into the database. Response schemas are validated too, so a handler cannot leak a field the contract didn't promise — useful when the consumer is an agent that will happily exfiltrate anything it sees.
- You still own
- Write the schema tight. min/max on numbers, min/max on string lengths, regex on identifiers, enum on choices. The framework runs whatever shape you give it; a permissive schema is permissive enforcement.
This is the single most important point in the entire report and the easiest one to get wrong. The temptation when a model is doing something clever is to widen the schema so the clever thing fits. Don't. Widen the schema only when you've thought through what the wider input means in production — and write the unhappy-path test before you ship it.
Network-level scope control — fetchGuard is one line
- DaloyJS ships
- fetchGuard() is a default-deny outbound wrapper around fetch / undici / Bun.fetch / Workers fetch. Cloud metadata IPs, localhost, RFC 1918 private ranges, link-local, and IPv6 equivalents are blocked. Redirects are re-validated against the same allow-list, so an attacker can't bounce off a public URL into the metadata service. ipRestriction() does the same job for inbound traffic on admin / kill-switch surfaces.
- You still own
- Write the allow-list. fetchGuard refuses to start without one — there is no '*' default. That refusal is on purpose; the most common AI tool SSRF is a 'we'll lock it down later' that never gets locked down.
Inference / execution separation — two Apps, two deploys
- DaloyJS ships
- DaloyJS Apps are cheap. The recommended pattern is two of them: a model-facing tool surface (public, schema-validated, rate-limited) and an execution app (internal-only, ipRestriction'd, behind short-lived JWTs). The tool surface forwards to the execution app over the internal network. A successful prompt injection lands the attacker in a process with no database credentials, no filesystem, and a fetchGuard allow-list of two domains.
- You still own
- Decide the split. 'Anything that mutates state' is a fine starting boundary. Move the line as your blast-radius tolerance changes. The framework doesn't care which side a route lives on.
Full observability and emergency controls — the receipts and the big red button
- DaloyJS ships
- Per-request structured logs with a correlated ULID requestId. RFC 9457 problem+json errors carrying the same requestId. loadShedding sheds the cheapest traffic first when the event loop or queue is saturated. gracefulShutdown drains in-flight requests on SIGTERM. A killswitch is one ipRestriction line on / and a redeploy.
- You still own
- Wire the structured log stream to your SIEM (Datadog, CloudWatch, Loki, whatever). Decide the load-shedding thresholds for your workload. The framework gives you the primitives; the dashboards and the on-call rotation are yours.
Prompt injection — the HTTP boundary owns the blast radius
Let's be honest about this one. Prompt injection doesn't live at the HTTP layer — it lives in the model. No framework "solves" prompt injection. What the framework owns is the blast radius of a successful prompt injection: how much damage the model can do once it has been convinced to call your tool with attacker-shaped input.
- DaloyJS ships
- The route IS the constraint. Strongly typed inputs, bounded numbers, bounded strings, enum'd choices, .strict() bodies. Response schemas bounded too, so a successful injection can't read fields the contract didn't promise. RFC 9457 errors so the model gets a structured 400 it can self-correct from, not a vague 500 it will retry with progressively weirder inputs.
- You still own
- Resist the urge to add a free-form 'action' field 'just for flexibility'. Free-form fields are the entire prompt-injection attack surface. If a tool needs flexibility, ship more tools, not wider tools.
Data processing guarantees — prod-mode redaction is on by default
- DaloyJS ships
- In production, DaloyJS redacts 5xx response bodies by default. No stack traces, no internal hostnames, no DB error messages reach the wire. The agent sees a problem+json with a requestId; your SIEM sees the full structured detail under the same id. Same for header sanitisation, same for the JWT verifier (which never echoes the failing claim, only the reason).
- You still own
- Don't paste raw error.message into a 200 response 'so the agent can self-correct'. The agent will self-correct from a 400 problem+json with a documented type URL just as well, and the type URL doesn't leak your DB schema.
The whole baseline in one file
That's the minimum shape of a DaloyJS app that takes traffic from an autonomous AI system in production. About fifty lines. Zero runtime dependencies on @daloyjs/core's side. Every line maps to a specific item on the report's minimum-safety list.
What the framework honestly cannot do
- Training-time safety.That's the model provider's layer. The report is correct that you cannot rely on it alone — but we can't supply it either. What we can do is make the deployment layer strong enough that a jailbroken model is still bounded by the schema.
- Detecting that a model is sandbagging. A model that intentionally underperforms on evals is a problem above the HTTP layer. What the framework can do is make every tool call observable and every side effect schema-checked, so an anomalous pattern shows up in your structured log stream and your SIEM can flag it.
- Telling you what is safe for your business.The schema says "amountCents must be ≤ 50,000" — the framework cannot tell you that 50,000 is the right number. That is a product / risk / compliance call and it changes per route, per customer tier, per jurisdiction.
- Stopping you from disabling the guards. The guards run in your app. If you delete fetchGuard or widen the schema to
z.any(), the framework lets you — the repo's AGENTS.md asks coding agents not to, and the secure-by-default post spells out why, but the merge-button discipline is on the team.
The honest answer to the original question
Are we doing anything about the International AI Safety Report 2026? Yes — the framework was already designed against this exact shape of threat model. Aikido's read of the report lines up one-for-one with primitives that ship today: fetchGuard() for network scope control, route schemas + .strict() for independent verification, two-App composition for inference/execution separation, requestId + structured logs + RFC 9457 for full observability, loadShedding + gracefulShutdown for emergency controls, prod-mode redaction + JWT algorithm allowlists for data processing guarantees, and rateLimit + body limits + request timeouts for abuse prevention.
None of it is exotic. None of it requires a runtime dependency. All of it is on by default or one line of opt-in. The framework cannot make the model safe — but it can make sure that when the model isn't, the backend still is.
Related reading on this blog: OWASP Top 10 for Agentic Applications, Mapped, Vibe Coding Security, Cloud Security Architecture, Mapped, The 5 Pillars of a Secure SDLC, Mapped, Secure by Default. Relevant docs: /docs/security, runtime protections, secure defaults.