SecurityAIField report

The State of AI in Security 2026: 450 Teams, One Uncomfortable Pattern, and What Your Backend Can Do About It

Aikido and Sapio Research surveyed 450 developers, CISOs, and AppSec engineers across Europe and the US. The headline: AI now writes a quarter of production code, 1 in 5 teams had a serious incident because of it, and the usual reflex (buy more tools) makes things measurably worse. Here is the data, with charts, and the structural lesson it points to for anyone shipping an API.

Devlin DuldulaoFullstack cloud engineerJune 25, 202614 min read

Someone dropped Aikido's State of AI in Security & Development 2026 report in our team chat with the comment “this is just us, but with numbers.” I read all 52 slides on a train ride, and yeah, it was uncomfortable in the way good data usually is.

The setup: Sapio Research surveyed 450 full-time professionals across Europe and the US, split evenly into 150 developers, 150 security leaders (CISOs or equivalent), and 150 application security engineers. So this is not a vendor asking its own happy customers whether they are happy. It is a cross-section of the people writing code, the people reviewing it, and the people who get the 2 a.m. phone call when it goes wrong.

I have shipped backends for about twelve years now, and I have lived most of this report. The part that stuck with me is not any single scary number. It is the loop: AI writes more of our code, incidents go up, and the instinctive fix (buy another scanner) makes the problem measurably worse. Let me walk through the data, and then the structural lesson it points to. The lesson is not “use our framework,” it is something more boring and more durable. But since one concrete way to apply that lesson is the thing I work on, let me be upfront about it.

New here, and full disclosure: I help build DaloyJS, so weigh the framework parts accordingly. If you have not run into it yet: DaloyJS is a new, contract-first TypeScript backend framework. You define a route once and get request and response validation, live OpenAPI docs, a fully typed client, and security guardrails that are on by default, with zero runtime dependencies, running on Node, Bun, Deno, Cloudflare, and Vercel. That is the lens I read this report through. I will keep the data and the framework clearly separated, so you can take the numbers and ignore me if you like.

AI is writing a quarter of your code, and it shows

Start with adoption. AI coding tools now write 24% of production code on average, 21% in Europe and 29% in the US. That is not autocomplete anymore. That is a junior developer who never sleeps, never asks questions, and confidently commits whatever pattern was most common in its training data.

69%

of orgs have found a vulnerability introduced by AI-generated code

1 in 5

suffered a serious incident directly tied to AI-generated code

24%

of production code is now written by AI tools

Nearly 70% of organizations say they have uncovered flaws tied to AI-generated code. For one in five, it escalated into a serious breach. Here is the full breakdown of how teams answered “have you ever identified a vulnerability introduced by AI-generated code?”

Vulnerabilities introduced by AI-generated code

Yes, a minor issue

49%

Yes, a serious incident

20%

Not aware of any

20%

No, none

11%

Source: State of AI in Security & Development 2026 (Aikido / Sapio Research, n=450). The 20% who are 'not aware' worry me more than the 20% who had an incident.

I want to be fair to the AI here. The code it writes usually works. That is exactly the trap. The vulnerability is rarely a syntax error a linter catches. It is the missing authorization check, the mass-assignment, the response that returns the whole row. Here is a real shape of it, the kind of thing I have genuinely had to send back:

// What a coding assistant happily produced for me when I asked it
// to "add an endpoint to update a user's plan". It runs. The demo
// works. The PR looks fine at a glance. Ship it, right?
app.post("/users/:id/plan", async (req, res) => {
  const { id } = req.params;
  const body = req.body; // trust me, it's fine

  const updated = await db.user.update({
    where: { id },
    // Mass assignment: nothing stops body.role = "admin" or
    // body.isVerified = true from being written straight to the row.
    data: body,
  });

  // And the response hands back the whole record, including
  // passwordHash and internal feature flags, to whoever called it.
  res.json(updated);
});

Three bugs, zero of them visible in a quick scroll. No input validation, so the request body becomes the database write. Mass assignment, so role and isVerified are fair game. And the response serializes the entire user record,passwordHash included. The demo works. The tests, if there are any, probably pass. This is what “the code runs” buys you now.

The optimism gap is doing a lot of heavy lifting

Here is where the report gets a little funny. Despite all of the above, 96% of organizations believe AI will one day write near-perfect secure code. The average timeline they give is 5.1 years. I admire the confidence. I do not share it, and neither, on closer reading, do they.

When will AI write near-perfect secure code?

Within 1-2 years

20%

3-5 years

44%

6-10 years

24%

More than 10 years

Never

Most teams expect a 3-5 year horizon, but only 21% believe it will ever happen without human oversight.

Only 21% think AI will get there without a human in the loop. The other 79%, as one of the report's quoted CISOs puts it, are “the smart ones.” Almost a third expect AI to reduce bugs but still need people for secure design, architecture, and the business logic that no model understands because it lives in your head and a Slack thread from 2024.

Meanwhile 90% expect AI to take over penetration testing within about 5.5 years, and 97% would at least consider an agentic AI pentest tool. But they want proof: 60% want side-by-side results against a manual pentest before they trust it. Translation: everyone believes in the future and nobody is willing to bet production on it yet. That is the correct posture, and it is the same posture you should have toward the AI writing your endpoints today.

Tool sprawl is an incident generator, not an incident fix

This is the section I would tattoo on the wall of every security org that responds to a scare by signing another contract. The data is blunt: teams that suffered an incident in the past year ran more security tools (5.1 on average) than teams that did not (4.2). And it runs both directions. More tools correlated with more incidents, even after accounting for company size.

Hours per week spent triaging security tool alerts

1-2 tools

4.1h

3-5 tools

5.6h

5+ tools

7.8h

Mean across all teams: 6.1 hours per engineer per week, just on triage.

The mechanism is not mysterious. Every tool you add is another alert stream, another dashboard, another set of false positives, and another integration that does not quite line up with the others. 93% of teams running separate application-security and cloud-security tools report integration headaches: duplicate alerts, inconsistent data, findings that do not connect across tools. And the incident rate follows:

Material incident rate: split vs integrated AppSec + CloudSec

Separate AppSec / CloudSec tools

31%

Integrated into one platform

20%

Teams that split application and cloud security were 50% more likely to report an incident (31% vs 20%).

Remediation gets slower too. With a small stack, teams average a little over 3 days to fix a critical vulnerability. For teams juggling five or more vendor tools, that stretches to almost 8 days. Every extra tool adds alerts and integration overhead, and the path to “actually fixed” gets longer, not shorter.

The $20M tax nobody puts on a slide deck

The report does the math I usually avoid because it depresses me. Engineers spend around 6 hours a week triaging security alerts. Based on US Bureau of Labor Statistics salary data, that is roughly $20,000 per developer per year in lost productivity. And 72% of that time goes to false positives.

15%

of engineering time lost to triaging alerts

72%

of that time wasted on false positives

~$20M

annual cost for a 1,000-developer org

If you want to feel it in your own terms instead of a press-release number, here is the same calculation as code. Swap in your blended rate and headcount:

// The report's $20M line, made concrete. Engineers spend ~6.1 hours
// a week triaging tool output, and 72% of that time goes to false
// positives. Plug in your own blended rate and team size.
const HOURS_PER_WEEK = 6.1;
const FALSE_POSITIVE_SHARE = 0.72;
const WORKING_WEEKS = 46;
const BLENDED_HOURLY_USD = 75; // illustrative

function noiseCostPerEngineer(): number {
  const wastedHours = HOURS_PER_WEEK * FALSE_POSITIVE_SHARE * WORKING_WEEKS;
  return Math.round(wastedHours * BLENDED_HOURLY_USD);
}

// ~$15,000 per engineer per year, spent reading alerts that were
// never real. At 1,000 engineers that is the report's ~$20M figure,
// and not one line of it shipped a feature.

For a 50-person engineering org that is about $1M a year. For 250, about $5M. The point is not the precision, it is the category: this is a real, recurring cost that scales with headcount, and most of it is spent reading alerts that were never real. That is the invoice tool sprawl quietly sends you every year.

False positives make good engineers do dumb things

Here is the human cost, which is worse than the dollar cost. 65% of respondents admit that false positives have pushed them into risky behavior: bypassing security checks, dismissing findings, or delaying fixes. In the US that climbs to 73%. I have been the engineer who clicked “dismiss” on a wall of yellow because I had a release to ship and the last forty alerts were noise. The forty-first might not have been. That is how this goes.

The tool-sprawl doom loop

01triggerIncident or scareboard asks 'what are we doing about AI?'
02reflexBuy another toolnow 5+ scanners, each its own dashboard
03resultMore alerts, more noise98% report false positives, ~4.8h/week each
04humanFatigue and bypass65% dismiss, delay, or skip checks
05outcomeReal bug slips throughincident rate rises, go to step 1

The loop feeds itself: each incident justifies another tool, each tool adds noise, the noise trains people to ignore alerts, and the next real finding gets ignored with the rest.

The way out of this loop is not heroics or another vendor. It is cutting the noise at the source so the alerts that survive are worth reading. Hold that thought, because it is the whole point.

Europe prevents, the US reacts

One of the more interesting splits in the report is regional. European orgs report far fewer serious incidents than US peers (20% vs 43%), but more near misses (53% vs 40%). The reading the report offers, and I find it convincing, is that Europe is catching things earlier in the pipeline while the US is catching them in production.

Serious incidents vs near misses, by region

Serious incident (EU)

20%

Serious incident (US)

43%

Near miss (EU)

53%

Near miss (US)

40%

A near miss is a finding caught before it became serious. More near misses and fewer incidents is the shape of catching things early.

Several factors feed it: stronger regulatory pressure in Europe, US teams more likely to dismiss alerts or delay fixes (73% vs 61%), and heavier US reliance on AI-generated code (29% vs 21%). The US is further ahead on AI adoption and visibility, which is also why it reports more AI-related vulnerabilities. Being out in front means you see more of the problem. Europe just shifted more of the catching to the left, before code ships.

What actually moves the needle: automated gates and DevEx

Now the constructive part, because the report does not just catalog pain. It is unusually clear about what correlates with fewer incidents, and there are two findings that matter.

First, automated gates beat manual review. 56% of teams use automated gates (PR checks, CI/CD), 46% still lean on manual reviews, and 42% mainly rely on developers spotting issues themselves. The report is direct that human review does not scale to the volume and speed AI produces, and that automation is the stronger guardrail. Where automated gates are in place, teams ship faster with fewer missed issues.

Second, and this is the one I think most people miss, tools built for both developers and security teams have the lowest incident rates. Tools that serve only one audience leave the other side fighting the tooling instead of the threat.

Material incident rate by who the tools are built for

Tools for both devs + security

22%

Tools built mainly for developers

30%

Tools built mainly for security

33%

Teams whose tools serve both sides also fix critical vulnerabilities within 24 hours far more often (59%) than developer-first setups (14%).

Read those two findings together and you get a design spec: automated, in-pipeline guardrails that developers and security people can both live with, and that produce signal instead of noise. That is not a product pitch. It is a way of building. It happens to be the way I build APIs now, so let me show you what it looks like in practice rather than wave my hands.

What this looks like in a backend you ship

Go back to that insecure AI-generated handler. The fix is not “review harder.” Review is the thing that does not scale. The fix is to make the contract enforceable so the unsafe version cannot exist. Here is the same endpoint where the route definition is the guardrail:

// Same feature, but the route IS the contract. The handler never
// runs unless the request matches, and the response physically
// cannot carry a field you did not declare. An AI can write this
// just as fast. The difference is the guardrail is structural,
// not a comment someone hopefully leaves in code review.
import { z } from "zod";
import { App } from "@daloyjs/core";

export const app = new App();

app.route({
  method: "POST",
  path: "/users/:id/plan",
  operationId: "updateUserPlan",
  request: {
    params: z.object({ id: z.string().uuid() }).strict(),
    body: z
      .object({
        plan: z.enum(["free", "pro", "team"]),
        seatCount: z.number().int().min(1).max(500),
      })
      // Unknown keys (role, isAdmin, isVerified) are rejected with a
      // 400, not silently merged into the update.
      .strict(),
  },
  responses: {
    200: {
      description: "updated",
      // Only these three fields can ever leave the building.
      // passwordHash is not on the list, so it cannot leak even if
      // a junior dev adds it to the SELECT next quarter.
      schema: z
        .object({
          id: z.string().uuid(),
          plan: z.enum(["free", "pro", "team"]),
          seatCount: z.number().int(),
        })
        .strict(),
    },
  },
  handler: async ({ params, body }) => billing.setPlan(params.id, body),
});

Notice what changed. The validation is not a separate step a busy developer might skip, it is part of the route. Unknown keys are rejected, so mass assignment is gone. The response schema meanspasswordHash cannot leak, even by accident, even next quarter. An AI assistant can generate this just as quickly as the unsafe version. The difference is that the safe shape is the default shape, so the AI's confident-but-wrong instinct has nowhere to land. That is what “secure by default” actually means: not a checklist you remember, a baseline you cannot forget.

On the false-positive problem, the report's lesson is to favor automated gates that produce clear signal over scanners that produce triage queues. The supply-chain side of a DaloyJS repo is built that way on purpose:

bash

# The report's clearest signal: automated gates in CI beat manual
# review, and tools that serve devs AND security beat single-audience
# tools. A DaloyJS repo leans into both. These run in CI and in the
# framework's own publish pipeline, and they fail closed.

pnpm verify:no-lifecycle-scripts   # no transitive postinstall hooks
pnpm verify:known-dep-names        # no slopsquatted / hallucinated names
pnpm verify:no-runtime-deps        # zero runtime deps to audit
pnpm verify:lockfile               # registry-only sources, pinned

# Each is a binary pass/fail with a named offender. There is no
# "severity: medium, 14 findings, please triage" queue to ignore.

Each of these is a binary pass or fail with a named offender. There is no “14 findings, severity medium, please assess” queue that trains you to click dismiss. That is the difference between a gate and a scanner: a gate tells you exactly what to fix and refuses to let it through, a scanner gives you homework. The report is full of teams drowning in homework.

And on the “tools for both devs and security” point: the same contract that gives a developer typed handlers and a generated client is the artifact a security reviewer reads to see exactly what every endpoint accepts and returns. One source of truth, two audiences, no second tool. That is the cheap version of the integration the report says reduces incidents.

If you want the whole thing on one screen, here is the report's pain mapped to what an app on @daloyjs/core does about it out of the box, no extra config and no extra tool:

What the report found	What DaloyJS does by default
AI code skips input validation and mass-assigns fields	The request schema runs before the handler. .strict() rejects unknown keys with a 400, so role or isAdmin cannot be smuggled into a write.
AI code returns whole records and leaks fields	Response schemas are validated on the way out. A field you did not declare (like passwordHash) physically cannot leave the endpoint.
Tool sprawl: separate dashboards for devs and security	One OpenAPI 3.1 contract is the single source for both audiences, and the typed client is generated from it. Nothing to reconcile across tools.
False-positive fatigue (65% bypass or dismiss checks)	The verify:* gates are binary pass/fail with a named offender. No 'severity: medium, 14 findings' queue to learn to ignore.
Manual review does not scale to AI speed and volume	Gates run in CI and fail closed. Body limits, secureHeaders, and prod-mode error redaction hold without a reviewer remembering them.
Slow remediation with large dependency stacks	@daloyjs/core ships zero runtime dependencies, so there is far less to audit, patch, or chase a CVE through.

None of that is exotic, and none of it is a paid add-on. It is the default shape of a route, which is the whole idea: the report keeps showing that the safe path loses whenever it is the extra step. So DaloyJS makes the safe path the only path that is also the easy one.

So what do I actually do on Monday?

You do not need to adopt anything I work on to act on this report. The lessons are structural. If I had to compress 52 slides into four things worth doing this week:

Treat AI-generated code as untrusted input. Put an enforced contract in front of every side effect, validate the request before the handler runs, and validate the response on the way out. The model is a fast, confident junior. Supervise it with code, not with hope.
Stop equating more tools with more security. The data says the opposite. Before you buy the next scanner, ask whether it produces gates or homework, and whether it adds a dashboard your team will learn to ignore.
Move your checks into the pipeline. Automated gates in CI beat manual review at AI speed and volume. A check that fails the build is worth ten that file a ticket.
Measure the false-positive tax. Run the cost snippet above with your real numbers. Once “noise” has a dollar figure, consolidating tools stops being a nice-to-have.

The honest caveat, since I promised this would not be a pitch: no framework, mine included, fixes most of this. Secure design, the architecture decisions, the business logic, and the judgment calls still need humans. That is literally what the report's 79% believe, and they are right. What a framework can do is make the safe path the default path so your scarce human attention goes to the hard problems instead of the ones a schema should have caught. That is a smaller claim than “we fix AI security,” and it is also the exact category the report shows hurting teams the most.

Want to try the secure-by-default approach?

DaloyJS is free, MIT-licensed, and zero-runtime-dependency. One command scaffolds a typed, validated, OpenAPI-documented API with the guardrails in this post already switched on:

bash

pnpm create daloy@latest my-api

Start with the secure-by-default guide or the docs. It is genuinely new, so kick the tires and tell me where it breaks. I would rather hear it from you than read about it in next year's report.

AI is going to keep writing more of our code. The report is clear-eyed that this is not slowing down. The teams that do well will not be the ones with the most tools or the most faith in the model. They will be the ones who made the secure thing the easy thing, automated the boring checks, and protected their engineers' attention like the expensive resource it is. That is not a 2026 trend. That is just good engineering, and it is nice to finally have 450 teams worth of data saying so.

Data throughout this post is from Aikido and Sapio Research's State of AI in Security & Development 2026 (survey of 450 developers, security leaders, and AppSec engineers across Europe and the US). The charts are my rendering of the report's figures.