The State of AI in Security 2026: 450 Teams, One Uncomfortable Pattern, and What Your Backend Can Do About It
Aikido and Sapio Research surveyed 450 developers, CISOs, and AppSec engineers across Europe and the US. The headline: AI now writes a quarter of production code, 1 in 5 teams had a serious incident because of it, and the usual reflex (buy more tools) makes things measurably worse. Here is the data, with charts, and the structural lesson it points to for anyone shipping an API.
Someone dropped Aikido's State of AI in Security & Development 2026 report in our team chat with the comment “this is just us, but with numbers.” I read all 52 slides on a train ride, and yeah, it was uncomfortable in the way good data usually is.
The setup: Sapio Research surveyed 450 full-time professionals across Europe and the US, split evenly into 150 developers, 150 security leaders (CISOs or equivalent), and 150 application security engineers. So this is not a vendor asking its own happy customers whether they are happy. It is a cross-section of the people writing code, the people reviewing it, and the people who get the 2 a.m. phone call when it goes wrong.
I have shipped backends for about twelve years now, and I have lived most of this report. The part that stuck with me is not any single scary number. It is the loop: AI writes more of our code, incidents go up, and the instinctive fix (buy another scanner) makes the problem measurably worse. Let me walk through the data, and then the structural lesson it points to. The lesson is not “use our framework,” it is something more boring and more durable. But since one concrete way to apply that lesson is the thing I work on, let me be upfront about it.
New here, and full disclosure: I help build DaloyJS, so weigh the framework parts accordingly. If you have not run into it yet: DaloyJS is a new, contract-first TypeScript backend framework. You define a route once and get request and response validation, live OpenAPI docs, a fully typed client, and security guardrails that are on by default, with zero runtime dependencies, running on Node, Bun, Deno, Cloudflare, and Vercel. That is the lens I read this report through. I will keep the data and the framework clearly separated, so you can take the numbers and ignore me if you like.
AI is writing a quarter of your code, and it shows
Start with adoption. AI coding tools now write 24% of production code on average, 21% in Europe and 29% in the US. That is not autocomplete anymore. That is a junior developer who never sleeps, never asks questions, and confidently commits whatever pattern was most common in its training data.
Nearly 70% of organizations say they have uncovered flaws tied to AI-generated code. For one in five, it escalated into a serious breach. Here is the full breakdown of how teams answered “have you ever identified a vulnerability introduced by AI-generated code?”
I want to be fair to the AI here. The code it writes usually works. That is exactly the trap. The vulnerability is rarely a syntax error a linter catches. It is the missing authorization check, the mass-assignment, the response that returns the whole row. Here is a real shape of it, the kind of thing I have genuinely had to send back:
Three bugs, zero of them visible in a quick scroll. No input validation, so the request body becomes the database write. Mass assignment, so role and isVerified are fair game. And the response serializes the entire user record,passwordHash included. The demo works. The tests, if there are any, probably pass. This is what “the code runs” buys you now.
The optimism gap is doing a lot of heavy lifting
Here is where the report gets a little funny. Despite all of the above, 96% of organizations believe AI will one day write near-perfect secure code. The average timeline they give is 5.1 years. I admire the confidence. I do not share it, and neither, on closer reading, do they.
Only 21% think AI will get there without a human in the loop. The other 79%, as one of the report's quoted CISOs puts it, are “the smart ones.” Almost a third expect AI to reduce bugs but still need people for secure design, architecture, and the business logic that no model understands because it lives in your head and a Slack thread from 2024.
Meanwhile 90% expect AI to take over penetration testing within about 5.5 years, and 97% would at least consider an agentic AI pentest tool. But they want proof: 60% want side-by-side results against a manual pentest before they trust it. Translation: everyone believes in the future and nobody is willing to bet production on it yet. That is the correct posture, and it is the same posture you should have toward the AI writing your endpoints today.
Tool sprawl is an incident generator, not an incident fix
This is the section I would tattoo on the wall of every security org that responds to a scare by signing another contract. The data is blunt: teams that suffered an incident in the past year ran more security tools (5.1 on average) than teams that did not (4.2). And it runs both directions. More tools correlated with more incidents, even after accounting for company size.
The mechanism is not mysterious. Every tool you add is another alert stream, another dashboard, another set of false positives, and another integration that does not quite line up with the others. 93% of teams running separate application-security and cloud-security tools report integration headaches: duplicate alerts, inconsistent data, findings that do not connect across tools. And the incident rate follows:
Remediation gets slower too. With a small stack, teams average a little over 3 days to fix a critical vulnerability. For teams juggling five or more vendor tools, that stretches to almost 8 days. Every extra tool adds alerts and integration overhead, and the path to “actually fixed” gets longer, not shorter.
The $20M tax nobody puts on a slide deck
The report does the math I usually avoid because it depresses me. Engineers spend around 6 hours a week triaging security alerts. Based on US Bureau of Labor Statistics salary data, that is roughly $20,000 per developer per year in lost productivity. And 72% of that time goes to false positives.
If you want to feel it in your own terms instead of a press-release number, here is the same calculation as code. Swap in your blended rate and headcount:
For a 50-person engineering org that is about $1M a year. For 250, about $5M. The point is not the precision, it is the category: this is a real, recurring cost that scales with headcount, and most of it is spent reading alerts that were never real. That is the invoice tool sprawl quietly sends you every year.
False positives make good engineers do dumb things
Here is the human cost, which is worse than the dollar cost. 65% of respondents admit that false positives have pushed them into risky behavior: bypassing security checks, dismissing findings, or delaying fixes. In the US that climbs to 73%. I have been the engineer who clicked “dismiss” on a wall of yellow because I had a release to ship and the last forty alerts were noise. The forty-first might not have been. That is how this goes.
- 01triggerIncident or scareboard asks 'what are we doing about AI?'
- 02reflexBuy another toolnow 5+ scanners, each its own dashboard
- 03resultMore alerts, more noise98% report false positives, ~4.8h/week each
- 04humanFatigue and bypass65% dismiss, delay, or skip checks
- 05outcomeReal bug slips throughincident rate rises, go to step 1
The way out of this loop is not heroics or another vendor. It is cutting the noise at the source so the alerts that survive are worth reading. Hold that thought, because it is the whole point.
Europe prevents, the US reacts
One of the more interesting splits in the report is regional. European orgs report far fewer serious incidents than US peers (20% vs 43%), but more near misses (53% vs 40%). The reading the report offers, and I find it convincing, is that Europe is catching things earlier in the pipeline while the US is catching them in production.
Several factors feed it: stronger regulatory pressure in Europe, US teams more likely to dismiss alerts or delay fixes (73% vs 61%), and heavier US reliance on AI-generated code (29% vs 21%). The US is further ahead on AI adoption and visibility, which is also why it reports more AI-related vulnerabilities. Being out in front means you see more of the problem. Europe just shifted more of the catching to the left, before code ships.
What actually moves the needle: automated gates and DevEx
Now the constructive part, because the report does not just catalog pain. It is unusually clear about what correlates with fewer incidents, and there are two findings that matter.
First, automated gates beat manual review. 56% of teams use automated gates (PR checks, CI/CD), 46% still lean on manual reviews, and 42% mainly rely on developers spotting issues themselves. The report is direct that human review does not scale to the volume and speed AI produces, and that automation is the stronger guardrail. Where automated gates are in place, teams ship faster with fewer missed issues.
Second, and this is the one I think most people miss, tools built for both developers and security teams have the lowest incident rates. Tools that serve only one audience leave the other side fighting the tooling instead of the threat.
Read those two findings together and you get a design spec: automated, in-pipeline guardrails that developers and security people can both live with, and that produce signal instead of noise. That is not a product pitch. It is a way of building. It happens to be the way I build APIs now, so let me show you what it looks like in practice rather than wave my hands.
What this looks like in a backend you ship
Go back to that insecure AI-generated handler. The fix is not “review harder.” Review is the thing that does not scale. The fix is to make the contract enforceable so the unsafe version cannot exist. Here is the same endpoint where the route definition is the guardrail:
Notice what changed. The validation is not a separate step a busy developer might skip, it is part of the route. Unknown keys are rejected, so mass assignment is gone. The response schema meanspasswordHash cannot leak, even by accident, even next quarter. An AI assistant can generate this just as quickly as the unsafe version. The difference is that the safe shape is the default shape, so the AI's confident-but-wrong instinct has nowhere to land. That is what “secure by default” actually means: not a checklist you remember, a baseline you cannot forget.
On the false-positive problem, the report's lesson is to favor automated gates that produce clear signal over scanners that produce triage queues. The supply-chain side of a DaloyJS repo is built that way on purpose:
Each of these is a binary pass or fail with a named offender. There is no “14 findings, severity medium, please assess” queue that trains you to click dismiss. That is the difference between a gate and a scanner: a gate tells you exactly what to fix and refuses to let it through, a scanner gives you homework. The report is full of teams drowning in homework.
And on the “tools for both devs and security” point: the same contract that gives a developer typed handlers and a generated client is the artifact a security reviewer reads to see exactly what every endpoint accepts and returns. One source of truth, two audiences, no second tool. That is the cheap version of the integration the report says reduces incidents.
If you want the whole thing on one screen, here is the report's pain mapped to what an app on @daloyjs/core does about it out of the box, no extra config and no extra tool:
| What the report found | What DaloyJS does by default |
|---|---|
| AI code skips input validation and mass-assigns fields | The request schema runs before the handler. .strict() rejects unknown keys with a 400, so role or isAdmin cannot be smuggled into a write. |
| AI code returns whole records and leaks fields | Response schemas are validated on the way out. A field you did not declare (like passwordHash) physically cannot leave the endpoint. |
| Tool sprawl: separate dashboards for devs and security | One OpenAPI 3.1 contract is the single source for both audiences, and the typed client is generated from it. Nothing to reconcile across tools. |
| False-positive fatigue (65% bypass or dismiss checks) | The verify:* gates are binary pass/fail with a named offender. No 'severity: medium, 14 findings' queue to learn to ignore. |
| Manual review does not scale to AI speed and volume | Gates run in CI and fail closed. Body limits, secureHeaders, and prod-mode error redaction hold without a reviewer remembering them. |
| Slow remediation with large dependency stacks | @daloyjs/core ships zero runtime dependencies, so there is far less to audit, patch, or chase a CVE through. |
None of that is exotic, and none of it is a paid add-on. It is the default shape of a route, which is the whole idea: the report keeps showing that the safe path loses whenever it is the extra step. So DaloyJS makes the safe path the only path that is also the easy one.
So what do I actually do on Monday?
You do not need to adopt anything I work on to act on this report. The lessons are structural. If I had to compress 52 slides into four things worth doing this week:
- Treat AI-generated code as untrusted input. Put an enforced contract in front of every side effect, validate the request before the handler runs, and validate the response on the way out. The model is a fast, confident junior. Supervise it with code, not with hope.
- Stop equating more tools with more security. The data says the opposite. Before you buy the next scanner, ask whether it produces gates or homework, and whether it adds a dashboard your team will learn to ignore.
- Move your checks into the pipeline. Automated gates in CI beat manual review at AI speed and volume. A check that fails the build is worth ten that file a ticket.
- Measure the false-positive tax. Run the cost snippet above with your real numbers. Once “noise” has a dollar figure, consolidating tools stops being a nice-to-have.
The honest caveat, since I promised this would not be a pitch: no framework, mine included, fixes most of this. Secure design, the architecture decisions, the business logic, and the judgment calls still need humans. That is literally what the report's 79% believe, and they are right. What a framework can do is make the safe path the default path so your scarce human attention goes to the hard problems instead of the ones a schema should have caught. That is a smaller claim than “we fix AI security,” and it is also the exact category the report shows hurting teams the most.
Want to try the secure-by-default approach?
DaloyJS is free, MIT-licensed, and zero-runtime-dependency. One command scaffolds a typed, validated, OpenAPI-documented API with the guardrails in this post already switched on:
Start with the secure-by-default guide or the docs. It is genuinely new, so kick the tires and tell me where it breaks. I would rather hear it from you than read about it in next year's report.
AI is going to keep writing more of our code. The report is clear-eyed that this is not slowing down. The teams that do well will not be the ones with the most tools or the most faith in the model. They will be the ones who made the secure thing the easy thing, automated the boring checks, and protected their engineers' attention like the expensive resource it is. That is not a 2026 trend. That is just good engineering, and it is nice to finally have 450 teams worth of data saying so.
Data throughout this post is from Aikido and Sapio Research's State of AI in Security & Development 2026 (survey of 450 developers, security leaders, and AppSec engineers across Europe and the US). The charts are my rendering of the report's figures.