AI is changing the economics of vulnerability discovery. Defenders should adapt now

By CERT-EU , on

Recent announcements by major AI laboratories, combined with observable shifts in attacker behaviour, mark an inflection point in the economics of vulnerability discovery. CERT-EU is publishing this post to share how we read these developments, what they mean for Union entities and the wider community, and how we are adapting our own services in response. It complements our Threat Landscape Report 2025, which documented the continued dominance of vulnerability exploitation as an initial access vector against Union entities.

Key points

  • AI-powered tools are now discovering and exploiting software vulnerabilities at a pace that fundamentally breaks the traditional patch cycle. According to Google's M-Trends 2026, the mean time to exploit newly disclosed vulnerabilities has dropped to an estimated negative seven days — exploitation typically occurs before a patch exists.
  • The most advanced models, such as Anthropic's Claude Mythos Preview, represent a generational jump in autonomous exploit development. Open-weight models are narrowing the capability gap to the frontier, meaning these capabilities will progressively become accessible to a wider range of actors, including malicious ones.
  • Exploitation of vulnerabilities in internet-facing software remains, for the second consecutive year, the highest-impact initial access vector against Union entities, as documented in our Threat Landscape Report 2025.
  • Defenders who adopt AI-augmented workflows systematically — in code they develop, in commercial software they deploy, and in their detection and response capabilities — can gain real ground. The race is about speed of adoption.
  • CERT-EU has built and tested an internal AI-powered penetration testing pipeline and is integrating AI-assisted analysis across its services to support Union entities under Regulation (EU, Euratom) 2023/2841.
  • We recommend building on the defensive fundamentals — defence-in-depth, sound security architecture, and early detection — with eight concrete actions: reduce attack surface, maintain rigorous hygiene, adopt AI-powered security testing responsibly, strengthen detection and response, accelerate zero-trust adoption, build cross-functional teams, align with emerging frameworks, and iterate continuously.

A new tempo in vulnerability discovery

In April 2026, Anthropic disclosed that Claude Mythos Preview — a cybersecurity-focused model it chose not to release publicly — had autonomously discovered thousands of high- and critical-severity vulnerabilities, including previously unknown zero-days in code dating back decades. Rather than a public release, Anthropic distributed the model through Project Glasswing, a controlled programme providing access to twelve launch partners and over 40 additional organisations maintaining critical infrastructure, exclusively for defensive security work.

The same month, HackerOne — the world's largest bug bounty platform — suspended new submissions to its Internet Bug Bounty programme after AI-generated vulnerability reports, a mix of genuine discoveries and low-quality submissions, overwhelmed both triage capacity and remediation resources across the open-source ecosystem. The cURL project had already shut down its own bug bounty programme in January 2026 for similar reasons.

This shift has been building for months. According to Google's M-Trends 2026 report, the mean time to exploit newly disclosed vulnerabilities has dropped to an estimated negative seven days: exploitation is, on average, now occurring before a patch is even released. In 2018, that window was 63 days. The traditional cycle of discover, disclose, patch, deploy was designed for a slower adversary. That adversary no longer exists.

None of this means that defenders are powerless. The same AI capabilities that strengthen attackers can also strengthen defences, and defenders who adopt them systematically gain real ground. For code they control — in-house applications and open-source projects they maintain or contribute to — they can integrate AI-powered analysis into development pipelines and software development lifecycles, closing flaws before they ship. For commercial software, they can use the same tools to identify vulnerabilities, report them to vendors through coordinated disclosure, and develop compensating controls and detections while awaiting patches. This is arguably the most significant opportunity defenders have had in many years, but only if they move quickly enough to seize it.

For organisations responsible for critical infrastructure — including our constituents, the Union entities (EU institutions, bodies, offices and agencies) — this acceleration demands a clear-eyed assessment of what has changed and a willingness to adapt how we approach the security of our internet-facing and third-party-exposed attack surface.

What AI can do today

The capability leap

Just as AI models have transformed software engineering — writing, debugging, and shipping code at superhuman speed — the most powerful systems are now demonstrating the same leap in security research, matching and in some cases exceeding human researchers in vulnerability discovery and exploitation.

When Anthropic benchmarked Claude Mythos Preview against vulnerabilities in Firefox 147's JavaScript engine, it developed working shell exploits 181 times out of several hundred attempts. Its predecessor, Claude Opus 4.6, succeeded twice on the same test. That is a generational jump in autonomous exploit development. OpenAI introduced Aardvark, an autonomous security agent powered by GPT-5 that achieved 92% recall on benchmark repositories seeded with known vulnerabilities, which has since evolved into Codex Security. OpenAI classified its GPT-5.3-Codex model as "High Cybersecurity Capability" under its Preparedness Framework and has since released GPT-5.4-Cyber, a variant of GPT-5.4 fine-tuned for defensive cybersecurity use cases — including capabilities such as binary reverse engineering — distributed through its expanded Trusted Access for Cyber programme to verified defenders and teams responsible for securing critical software. Google DeepMind launched CodeMender, an agent that leverages deep-reasoning models to autonomously identify and fix complex vulnerabilities, having already contributed 72 security fixes to open-source projects.

The results extend well beyond lab benchmarks. AISLE's autonomous cyber reasoning system discovered 12 of 12 CVEs in the January 2026 OpenSSL coordinated release, plus historical vulnerabilities dating back years, in one of the most heavily audited codebases in existence. In August 2025, DARPA's AI Cyber Challenge (AIxCC) demonstrated that seven finalist teams could discover 54 vulnerabilities across 54 million lines of code in just four hours, at an average cost of approximately $152 per task, with 18 real vulnerabilities responsibly disclosed.

These capabilities are already reaching the market. XBOW, an autonomous penetration testing platform, reached the number one position on HackerOne's US leaderboard in the first half of 2025, submitting nearly 1,060 vulnerability reports with 130 confirmed and resolved by programme owners. Aikido Security's AI-powered testing discovered a high-severity cache deception vulnerability affecting SvelteKit applications (a widely used web framework) deployed on Vercel (a major application hosting platform) with default configurations.

The trajectory is steep. CVE-Bench, a benchmark built from real-world critical vulnerabilities (ICML 2025 spotlight), initially measured the best autonomous agents at a 13% end-to-end exploitation rate. Less than a year later, OpenAI reported in its GPT-5.3-Codex system card that its latest model reached 90% on the same benchmark. Even at lower success rates, the economics are decisive: an AI agent that fails most of the time but tries thousands of attack vectors per hour will find more vulnerabilities than a human expert working manually. And the tools are automating the discovery of exactly the kind of vulnerabilities that make up the vast majority of real-world breaches: injection flaws, misconfigurations, authentication weaknesses, and known-but-unpatched issues.

What the numbers do not show

The discovery and exploitation numbers above are real, but they do not tell the whole story. What makes the latest generation of models particularly dangerous is not just the volume of vulnerabilities they find. It is their ability to chain findings across multiple steps, reason about application logic, and produce exploitation paths that previously required deep specialist knowledge. This is what turns a list of individual flaws into a working attack.

For defenders, using these tools responsibly adds friction that attackers do not face. Models can hallucinate vulnerabilities, mischaracterise severity, or propose patches that introduce new issues. Responsible deployment means validating findings, reviewing proposed fixes, and testing patches before they reach production, a workflow that remains labour-intensive even when discovery itself is fast. Running AI-powered analysis at scale also has operational costs. However, those costs are falling rapidly — each new generation of models has reduced the cost of equivalent analysis by roughly an order of magnitude — which will make systematic scanning more accessible to defenders, and to attackers alike.

This creates an uncomfortable asymmetry within the asymmetry: attackers need only one working exploit and face no quality-control burden, while defenders must triage every finding and remediate each one correctly. The tools are transformative on both sides, but the operational overhead of using them well, rather than merely using them, falls disproportionately on defence. This is precisely why investing in AI-augmented defensive workflows now, before the gap widens further, is so critical.

The access shift

But perhaps the most consequential development is not what the best AI systems can do, but who can now access these capabilities. Tasks that previously required years of specialised expertise — discovering complex attack vectors, crafting exploit chains — can now be partially automated by anyone with access to a frontier model.

FunkSec, a cybercriminal group with apparently limited technical proficiency, briefly became the most prolific ransomware actor worldwide, with much of its attack tooling AI-generated according to analysis by Check Point Research. DDoS-for-hire platforms are integrating large language models to lower the skill barrier for orchestrating sophisticated campaigns.

The pace of diffusion reinforces this. Open-weight models have historically lagged frontier closed-source systems, but that lag has narrowed significantly in recent years, and on coding and agentic tasks — the most direct proxy for vulnerability discovery — the gap has compressed further still. Open-weight releases such as Kimi K2.5 are already benchmarked on par with frontier closed models on cybersecurity tasks, and MITRE's OCCULT framework has shown open-weight models scoring over 90% on offensive cyber knowledge evaluations. The deepest exploitation-reasoning capabilities demonstrated by models such as Mythos Preview are likely to remain a closed-source advantage for longer, but the broader capability class is likely to become accessible through less controlled channels on timescales that matter for defensive planning. Safeguards on open-weight models can also be removed through fine-tuning, a well-established procedure.

This is not magic. It is efficiency on a fundamentally different level. The same dynamic that made spreadsheets accessible to non-accountants is now making offensive security techniques accessible to a far broader range of actors. The barrier to entry has dropped, and it will continue to drop as models improve and tooling matures.

The growing gap between offence and defence

The offensive AI capabilities described above are dangerous not because they exist in isolation, but because they collide with a defensive reality that was built for a different era.

Enterprise patch management cycles typically run 30 to 60 days, and responsible patching requires testing, staging, and rollback planning that cannot be safely shortcut. Meanwhile, the data paints a clear picture. Our Threat Landscape Report 2025 found that, for the second consecutive year, the exploitation of vulnerabilities in internet-facing software was the initial access vector with the highest impact on Union entities (see also our companion blog post). This is consistent with broader industry data: exploits remained the most common initial infection vector for the sixth consecutive year, accounting for 32% of all intrusions investigated by Mandiant in 2025, a trend independently corroborated by the Verizon Data Breach Investigations Report.

The mismatch is structural. AI systems discover vulnerabilities in minutes to hours. Attackers weaponise them within days, using automated patch-diffing — comparing patched and unpatched versions of software to reverse-engineer the underlying vulnerability — combined with AI-assisted exploit generation. Defenders need weeks to months to deploy fixes safely. And as noted above, exploitation is now routinely occurring before patches even exist.

This asymmetry is real, but it is not new. As our Threat Landscape Report 2025 documents, state-linked threat actors have long maintained vulnerability research programmes at a scale most defenders cannot match, conducting broad exploitation and supply-chain compromises against Union entities and their ecosystem. What AI does is change who else can credibly operate at that tempo. For defenders, the corresponding opportunity is to apply the same tools systematically: to the code they develop, to the commercial products they rely on (through coordinated disclosure and compensating controls), and to their detection and response capabilities. The race is about speed of adoption.

Organisations still relying on the traditional cycle of discover, disclose, patch, deploy — often slow and only partially executed — are operating with a structural disadvantage that grows wider with every improvement in AI capability. Closing this gap requires not only faster patching, but also proactive discovery in code they control, compensating controls while patches are pending, and detection capabilities tuned for higher-speed, higher-volume intrusion attempts.

Paying down decades of security debt

There is a deeper story beneath the acceleration in vulnerability discovery. For decades, many software vendors underinvested in secure development lifecycles, thorough code review, and sustained vulnerability research programmes. The result is a vast accumulation of latent vulnerabilities across the software ecosystem — flaws that existed but were never found, because finding them was too expensive, too slow, or commercially deprioritised in favour of faster feature delivery. Market incentives historically rewarded time-to-market over security engineering, and the cost of unfound vulnerabilities was largely externalised onto users.

AI changes that equation fundamentally. The flood of vulnerabilities now being surfaced, some dating back years or even decades, is not evidence that software is getting worse. It is the bill coming due for years of underinvestment. The initial pain is real: emergency patches, triage overload, and potential compromises before fixes are available. But the long-term trajectory is positive. Every vulnerability found and fixed is a door permanently closed. The end state is a cleaner, more resilient software ecosystem.

Policy is beginning to reinforce this dynamic. The EU Cyber Resilience Act, with mandatory vulnerability and incident reporting obligations taking effect from September 2026 and full compliance required by December 2027, creates binding cybersecurity requirements for vendors and hence an incentive to invest in the security of products with digital elements throughout their lifecycle. Combined with AI-powered discovery tools, this regulatory framework will accelerate the shift from reactive patching to proactive security engineering.

For organisations that develop software, including Union entities and the open-source community, the implication is clear: AI-powered vulnerability discovery should be integrated into development pipelines now, not as an afterthought but as a standard part of the build process. The cost of finding vulnerabilities before release is a fraction of the cost of responding to them in production.

Applying these capabilities defensively: an experiment at CERT-EU

At CERT-EU, we decided to understand this shift firsthand. Over the past weeks, prior to the announcement of Mythos Preview by Anthropic, our AI team has been developing and testing an internal AI-powered penetration testing pipeline to evaluate what current-generation models can actually do when applied systematically to vulnerability discovery.

Our system uses multiple specialised AI agents organised in a multi-phase pipeline that mirrors a professional penetration test: from initial reconnaissance and source code analysis through vulnerability identification, exploitation, remediation, and reporting. The agents cover different vulnerability domains, with a dedicated agent for each.

The design philosophy is to give each agent a laboratory, not a script. Each receives access to the full target environment and the autonomy to investigate as a human researcher would, adapting based on what it finds. A shared knowledge base accumulates discoveries across agents, so that each builds on what the others have learned.

The pipeline does not stop at finding vulnerabilities. For each confirmed finding, it generates a patch, creates a dedicated fix branch, and can open a merge request with all necessary changes, essentially taking the process from discovery to remediation.

The results have been instructive. The system consistently identifies real, exploitable vulnerabilities in production-grade applications, all of which were reported to the responsible maintainers through coordinated disclosure. Unlike traditional scanners, AI agents can interact with a live application, chain findings across different vulnerability classes, and reason about context in ways that static tools cannot.

The broader lesson is this: our system was built using publicly available AI models and the kind of expertise — offensive security, AI, incident response, threat intelligence — that any well-resourced adversary already possesses. Hostile state actors, criminal groups, and less responsible commercial vendors can build something similar, if they have not already. Given how quickly frontier capabilities diffuse to open-weight channels, this is not a question of whether such tools will be widely available, but when. The cross-functional collaboration required to develop tools like ours is not optional. It is the only way to keep pace with a threat that spans all of these disciplines.

What organisations should do now

The following recommendations translate this shift into concrete defensive actions for Union entities and the wider community. A preliminary point: the fundamentals remain essential. Defence-in-depth, sound security architecture, and early detection are as valuable as ever, as recent analysis by the UK AI Security Institute also underlines. The recommendations below build on those fundamentals rather than replacing them.

1. Reduce your attack surface. Decommission unnecessary internet-facing services, close unused ports, remove legacy systems, and enforce network segmentation. Every internet-facing service, API endpoint, and administrative interface is a potential target. Reduce what is exposed to what is strictly necessary.

2. Maintain rigorous cyber hygiene. Enforce credential rotation, deploy phishing-resistant multi-factor authentication, and maintain accurate asset inventories. Keep software up to date across the entire estate — internet-facing systems and edge devices in particular must be patched without delay. For critical vulnerabilities affecting exposed assets, organisations should target remediation within days, not weeks. Where immediate patching is not feasible, compensating controls such as web application firewalls, network segmentation, and temporary access restrictions must be deployed immediately. Automated code scanning (SAST), runtime testing (DAST), software composition analysis, and dependency scanning should be standard practice for all applications.

3. Adopt AI-powered security testing, responsibly. Use AI-assisted vulnerability discovery on your own systems before adversaries do. This applies across three concrete use cases:

  • Software you develop: integrate AI-powered vulnerability discovery into CI/CD pipelines to catch flaws before they reach production. This applies equally to Union entities, public and private sector organisations, and the open-source community.
  • Commercial and third-party software: use these tools to discover vulnerabilities in the products you rely on and report them to vendors through coordinated vulnerability disclosure, strengthening the ecosystem for everyone.
  • Compensating measures while patches are pending: develop custom detection rules and mitigations to close the gap between discovery and remediation, again AI-assisted where possible.

Responsible use matters: validate findings before acting on them, keep a human in the loop for exploitation and disclosure decisions, and follow coordinated vulnerability disclosure practices rather than contributing to the kind of submission floods that forced cURL and HackerOne to reconsider their bug bounty programmes. Shared security infrastructure, such as managed edge-security platforms, can compress the patching gap significantly through virtual patching and WAF rule updates deployed in hours rather than weeks.

4. Strengthen detection and response. AI-assisted intrusions are highly likely to be faster and stealthier. Invest in behavioural detection, anomaly monitoring, and AI-assisted threat hunting. Ensure incident response plans account for higher-speed, higher-volume scenarios. Continuous security testing, deceptive defence measures such as honeypots, and behavioural analytics allow organisations to identify and address weaknesses before they are exploited.

5. Accelerate zero-trust adoption. Assume that perimeter defences will be breached. Start with identity: enforce least privilege, verify every access request, and eliminate implicit trust. Extend to network segmentation, micro-segmentation where feasible, and continuous verification of devices and users.

6. Build cross-functional security teams. The AI-driven threat landscape cannot be addressed by any single discipline. Offensive security experts, AI specialists, incident responders, and cyber threat intelligence analysts must work together to anticipate and mitigate threats. Our experience developing the pipeline described above confirmed this: the most valuable insights emerged when CTI analysts informed the attack scenarios that offensive testers automated with AI, and when incident responders shaped detection rules based on what the AI agents actually did. Silos between these functions are a liability.

7. Align with emerging frameworks. The ENISA Multilayer Framework for Good Cybersecurity Practices for AI provides structured guidance for securing AI systems across foundational, AI-specific, and sector-specific layers. The EU AI Act establishes regulatory expectations for how organisations deploy and defend against AI systems, and its rules for general-purpose AI models are already in force. Providers of general-purpose AI models with systemic risk must assess and mitigate systemic cyber risk stemming from the model and ensure adequate cybersecurity protection of the model itself, throughout the model's lifecycle, including before the model is placed on the EU market. The EU Cyber Resilience Act, with mandatory vulnerability and incident reporting from September 2026 and full compliance required by December 2027, creates binding obligations for the security of products with digital elements. The NIST Cybersecurity Framework 2.0 has been updated with an AI-specific profile covering securing AI systems, conducting AI-enabled cyber defence, and thwarting AI-enabled cyberattacks.

8. Continuously iterate. This is not a one-time exercise. The threat landscape will evolve as AI capabilities advance. Regular reassessment, testing, and improvement of all defensive measures is essential.

How CERT-EU is adapting

CERT-EU is adapting its services and ways of working to this new environment. The AI-powered penetration testing pipeline described above is one component; we are also integrating AI-assisted analysis into vulnerability management, threat intelligence production, and incident response workflows that support Union entities under Regulation (EU, Euratom) 2023/2841. Under that Regulation, CERT-EU acts as the central cybersecurity hub for Union entities, and we will continue to leverage emerging AI capabilities, including at greater scale in the period ahead, to help Union entities strengthen their cybersecurity posture, meet their obligations, and respond to incidents at the tempo the threat landscape now demands. Our analytical work continues to apply the standards defined in our Cyber Threat Intelligence Framework. The broader institutional context for these efforts is set out in the 2025 IICB Annual Report.

Within the structured cooperation between CERT-EU and ENISA, we will also jointly assess these developments and pursue further action where appropriate.

As AI capabilities evolve, new models are released, and operational experience accumulates, we may share further assessments, practical guidance, and lessons learned.

The time to act is now

AI has changed the economics of vulnerability discovery. Capabilities that required nation-state resources and months of expert labour a few years ago are now available to a far broader set of actors, and will become more accessible with each new model release.

Every major AI laboratory is investing in security-relevant capabilities and, in parallel, in access-control arrangements for them. Anthropic distributed Mythos Preview exclusively through Project Glasswing, a controlled programme limited to vetted defensive partners. OpenAI released GPT-5.4-Cyber, a cyber-specialised variant of its frontier model, through its Trusted Access for Cyber programme, which uses identity verification and tiered access to restrict use to authenticated defenders. These are different approaches to the same underlying problem: frontier cybersecurity capabilities should reach defenders without becoming freely available to malicious actors.

Not all current or future providers will adopt the same release controls, and as noted earlier, the lag between frontier closed-source models and capable open-weight releases continues to narrow. When equally capable models reach less controlled channels — through open-weight release or through the removal of safety guardrails via fine-tuning — the offensive capabilities described in this post will become available to a far wider set of actors, including those with malicious intent.

The organisations that invest now in AI-augmented defence — in attack surface reduction, rigorous hygiene, proactive testing, and cross-functional collaboration — will be better positioned to weather what is coming. Those that defer investment until the next major incident will be responding from a weaker position than they needed to.

This is not a reason for alarm. It is a reason for focus. The same technology that empowers attackers can, and must, be leveraged to strengthen defences. Defenders who adopt these tools systematically have the structural advantage of applying them across their entire estate, closing doors that attackers would otherwise find. The question facing every organisation responsible for critical systems is no longer whether to adapt, but how quickly.

We got cookies

We only use cookies that are necessary for the technical functioning of our website. Find out more on here.