
Now loading...
Artificial intelligence is stepping up its game in cybersecurity, with advanced models now capable of spotting serious flaws in sophisticated software on their own. Anthropic’s Claude Opus 4.6 recently uncovered more than 500 unknown security issues, known as zero-days, in established open-source projects, as the company detailed in an earlier report.
In a new partnership with Mozilla researchers, the AI model identified 22 vulnerabilities in Firefox over just two weeks. Among them, 14 earned high-severity ratings from Mozilla, accounting for nearly one-fifth of the browser’s critical fixes handled throughout 2025. This demonstrates how AI can dramatically speed up the detection of major security risks, potentially transforming how developers stay ahead of threats.
The collaboration led to a surge in reported Firefox issues, with Claude’s discoveries in February 2026 outpacing any single month’s findings from 2025 across all sources. Mozilla reviewed the influx of reports, guided Anthropic on prioritizing submissions, and rolled out patches reaching hundreds of millions of users through Firefox 148.0. Details of their joint work and key takeaways offer a blueprint for blending AI-driven security hunting with traditional maintenance teams.
The effort began in late 2025 when Anthropic observed Claude Opus 4.5 nearly mastering CyberGym, a test suite for replicating known software weaknesses. To push further, the team created a tougher benchmark using past Firefox vulnerabilities, targeting the browser’s intricate code. Firefox stood out as a rigorous challenge due to its vast, heavily scrutinized open-source nature and the billions of web interactions it secures daily.
Initially, Claude successfully recreated many historical flaws from older Firefox versions, a feat that once demanded extensive human investigation. To confirm true innovation, Anthropic directed the model toward undiscovered bugs in the latest codebase, starting with the JavaScript engine—a high-risk area exposed to unverified web content. Within 20 minutes, Claude flagged a Use After Free error, a memory mishap that could let attackers inject harmful data.
Anthropic’s team verified the issue in a fresh environment, cross-checked it internally, and submitted it to Mozilla’s Bugzilla tracker with a Claude-generated patch description. By the time validation wrapped up, the AI had generated 50 additional crash triggers. A Mozilla contact soon engaged, advising bulk submissions without full manual checks to accelerate triage. Ultimately, after probing around 6,000 C++ files, Anthropic filed 112 distinct reports, including the noted high- and medium-severity ones. Most received fixes in Firefox 148, with others slated for soon-to-come updates.
Awareness of potential oversights in unfamiliar codebases shaped the process, with Anthropic emphasizing self-validation while valuing Mozilla’s triage insights. The browser maker’s openness helped refine submissions to focus on actionable security concerns, even if some proved non-critical. Now, Mozilla’s own experts are testing Claude for internal security tasks.
To gauge Claude’s full potential, Anthropic explored whether it could go beyond detection to crafting basic exploits—simulating hacker tactics to leverage flaws for code execution, like accessing local files. Over hundreds of trials costing about $4,000 in compute, the model succeeded only twice. This highlights its strength in bug-spotting over weaponization, while underscoring that discovery remains far more affordable than exploitation. Still, even limited success in generating rudimentary browser attacks raises flags, though these worked solely in a stripped-down setup lacking real-world defenses like Firefox’s sandbox.
Such protections layer browser security, blunting many exploits, but sandbox escapes do occur, making AI-generated components a worrisome piece of larger threats. For deeper insight into one such Firefox exploit, check Anthropic’s Frontier Red Team analysis.
These developments stress the need to hasten vulnerability patching for security teams. Anthropic shared practical tips from the project, including “task verifiers”—reliable tools that let AI self-assess outputs during code exploration for better accuracy.
Verifiers proved vital for unearthing the Firefox bugs and, in other studies, for validating repairs by confirming flaw elimination without breaking core features. Custom verifiers, like automated tests for lingering issues or regression checks, boost patch reliability, though human review remains essential for any AI-suggested code, just as with outsider contributions.
For streamlined bug reporting amid maintainer overload, Anthropic recommends including concise test cases, thorough proof-of-concept details, and draft fixes in submissions. This approach builds trust and speeds verification, a standard the team urges for all AI-assisted research.
Anthropic also outlined its Coordinated Vulnerability Disclosure guidelines, aligning with industry standards but poised for evolution as AI advances.
Leading-edge models like Claude Opus 4.6 are now elite at vulnerability hunting, evidenced by the Firefox wins and separate probes into projects like the Linux kernel. Anthropic plans ongoing disclosures and community collaborations to bolster open-source defenses.
For now, the AI excels more at detection and repair than attacks, tilting the scales toward protectors. The launch of Claude Code Security in early access extends these tools to users and maintainers alike.
Yet rapid AI progress suggests the discovery-exploitation divide may narrow soon, prompting safeguards against misuse. Developers should seize this interval to fortify codebases. Anthropic intends to ramp up initiatives, from joint vulnerability scans to triage aids and proactive patches, all under responsible disclosure protocols.
