WhataWin via iStock / Getty Images Plus
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- AI is proving better than expected at finding old, obscure bugs.
- Unfortunately, AI is also good at finding bugs for hackers to exploit.
- In short, AI still isn’t ready to replace programmers or security pros.
In a recent LinkedIn post, Microsoft Azure CTO Mark Russinovich said he used Anthropic’s new AI model Claude Opus 4.6 to read and analyze assembly code he’d written in 1986 for the Apple II 6502 processor.
Also: Why AI is both a curse and a blessing to open-source software – according to developers
Claude didn’t just explain the code; it performed what he called a “security audit,” surfacing subtle logic errors, including one case where a routine failed to check the carry flag after an arithmetic operation.
That’s a classic bug that had been hiding, dormant, for decades.
The good news and the bad news
Russinovich’s experiment is striking because the code predates today’s languages, frameworks, and security checklists. However, the AI was able to reason about low-level control flow and CPU flags to point out real defects. For veteran developers, it’s a reminder that long-lived codebases may still harbor bugs that conventional tools and developers have learned to live with.
Also: 7 AI coding techniques I use to ship real, reliable products – fast
Yet despite the progress, some experts believe this experiment raises concerns.
As Matthew Trifiro, a veteran go-to-market engineer, said: “Oh, my, am I seeing this right? The attack surface just expanded to include every compiled binary ever shipped. When AI can reverse-engineer 40-year-old, obscure architectures this well, current obfuscation and security-through-obscurity approaches are essentially worthless.”
Trifiro makes a point. On the one hand, AI will help us find bugs so we can fix them. That’s the good news. On the other hand, and here’s the bad news, AI can also break into programs still in use that are no longer being patched or supported.
As Adedeji Olowe, founder of Lendsqr, pointed out, “This is scarier than we’re letting on. Billions of legacy microcontrollers exist globally, many likely running fragile or poorly audited firmware like this.”
Also: Is Perplexity’s new Computer a safer version of OpenClaw? How it works
He continued: “The real implication is that bad actors can send models like Opus after them to systematically find vulnerabilities and exploit them, while many of these systems are effectively unpatchable.”
LLMs complementing detector tools
Traditional static analysis tools such as SpotBugs, CodeQL, and Snyk Code scan source code for patterns associated with bugs and vulnerabilities. These tools excel at catching well-understood issues, such as null-pointer dereferences, common injection patterns, and API misuse, and they do so at scale across large Java and other-language codebases.
Now, it has become clear that large language models (LLMs) can complement those big detector tools. In a 2025 head-to-head study, LLMs like GPT-4.1, Mistral Large, and DeepSeek V3 were as good as industry-standard static analyzers at finding bugs across multiple open-source projects.
Also: This new Claude Code Review tool uses AI agents to check your pull requests for bugs — here’s how
How do these models do it? Instead of asking, “Does this line violate rule X?”, the LLM is effectively asking, “Given what this system is supposed to do, where are the failure modes and attack paths?” Combined, this approach is a powerful pairing.
For example, Anthropic’s Claude Opus 4.6 AI is helping clean up Firefox’s open-source code. According to Mozilla, Anthropic’s Frontier Red Team found more high-severity bugs in Firefox in just two weeks than people typically report in two months. Mozilla proclaimed, “This is clear evidence that large-scale, AI-assisted analysis is a powerful new addition to security engineers’ toolbox.”
Anthropic isn’t the only organization using AI engines to find bugs in code. Black Duck’s Signal product, for instance, combines multiple LLMs, Model Context Protocol (MCP) servers, and AI agents to autonomously analyze code in real time, detect vulnerabilities, and propose fixes.
Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic
Meanwhile, security consultancies, such as NCC Group, are experimenting with LLM-powered plugins for software reverse-engineering tools, like Ghidra, to help discover security problems, including potential buffer overflows and other memory-safety issues that can be hard for people to spot.
Passing security checks to AI
These successes don’t mean we’re ready to pass our security checks to AI. Far from it.
Also: I tried to save $1,200 by vibe coding for free – and quickly regretted it
Researchers have found that LLM-driven bug finding is not a drop-in replacement for mature static analysis pipelines. Studies comparing AI coding agents to human developers show that while AI can be prolific, it also introduces security flaws at higher rates, including unsafe password handling and insecure object references.
CodeRabbit found “that there are some bugs that humans create more often and some that AI creates more often. For example, humans create more typos and difficult-to-test code than AI. But overall, AI created 1.7 times as many bugs as humans.
Code generation tools promise speed but get tripped up by the errors they introduce. It’s not just little bugs: AI created 1.3-1.7 times more critical and major issues.”
Also: Rolling out AI? 5 security tactics your business can’t get wrong – and why
You can also ask Daniel Stenberg, creator of the popular open-source data transfer program cURL. He’s loudly and legitimately complained that his project has been flooded with bogus, AI-written security reports that drown maintainers in pointless busywork.
The moral of the story
AI, in the right hands, makes a great assistant, but it’s not ready to be a top programmer or security checker. Maybe someday, but not today. So, use AI with existing tools carefully, and your programs will be far more secure than they are currently.
As for old code, well, that’s a real worry. I foresee people replacing firmware-powered devices due to realistic fears that they’ll soon be compromised.

