Beyond the Hype: LLMs, Mythos, and the Future of Firmware Analysis

LLMs Revolutionize Firmware Analysis
Large Language Models (LLMs) are making inroads across domains, and IT security – including firmware analysis – is no exception. AI models can read and interpret code, sometimes even decompiling binary firmware into human-like pseudo-code, and assist in finding vulnerabilities. For example, researchers have demonstrated a pipeline where an LLM works with Ghidra (an open-source reverse engineering suite) to analyze decompiled firmware code for vulnerabilities. This automation can perform in hours what used to take experts days – scanning thousands of functions for buffer overflows, hardcoded credentials, insecure communications, and more. The LLM doesn’t rely on simple pattern matching, it understands code context and logic, enabling it to catch complex vulnerability patterns that simple static analysis might miss. In one case, an automated LLM-driven system handled the entire workflow – binary identification, decompilation, vulnerability analysis, CWE classification, and report generation – turning a 40-hour manual process into a 4-hour autonomous job¹.
Mythos Revolutionizes Vulnerability Discovery
Around the same time, Anthropic’s new model Claude Mythos stunned the industry by autonomously uncovering critical flaws that had eluded humans for decades – including a 27-year-old bug in OpenBSD’s network stack that survived millions of fuzz tests and audits. And Mythos didn’t stop at one insight, it reportedly surfaced thousands of zero-day vulnerabilities across every major operating system and web browser². This unprecedented scale of discovery has excited defenders and attackers alike. On one hand, AI promises to quickly identify hidden weaknesses in embedded code. On the other, it heralds a flood of new vulnerabilities that security teams will have to triage and fix at a breakneck pace. For context, a 2026 Cloud Security Alliance report warned that the average time from a new vulnerability’s disclosure to it being exploited in the wild had collapsed to roughly 20 hours – down from 2.3 years in 2018³. In short, LLMs like Mythos are supercharging firmware analysis and exploit development, dramatically shrinking the window defenders have to respond.
AI-Powered Firmware Analysis: New Strengths, New Challenges
Modern LLMs have shown they can tackle tasks once reserved for seasoned reverse engineers. Mythos took this a step further, essentially acting as an autonomous security researcher. It not only found obscure software bugs, but also weaponized them: for example, it generated working exploits for old “medium-severity” vulnerabilities that were previously thought impractical to attack at scale. In fact, Mythos was able to produce thousands of working exploits for the zero-days it discovered, at an average compute cost of only a couple thousand dollars each⁴. This kind of AI-driven offensive capability represents a generational leap. A vulnerability that might have been dismissed as too complex to exploit can no longer be safely ignored – Mythos and similar models can potentially turn it into a reliable exploit overnight. The result is a double-edged sword: we’re finding more vulnerabilities than ever before, faster than ever before, which is great for shoring up defenses if we can fix them in time. But it also means organizations will be inundated with issues to manage. Security teams are now scrambling to prioritize and patch a tsunami of newly unearthed bugs, knowing that attackers armed with similar AI will be racing to exploit them in hours. This dramatic acceleration is what some experts are calling an “AI-driven vulnerability storm” – a pivotal change that places new urgency on how we handle firmware security at scale³.
The Most Important Limit of LLMs in Firmware Security: Non-determinism
For all their powers, current LLMs like Mythos are not magic one-click solutions, and they come with serious limitations. A key issue is non-determinism: the AI’s analysis process involves a degree of randomness and trial-and-error. It won’t find every vulnerability on the first try, and each run can yield different results. Anthropic’s own red-team evaluation revealed that they had to execute Mythos in an iterative loop with different “agentic” strategies – in one case running 1,000 independent attempts (costing nearly $20,000 in cloud compute) to gradually uncover several dozen bugs in a target component. The specific run that hit the jackpot (the OpenBSD crash) cost less than $50 in AI time, but they could not predict in advance which run would succeed. This highlight of Mythos’s process underscores the point: using an LLM to find bugs is essentially a guided search, not a guaranteed outcome. In practice, this means an AI might miss a vulnerability on one pass and catch it on another – you don’t know if you’ve found everything without running it many times or combining it with other methods. That’s a stark contrast to traditional static analysis tools, which will deterministically flag the same issues every time (even if they sometimes flag too much).
Lost in the Firmware Maze: AI’s Context Conundrum
Another challenge for AI-driven analysis is scale and context. Firmware is often a massive puzzle – thousands of pieces (modules and libraries) that create a big picture when assembled. But an LLM can only look at a handful of pieces at once due to its fixed context window. This means if you feed the AI one chunk at a time, it might miss how pieces from different parts connect to reveal a vulnerability – it could lose the forest for the trees. Some advanced setups might try a multi-agent approach, with several AI “workers” each examining different sections and sharing notes, to cover a whole firmware. Yet building that kind of coordinated harness is no trivial feat: it needs custom scripting, simulated environments, and expert prompt tuning. Not every team can invest that effort for each release. So while AI can turbocharge analysis, scaling it up to a jumbo-sized firmware without missing context remains a tricky puzzle to solve.
Breaking the Bank and the Clock: AI’s Costly Time Crunch
Finally, these models can be expensive and impractical to use continuously. Even if the cost per run comes down, the compute is slow and step-by-step nature of LLM reasoning makes it hard to integrate into a fast-paced development pipeline. You likely can’t run a several hour long AI analysis on every single nightly firmware build and still release on schedule. And if you did, you might get different findings from one day to the next due to the non-determinism.
No Silver Bullet: Why AI Can’t Secure Firmware Alone
All of this matters because it highlights a reality: LLMs are powerful tools, but they are not a standalone solution for firmware security. LLM-based code analysis and traditional static analysis augment each other’s strengths instead of standing in for one another. The AI brings creative, context-aware discovery, while classical tools bring thoroughness, consistency, and scale. To protect real products, we need to leverage the strengths of AI and compensate for its weaknesses.
Why Purpose-Built Tools Matter More Than Ever
This is where a specialized firmware security platform like ONEKEY becomes invaluable in the age of AI, as it takes a fundamentally deterministic and scalable approach to firmware analysis – exactly what current LLM methods lack. It automatically dissects a device’s firmware to produce a complete Software Bill of Materials (SBOM) and then cross-references each identified component against known vulnerability databases (like the CVE database) and scans the firmware’s binaries with advanced static analysis techniques to detect both known and unknown vulnerabilities and weaknesses. If a vulnerable library or configuration is present, a traditional static analysis platform will consistently detect it through binary-level scanning and matching, regardless of how large or complex the firmware is. There’s no randomness in the results: the same firmware will yield the same findings every time, which is critical for maintaining a reliable security baseline. And unlike an LLM, a traditional tool doesn’t skip over anything, it systematically covers the entire firmware image, giving security teams full visibility into what’s inside their devices. Armed with that visibility, the platform deterministically finds all the known vulnerabilities affecting those components, and does so with context. In fact, ONEKEY is designed to cut through the noise that plagues traditional scanning. Embedded developers often get bombarded with massive lists of CVEs from generic tools – many of which turn out to be irrelevant or already fixed in their firmware. That trap can be avoided by doing version-aware, binary-specific analysis: it shows you exactly which vulnerabilities matter and why, filtering out false positives. In practice, this means if your firmware includes (for example) OpenSSL 1.1.1, ONEKEY will tell you if that exact build of OpenSSL has any known CVEs, and even whether the vulnerable function is likely used – so you can prioritize the fix appropriately.
Securing Every Build: CI Integration & Continuous Monitoring
Equally important, ONEKEY brings automation and integration that fit into real development workflows. It can be integrated into CI/CD pipelines or dev toolchains to scan each new firmware build automatically, ensuring that known flaws are caught before a release goes out. The platform continuously monitors your firmware’s security over time: once you’ve scanned a firmware version, you will be alerted in the future if new vulnerabilities emerge that affect it, thanks to the always-updated threat intelligence. This kind of proactive monitoring is crucial in the new fast-paced threat landscape – when Mythos or others reveal a batch of new vulnerabilities, you want to immediately know, do any of our products use those affected components? Our platform provides that answer at a glance, because it maintains an indexed inventory of all components across your products and through continuous monitoring and reanalysis, can instantly flag any newly disclosed risks. In an era when patches need to be developed and deployed within hours, having this single source of truth for firmware vulnerabilities is a lifesaver. It means teams don’t have to scramble manually diffing SBOMs or scanning code each time news of a vulnerability breaks – the platform has done the heavy lifting in advance, so you can focus on remediation rather than detection.
ONEKEY’s Clever Use of LLMs
Notably, adopting ONEKEY’s deterministic approach doesn’t mean turning away from AI – in fact, the platform integrates LLM capabilities intelligently where they add value, without compromising reliability. ONEKEY uses AI as a supportive sidekick in two main ways:
- Enhanced Firmware Analysis: If a firmware contains an unknown or obfuscated component, Onekey can invoke an LLM to help figure it out. For example, an AI might analyze patterns (strings, symbols, file headers) in a mysterious binary and infer what library or module it is – essentially “recognizing” the software in a human-like way. This augments ONEKEY’s standard binary fingerprinting and allows it to identify more components accurately. Similarly an LLM is used to pinpoint the likely location of known vulnerabilities – by reading CVE descriptions and references, which might mention a function name or file and then this information is stored in our vulnerability database. When Onekey flags a vulnerable component, it can be cross-checked whether the specific vulnerable function or file is actually present in the firmware by searching for names or code patterns. If we don’t find that vulnerable code, Onekey knows the issue is likely not applicable to this firmware (a false positive) and can safely filter it out. This double-check powered by AI helps eliminate noise and ensures that reported vulnerabilities are truly relevant to the firmware being analyzed.
- AI-Assisted Triage & Guidance: Once potential issues are identified, AI helps make the findings more understandable and actionable. An LLM can triage and summarize results by grouping similar vulnerabilities together, classifying them by root cause or CWE, and even providing a plain-language explanation for each issue with suggested fixes. Instead of wading through raw data, a developer might see an AI-curated note like, “Buffer Overflow in Wi-Fi Module – found in the Wi-Fi driver’s beacon packet parser. An attacker could exploit this by sending a malformed beacon frame, causing an overflow. Recommend adding length checks before processing the packet.” This context-rich guidance, generated by the AI, makes remediation faster and clearer. ONEKEY is even building a chat-based AI assistant into the platform, so engineers can query the scan results conversationally – asking things like “How severe is this bug?” or “Show me all issues related to encryption,” and getting instant, easy-to-digest answers. And looking ahead, the research team is developing additional AI agents to assist in uncovering new vulnerabilities in a safe, repeatable way.
In all these cases, ONEKEY uses AI as a booster rather than a replacement. The platform’s core analysis remains consistent and deterministic – you get the same thorough SBOM and known-vulnerability results every time – while AI is woven in as a smart assistant. All AI contributions operate under Onekey’s governance and guardrails (using techniques to minimize hallucinations), ensuring the final output stays comprehensive and reliable. Importantly, the structured data Onekey produces (extracted components, identified issues, CVE context, etc.) is a great feed for any further AI-driven analysis or summarization. In other words, Onekey builds a rock-solid foundation of facts, on top of which AI can safely provide creative insight – giving you the best of both worlds for firmware security.
Conclusion
Think of your firmware security strategy as a fortress. ONEKEY is the solid stone wall and well-engineered foundations of that fortress – tried-and-true, built to withstand known attacks and steadily guard every possible entry point. LLMs like Mythos are more like a brilliant new surveillance system or a roving sentinel: they can scout beyond the walls and spot hidden dangers or clever saboteurs that were previously unseen. The sentinel can alert you to novel threats, but it’s the sturdy walls (and the disciplined guards on them) that actually keep the invaders at bay consistently. You wouldn’t build your entire defense out of experimental, non-deterministic tech, but you would use it to reinforce and enhance your stronghold. In the same way, LLMs are an amazing support tool for firmware analysis – they can uncover things we’ve missed and speed up discovery – but they are unpredictable and expensive, so they won’t replace dedicated security platforms. Instead, they augment them. ONEKEY exemplifies this balance: it provides a rock-solid framework for firmware vulnerability management, while embracing AI assistance in areas where it adds value. The result is a future-ready approach to firmware security that combines reliability with innovation.
Beyond the hype, the reality is that AI will make finding vulnerabilities easier – potentially flooding organizations with more issues to fix – but it also provides opportunities to improve defenses. The key is to harness LLMs responsibly: let them shine in analysis and creativity, but lean on deterministic platforms to handle the scale, automation, and assurance that no critical known issue slips through the cracks. Together, they make a formidable duo. In an age of Mythos and machine-speed exploits, ONEKEY’s comprehensive firmware analysis ensures you have an unshakable foundation – and when paired with the sharp eyes of AI, it means no vulnerability, old or new, will catch you unprepared.
References
Über Onekey
ONEKEY ist der führende europäische Spezialist für Product Cybersecurity & Compliance Management und Teil des Anlageportfolios von PricewaterhouseCoopers Deutschland (PwC). Die einzigartige Kombination der automatisierten ONEKEY Product Cybersecurity & Compliance Platform (OCP) mit Expertenwissen und Beratungsdiensten bietet schnelle und umfassende Analyse-, Support- und Verwaltungsfunktionen zur Verbesserung der Produktsicherheit und -konformität — vom Kauf über das Design, die Entwicklung, die Produktion bis hin zum Ende des Produktlebenszyklus.

KONTAKT:
Sara Fortmann
Senior Marketing Manager
sara.fortmann@onekey.com
euromarcom public relations GmbH
team@euromarcom.de
Bereit zur automatisierung ihrer Cybersicherheit & Compliance?
Machen Sie Cybersicherheit und Compliance mit ONEKEY effizient und effektiv.



