A new benchmark from Quesma, a company specializing in AI model evaluation, highlights the nascent capabilities of artificial intelligence in safeguarding software supply chains. The BinaryAudit tool, developed with input from reverse engineering expert Michał "Redford" Kowalczyk, tests AI's ability to find hidden threats within software binaries.
AI's Limited Success in Detecting Threats
The results indicate that while AI can identify some malicious code, its effectiveness is currently limited. The top-performing model, Claude Opus 4.6, achieved only a 49% success rate in detecting threats. Furthermore, these advanced AI models often flagged legitimate software as dangerous, a common issue in early AI security applications.
The Urgency of Supply Chain Security
Software supply chain attacks pose a significant and growing threat. Recent incidents include state-sponsored actors compromising widely used software like Notepad++ and the Shai Hulud 2.0 attack that affected thousands of organizations. The XZ Utils backdoor, where a contributor inserted malicious code over time, also underscores the vulnerability of trusted software sources. These attacks originate not only from external actors but also from compromised vendors or even inherent weaknesses like manufacturer-inserted code.
Transforming Security from Reactive to Proactive
Traditional binary reverse engineering is a specialized, time-consuming process typically employed only after a security incident. AI offers the potential to shift this paradigm, enabling proactive inspection of software at various stages—before deployment, during updates, or even years after release. This could fundamentally change how organizations approach AI supply chain security, moving from incident response to continuous prevention.
AI as an Assistant, Not a Solution
Jacek Migdał, CEO of Quesma, noted that the ability of current large language models to detect malicious code at all was surprising. "At current performance levels, it’s an assistant, not a solution," Migdał stated. He expressed hope that future AI models will mature enough to make binary analysis mainstream, with BinaryAudit serving to track progress in this crucial area.
BinaryAudit is publicly available at https://quesma.com/benchmarks/binaryaudit/



