A new benchmark from Quesma, a company specializing in AI model evaluation, highlights the nascent capabilities of artificial intelligence in safeguarding software supply chains. The BinaryAudit tool, developed with input from reverse engineering expert Michał "Redford" Kowalczyk, tests AI's ability to find hidden threats within software binaries.
AI's Limited Success in Detecting Threats
The results indicate that while AI can identify some malicious code, its effectiveness is currently limited. The top-performing model, Claude Opus 4.6, achieved only a 49% success rate in detecting threats. Furthermore, these advanced AI models often flagged legitimate software as dangerous, a common issue in early AI security applications.
