Salesforce AI Research and UC Berkeley have unveiled BFCL Audio, a new benchmark designed to rigorously evaluate the precision of AI models in handling audio-native function calls. In an announcement on its blog, the collaboration detailed how this extension of the existing BFCL framework addresses critical challenges in real-world voice interactions, particularly for enterprise applications where accuracy is paramount.
The initiative stems from a recognized gap in evaluating AI models' ability to reliably execute zero-shot function calls, a problem the team first tackled in 2022 with the Gorilla OpenFunctions models. The original BFCL benchmark evolved through several versions, from AST-based evaluation to multi-turn and agentic settings, becoming a foundational tool for text-based function calling. BFCL Audio now extends this crucial evaluation to the voice domain, acknowledging that real-world products rarely operate in pure text.
