The world of artificial intelligence is rapidly evolving beyond mere text comprehension, moving towards agents capable of actively interacting with software. A new benchmark, SCUBA (Salesforce Computer Use Benchmark), marks a significant leap in evaluating these agentic AI systems, specifically within the complex environment of enterprise software. This development signals a critical shift: the focus is now on how well AI can truly operate business applications, not just understand them. For Salesforce Agentic AI, this benchmark is poised to redefine what's possible in automation.
SCUBA is meticulously crafted around the actual workflows inherent to the Salesforce platform, a departure from more generalized AI benchmarks. It encompasses over 300 task instances, all derived from extensive interviews with real users, including platform administrators, sales representatives, and service agents. This isn't about simple question-answering; SCUBA rigorously tests an agent's ability to navigate user interfaces, manipulate data, trigger complex workflows, and even troubleshoot issues within a live enterprise setting. It addresses a long-standing gap in AI evaluation, moving beyond basic web navigation to the nuanced demands of business-critical software interaction.
