As AI agents increasingly handle tasks like scheduling and purchasing, they need more than just competence; they require social reasoning. Microsoft Research has introduced SocialReasoning-Bench, a new benchmark designed to measure this critical ability.
The benchmark evaluates how well AI agents negotiate on behalf of users in realistic scenarios, specifically Calendar Coordination and Marketplace Negotiation. It assesses both the final outcome and the process employed, scoring agents on 'Outcome Optimality' and 'Due Diligence'.
