The quest for truly intelligent personal AI agents hinges on their ability to move beyond stateless instruction following to deeply understand and reason over a user's unique identity, history, and preferences. Current benchmarks, however, fall short by operating in impersonal sandboxes, failing to reflect the rich, interconnected data residing on a user's device.
Bridging the Personalization Chasm with iOSWorld
To address this critical gap, researchers have introduced iOSWorld, the first interactive, native iOS simulator benchmark. This novel environment is built around a persistent user identity and encompasses 26 newly developed iOS apps. These apps feature interconnected data streams, including transactions, messages, travel records, social connections, and financial activity, creating a realistic simulation of a user's digital life. iOSWorld is structured with 133 tasks across three difficulty tiers: single-app tasks (27), multi-app tasks spanning 2 to 8 apps (60), and memory and personalization tasks requiring inference from personal data (46).