Microsoft Research has unveiled AsgardBench, a new benchmark designed to rigorously test the ability of AI agents to plan and adapt tasks based on visual input. This development addresses a critical gap in evaluating embodied AI, which requires agents to interact with and understand their environment.
Unlike previous benchmarks that often bundle perception, navigation, and control, AsgardBench isolates the crucial aspect of visually grounded interactive planning. It challenges AI agents to adjust their actions in simulated household tasks when visual observations contradict their initial assumptions. This is vital for creating robots and AI systems capable of navigating the unpredictable real world.
