The current paradigm for training robotic manipulation policies relies heavily on simulation, but existing platforms fall short by neglecting object affordance information. This oversight prevents the automatic generation of semantically correct trajectories for tasks requiring precise interaction with functional regions, such as grasping a mug by its handle or pouring from a cup's rim. To address this limitation, researchers have introduced AffordSim, the first simulation framework to integrate open-vocabulary 3D affordance prediction into manipulation data generation.
Unlocking Semantically Rich Trajectories with VoxAfford
AffordSim leverages its proprietary VoxAfford model, an open-vocabulary 3D affordance detector. VoxAfford enhances MLLM output tokens with multi-scale geometric features to predict affordance maps on object point clouds. This capability directly guides grasp pose estimation toward task-relevant functional regions, a crucial step for enabling more intelligent robotic behaviors. The framework is built on NVIDIA Isaac Sim, supporting cross-embodiment configurations (Franka FR3, Panda, UR5e, Kinova), and incorporates VLM-powered task generation. Novel domain randomization techniques, utilizing DA3-based 3D Gaussian reconstruction from real photographs, further enhance the realism and transferability of the generated data. The integration of AffordSim robotics into this pipeline marks a significant advancement.
Benchmarking Affordance-Demanding Manipulation Tasks
The researchers established a benchmark comprising 50 tasks across seven categories: grasping, placing, stacking, pushing/pulling, pouring, mug hanging, and long-horizon composite tasks. Evaluation of four imitation learning baselines (BC, Diffusion Policy, ACT, Pi 0.5) revealed that while grasping tasks show high success rates (53-93%), tasks requiring specific affordances, such as pouring into narrow containers (1-43%) and mug hanging (0-47%), remain substantially challenging for current methods. This underscores the critical need for affordance-aware data generation, a core capability of AffordSim robotics. Zero-shot sim-to-real experiments on a Franka FR3 robot confirmed the successful transferability of the generated data, validating the framework's efficacy.