Many essential manipulation tasks, from food preparation to surgery and craftsmanship, remain beyond the reach of autonomous robots. These tasks are difficult not only because they involve contact-rich, force-sensitive dynamics but also due to their inherently "implicit" success criteria. Unlike simple pick-and-place operations, the quality of these tasks is often continuous and subjective, making quantitative evaluation and reward engineering a significant challenge. This research presents a learning framework designed to tackle these complex manipulation domains, using peeling with a knife as a representative example. The work is detailed in a paper available on arXiv.
What the Researchers Did
The proposed approach follows a two-stage pipeline. Initially, the system learns a robust policy through force-aware data collection and imitation learning. This initial phase allows the robot to generalize across variations in objects. Following this, the policy is refined using preference-based finetuning. This refinement process leverages a learned reward model that integrates both quantitative task metrics and qualitative human feedback, thereby aligning the robot's behavior with human perceptions of task quality. This method is particularly relevant for tasks requiring nuanced control and understanding of subtle feedback, a key aspect of robotics force-sensitive manipulation.