Modern video diffusion models, while adept at visual synthesis, falter in capturing the nuances of physical interactions. Objects drift unrealistically, collisions lack proper rebound physics, and material responses often defy physical laws. This gap limits their applicability in scenarios demanding verisimilitude.
Introducing PhyCo: Continuous, Grounded Physical Control
The PhyCo framework, detailed on its project page, addresses this critical limitation by injecting continuous, interpretable, and physically grounded control into video generation. This is achieved through a multi-pronged approach that leverages a novel dataset and innovative training methodologies. The researchers present a large-scale dataset comprising over 100,000 photorealistic simulation videos, systematically varying parameters like friction, restitution, deformation, and force across diverse scenarios. This dataset forms the bedrock for training models to understand and replicate physical behaviors.