PhyCo: Bridging Physics and Video Generation

Modern video diffusion models, while adept at visual synthesis, falter in capturing the nuances of physical interactions. Objects drift unrealistically, collisions lack proper rebound physics, and material responses often defy physical laws. This gap limits their applicability in scenarios demanding verisimilitude.

Introducing PhyCo: Continuous, Grounded Physical Control

The PhyCo framework, detailed on its project page, addresses this critical limitation by injecting continuous, interpretable, and physically grounded control into video generation. This is achieved through a multi-pronged approach that leverages a novel dataset and innovative training methodologies. The researchers present a large-scale dataset comprising over 100,000 photorealistic simulation videos, systematically varying parameters like friction, restitution, deformation, and force across diverse scenarios. This dataset forms the bedrock for training models to understand and replicate physical behaviors.

Physics-Supervised Fine-Tuning and VLM-Guided Optimization

At the core of PhyCo is a physics-supervised fine-tuning process. A pre-trained diffusion model is enhanced using a ControlNet conditioned on pixel-aligned physical property maps. This integration allows the model to directly incorporate physical properties into its generation process. Furthermore, VLM-guided reward optimization is employed, where a fine-tuned vision-language model assesses generated videos based on targeted physics queries. This provides differentiable feedback, enabling the generative model to iteratively improve its physical realism. Crucially, this method enables PhyCo video generation to produce physically consistent and controllable outputs by varying physical attributes, all without requiring an explicit simulator or geometry reconstruction during inference.

Beyond Synthetic Data: Scalable, Generalizable Video Generation

The impact of PhyCo is demonstrated through its performance on the Physics-IQ benchmark, where it significantly surpasses existing strong baselines in physical realism. Human studies further validate the framework, confirming clearer and more faithful control over physical attributes. This work signals a scalable pathway toward generative video models that not only achieve physical consistency but also generalize effectively to real-world scenarios, moving beyond the limitations of purely synthetic training environments. The advancement in PhyCo video generation represents a significant step toward more believable and controllable AI-generated content.

PhyCo: Bridging Physics and Video Generation

Introducing PhyCo: Continuous, Grounded Physical Control

Related startups

Physics-Supervised Fine-Tuning and VLM-Guided Optimization

Beyond Synthetic Data: Scalable, Generalizable Video Generation

AI Daily Digest