PhyCo: Bridging Physics and Video Generation

PhyCo introduces a framework for physically consistent and controllable video generation, overcoming limitations of current diffusion models through physics-supervised fine-tuning and VLM-guided rewards.

Diagram illustrating the PhyCo framework for physically consistent video generation
Conceptual overview of the PhyCo framework, highlighting its key components for physics-grounded video generation.

Modern video diffusion models, while adept at visual synthesis, falter in capturing the nuances of physical interactions. Objects drift unrealistically, collisions lack proper rebound physics, and material responses often defy physical laws. This gap limits their applicability in scenarios demanding verisimilitude.

Introducing PhyCo: Continuous, Grounded Physical Control

The PhyCo framework, detailed on its project page, addresses this critical limitation by injecting continuous, interpretable, and physically grounded control into video generation. This is achieved through a multi-pronged approach that leverages a novel dataset and innovative training methodologies. The researchers present a large-scale dataset comprising over 100,000 photorealistic simulation videos, systematically varying parameters like friction, restitution, deformation, and force across diverse scenarios. This dataset forms the bedrock for training models to understand and replicate physical behaviors.

Related startups

Physics-Supervised Fine-Tuning and VLM-Guided Optimization

At the core of PhyCo is a physics-supervised fine-tuning process. A pre-trained diffusion model is enhanced using a ControlNet conditioned on pixel-aligned physical property maps. This integration allows the model to directly incorporate physical properties into its generation process. Furthermore, VLM-guided reward optimization is employed, where a fine-tuned vision-language model assesses generated videos based on targeted physics queries. This provides differentiable feedback, enabling the generative model to iteratively improve its physical realism. Crucially, this method enables PhyCo video generation to produce physically consistent and controllable outputs by varying physical attributes, all without requiring an explicit simulator or geometry reconstruction during inference.

Beyond Synthetic Data: Scalable, Generalizable Video Generation

The impact of PhyCo is demonstrated through its performance on the Physics-IQ benchmark, where it significantly surpasses existing strong baselines in physical realism. Human studies further validate the framework, confirming clearer and more faithful control over physical attributes. This work signals a scalable pathway toward generative video models that not only achieve physical consistency but also generalize effectively to real-world scenarios, moving beyond the limitations of purely synthetic training environments. The advancement in PhyCo video generation represents a significant step toward more believable and controllable AI-generated content.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.