The dominant paradigm in AI today is language and code. But a new wave of progress is extending frontier AI into the physical world, marking an emerging paradigm for physical AI. This shift is fueled by advancements in robot learning, autonomous science, and novel human-computer interfaces, according to analysis from a16z Blog.
These fields are maturing concurrently, with talent, capital, and founder activity on the rise. The pace of progress suggests these areas could soon enter their own scaling regime, inheriting infrastructure and research momentum from current AI frontiers while requiring significant new development.
Three domains fit this description: robot learning, autonomous science (particularly in materials and life sciences), and new human-machine interfaces. These areas are not isolated; they share foundational technical primitives and are mutually reinforcing.
Foundational Primitives
Several core technologies underpin this expansion into the physical world.
Learned Representations of Physical Dynamics
The ability to learn compressed models of physical behavior—how objects move, deform, and collide—is crucial. Vision-Language-Action (VLA) models extend pre-trained vision-language models with action decoders. World Action Models (WAMs) build on video diffusion transformers to learn physical priors. Generalist's GEN-1 takes a different approach, training a native embodied foundation model from scratch on real-world physical interaction data.
Spatial intelligence models are also vital, helping to reconstruct and reason about the 3D structure of physical environments, a gap that VLAs and WAMs currently have. Convergence across these approaches aims to create transferable models of physical behavior.
Architectures for Embodied Action
Translating physical understanding into reliable action requires architectures that map intent to motor commands, maintain coherence over long horizons, and operate within real-time constraints. A dual-system hierarchical architecture, separating reasoning from real-time control, is emerging as a standard design pattern.
Action generation is evolving rapidly, with flow matching and diffusion-based methods producing smoother, high-frequency continuous actions. A significant development is extending reinforcement learning to pre-trained VLAs, allowing foundation models to improve through autonomous practice and self-correction.
Simulation and Synthetic Data
The data challenge in physical AI is immense. Simulation and synthetic data generation are key infrastructure components that overcome the cost and limitations of real-world data collection. This modern simulation stack combines physics engines, photorealistic rendering, and world foundation models.
Improvements in simulation are changing the economics of physical AI, making it scale with compute rather than human labor. This infrastructure is also crucial for autonomous science and new interfaces.
Expanding the Sensory Manifold
The physical world offers richer signals than just vision and language. Touch, neural signals, and subvocal muscle activity provide critical data. The expansion of AI's sensory access to these modalities is driven by new devices and software infrastructure for capturing and processing these signals.
