Mira Murati's Interaction Model: Full-Duplex AI at 0.4 Seconds

Thinking Machines Lab's TML-Interaction-Small processes audio and video in 200ms chunks, responds in 0.4 seconds, and runs full-duplex without a VAD harness. Here is how the architecture works and what Murati said at Bloomberg Tech.

Jun 19 at 8:00 AM6 min read

Mira Murati, interaction models technical contribution, 2026 — Mira Murati at the 2026 Met Gala.· Photo by SWinxy, via Wikimedia Commons (CC BY 4.0)

On May 11, 2026, Thinking Machines Lab released TML-Interaction-Small in research preview, a 276-billion-parameter model that processes audio, video, and text in continuous 200-millisecond chunks and returns a response in 0.40 seconds, matching the typical gap between human conversational turns, according to the company’s technical disclosure reported by MarkTechPost on May 13, 2026.

Related startups

Mira Murati served as chief technology officer at OpenAI until September 2024, then co-founded Thinking Machines Lab in February 2025 with chief scientist John Schulman and four other OpenAI alumni. The lab secured a $2 billion seed round at a $12 billion valuation in July 2025, the largest seed round on record per TechCrunch, and TML-Interaction-Small is its first published model.

A New Category: What “Interaction Models” Actually Are

“Interaction models” is the term Thinking Machines Lab uses for systems built from scratch to treat real-time conversation as a first architectural principle, not a capability retrofitted onto a text-based foundation. The distinction matters because most voice AI deployed today is exactly that: a large language model with a speech-to-text front end and a text-to-speech back end, coordinated by voice-activity detection software. Murati’s team discarded the VAD layer entirely.

TML-Interaction-Small is a mixture-of-experts model with 276 billion total parameters, of which 12 billion are active at inference time. The model processes incoming audio, video, and text as a continuous stream divided into 200-millisecond chunks. Each chunk is processed as it arrives rather than after the user finishes speaking. Semafor reported on May 13, 2026 that the model reached 0.40 seconds of end-to-end response latency, roughly equivalent to the beat a speaker leaves before replying in ordinary conversation.

The MoE design keeps compute costs tractable despite the large total parameter count. At 12 billion active parameters per forward pass, TML-Interaction-Small sits in a similar inference-compute tier to other deployed mixture-of-experts systems, while the 276 billion total parameters provide the representational breadth that continuous multimodal processing demands.

Horizontal bar chart comparing TML-Interaction-Small total parameters (276B) to active parameters (12B) — TML-Interaction-Small parameter scale: 276B total, 12B active via mixture-of-experts routing. Source: MarkTechPost, May 2026.

How TML Differs From Turn-Based Voice Models

The operational gap between TML-Interaction-Small and most deployed voice AI is the distinction between full-duplex and half-duplex conversation. Standard voice models operate on a turn-completion model: the system waits for the user to stop speaking, then processes the full utterance and generates a reply. Interrupting the model mid-response typically forces a restart of the generation cycle, adding several hundred milliseconds to the next exchange.

TML-Interaction-Small is full-duplex. The interaction model listens and generates simultaneously rather than alternating. If a user cuts into a response mid-sentence, the system can acknowledge and redirect without resetting the full generation cycle. The AI Insider reported on May 12, 2026 that Murati’s team describes this as “micro-turn” processing, where turn boundaries are computed continuously rather than detected at the end of an utterance.

The internal architecture runs two components in parallel. A lightweight interaction model runs persistently in the foreground, tracking all audio, video, and text streams and maintaining conversational state. A heavier background model handles longer-horizon reasoning and tool use asynchronously, sharing full conversation context with the foreground model throughout. The background component offloads complex reasoning without introducing the latency penalty that would come from routing every exchange through a full reasoning pass.

Bar chart showing audio chunk size of 200ms and response latency of 400ms for TML-Interaction-Small — TML-Interaction-Small timing pipeline: 200ms processing chunks, 0.40s (400ms) total response latency. Source: MarkTechPost, Semafor, May 2026.

Bloomberg Tech, the Nvidia Deal, and What Comes Next

On June 4, 2026, Murati made her first major public appearance in roughly 18 months, speaking at Bloomberg Technology’s annual conference in San Francisco. In an interview with Bloomberg’s Emily Chang, she described the human-AI relationship using a tandem bicycle analogy, arguing that both parties must steer and contribute momentum together rather than one directing while the other reacts, per TechCrunch’s June 4 report.

Three months earlier, in March 2026, Nvidia announced a multiyear chip supply agreement with Thinking Machines, committing to provide Vera Rubin accelerators as production capacity comes online. The deal is supply-chain insurance: without a direct GPU allocation from Nvidia, a lab operating at TML’s scale would face compute constraints that could slow the move from research preview to general availability. Bloomberg’s June 4 interview noted that funding discussions pegging TML at up to $50 billion in valuation were active, consistent with a Bloomberg report from November 13, 2025 that first surfaced that figure.

Access to TML-Interaction-Small remains limited to a curated group of research partners as of June 2026, with no public launch date announced. The lab has not published benchmark comparisons against competing voice models, making independent latency verification difficult until broader access is available.

Line chart showing Thinking Machines Lab valuation rising from $12B at seed round in July 2025 to $50B in funding discussions by November 2025 — Thinking Machines Lab valuation arc: $12B at July 2025 seed round; in discussions at up to $50B as of November 2025. Sources: TechCrunch July 2025, Bloomberg November 2025.

What It Means

Murati’s thesis is that voice AI built on top of text models has a structural ceiling in conversational quality, and that closing that gap requires different architecture rather than better prompting or faster inference hardware. TML-Interaction-Small is the first published evidence that Thinking Machines Lab has an executable version of that thesis. Anthropic and OpenAI are not standing still on real-time voice; the question is whether native multimodal architecture from day one delivers a durable advantage, or whether incremental improvements to existing systems close the gap before TML moves from research preview to general availability.

Sources

Editorial standards: every claim is sourced. Tips: editor@startuphub.ai

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Mira Murati #Thinking Machines Lab #AI models #voice AI #multimodal AI #full-duplex AI #TML-Interaction-Small #mixture of experts