Cloudflare is injecting voice into its AI Agents SDK, aiming to bridge the gap between text-based interactions and natural conversation. For many, AI agents have been confined to chat interfaces, requiring users to master prompt engineering. This new experimental voice pipeline, available via the @cloudflare/voice package, allows agents to converse in real-time over existing WebSocket connections.
The integration means voice becomes just another input modality, managed by the same Durable Object infrastructure that powers the SDK. This approach preserves the agent's state, persistence, and tooling, avoiding the need to migrate to separate voice frameworks.
