Cloudflare is injecting voice into its AI Agents SDK, aiming to bridge the gap between text-based interactions and natural conversation. For many, AI agents have been confined to chat interfaces, requiring users to master prompt engineering. This new experimental voice pipeline, available via the @cloudflare/voice package, allows agents to converse in real-time over existing WebSocket connections.
The integration means voice becomes just another input modality, managed by the same Durable Object infrastructure that powers the SDK. This approach preserves the agent's state, persistence, and tooling, avoiding the need to migrate to separate voice frameworks.
Voice as a Native Agent Feature
The @cloudflare/voice package offers flexibility with options like withVoice(Agent) for full voice agents and withVoiceInput(Agent) for speech-to-text-only use cases like dictation or voice search. React developers can leverage useVoiceAgent and useVoiceInput hooks, while framework-agnostic clients can use VoiceClient.
