OpenAI is significantly accelerating its AI agent workflows by adopting WebSockets for its Responses API. This move addresses a critical bottleneck: the cumulative latency introduced by traditional, sequential API requests.
Agentic systems, like code completion tools, often involve dozens of back-and-forth API calls to validate actions, process tool outputs, and build context. As AI models become faster, the overhead from these numerous network interactions becomes increasingly apparent, leaving users waiting.
The company detailed these efforts on OpenAI News, explaining how a persistent WebSocket connection replaces the need for repeated HTTP requests. This allows the API to maintain state, cache reusable information like tokenized text, and process model responses more efficiently.
When the API Became the Bottleneck
With the introduction of specialized hardware and faster models like GPT-5.3-Codex-Spark, the inference speed jumped dramatically. However, the traditional API architecture couldn't keep pace, with API service overhead dominating the workflow.