Superhuman, the productivity platform known for its email client and Coda, has partnered with Databricks to scale its AI-powered writing assistance to an impressive 200,000 queries per second (QPS). This significant achievement was detailed in a recent Databricks blog post, highlighting how the companies jointly engineered a high-throughput, low-latency AI serving platform.
The collaboration focused on modernizing Superhuman's inference stack, which handles real-time suggestions for correctness, clarity, tone, and style. Previously, the company relied on a custom vLLM stack, which, while capable of massive scale, presented operational challenges and required significant manual tuning for each new model iteration.
Modernizing the Serving Stack
Superhuman's core AI model, responsible for grammatical error correction at peak traffic exceeding 200,000 QPS, was pushing the limits of its existing infrastructure. The need for a platform partner committed to performance and latency Service Level Objectives (SLOs) became paramount.