Superhuman, the productivity platform known for its email client and Coda, has partnered with Databricks to scale its AI-powered writing assistance to an impressive 200,000 queries per second (QPS). This significant achievement was detailed in a recent Databricks blog post, highlighting how the companies jointly engineered a high-throughput, low-latency AI serving platform.
The collaboration focused on modernizing Superhuman's inference stack, which handles real-time suggestions for correctness, clarity, tone, and style. Previously, the company relied on a custom vLLM stack, which, while capable of massive scale, presented operational challenges and required significant manual tuning for each new model iteration.