• StartupHub.ai
    StartupHub.aiAI Intelligence
Discover
  • Home
  • Search
  • Trending
  • News
Intelligence
  • Market Analysis
  • Comparison
  • Market Map
Workspace
  • Email Validator
  • Pricing
Company
  • About
  • Editorial
  • Terms
  • Privacy
  • v1.0.0
  1. Home
  2. News
  3. Cartesias Sonic 3 Tts Laughs And Emotes At Human Speed
Back to News
Startup news

Cartesia\'s Sonic-3 TTS laughs and emotes at human speed

\n The race to make AI agents sound less like robots and more like humans just got a new front-runner.

S
StartupHub Team
Oct 29, 2025 at 9:19 AM2 min read
Cartesia\'s Sonic-3 TTS laughs and emotes at human speed

The race to make AI agents sound less like robots and more like humans just got a new front-runner. AI startup Cartesia has unveiled Sonic-3, a text-to-speech (TTS) model it claims is the fastest and most emotionally expressive on the market, capable of generating laughter and a full range of emotions in real-time conversations.

For anyone who has suffered through a laggy, monotone call with an automated agent, Cartesia’s claims are significant. The company reports an end-to-end latency of just 190 milliseconds, well below the typical threshold for human conversational response. This speed, combined with the ability to generate non-speech sounds like laughter, aims to eliminate the uncanny, stilted nature of most current voice AI. In demos, the voice can sound palpably excited or even "devastatingly sad," a far cry from the neutral tone of typical assistants.

SSMs: The engine behind the emotion

The key differentiator, according to Cartesia, is its underlying architecture. While most of the industry relies on Transformers, Sonic-3 is built on State Space Models (SSMs). In a post on X, the company explained the difference with a simple analogy: Transformers are like re-watching an entire conversation from the start before speaking each new word, which is computationally intensive. SSMs, by contrast, act more like humans, remembering the "topic and vibe" of a conversation to maintain context without constant reprocessing.

This technical choice, pioneered by Cartesia's co-founders at the Stanford AI Lab, is what enables the model's low latency. The efficiency creates a performance budget that allows for more complex, emotional rendering without sacrificing speed.

Beyond the speed and emotion, Sonic-3 is built for global enterprise use. It supports 42 languages, intelligently handles acronyms, and offers both instant and professional-grade voice cloning. Cartesia is already powering millions of monthly conversations for clients like ServiceNow and Cresta. To back its claims, the company's co-founder issued a bold challenge: if they can't improve a qualified company's existing voice AI, they'll donate $5,000 to a charity of its choice.

#AI
#Cartesia
#Conversational AI
#Launch
#Partnership
#ServiceNow
#State Space Models (SSMs)
#Text-to-Speech (TTS)

AI Daily Digest

Get the most important AI news daily.

GoogleSequoiaOpenAIa16z
+40k readers