A new challenger has entered the AI voice arena, and it’s making some bold claims. Fish Audio, from Hanabi AI Inc., today publicly launched its S1 model, touting it as the "most expressive and natural TTS model on the market." The company isn't just talking a big game; it's directly targeting industry leader ElevenLabs with a price point that's a staggering 6x cheaper.
The announcement, made by Helena (@hehe6z) on X, highlights significant traction for the nascent platform. Fish Audio already boasts 20,000 active developers and a reported $5 million in annual recurring revenue (ARR), indicating substantial early adoption. This isn't a small startup whisper; it's a platform with growing momentum, aiming to democratize access to high-fidelity AI-generated speech and potentially redefine what constitutes the best AI voice.
At the core of Fish Audio S1's appeal is its promise of nuanced, emotionally rich voice generation. Users can clone their own voice for free with just 10 seconds of audio, a feature particularly attractive to content creators. One early user, a YouTuber, noted how they could "patch audio seamlessly" with their cloned voice, calling the results "scary well." This capability extends beyond simple voice replication, offering granular emotion control for everything from dynamic video voiceovers and immersive audiobooks to expressive character voices for games and empathetic conversational chatbots. The platform also supports over 30 languages, ensuring global reach for its expressive capabilities.
