NVIDIA has announced a significant push into multilingual AI, releasing a new open dataset called Granary and two accompanying models, Canary-1b-v2 and Parakeet-tdt-0.6b-v3. This initiative aims to address the critical lack of AI support for the vast majority of the world's languages, specifically targeting high-quality speech recognition and translation across 25 European languages, including those with historically limited data like Croatian, Estonian, and Maltese. The tools are designed to empower developers to build scalable AI applications for global users, facilitating advanced features in areas like multilingual chatbots, customer service, and real-time translation.
In an announcement on its blog, NVIDIA detailed Granary as a massive, open-source corpus comprising approximately one million hours of audio. This includes nearly 650,000 hours dedicated to speech recognition and over 350,000 hours for speech translation. The dataset, along with the new Canary and Parakeet models, is now publicly available on Hugging Face, with a research paper on Granary slated for presentation at the Interspeech conference in August.
