Meta’s Omnilingual ASR uses LLM tech to transcribe 1,600+ languages

Meta is open-sourcing a massive new automatic speech recognition (ASR) system that dramatically expands the number of languages AI can understand, moving far beyond the handful of high-resource languages that dominate today’s technology. The new suite of models, called Meta Omnilingual ASR, provides transcription for over 1,600 languages, including 500 that Meta says have never been transcribed by an AI system before.

The release, announced today by Meta’s Fundamental AI Research (FAIR) team, isn’t just about scale. It represents a fundamental shift in how ASR models are built and expanded, borrowing a key capability from the world of large language models: in-context learning.

From Big Data to a Few Samples

The secret sauce is a new 7-billion-parameter model dubbed LLM-ASR. It combines a scaled-up version of Meta’s `wav2vec 2.0` speech encoder with a transformer decoder, an architecture common in LLMs like GPT. This allows the system to do something most ASR models can’t: learn a new, unsupported language on the fly from just a handful of audio-text examples provided by a user.

This “bring your own language” capability is a game-changer. Previously, adding a new language to an ASR system required enormous, expertly curated datasets and significant computing power for fine-tuning, a process inaccessible to most language communities. With Omnilingual ASR, a speaker can theoretically provide a few samples and get usable transcriptions, drastically lowering the barrier to entry for digitally underrepresented languages. According to Meta, its 7B model achieves character error rates below 10 percent for nearly 80 percent of the languages it supports.

As part of the open-source release, Meta is providing a family of models ranging from a lightweight 300M version for on-device use to the powerful 7B model. It’s also releasing the `Omnilingual ASR Corpus`, a new dataset of transcribed speech in 350 underserved languages, created in partnership with organizations like Mozilla’s Common Voice and Lanfrica. All models are released under a permissive Apache 2.0 license, empowering developers to build on the technology for their own use cases.

From Big Data to a Few Samples

Meta’s Omnilingual ASR uses LLM tech to transcribe 1,600+ languages

From Big Data to a Few Samples

AI Daily Digest

Meta’s Omnilingual ASR uses LLM tech to transcribe 1,600+ languages

From Big Data to a Few Samples

AI Daily Digest