UK Sovereign AI Boosts Welsh Language, Public Services

The United Kingdom is making a significant stride in its national AI efforts with a new UK Sovereign AI initiative focused on preserving and empowering its oldest living languages. A new AI model, developed by the UK-LLM project, University College London, Bangor University, and NVIDIA, is now capable of reasoning in both English and Welsh. This development aims to make public services more accessible to Welsh speakers.

This British AI development is built on NVIDIA Nemotron and trained on the UK's most powerful supercomputer, Isambard-AI, located in Bristol. Prime Minister Keir Starmer emphasized the importance of this work, stating it ensures public services like healthcare and education are accessible to everyone in their native language. This is a powerful example of how advanced AI technology can serve the public good and protect cultural heritage.

The UK-LLM project, initially established as BritLLM in 2023, has previously released two models for UK languages. Its latest model for Welsh directly supports the Welsh government's Cymraeg 2050 initiative, which aims for a million Welsh speakers by 2050. Nscale, a UK-based AI cloud provider, will make this new model available to developers via its API.

Gruffudd Prys, head of the Language Technologies Unit at Bangor University's Canolfan Bedwyr, highlighted the goal of keeping Welsh a "living, breathing language." He noted AI's potential to aid second-language acquisition and improve native speakers' skills. This public service AI can also boost accessibility for institutions and businesses in Wales, offering translation and bilingual chatbot services.

Beyond Welsh, the UK-LLM team plans to extend this methodology to other UK languages such as Cornish, Irish, Scots, and Scottish Gaelic. Furthermore, they intend to collaborate internationally on models for languages from Africa and Southeast Asia. Pontus Stenetorp, professor of natural language processing at UCL, noted the collaboration with NVIDIA and Bangor University accelerated the creation of the best-ever language model for Welsh.

Powering National AI UK with Advanced Infrastructure

The new model for Welsh leverages NVIDIA Nemotron, an open-source family of models. The UK-LLM team utilized the 49-billion-parameter Llama Nemotron Super and 9-billion-parameter Nemotron Nano models, post-training them with Welsh-language data. Creating a sufficiently large Welsh dataset was a challenge due to limited existing resources.

To overcome this, the team used NVIDIA NIM microservices with gpt-oss-120b and DeepSeek-R1 to translate over 30 million entries from English to Welsh. They harnessed a GPU cluster via NVIDIA DGX Cloud Lepton and hundreds of NVIDIA GH200 Grace Hopper Superchips on Isambard-AI. This UK AI infrastructure, backed by £225 million in government investment, significantly accelerated their translation and training workloads.

Bangor University, situated in Gwynedd, a region with a high percentage of Welsh speakers, provided crucial linguistic and cultural expertise. Gruffudd Prys and his team verified machine-translated data and assessed the model's ability to handle complex Welsh nuances, like consonant mutations. This careful evaluation ensures the model's accuracy and practical utility.

The model, along with its Welsh training and evaluation datasets, will be openly available for enterprise and public sector use. This open approach is critical, as Prys explained, making the difference between the technology being used or not. This UK Sovereign AI framework can serve as a blueprint for multilingual AI development globally.

UK Sovereign AI Boosts Welsh Language, Public Services

Powering National AI UK with Advanced Infrastructure

AI Daily Digest