Microsoft AI (MAI) has announced its first two internally developed AI models, marking a significant milestone in the company's strategy to build proprietary foundation models rather than relying solely on external partnerships like OpenAI.
The New Models
MAI-Voice-1 is Microsoft's debut speech generation model, designed for high-fidelity, expressive audio output. The model can generate a full minute of audio in under a second on a single GPU, positioning it as one of the most efficient speech synthesis systems currently available. The model is already integrated into Microsoft's Copilot Daily and Podcasts features, with additional testing available through Copilot Labs.
MAI-1-preview represents Microsoft's first end-to-end trained foundation model, built using a mixture-of-experts architecture. The model was trained on approximately 15,000 NVIDIA H100 GPUs and is currently undergoing public evaluation on LMArena, a community-driven model testing platform. Microsoft plans to gradually integrate MAI-1-preview into Copilot for text-based use cases over the coming weeks.
