Skip to main content

Two New Models from Microsoft AI: MAI-Voice-1 and MAI-1-preview

Created on August 29|Last edited on August 29
Microsoft AI has introduced two new in-house models intended to support its Copilot products and ongoing experiments. MAI-Voice-1 is a speech generation model that focuses on natural and expressive audio, and MAI-1-preview is a large-scale language model trained to handle instruction-following and everyday text queries. The releases reflect Microsoft’s move to develop more of its own core systems while continuing to integrate them into its broader product ecosystem.

MAI-Voice-1: A New Speech Generation System

The first of these models is MAI-Voice-1, a speech generation system focused on producing expressive and natural audio. Voice is being framed as the key interface for future AI companions, and this system is capable of generating a minute of speech in under a second on a single GPU. Already, it is powering features in Copilot Daily and Podcasts, while also being available for experimentation in Copilot Labs. Demos include interactive storytelling and personalized meditations, showing off both efficiency and creative potential.

MAI-1-preview: Microsoft’s Foundation Language Model

Alongside the voice system, Microsoft is previewing its first large-scale foundation model, MAI-1-preview. This mixture-of-experts model was trained and post-trained on around 15,000 NVIDIA H100 GPUs, making it one of the largest training efforts by the company to date. The model is designed to handle a wide range of instruction-following tasks and helpful responses, and it is currently being tested publicly on the evaluation platform LMArena. Over the coming weeks, it will begin appearing in text-based Copilot features, while also being shared with trusted testers through API access.

Scaling Infrastructure and Future Roadmap

The release of these models is supported by significant infrastructure investment, including Microsoft’s new GB200 compute cluster. The company is also highlighting its agile and lean team, encouraging researchers and engineers to join its mission of advancing applied AI. The roadmap suggests further models will follow, both for voice and text, and Microsoft expects to orchestrate a suite of specialized systems tailored to different use cases.

Positioning Within the AI Ecosystem

Microsoft is emphasizing that its approach is not exclusive to in-house development but will also continue to integrate partner and open-source models when beneficial. This blended strategy is meant to provide flexibility and maintain high quality across the millions of interactions Copilot manages daily.

Conclusion

With MAI-Voice-1 and MAI-1-preview, Microsoft is signaling a shift toward greater independence in model development while still maintaining openness to external contributions. The combination of expressive voice technology and a large foundation language model suggests a long-term plan to build AI companions that are not only functional but also personal and adaptive.