Artificial Intelligence

Mistral releases a new open source model for speech generation

Published byAIDaily Editorial Team
5 min read
Original source author: Ivan Mehta

The model, which lets enterprises build voice agents for sales and customer engagement, puts Mistral in direct competition with the likes of ElevenLabs, Deepgram, and OpenAI.

Share:

French AI company Mistral released a new open source text-to-speech model on Thursday that can be used by voice AI assistants or in enterprise use cases like customer support. The model, which lets enterprises build voice agents for sales and customer engagement, puts Mistral in direct competition with the likes of ElevenLabs, Deepgram, and OpenAI.

The new model, called Voxtral TTS, supports nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic.

“Our customers have been asking for a speech model. So we built a small-sized speech model that can fit on a smartwatch, a smartphone, a laptop, or other edge devices. The cost of it is a fraction of anything else on the market, but it offers state-of-the-art performance,” Pierre Stock, VP of science operations at Mistral AI, told TechCrunch during a phone interview.

Mistral said the new model can adapt a custom voice with a sample of less than five seconds and can capture characteristics like subtle accents, inflections, intonations, and irregularities in the flow of speech. The model, based on Ministral 3B , can switch between languages easily without losing the characteristics of the voice, which is useful for use cases like dubbing or real-time translation. Stock said the company wanted the model to sound human and not robotic.

The model has been built for real-time performance, according to the company. It has a time-to-first-audio (TTFA) — a measure of when the model starts “speaking” after receiving input — of 90 ms for a 10-second sample of 500 characters. The model also has a real-time factor (RTF) of 6x, which means it can render a 10-second clip in roughly 1.6 seconds.

Earlier this year, Mistral launched a pair of transcription models , one for large batch processing and the other for real-time use cases with low latency. With the new speech model, the company is likely aiming to provide a full suite of voice products to enterprises.

“We plan to have an end-to-end platform that can handle multimodal streams of input, including audio, text, and image and output as well. The main benefit of that is you get way more information with an end-to-end agentic system that supports audio as an input or output,” Stock said.

Disrupt 2026: The tech ecosystem, all in one room

Save up to $300 or 30% to TechCrunch Founder Summit

Mistral’s positioning is that its open source and customization bit will help enterprises adopt its voice models over competitors, as they can tune it the way they want.

Ivan covers global consumer tech developments at TechCrunch. He is based out of India and has previously worked at publications including Huffington Post and The Next Web.

You can contact or verify outreach from Ivan by emailing im@ivanmehta.com or via encrypted message at ivan.42 on Signal.

StrictlyVC kicks off the year in SF. Get in the room for unfiltered fireside chats with industry leaders, insider VC insights, and high-value connections that actually move the needle. Tickets are limited.

The AI skills gap is here, says AI company, and power users are pulling ahead Rebecca Bellan

The AI skills gap is here, says AI company, and power users are pulling ahead

The AI skills gap is here, says AI company, and power users are pulling ahead

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’ Sarah Perez

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Google unveils TurboQuant, a new AI memory compression algorithm — and yes, the internet is calling it ‘Pied Piper’

Kentucky woman rejects $26M offer to turn her farm into a data center Graham Starr

Kentucky woman rejects $26M offer to turn her farm into a data center

Kentucky woman rejects $26M offer to turn her farm into a data center

Someone has publicly leaked an exploit kit that can hack millions of iPhones Lorenzo Franceschi-Bicchierai Zack Whittaker

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Someone has publicly leaked an exploit kit that can hack millions of iPhones

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi Anthony Ha

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Cursor admits its new coding model was built on top of Moonshot AI’s Kimi

Delve accused of misleading customers with ‘fake compliance’ Anthony Ha

Delve accused of misleading customers with ‘fake compliance’

Delve accused of misleading customers with ‘fake compliance’

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple Julie Bort

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple

An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple

What this coverage includes

  • Clear source attribution and link to the original publication.
  • Editorial framing about relevance, impact, and likely next developments.
  • Review for readability, context, and duplication before publication.

Original source:

TechCrunch AI

About this article

This article was curated and published by AIDaily as part of our editorial coverage of artificial intelligence developments. The content is based on the original source cited below, enriched with editorial context and analysis. Automated tools may assist with translation and initial structuring, but publication decisions, factual review, and contextual framing remain editorial responsibilities.

Learn more about our editorial process