Skip to main content

Meta AI Post On Recent Direct Speech-To-Speech Translation Research

Meta AI researchers share their findings regarding advancement in direct speech-to-speech translation models in a blog post today.
Created on June 13|Last edited on June 13
Today Meta AI has released a blog post covering a paper released by researchers at the company back in March. The paper covers a new method developed for speech-to-speech translation which does not rely on an intermediate text-to-text translation step.

The project also has a github page with examples of their research with comparisons to other systems.

Meta's method for speech-to-speech translation

Often, speech-to-speech translation models are really just text-to-text translation models with speech input transcribed to text and text output generated into speech. This method is not really "true" speech-to-speech translation, more-so text-to-text translation with more convenient endpoints.
While this style of translation isn't really speech-to-speech translation, true speech-to-speech translation models using spectrogram representations are incredibly difficult to train because of how much more complex spoken language is compared to text.
Meta researchers looked to support that direct speech-to-speech translation method by incorporating something called discrete units. By processing the input speech into these discrete units, a speech waveform can be generated for the primary output while also supporting the output of a text representation. This lets the model process speech-to-speech translation without ever performing text-to-text translation as an intermediate step.

Their model was trained using the Fisher and CALLHOME Spanish--English Speech Translation dataset which features phone calls spoken in Spanish and transcribed into Spanish and English.
Advancements in quality direct speech-to-speech translation brings us closer to translation systems for languages which do not have a textual representation.

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.