Whisper: OpenAI's Open-Source Multilingual Speech Recognition Model Set

OpenAI released Whisper, an open-source multilingual speech transcription and translation model.

Created on September 26|Last edited on September 26

Comment

Researchers at OpenAI have produced (and open-sourced) a new set of models for automatic multilingual speech recognition, transcription, and translation called Whisper. This model's highlight feature is its performance robustness across many different languages and accents as well as technical speech and background noise situations.
﻿
Whisper's strength lies in broad language abilityWhile many other speech recognition models tend to focus on specific language understanding goals, such as focusing on a single language or linguistic area, Whisper shows its strength in its ability to work with numerous languages in a single model, while supporting not only transcription but cross-language translation as well.
With a training dataset of over 680,000 hours of audio collected from the internet, a third of which is composed of non-English language speech, Whisper was found to have significantly higher robustness and a lower error rate when compared to other models across diverse datasets overall. However, because it does not have a focused specialization, it does not beat any models on the famously competitive LibriSpeech datasets.
Whisper is also able to multitask, being able to transcribe the speech of any language while also being able to translate it into English. 
Whisper comes open-source and easy to useWhisper's release includes a GitHub repository with all the instructions you need to get Whisper running in your environment, as well as a Colab notebook to make it even more clear.
Pre-trained model weights for Whisper are available in a variety of sizes and two different versions of Whisper are also present - The main multilingual version and a version made exclusively for English. Both versions have models ranging from 39 million (tiny) to 769 million (medium) parameters, while the multilingual version has an additional 1.55 billion (large) parameter version available.
An official Gradio implementation of Whisper is available on Hugging Face, though it oddly only features transcription with the small multilingual model. This Gradio demo by davidtsong shows off a more feature-complete implementation of Whisper.
Find out more﻿Read the announcement blog post by clicking here.﻿
﻿Visit the GitHub repository by clicking here.﻿
﻿Read the full research paper on Whisper by clicking here.﻿﻿﻿
﻿

Add a comment

Tags: ML News

Iterate on AI agents and models faster. Try Weights & Biases today.