A Gentle Introduction to Dance Diffusion
Diffusion models are everywhere for images, but have yet to gain real traction in audio generation. That's changing thanks to Harmonai.
Created on September 23|Last edited on January 30
Comment
What is Dance Diffusion?
Dance Diffusion is a family of audio-generating machine learning models created by Harmonai, a community-driven organization with the mission of developing open-source generative audio tools for producers and musicians, and part of Stability AI.
You can use a pre-trained Dance Diffusion model (or train your own Dance Diffusion model) to generate random audio samples in a particular style, regenerate a given audio sample, or interpolate between two different audio samples.
A Dance Diffusion model is, as its name suggests, a diffusion model.
What is a diffusion model?
A diffusion model is a type of machine learning model that generates novel data by learning how to “destroy” (called “forward diffusion” or “noising") and “recover” (called “reverse diffusion” or “de-noising") the data that the model is trained on.
During the training process, the model gets better and better at faithfully recovering the data that it had previously destroyed.
This technical blog post by NVIDIA illustrates the forward and reverse processes. The model iteratively adds noise to some piece of data (like an image of a cat) until the data is pure noise, and then iteratively removes noise until the image is restored to its original form.

What if you asked the model to restore random noise?
As it turns out, when you pass random noise to a trained diffusion model, the reverse diffusion process de-noises the input into something that is the same type as the data that it has learned to recreate. In other words, this is how a diffusion model generates novel data!
Since Dance Diffusion models are trained on audio, they learn to generate audio.
Here you can listen to some piano music samples created completely by the Dance Diffusion model
What is Dance Diffusion trained on?
There are currently 6 publicly available Dance Diffusion models, each trained on a different dataset of audio files:
Since the data that a diffusion model is trained on — and therefore learns how to recover — affects the type of data that it later generates, audio samples created by, for example, the maestro-150k model will always sound like piano music, and not like guitar or trumpet music.
Zach Evans, the creator of Dance Diffusion, has released a Google Colab notebook for open beta access to the Dance Diffusion models listed above. Evans has also written a Colab notebook where you can fine-tune, or customize, a Dance Diffusion model on your own dataset for greater control over the generated audio clips.
Edit: This post has been deleted! If you’d like to try using or fine-tuning a Dance Diffusion model for yourself, check out Reddit user’s u/Stapler_Enthusiast detailed Dance Diffusion tutorial.
And lastly, if you'd like to create your own music samples using one of Harmonai's available models, we have a Colab link for you below!
Add a comment
The tutorial link near the bottom of this article goes to a reddit post which has been deleted :/
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.