Harmonai's Dance Diffusion, Open-Source AI Audio Generation Tool For Music Producers
Harmonai's Dance Diffusion leans on machine learning diffusion models to produce unique audio waveforms, and aims to be a key part of any digital music producer's toolset.
Created on September 25|Last edited on September 26
Comment
Stability AI's recent models have been a tremendous success in both the ML world and the internet at large (think the recent release of Stable Diffusion) but they’re not done yet. This week, they're releasing a new diffusion model but this time dedicated to a sensory medium tragically under-represented in ML: Audio, and to be more specific, music.
To begin filling this void, Harmonai, an open-source machine learning project, and organization, is working to bring ML tools to music production under the care of Stability AI.
What’s Harmonai all about?
The unavoidable controversy of AI image generators replacing traditional digital artists is not lost on Harmonai; this is why its focus is on making the art of music production more accessible to all with its powerful generative AI-driven tools, rather than overtaking the whole music production process.
Their first model, Dance Diffusion, is all about samples - a key aspect present in most digital music production, and especially present in electronic music.
Dance Diffusion is also built on datasets composed entirely of copyright-free and voluntarily provided music and audio samples. Because diffusion models are prone to memorization and overfitting, releasing a model trained on copyrighted data could potentially result in legal issues. In honoring the intellectual property of artists while also complying to the best of their ability with the often strict copyright standards of the music industry, keeping any kind of copyrighted material out of training data was a must.
Unlike other audio generation models, Harmonai is trying to create production-ready sounds. That means incorporating the full auditory range of audio. While you could probably get away with lower quality output (in terms of bitrate) for voices to save on computation time and resource use, a music model must output a higher quality standard, especially if it’s positioning itself to be used by music producers.
Here you can listen to some piano music samples created completely by the Dance Diffusion model
One issue that music generation models have struggled with before is hitting high frequencies. Just like how many ML models tend to learn the broad strokes first, lower frequencies are an easy task for waveform generation models to figure out, but the high-frequency noises have always been a struggle. Take OpenAI’s Jukebox for instance, while its output is remarkable in its own right, the struggle to output quality high frequencies leaves it sounding muddy.
Harmonai wanted to ensure their Dance Diffusion model hit every frequency necessary to produce good sounds before releasing it to the public.
Furthermore, Harmonai wants to bring its tools straight to the DAW - envisioning a future where AI runs in real-time side by side with your normal music-making procedure. Plugins that use AI to generate unique waveforms are few and far between, and Harmonai sets one of its goals to fill that niche.
Find out more
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.