Experiments with OpenAI Jukebox


Generative models have long shown great promise in various domains, from generating text (GPT-2), Image to Image Translation (CycleGAN), to Musical Compositions (MuseNet).

OpenAI's recent Jukebox paper (arxiv link) furthers the work of MuseNet to generate unique audio samples. While MuseNet was trained on MIDI data, a format that can carry information about a musical note such as notation, pitch, velocity, etc, the Jukebox paper is trained on raw audio.

Inherently, the key breakthrough of the model is that it's able to learn top level features such as composition style and genre and low level features such as pace, notation, pitch, etc without any specific additional information.

In this report, we explore 2 main setups:

  1. Sampling a trained model by feeding in new audio samples and analyzing upsampling and noisy audio output.
  2. Analyzing the training behavior of the Jukebox model.

Read the full post →

Join our mailing list to get the latest machine learning updates.