AudioCraft

Motivation

I decided to explore this research project because I’m interested in creating a music generator, and in other synthesized media, and creating a product from them. This also allows me to explore the usage of the PyTorch library at a high level to create these kinds of project/products.

Description

AudioCraft was a research project conducted by Meta AI in 2023, that consisted of research aimed at synthesizing audio samples from various prompts.

The project is made of several in-depth technical papers full of exciting vocabulary, concepts, and operations that I’m going to explore through several sections of this post. I’m going to include a glossary at the end of post with a definition of each term, and code- or math-snippets where necessary.

The papers from the project include the following:

  • MusicGen - Simple and Controllable Music Generation: this paper discusses a two-stream approach to training a neural network to synthesize audio clips from conditioning prompts.

The process

I was trying to get this program to work, and ran into several challenges along the way. The first of which happened when I tried building the AudioCraft library. This was a supremely technical and shitty process.

What went wrong (pt. 1)

torch, as the library will be referred to, depends on the ATen library. ATen is a tensor library that enables optimized tensor operations on hardware, with a focus on memory optimization via several mechanisms [2]. One of the functionalities

To begin, I cloned the repository, and went straight to the demos/ folder at its root. I tried running the demos/musicgen_demo.ipynb notebook which showed the basic steps for inferencing some of the available models, but one of the pytorch dependencies kept crashing the kernel - or so I thought!

Long story short, these are these steps I took the get the project running:

  1. I deleted the previous conda environment I had created earlier, so I could un-fuck up anything I did fuck up.
  2. I created a new conda environment, and s

[1] As of writing, I was too lazy to verify whether or not the version of ATen that comes with torch is in fact a git submodule, so please bear with any inaccuracies while I try to untangle this mess.

[2] Not the standalone library that exists in zdevitos Github page, ATen, but the customized version that exists as a submodule [1].

[*] I’m wondering if I should refer to the libraries by their package name, or by their official project name, as listed on the website? One appeals to developers, and the other appeals to non-programmers. I think I’ll go with the devs, but make sure to reduce all ambiguity.