[REVIEW] AudioCraft
AudioCraft
Motivation
I decided to explore this research project because I’m interested in creating a music generator, and in other synthesized media, and creating a product from them. This also allows me to explore the usage of the PyTorch library at a high level to create these kinds of project/products.
Description
AudioCraft was a research project conducted by Meta AI in 2023, that consisted of research aimed at synthesizing audio samples from various prompts.
The project is made of several in-depth technical papers full of exciting vocabulary, concepts, and operations that I’m going to explore through several sections of this post. I’m going to include a glossary at the end of post with a definition of each term, and code- or math-snippets where necessary.
The papers from the project include the following:
- MusicGen - Simple and Controllable Music Generation: this paper discusses a two-stream approach to training a neural network to synthesize audio clips from conditioning prompts.
The process
I was trying to get this program to work, and ran into several challenges along the way. The first of which happened when I tried building the AudioCraft library. This was a supremely technical and shitty process.
What went wrong (pt. 1)
torch, as the library will be referred to, depends on the ATen library. ATen is a tensor library that enables optimized tensor operations on hardware, with a focus on memory optimization via several mechanisms [2]. One of the functionalities
To begin, I cloned the repository, and went straight to the demos/ folder at its root. I tried running the demos/musicgen_demo.ipynb notebook which showed the basic steps for inferencing some of the available models, but one of the pytorch dependencies kept crashing the kernel - or so I thought!
Long story short, these are these steps I took the get the project running:
- I deleted the previous
condaenvironment I had created earlier, so I could un-fuck up anything I did fuck up. - I created a new
condaenvironment, and s
[1] As of writing, I was too lazy to verify whether or not the version of ATen that comes with torch is in fact a git submodule, so please bear with any inaccuracies while I try to untangle this mess.
[2] Not the standalone library that exists in zdevitos Github page, ATen, but the customized version that exists as a submodule [1].
[*] I’m wondering if I should refer to the libraries by their package name, or by their official project name, as listed on the website? One appeals to developers, and the other appeals to non-programmers. I think I’ll go with the devs, but make sure to reduce all ambiguity.