CDS Congratulates Dr. Brian McFee on Becoming an Assistant Professor in Music Technology and Data Science!
By Sabrina de Silva
Dr. Brian McFee is an Assistant Professor in Music Technology and Data Science. He was previously a Moore-Sloan Data Science fellow at CDS, where his areas of interest included machine learning, music information retrieval, recommender systems, and multimedia signal processing. Before this, McFee was a postdoctoral research scholar in the Center for Jazz Studies and LabROSA at Columbia University. On August 20th, 2018, McFee discussed his new position, latest work in the field, and interests with Sabrina de Silva, CDS Content Writer.

This interview has been lightly edited & condensed for clarity.
First of all, congratulations!
Thank you very much!
Can you give me a brief overview of what you do?
The short version is that I develop algorithms and software to consume recorded audio and transform it into something comprehensible. Broadly, I work on descriptive tasks that help make audio more easily digestible and understandable. Those tasks could range from making textual descriptions of the different sources of noise in a recording, to transcribing chord sequences, to annotating structural boundaries in music.
What are you most excited to contribute in your new position?
Primarily, I want to make a more stable connection between data science, analytics, and statistics — all the things that we do in this building! And I want to focus on some of the domain science and work that happens more directly with audio. These days in computer science, machine learning, and data science, there are a lot of connections to image analysis and to language, text, and things of that nature. Comparatively, those are things that are pretty easy to work with. Audio, for some reason, has a higher barrier to entry that I would like to lower. I would love to get those two sides into dialogue.
What intrigued you about sound over images?
The honest answer is I had a couple of friends who started the Computer Audition Laboratory, and they were having a lot of fun. At that time in graduate school, I was not having a lot of fun. I realized there were all sorts of interesting questions to ask and avenues to make progress. In part, that’s because it’s a smaller field, but it’s also more open-ended. Even while certain problems may not be well-defined in a pursuit like computer vision, the field’s size and history allows researchers to feel secure in understanding they’re going to be working on object/image recognition or track faces, or do body pose estimation. On the other hand, automatic music transcription, or even instrument recognition are really hard problems in audio, and fraught with all kinds of ambiguities that make the problems difficult to formalize. I find those problems intriguing; particularly trying to define them in a way that makes it easier to find tools for them and make progress.
What are some exciting applications of the technology?
In the abstract, the applications I’m most interested in are the ones that help people connect with music and audio. Suppose you’re on Spotify, and you’re looking through a playlist. You see a list of songs, but you don’t really know what’s in them. You may know the artist, but you’ll likely still have to listen to each song to get a sense of it. There may be a cool hook that happens two and a half minutes into the song, but if you’re skipping around randomly, you would never have discovered it. But what if you had a visualization of the song to alert you, “Hey, there’s something different that happens at this part of the song”? That’s the kind of problem I like to think about.
Your particular pursuits have led you into what seems like a very interdisciplinary area. How do you reckon with the “creative” versus “technical” sides of the same coin, so to speak?
The research community is fracturing into those directions. There are people working on machine-assisted creativity and automatic composition. A lot of people are working on generative adversarial networks, either in images or in sound or text. Those things seem super cool, but really difficult to evaluate. I care a lot about evaluation, and tend to work more on the problems that I find easier to reason about.
Most of the problems in music are not easy to evaluate. I started off doing recommendation systems and playlist generation, and who knows what the right answer is there? A lot of the work is trying to come up with ways to compare systems and figure out if what we’re doing makes any sense. Evaluation is key to that pursuit. That said, a lot of the tools, models, and software are applicable for both sides, so I try and keep an eye out to see how people are using them.
Are my favorite musicians going to be AIs in 20 years?
We have synthetic pop stars now! I don’t think it’d be too much of a departure. Of course, DJs are a little easier than a whole pop star, but this reminds me of Hatsune Miku. She’s a completely synthetic anime pop star in Japan. Her voice and all her motions are computer generated. They put on “live performances,” where people populate an arena to watch a 30-foot hologram sing. There’s also a live band! It’s fun technology.
Would you go?
Sure! I would check it out. I wouldn’t say it’s my favorite thing to listen to, but it’s interesting for other reasons. A lot of this technology is collaborative, and it is cool to watch it become a collection of new tools for creative people to work with.
How do you feel about jumping back into teaching?
It’s been a little while, so I’m trying to dust off those skills! I’ll be teaching half in data science, and half in music tech. It will be interesting to switch between those two worlds. I’m going to be teaching Introduction to Signal Processing Theory for Music Tech, and I’m looking forward to that.
What new research can we look forward to?
In September, we’re presenting a paper at a conference about an open data set we created for evaluating instrument recognition. It’s 20,000 clips from the free music archive, tagged for which instrumentation by human annotators. I think it’s really important for the field since we haven’t had a large data set for what’s such a conceptually simple task in music, let alone the more complex tasks like melody, harmony, or rhythm. I’m excited to see what people are going to do with this data set.
Beyond that, in the near future, I’m interested in working on methods to understand vocal performance and lyrics, since we don’t have great tools for that right now either. Transcription in general, figuring out words in a song, is tough right now. Music is a lot more dynamic than spoken language; there is more redundancy in speech through timing, patterns, and stress, all of which are distorted in music. We need large data sets of singing that are transcribed in order to move forward with that too.