CDS PhD Student Chris Ick Unveils Groundbreaking Research in Sound Localization Using Simulated Data
The ability to precisely identify the location of a sound using audio alone is a skill humans have honed over millions of years. At the recent DCASE 2023 conference, Chris Ick, a PhD candidate at CDS, presented innovative research that advances the state of the art of the artificial implementation of this traditionally human skill.
Ick’s work, done under the advisement of CDS Assistant Professor of Music Technology and Data Science Brian McFee, is deeply rooted in his association with the Music and Audio Research Lab (MARL), and is somewhat of an outlier at CDS. His focus is on leveraging data science to teach robots to localize sound events using audio. “My work is pretty distinct from other members at CDS because of my focus on music and audio,” Ick explains. “I’m heavier on the ‘data’ side of data science, as opposed to machine learning, especially with my current work.”
Ick’s paper, “Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization,” which constitutes the early results of his dissertation, explores the complex task of teaching machines to localize sound solely through audio. “We’ve spent millions of years evolving to do that with just two ears,” says Ick, illustrating the intricacy of the task. The challenge lies in translating the human brain’s ability to process audio signals into a location into an algorithm that can be taught to a machine.
A significant hurdle in this research is the requirement of vast amounts of labeled data. “To do this, as you know, a big part of training neural networks is having lots and lots of data,” Ick says. But this data needs to be labeled, and acquiring enough labeled data, which has to be done through painstaking, expensive manual labor by humans, can be impractical. To circumvent the resource-intensive process of gathering real-world data, Ick has proposed using simulations to generate data. “What if I just simulate this data?” he asked, in order to investigate the potential of acoustic simulations in creating realistic, virtual audio environments.
The paper was accepted to be presented at the Workshop for the Detection and Classification of Acoustic Scenes and Events (DCASE) in Tampere, Finland, in response to a specific challenge issued by the conference, and reported on exactly this: the effectiveness of simulated data in training models for sound event localization and detection. Ick’s research shows promising results, with models trained on simulated data performing within 5% of those trained on real-world data. In practice, Ick found that 5% was a more-than-acceptable tradeoff for the practical benefits you accrue by skipping the laborious process of hand-labeling real data.
These results are also surprising. “It’s shocking that we can get this close without real data,” says Ick, calling this “the punch of this paper.” It seems, therefore, likely to change how a lot of work like this gets done. “This is great because, why are you going out and recording these data sets in the real world? Now you don’t need to.”
The implications of Ick’s research extend far beyond academic curiosity. “The reason I’m working on this in the first place is because of my lab’s history with urban monitoring in New York City,” says Ick, and shares a New York Times article about the work they did quantifying the city’s quiescence. Ick mentions a collaborator and former postdoc at MARL, Vincent Lostanlen, who uses sound event, localization, and detection (SELD) to track migratory patterns of animals, and Ick himself, during an internship at Bosch, developed an autonomous robot platform that identified microscopic air leaks aboard the International Space Station (ISS), which resulted in a paper published in ICASSP in 2022.
Looking ahead, Ick envisions applications for enhancing virtual reality (VR) and augmented reality (AR) experiences. With Apple’s Vision Pro having recently launched, Ick foresees a boom in AR and VR. “Once Apple makes a product, that’s usually a good indication that that technology is going to have a lot more people working on it very, very soon.” For AR and VR applications, “spatial audio and well-realized immersive audio is essential.”
Ick’s work at CDS not only pushes the boundaries of our understanding of sound localization but also paves the way for practical applications that could significantly impact our interaction with technology and the world around us. As he says, “We live in a very noisy world, and beyond vision, sound is probably the most direct way we interact with the world around us.”
By Stephen Thomas