Simulating Soundscapes: New Tool Enhances Machine Learning Models for Audio Localization

3 min readJun 19, 2024

In crowded urban environments, accurately identifying and locating sounds can be crucial for public safety and accessibility. CDS PhD Student Christopher Ick’s latest work at CDS addresses this challenge head-on. Presented at ICASSP 2024, Ick’s paper, “SpatialScaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms,” introduces a powerful new tool that promises to revolutionize how sound data is simulated and utilized in machine learning models.

Sound event localization and detection (SELD) is pivotal for developing technologies that assist individuals with low vision or hearing impairments. Traditional methods for creating datasets involve painstakingly collecting and annotating real-world audio recordings. This process is labor-intensive and time-consuming. Ick, along with co-authors CDS Assistant Professor of Music Technology and Data Science Brian McFee, and others, sought to alleviate this bottleneck with SpatialScaper, an innovative library designed to simulate soundscapes in both real and synthetic rooms.

“SpatialScaper allows us to generate vast amounts of labeled sound data without the need for extensive manual annotation,” Ick explained in an interview. “This tool leverages both real and synthetic room impulse responses [RIRs] to create diverse and realistic audio environments.”

The library’s key feature is its ability to emulate virtual rooms by adjusting parameters such as size and wall absorption. This flexibility enables the creation of varied acoustic environments, which is essential for training robust SELD models. By incorporating both real and synthetic RIRs, SpatialScaper can simulate soundscapes with unparalleled acoustic diversity, enhancing the generalization of machine learning models.

One notable application of SpatialScaper is its use in the DCASE SELD data challenge. “We replaced the existing data generator with SpatialScaper and saw a marked improvement in model performance,” Ick noted. This enhancement is directly linked to the library’s ability to introduce greater acoustic variability into the training data, demonstrating its practical benefits.

The collaborative nature of this project is another highlight. Ick emphasized the importance of open-source development: “Our lab is committed to making this software freely available on GitHub. We believe that by encouraging community contributions, we can continuously improve the tool and expand its applications.”

SpatialScaper is more than just a theoretical advancement; it has practical implications for various fields beyond assistive technology. Audio production, virtual reality, and even neuroscience could benefit from this tool. For example, Ick mentioned ongoing collaborations with other researchers to apply SpatialScaper in diverse environments, including laboratory settings for animal behavior studies.

The development of SpatialScaper also reflects Ick’s broader research trajectory. His journey began with the Sounds of New York City (SONYC) project, which aimed to characterize urban soundscapes. This foundational work inspired the creation of SpatialScaper, extending its capabilities from urban noise monitoring to three-dimensional audio simulations.

“By building on the SONYC project, we were able to create a tool that not only meets our current research needs but also has the potential to impact a wide range of disciplines,” Ick said. “The goal is to make it as easy as possible for researchers to generate high-quality spatial audio data, thereby advancing the field as a whole.”

SpatialScaper’s introduction marks a significant step forward in sound event localization and detection. As it gains traction within the research community, its impact is likely to be felt across multiple domains, driving further innovation in machine listening and beyond.

For those interested in exploring or contributing to SpatialScaper, the project is available on GitHub.

By Stephen Thomas

Simulating Soundscapes: New Tool Enhances Machine Learning Models for Audio Localization

Written by NYU Center for Data Science

No responses yet