At work, at home and even at the doctor’s office, AI continues to be utilized in more and more spaces in our lives. As the application of AI continues to expand so does the need for models that can recognize human speech and the intent of that speech.
As part of their Master’s capstone project, CDS students Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung published a paper titled “Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs” which details their efforts to create a spoken language understanding system system that can effectively understand the intent of human speech.
The paper outlines two main obstacles in the area of speech recognition in regards to intent prediction:
- Since that speech can be used to identify a person, often only automatic speech recognition (ASR) transcripts are accessible
- Intent-labeled speech data is scarce.
The authors then go on to propose a new system that can address both of these issues. To solve the first problem, the authors developed a novel system that can interpret ASR transcripts or speech or both. To solve the second problem, they created a cross-modal system that has an acoustic module and a text module which leverages a pre-trained BERT model. This cross-modal system can co-train text and acoustic data at the same time. They then further improved this system, “by pre-training the acoustic module on the LibriSpeech dataset and domain-adapting the text module on our target datasets.”
The results of the team’s experiments is a system that shows competitive performance to other systems when it comes to interpreting Snips SLU and Fluent Speech Commands datasets. The team hypothesized in the conclusion of the paper that the system is also better at understanding “noisy” data because it is trained to process ASR transcript embeddings.
The team plans to present their work this August at Interspeech 2021 We hope you all join us in congratulating them on this remarkable accomplishment. We are sure we’ll be seeing further innovation from this team in the future!