CDS PhD student Nan Wu will showcase her research at the Thirty-ninth International Conference on Machine Learning
Wu’s work on the “greedy nature of learning” in deep neural networks will enhance available tools for multi-modal learning
A project led by CDS PhD student Nan Wu, will appear at the 2022 Thirty-ninth International Conference on Machine Learning (ICML) this July. Wu’s project titled “Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks,” is co-authored by former CDS affiliated postdoctoral researcher, Stanisław Jastrze ̨bski, CDS Associate Professor of Computer Science and Data Science, Kyunghyun Cho, and CDS affiliated professor, Krzysztof J. Geras.
The ICML will meet in person this year at the Baltimore Convention Center in Maryland from Sunday, July 17th through Saturday the 23rd. As a leading academic conference on machine learning, Wu will have the opportunity to bring her work to an international audience. “ICML 2022 will be the first in-person conference I’ve joined after the pandemic,” said Wu. “I’m looking forward to meeting researchers in person, making new connections, and discussing ideas.”
At CDS, Wu works with Krzysztof J. Geras and Kyunghyun Cho on deep learning research for medical imaging, specifically in breast cancer screening. Currently, Wu’s attention is on issues of multi-modal deep learning. She was a recipient of the Google Ph.D. Fellowship 2020, a program that supports exceptional Ph.D. candidates conducting innovative research in areas relevant to computer science who “seek to influence the future of technology.”
Deep learning is essentially teaching a computer how to think like the human brain through learning by example. Deep learning models are parameterized by deep neural networks (DNNs) and are arranged in a hierarchy of increasing complexity rather than the linear algorithms of traditional machine learning. It grants data scientists more flexibility in feature learning, especially when it comes to learning from high-dimensional data. It is also the technology behind driverless cars which can make autonomous decisions depending on the environment the car is in.
Researchers are interested in training DNNs to learn from multiple input modalities, as humans explore the world by engaging a variety of our senses — visual, auditory, kinaesthetic, etc. Such research domains are referred to as multimodal learning and models are named as multimodal DNNs. Wu’s paper addresses the problem that multimodal DNNs cannot always outperform models using a single modality. Wu encountered the research problem in her previous work on multi-modal DNNs for breast cancer screening.
The paper points out the “greedy nature” of multimodal learning and explains the above counterintuitive phenomenon with it. Greedy refers to algorithms that build towards a solution while choosing the next piece that offers the largest and most immediate benefit. Wu’s research shows the “greedy nature of learning” harms the models’ results.
To determine the model’s dependence on each modality, Wu uses a metric named the “conditional utilization rate.” In running the experiment, there was a consistent observation of imbalance in the conditional utilization rates between modalities. To further validate the greedy learner hypothesis and to solve the issue on-the-fly during training, the team introduces “conditional learning speed,” the pace the model learns from each modality. The researchers propose an algorithm to balance the conditional learning speeds between modalities which addresses the problem of greedy learning. The algorithm is validated on three multi-modal datasets.
The greediness of training algorithms inhibits DNNs from achieving better performance and the method the research proposes could help solve the issue. “After many trials and errors, we successfully found what we think is the best way to capture and explain the phenomenon,” said Wu. The tools proposed through the work will also give researchers a better understanding of what the model has learned. “Our greedy learner hypothesis provides a complementary explanation and the methods inspired by it enrich the spectrum of available tools for multi-modal learning,” said Wu. “I’d especially hope to see it serve our goals towards more accurate AI-assisted cancer diagnosis.”
Wu’s research was supported by grants from the National Institutes of Health, the National Science Foundation, the Gordon and Betty Moore Foundation, and the Samsung Advanced Institute of Technology.
By Meryl Phair