Multimodal learning, a technique that trains machine learning models using multiple data modalities simultaneously, such as images and text, stands out as a promising avenue. However, a recurring challenge has been observed: multimodal models often fail to utilize all data modalities effectively. Taro Makino, a fourth-year CDS PhD student, along…