Semi- and Self-Supervised Learning Help Clinicians Minimize Manual Labeling in Medical Image Analysis

NYU Center for Data Science
3 min readJul 5, 2024

--

A new AI pipeline developed by researchers at NYU significantly reduces the need for manual labeling in medical image analysis tasks, as detailed in the study titled “Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification,” published in Nature Scientific Reports. The work, led by Assistant Professor and Faculty Fellow at CDS and the Colton Center Jacopo Cirrone, introduces the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, which leverages advancements in self-supervised and semi-supervised learning to simplify the scaling of machine supervision compared to fully-supervised methods.

“Our study demonstrates that machine supervision significantly improves two crucial medical imaging tasks: classification and segmentation,” said Cirrone, who leads AI efforts at the Colton Center for Autoimmunity at NYU Langone. The researchers observed a significant improvement in classification tasks over methods reliant on fully annotated data, as well as a notable reduction in the need for labels in segmentation tasks. “The semi-supervised approach for segmentation outperforms fully-supervised methods while requiring 50% fewer labels across all evaluated datasets,” Cirrone noted.

The S4MI pipeline addresses a critical bottleneck in the advancement of clinical treatments: the heavy reliance on supervised learning techniques that require large amounts of annotated data. This process is not only costly but also incredibly time-consuming, demanding extensive involvement from clinical specialists. “A major challenge in applying deep learning to medical imaging is the extensive need for domain-specific annotation,” Cirrone, who is also Affiliated Faculty in the NYU Division of Precision Medicine, explained. “This not only demands considerable time and resources but also introduces inefficiencies in the development of robust AI models.”

The project was a highly collaborative and interdisciplinary effort, primarily developed at CDS in collaboration with the Courant Institute of Mathematical Sciences and the NYU Grossman School of Medicine. Clinical collaborator Craig Smuda, of the NYU Grossman School of Medicine’s Division of Rheumatology, played a pivotal role in ensuring that the computational innovations were aligned with real-world clinical needs. “Dr. Smuda’s involvement was critical in validating our AI models and ensuring that they addressed the complexities of medical image analysis in a clinically relevant manner,” Cirrone said.

The project also benefited significantly from the contributions of three students who were part of the Fall 2022 MS capstone research project at CDS: Luoyao Chen, Mei Chen, and Jinqian Pan. Their project, recognized as one of the best capstone projects of the semester, focused on developing and testing the semi-supervised learning algorithm that is part of the pipeline’s core. “These students not only contributed to the algorithm development but also played key roles in data preprocessing, model training, and performance evaluation,” Cirrone said.

The motivation behind this work is rooted in addressing several critical challenges in the diagnosis and treatment of autoimmune diseases. Clinicians often aim to control inflammation after the disease has already advanced, by which point significant tissue damage may have occurred. “Our research moves forward by addressing the underutilization of biopsy data,” Cirrone explained. “Typically, only certain features from these biopsies are used for diagnosis and research, leaving a vast amount of valuable data unexplored.”

The implications of this work extend far beyond basic research, offering significant potential across various industries and fields. In the healthcare industry, the S4MI pipeline can revolutionize diagnostic tools by enhancing their efficiency and accuracy. By automating and standardizing the analysis of medical images, these methods reduce the burden on clinicians and use vast amounts of unlabeled data to improve diagnostic consistency and reliability.

In line with their commitment to contributing to the scientific community, the researchers have made the S4MI code openly accessible, allowing for broader application and further development of these methods. “By developing effective self and semi-supervised deep learning approaches for segmentation and classification, we can detect and identify inflammatory cells in human tissue biopsies, significantly improving diagnostic processes,” Cirrone said. “This work not only enhances our understanding of disease pathogenesis but also offers practical tools for clinicians.”

By Stephen Thomas

--

--

NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.