Machine Learning Expands Technology’s Role in Healthcare: Waging the War on Autoimmunity

NYU Center for Data Science
7 min readMay 31, 2023

CDS Assistant Professor/Faculty Fellow Jacopo Cirrone discusses his work harnessing data science in medical image analysis

CDS Assistant Professor/Faculty Fellow, Dr. Jacopo Cirrone

Medical image analysis has significantly benefited in recent years from machine learning-based modeling tools. While the field is fast growing, existing techniques have their limitations. CDS Assistant Professor/Faculty Fellow Jacopo Cirrone works at the intersection of machine learning and healthcare, recently publishing two papers that expand deep learning research within these fields.

His paper “CASS: Cross Architectural Self-Supervision for Medical Image Analysis” was a joint effort with CDS MS student Pranav Singh and previous CDS Moore Sloan Faculty Fellow and Assistant Professor Elena Sizikova, now a Staff Fellow for the Center for Devices and Radiological Health (CDRH) in the Food and Drug Administration (FDA). The research which presents a new self-supervised learning approach that utilizes both Transformer and CNN models, was accepted at the NeurIPS 2022 Self-Supervised Learning Theory and Practice Workshop.

Jacopo’s second paper “A Data-Efficient Deep Learning Framework for Segmentation and Classification of Histopathology Images” was accepted at the European Conference on Computer Vision-MCV 2022, and published by Springer Nature. This work focuses on developing deep learning approaches to comprehend inflammation in autoimmune diseases and was also accomplished in collaboration with Pranav.

To learn more about data science’s future in medical imaging and healthcare, CDS spoke with Jacopo.

Can you talk about how these projects began and how you became involved?

Entrepreneur and philanthropist Stewart Colton’s powerful question, ‘We have a war on Cancer, why can’t we have a war on Autoimmunity?’ resonated with me. It underlines the critical role and vast potential of technology in healthcare — a merger I’ve always perceived as not just significant, but transformative. By harnessing this synergy, we can influence and improve countless lives, thus reshaping the world. As the leader of artificial intelligence (AI) efforts at the Colton Center for Autoimmunity at NYU Langone, my focus is on expediting the development of novel diagnostics and therapies for autoimmune diseases, including lupus, rheumatoid arthritis, and multiple sclerosis, among others.

We’re tackling a multitude of challenges. The current management strategies in autoimmune diseases are predominantly reactive, as clinicians often aim to control inflammation once the disease has already advanced and damage may have occurred. The wealth of information from biopsies, commonly carried out for diagnosis and research purposes in autoimmune diseases, are often underutilized. Only certain features are used for diagnosis, leaving a vast amount of data unexplored. Furthermore, we still have much to understand about how cells interact at the tissue level. There are major outstanding research questions regarding which cell types participate in tissue inflammation and how they interact with each other spatially. For clinicians, manual examination of these structures is incredibly challenging.

AI and data science hold the potential to address these issues by offering high-throughput analysis while enabling unbiased discovery, making them powerful tools for answering critical questions in immunology. My group’s role revolves around leveraging AI’s potential, managing, and interpreting various -omics data to uncover the complexities of inflammation architecture in autoimmune diseases.

Were there any limitations or challenges you faced in research for your paper “CASS: Cross Architectural Self-Supervision for Medical Image Analysis”?

Developing CASS, our innovative siamese self-supervised learning approach, was a unique and challenging journey. Our goal was to build an efficient and versatile architecture that could overcome the limitations of existing deep learning frameworks.

CASS uniquely augments the input image only once, creating greater efficiency compared to state-of-the-art algorithms that augment twice. This single augmented image is then passed through both a Transformer and a CNN, each extracting different features. The Transformer focuses on more global aspects of the image, while the CNN concentrates on the more centered features.

This process results in two branches generating similar feature representations for similar inputs, which we refer to as positive pairs. Our approach not only provides a trained Transformer and a trained CNN in one cycle, but also reduces pretraining time by a significant 69% compared to other methods.

This journey wasn’t without obstacles. Deciding which architecture to use at inference time without labels is a hard problem, as CASS simultaneously trains a CNN and a Transformer. Additionally, benchmarking DINO against CASS required days of training on large datasets, presenting a time-intensive challenge.

CASS has shown promising results, outperforming existing self-supervised learning methods across diverse healthcare datasets. More importantly, CASS has proven itself to be not only time-efficient, but remarkably effective on smaller medical image datasets.

What lingering questions or areas of exploration arose from your research for your paper “A Data-Efficient Deep Learning Framework for Segmentation and Classification of Histopathology Images”?

In this study, we developed effective deep learning approaches for segmentation and classification. These methods can detect and identify inflammatory cells in human tissue biopsies of dermatomyositis. Compared to existing methods on the same dataset, our approach improved classification performance by 26% and segmentation performance by 5%.

Deciphering the complex structure within these biopsy images has traditionally been a time-consuming task, requiring many hours from skilled clinicians. As such, our deep learning framework is being used by our clinician collaborators in the Rheumatology department, to enhance their understanding of human disease pathogenesis, particularly in the context of dermatomyositis or lupus nephritis. This pipeline operates in a fully supervised manner. With CASS, we’ve developed a self-supervised component for classification and are now focusing on developing a self-supervised component for segmentation, given the formidable challenge faced by clinician-researchers in accurately annotating diverse cell types.

One of the key areas related to this work that I’m excited to delve into involves understanding the interactions of different cell types in autoimmune diseases. The main biological question focuses on inflammation in skin and muscle tissues, especially the interactions between T cells and B cells. We observe these cells in close proximity in muscle tissue, but it’s still unknown whether this leads to productive interaction and communication at the tissue level. This is a crucial question in immunology, particularly in the context of diseases like dermatomyositis or lupus nephritis.

What impact do you hope this work will have on the future of data science in healthcare, specifically in medical image analysis?

We aim to trigger a paradigm shift in how we approach healthcare, specifically in the realm of medical image analysis for autoimmune diseases. Our AI and machine learning research projects show promising strides in identifying inflammatory cells and their interactions, shedding light on critical questions about cell type participation in inflammation using biopsy images of inflamed human tissue.

One of the fascinating aspects of our study focuses on proximity mappings — evaluating the interactions of certain cells in close proximity. A crucial cell subtype known as T follicular helper (Tfh) cells hold particular importance in this regard. Tfh cells are rare and sparse, making their observation challenging. Our aim is to study the proximity mappings of these Tfh cells with B cells to understand the nature of their interaction, if any. This investigation into immune cell co-aggregation could provide novel insights into immune cell behavior from both a biological and medical perspective. Such insights could potentially revolutionize our understanding of autoimmune diseases and open up novel diagnostic and therapeutic avenues.

Our research also employs self-supervised learning, significantly reducing the human-intensive task of data annotation. It’s a powerful tool, especially when dealing with limited data availability in emerging diseases.

My group and I are relentlessly pursuing follow-up groundbreaking research. As we innovate, we are focusing on integrating our novel deep learning framework for segmentation, classification, and CASS into the primary deep learning pipeline at NYU Langone. Our aim is to empower clinician-researchers across NYU Langone Hospital, beyond just our collaborators in the Rheumatology department, to use our innovative framework to analyze their medical images. This will facilitate more efficient and effective healthcare solutions and improve our understanding of complex diseases.

We envision our collective efforts as significant strides in the ‘war on Autoimmunity.’ By amplifying the role of AI and data science in healthcare, we are striving to expedite the development of advanced diagnostic and therapeutic strategies for a variety of autoimmune diseases. This will ultimately improve patient outcomes and enhance the quality of life.

Any additional thoughts you would like to share?

I consider myself privileged to work with talented, energetic, and passionate individuals like Pranav, a brilliant deep learning researcher in my group. In the field of AI and healthcare, there’s still much to accomplish, and I’d like to emphasize that meaningful healthcare research stems from critical collaboration. Our current work at NYU, which involved several departments, is a clear example of this.

Alongside collaboration, a solid data strategy is equally vital. Once we’ve formed impactful research questions, a well-structured data strategy is required. This strategy not only involves the organization, analysis, and application of data but also identifying the most relevant datasets. It’s this strategy that helps us focus on our research questions and guides us toward the answers. Moving forward, I’m always eager to initiate new collaborations that can leverage our AI research to open up novel healthcare applications.

By Meryl Phair



NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.