AI Model Discovers New Lung Cancer Subtype Associated with Poor Survival
Tumor cells rarely exist in isolation. Instead, they live within complex microenvironments populated by diverse immune cells, blood vessels, and supporting structures. Understanding this heterogeneity is crucial for precision cancer treatment, but examining the multidimensional data from tumor samples has traditionally required laborious manual work by pathologists.
A new study, “Characterization of tumour heterogeneity through segmentation-free representation learning on multiplexed imaging data,” published in Nature Biomedical Engineering showcases how machine learning can analyze these complex tumor environments without human bias. Former CDS Master’s student Jimin Tan, now a postdoc at NYU Langone Health, led a team that included CDS Professor Kyunghyun Cho, which developed CANVAS (CANcer Vision AutoencoderS), a framework that can automatically identify different tumor microenvironments and their clinical relevance.
“In the medical domain, labels are very hard to come by, especially for clinicians,” Tan said. “Using this framework, you can enable automatic discovery without requiring any additional data.”
Cho provided significant contributions on the computational side, helping develop key aspects of the methodology, including spectral clustering techniques that enabled the research team to group features based on their spatial relationships.
The team’s approach represents a significant advancement over previous methods that relied on cell segmentation — the practice of manually outlining individual cells before analysis. This method instead uses a self-supervised machine learning approach that learns patterns directly from the pixels of multiplexed images, where tumor samples are stained with various markers to highlight different cell types simultaneously.
Trained on lung cancer images from 416 patients, CANVAS identified 50 distinct tumor signatures, including a previously uncharacterized “monocytic signature” associated with poor survival. The team validated this finding in an independent dataset and through laboratory experiments, discovering that monocytes (a type of immune cell) near tumors produce extracellular matrix components that may contribute to fibrosis and cancer progression.
“The monocytic signature that we discovered was not previously characterized, and it’s something new that we just identified and validated,” Tan explained.
What makes CANVAS particularly valuable is its ability to learn without human guidance. Traditional approaches to analyzing tumor heterogeneity often rely on pathologists manually examining images to grade tumors based on morphology. These assessments can be subjective and miss subtle patterns that might have clinical significance.
Tan’s machine learning approach, on the other hand, provides an unbiased view. “By using our method, which is based on individual pixels, we’re not making any assumption about the tumor environment or specific discretized cell type,” Tan said.
The technology behind CANVAS builds on vision transformers — the same class of deep learning models that power many advanced image recognition systems. The model analyzes tumor images by breaking them into small patches and learning meaningful representations of these patches through a “masked image modeling” task, where it learns to reconstruct randomly hidden portions of the image.
Tan credits his education at CDS for giving him the machine learning expertise necessary for this project. “The training I received at CDS became the foundation for the computational projects I worked on during my PhD,” he said. “When I graduated from the master’s program, I was feeling confident using machine learning to study biology and drive new discoveries.”
The implications of this work extend beyond lung cancer. The segmentation-free nature of CANVAS makes it applicable to a wide range of imaging analyses under different conditions, including multiplexed immunofluorescence imaging, imaging mass spectrometry, and spatial transcriptomics.
As researchers continue to develop and refine these AI tools, they may increasingly become part of the standard toolkit for cancer diagnosis and treatment planning, helping pathologists identify clinically relevant patterns that might otherwise remain hidden within the complex ecosystem of tumors.
By Stephen Thomas