CDS Members Author Paper on Graph Neural Networks in Electronic Health Records Representation Learning

We’re excited to announce that CDS PhD student Weicheng Zhu alongside affiliated CDS professor Narges Razavian have recently co-authored “Variationally Regularized Graph-based Representation Learning for Electronic Health Records”.

Weicheng Zhu (left) and Narges Razavian (right)

The team opens their paper by explaining that electronic health records (EHR) are an abundant source of information that is useful in predictive tasks in medical application such as mortality prediction, outcomes prediction, and phenotyping. EHR data is quite accessible which makes them a suitable resource for scaling screening to large populations. Particularly in the case of chronic diseases such as Alzheimer’s Disease, early identification prior to the onset of clinical symptoms can improve effectiveness of the treatments as well as the enrollment for clinical trials.

Prior studies have investigated an array of deep learning methodologies on the EHR. Recent research indicates the significance of graph structures among medical concepts. Though EHR data is quite accessible, EHR are intrinsically scarce and their data has a high probability of missing values. Some diseases can be recorded as diagnosis codes (“the translation of written descriptions of diseases, illnesses and injuries into codes from a particular classification”)1 while other existing conditions that are not considered in the clinical encounter may not be recorded.

As a potential solution to this problem, the team introduces a “variationally regularized encoder-decoder graph network that achieves more robustness in graph structure learning by regularizing node representations.”2 Their proposed encoder-decoder graph neural network “adaptively learns the connections among observed medical codes in EHR”.2 Additionally, their method addresses the problem of learning more expressive representations via variational regularization.

Ultimately, their model outperforms the existing graph and non-graph based methods used in multiple EHR predictive tasks based on real-world clinical data as well as public data. They demonstrated that their model achieves superior performance on three EHR-based predictive tasks:

  1. MIMIC-III mortality prediction (public data)
  2. eICU readmission prediction (public data)
  3. Prediction of future Alzheimer’s dementia onset based on NYU Langone’s 1.6 million de-identified patients data (internal data, for their early dementia intervention program)

Improvements in empirical experiment performances aside, the team provides an interpretation of the effect of variational regularization compared to standard graph attention networks, utilizing singular value analysis. Their future studies “include exploration of self-supervised learning to further improve generalization of graph based EHR representation learning.”2

To read the paper in its entirety, please visit its Arxiv page.

References:

  1. Diagnosis Code Wikipedia webpage
  2. “Variationally Regularized Graph-based Representation Learning for Electronic Health Records” paper

By Ashley C. McDonald

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.