Sitemap

Postdoctoral Research Fellow Micah Goldblum publishes paper with Capital One’s Applied Research team

3 min readAug 3, 2022

The paper highlights the effectiveness of collaborative innovation and adds insights to the future of machine learning.

Micah Goldblum, CDS Postdoctoral Research Fellow

CDS Postdoctoral Research Fellow Micah Goldblum recently published a research paper titled, “Transfer Learning with Deep Tabular Models” in collaboration with Capital One’s Applied Research team which utilizes artificial intelligence to enhance its financial services and has a cooperative relationship with CDS. Goldblum worked with Bayan Bruss, Head of Applied Research at Capital One, CDS Professor Andrew Gordon Wilson, Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, and Tom Goldstein who all contributed significantly to the research.

The project concerns transfer learning, the storing of information gained from solving one problem that can be applied to a new but similar problem. An example Goldblum gives is leveraging large volumes of disease diagnosis data from common diseases to help diagnose rare ones. While neural models are highly effective in computer vision because of their ability to learn reusable features and new domains, transfer learning has also been found highly effective when task-specific data is scarce.

“Despite being critical to many of our applications in financial services, tabular data as a domain of exploration is an underdeveloped space in mainstream machine learning research,” said Bruss. Although high high-quality tools exist for machine learning on tabular data (information organized in tables), Bruss explains their scientific underpinnings are decades old.

Recent research on deep learning for tabular data has found the method performs strongly and often closes the divide between the widely used gradient-boosted decision trees (GBDT), a machine learning technique used in classification and regression tasks, and neural networks. “Fields like computer vision and natural language processing have been dominated by neural networks,” said Goldblum. “While practitioners in the tabular data domain, the most pervasive setting in real-world applications, still use decision tree methods.”

The paper shows that upstream data gives tabular neural networks an advantage over GBDT models. It proposes a medical diagnosis benchmark for tabular transfer learning and outlines a guide for leveraging upstream data to boost performance within a range of neural network architectures. “It’s an exciting time for tabular deep learning, and it just might be the future of data science,” said Goldblum.

The partnership between CDS and Capital One, one of our founding partners, that made this research possible is part of the bank’s ongoing collaborations with leading universities to advance machine learning and AI research. The relationship has facilitated professional development opportunities for the CDS Undergraduate Research Program (CURP) and assisted the university in launching the CDS Diversity Initiative in 2018 to enhance representation in data science.

“We see a wide swath of uncharted territory related to how modern machine learning architectures can be utilized for tabular data today,” said Bruss. “Collaborating with innovative research community partners like NYU to advance understanding of this field is one of our team’s top priorities.”

By Meryl Phair

--

--

NYU Center for Data Science
NYU Center for Data Science

Written by NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.

No responses yet