The Missing Middle: Building a Science of Deep Learning

NYU Center for Data Science
4 min readDec 11, 2024

--

While scientists typically use experiments to understand natural phenomena, a growing number of researchers are applying the scientific method to study something humans created but don’t fully comprehend: deep learning systems. At the upcoming Workshop on Scientific Methods for Understanding Deep Learning at NeurIPS 2024 (Sunday, December 15th in Meeting rooms 205–207), researchers will gather to establish this emerging approach as a distinct field.

CDS Faculty Fellow Florentin Guth, recent CDS PhD graduate Zahra Kadkhodaie, spearheaded the workshop. They were joined by collaborators from several institutions, including CDS PhD Student Sanae Lotfi, Google DeepMind researcher Valentin De Bortoli, National Lab researcher Davis Brown and former CDS Postdoc Researcher Micah Goldblum.

The organizers saw a gap between deep learning’s two traditional camps. “The deep network community is split into two parts,” Kadkhodaie said. “One is the more practice-focused engineering side, which has had amazing achievements in terms of making things work. On the other side, there is mathematical theoretical work trying to rigorously explain how neural networks work and provide guarantees about their limits. Although valuable progress has been made on this front, the complexity of real data and deep networks resists traditional analysis, leaving many aspects mysterious and still poorly understood.”

The workshop promotes an alternative path to understanding deep nets which sits right at the middle of practice and theory: using controlled experiments to test hypotheses about how deep learning systems actually function. This approach has already yielded important insights, like the discovery of scaling laws that predict how model performance improves with size and training data. “That started from experiments just evaluating models at different scales,” Guth said. “Empirically, you discover there’s a very simple relationship that allows you to predict how these things are going to scale. This has led to improvements in both theory and practice.”

The idea for the workshop emerged gradually, as both organizers encountered similar challenges in communicating their research. “Through sending work to conferences and getting back reviews, I felt it was hard to communicate our results,” Kadkhodaie said. “The criteria for assessing this type of work is still forming and is not as clear-cut as quantitative contributions like improving the state-of-the-art, or proving rigorous theorems. Yet, this type of work has had a profound impact on both theory and practice. The goal of the workshop is to bring together researchers from different subfields who use scientific methods to understand and improve deep nets to facilitate collaborations and build a community and hopefully step towards establishing it as a subfield in machine learning.”

The organizers assembled speakers and panelists from diverse backgrounds including neuroscience, statistical physics, and Bayesian deep learning to foster cross-pollination of ideas. CDS Professor Eero Simoncelli and CDS Professor Andrew Wilson will participate in a panel discussion exploring different perspectives on understanding these systems. The panel aims to spark debate between researchers who approach deep learning understanding from different angles.

For Kadkhodaie, whose background includes physics, the scientific approach came naturally. “This was just a natural way of approaching things,” she said. Guth’s path was less direct, coming from mathematics and computer science. “We were always trying to be in the middle of both worlds,” he said. “We had lots of experiments, but we weren’t trying to improve performance. We had a few theorems, but they weren’t the main focus.”

The timing of this new approach is significant. Pre-2012, when deep learning wasn’t yet achieving its current success, there was more emphasis on understanding these systems. “When things were not working, you had more time to think and try to understand them, as opposed to just trying to code and engineer and scale them,” Guth said. “When deep learning started really working and the hype began, theoretical understanding of these systems got sort of swept under that wave.”

“It can be hard to evaluate good work in this area compared to engineering metrics like state-of-the-art performance,” Kadkhodaie said. “But these are things that have a higher chance of surviving the test of time. We know that in the deep learning community, things often get obsolete after a few months.”

The workshop will take place on December 15th and feature keynote talks from leading researchers including Zico Kolter, Tom Golstein, Surya Ganguli, Hanie Sedghi, and Misha Belkin. The accepted papers can be viewed on OpenReview. The workshop will also include a ‘Debunking Challenge’ offering prizes for work that challenges commonly-held beliefs in deep learning through careful experimentation.

“It might initially be perceived as unusual to say ‘science of deep learning,’” Kadkhodaie said. “Usually we apply scientific methods to understand the natural world, and here we have built something, and then we don’t know what it is or how it works, and then we want to apply science to understand that. But because deep nets have evolved to become these very complex entities, it makes sense to apply scientific methods to figure them out.”

For the latest news about the workshop, follow @scifordl on X/Twitter. Those interested in participating in the panel discussion can submit questions through an online form.

By Stephen Thomas

--

--

NYU Center for Data Science
NYU Center for Data Science

Written by NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.

No responses yet