Universal Building Blocks: Research Reveals Neural Networks Learn the Same Patterns When Processing Images

NYU Center for Data Science
3 min readDec 3, 2024

--

Neural networks trained on completely different image recognition tasks end up learning many of the same foundational patterns, according to “On the Universality of Neural Encodings in CNNs,” new research by CDS Instructor Florentin Guth and collaborator Brice Ménard from Johns Hopkins University.

The research revealed that regardless of whether a neural network is trained to recognize images from popular computer vision datasets like ImageNet or CIFAR, it develops similar internal patterns for processing visual information. “The first step to characterize what a model learns,” said Guth, “is to compare networks to see if they have learned the same thing.”

The discovery helps explain why transfer learning — where a network trained on one task can be adapted to another with minimal retraining — works so well in practice. More importantly, it suggests that neural networks encode fundamental statistical patterns about the visual world that transcend specific tasks.

A Physics-Inspired Approach

The collaboration between Guth and Ménard brought together complementary perspectives. “Physicists have a very different way of approaching these questions than mathematicians or computer scientists,” Guth said. “Particularly in being extremely good at exploratory data analysis.” This approach proved valuable when examining the massive tensors of weights in neural networks — looking for patterns and structure before developing theoretical explanations.

The analogy to astrophysics is particularly apt. “When you’re an astrophysicist and you study galaxies, you can’t really do experiments. The only thing you can do is compare two galaxies and say whether they look the same or not,” Guth explained. “For neural networks, you can do all the experiments that you want, but the similarity is that we have two complex systems and we don’t even know what questions we should ask.”

Universal Patterns Emerge

Using novel mathematical analysis techniques building on an approach developed in a previous collaboration with Gaspar Rochette of École Normale Supérieure, and Stéphane Mallat of Collège de France, the researchers found that networks reliably learn a compact, universal set of spatial filters. These filters emerge regardless of network architecture, initialization, or training data.

Beyond spatial patterns, the team discovered that networks also learn similar channel-wise features across different datasets, though the universality is not as strong as with spatial filters. This suggests a hierarchy of universality in what networks learn.

The findings point to an alternative approach for developing foundation models — instead of purely optimizing for task performance, researchers could explicitly aim to maximize the universality of learned representations. “Now that we are starting to measure this, it might turn out that it’s even more important to aim for than performance,” Guth noted.

Future Applications

Looking ahead, this research could enable more efficient ways to combine multiple trained networks. “One type of application is weight editing — sometimes people call this model surgery — to have finer control over what your network has learned,” Guth said. “If you want to do edits after training without retraining everything, understanding how it’s represented in the weights would enable these kinds of things.”

This research provides a new way to compare and understand neural networks through the lens of universal patterns they discover, rather than just their outputs. Much like how astrophysicists study distant galaxies by comparing their properties, this work helps illuminate the inner workings of neural networks by mapping their shared encoding of the visual world. As Guth observed, his and Ménard’s approach was guided by an intuition that, “even though the system is complex, there must be some simplicity somewhere — a way to characterize this phenomenon with just a small number of variables.”

While we don’t yet fully understand the hidden internal structure of neural networks, this research takes an important step toward finding that underlying simplicity.

By Stephen Thomas

--

--

NYU Center for Data Science
NYU Center for Data Science

Written by NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.

No responses yet