Uncovering the Phases of Neural Network Training: Insights from CDS’ Michael Hu

NYU Center for Data Science
3 min readJan 31, 2024

The idea that neural networks undergo distinct developmental phases during training has long been a subject of debate and fascination. CDS PhD student Michael Hu’s recent paper, “Latent State Models of Training Dynamics,” takes a pioneering step in understanding and identifying these phases. Published in the Transactions on Machine Learning Research (TMLR), Hu’s work offers new insights into the training dynamics of neural networks.

Michael Hu, during an interview, delved into the motivation and implications of his research into the way neural networks learn. “We wanted to create a tool to analyze learning dynamics,” Hu explained. “Our goal was to segment training into what the algorithm believes are distinct phases of that learning process.”

The approach of Hu and his co-authors — CDS PhD student Angelica Chen, Naomi Saphra, and CDS Professor of Computer Science and Data Science Kyunghyun Cho — was partly inspired by historical theories of human psychological development. Hu drew parallels between the developmental stages in humans, as theorized by psychologists like Freud and Piaget, and the phases of neural network training.

However, he was quick to clarify that the analogy is not direct. Whereas Freud, Piaget, and modern developmental psychologists delineate discrete stages of human development, we should not expect machines’ stages to resemble humans’. “These models learn in a weird way, unlike humans who are bound by constraints like data and computation time,” Hu said. “With AI, it’s more helpful to go from first principles. Our approach was data-driven, using an algorithm that improves at predicting these phases as it trains more models.”

The core of Hu’s research utilized Hidden Markov Models (HMMs), which predict the development of neural networks during training. “It’s essentially meta-modeling,” Hu described. “We’re predicting the next step in a pupil’s development, using HMM’s discrete latent space to segment training phases automatically.”

The technique Hu and his team propose is “mainly a tool for science,” said Hu. Its purpose is to optimize neural network training by tailoring algorithms to specific phases of development. Adoption of the technique could lead to more efficient and effective training processes, particularly in the fields where neural network training is still nascent. Hu said this is reflected in the kinds of people who have reached out to him since the paper’s publication. “They’re basically scientists who are trying to better understand how things like language models train.”

It’s a versatile tool, according to Hu. “It’s applicable to a variety of contexts,” he said. “We’ve used it on convolutional neural networks, transformer-based language models, and even simple architectures like multi-layer perceptrons.”

As for real-world implications, although Hu sees his work primarily as a scientific tool, he also anticipates that its future application will lead to more informed decisions in neural network training. “It’s about predicting where training is going in a cheaper way than running the network,” Hu said, explaining that actually training a model will of course show you how it learns, but if you can predict how it will learn ahead of time, that saves you the cost of training it, which is often a major expense.

Reflecting on the collaborative nature of the project, Hu praised his team’s efforts and expressed enthusiasm for their future work. “It was a lot of fun,” he said, “and I think we have yet to do our best work together.”

Hu and his co-authors’ work at CDS represents a significant stride in understanding the phases of machine learning. Their innovative approach to modeling training dynamics not only unravels the mysteries of neural network development but also paves the way for more sophisticated and efficient training methodologies in the future.

By Stephen Thomas



NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.