A paper co-authored by CDS PhD student Zahra Kadkhodaie will be presented as a spotlight talk at ICLR 2023
The research proposes a local low-dimensional probability model of large images
Recent advances in score-based diffusion generative models show that deep neural networks can learn the complex and high dimensional probability distributions of images. How these networks capture complex global statistical structures without appearing to suffer from “the curse of dimensionality” (an explosion in the number of samples needed to estimate probability distributions from high-dimensional data) is still a mystery. To study this question, a team of researchers, including CDS PhD student Zahra Kadkhodaie, proposed a local low-dimensional probability model of images instantiated in a score-based generative model. The validity of their model was tested empirically by synthesizing conditional and unconditional large images.
Their paper “Learning multi-scale local conditional probability models of images” was accepted as an oral presentation to the International Conference on Learning Representations (ICLR 2023). The eleventh iteration of the global deep learning conference will be held in Kigali, Rwanda from May 1st through the 5th. In addition to Zahra, the study’s authors include PhD student at Ecole Normale Supérieure and incoming CDS postdoctoral fellow Florentin Guth, Professor of Applied Mathematics and Computer Science at Collège de France and Ecole Normale Supérieure Stéphane Mallat, and CDS Associated Professor and NYU Silver Professor of Neural Science, Mathematics, Data Science, and Psychology Eero Simoncelli.
The idea behind the model comes from literature in Markov random fields, where a high-dimensional joint probability distribution is broken down into a set of overlapping low-dimensional conditional distributions. First, the image is decomposed into a multi-scale representation. Subsequently, the representation in each scale is assumed to satisfy the Markov property (conditioned on a small neighborhood, each coefficient is independent of the rest of the image).
“We know how to train small networks to learn probability distributions of small images, and we also know how to train large networks to learn probability distributions of large images,” said Kadkhodaie. “In this work, we show how to construct small networks to capture probability distributions of large images.”
The model was tested by training a small conditional convolutional neural network (CNN) with a small receptive field for each scale. The researchers then used the cascade of these networks to draw samples from a distribution of face images. The high quality of images synthesized by the model as well as its denoising performance showed the locality assumption holds (at least for face images). Kadkhodaie said this was a surprise because images have long-range dependencies which cannot be captured in a local model in the pixel domain.
By Meryl Phair