New Techniques from CDS Make Big Data Analysis More Efficient

NYU Center for Data Science
3 min readMay 29, 2024

When handling big data, researchers often need to approximate complex probability distributions. This is particularly true in tasks involving Bayesian inference, where parameters that govern a process are estimated from data. A new technique has emerged that makes this process more tractable and comes with concrete guarantees. In their paper “Algorithms for Mean-Field Variational Inference via Polyhedral Optimization in the Wasserstein Space,” Aram-Alexandre Pooladian, a PhD student at CDS, along with co-authors Yiheng Jiang from NYU’s Courant Institute of Mathematical Sciences, and Sinho Chewi from the Institute for Advanced Study, introduce a new approach to this problem.

Broadly speaking, Bayesian inference is governed by the practitioner’s understanding of the posterior: an intractable probability distribution that governs their beliefs about a parameter given some observed data. Traditional methods can struggle in high dimensions, where computation can be slow and inefficient. Thus practitioners are sometimes willing to approximate the posterior to increase its tractability; this is called variational inference (VI). Pooladian and his co-authors developed an analysis of mean-field VI, which is when the posterior is approximated by a probability distribution with independent components.

The authors developed a specific notion of polyhedra to create a simpler, finite-dimensional version of an otherwise complex, infinite-dimensional space. “Computing the mean-field approximation is a hard problem”, Pooladian explains. “The core idea of our work is to create a simpler subset over which to optimize. This makes the optimization procedure tractable and still gives good results with precise guarantees.” An advantage of this approach is the absence of conjugacy conditions required on the prior beliefs on the parameters — this was a crucial bottleneck for previous implementation of mean-field VI.

The project started over the summer, when Jiang won a Summer Undergraduate Research Award from Courant. This provided a stipend that enabled him to take on a slightly different version of this project which was pitched by Pooladian. To write up what they found, Pooladian took refuge at Chewi’s apartment at the Institute for Advanced Study (IAS) over Thanksgiving to finish the draft. Pooladian recalls that while chatting at the IAS, they discovered that a slight change in the constraints led to a more robust solution. (Pooladian: “That was a breakthrough moment!”) The three authors continued to work on the article over the course of the week, and placed it online shortly thereafter.

Throughout this process, Pooladian consulted his doctoral advisor, Jonathan Niles-Weed, who is an assistant professor at CDS and the Courant Institute. Reflecting on his experience at CDS, Pooladian highlights the value of a supportive academic environment. “The collaborative spirit and access to diverse expertise at CDS and Courant have been invaluable. It’s an ecosystem that fosters innovation and rigorous research. It’s even better that Jon is amazingly supportive of my projects, even when he’s not intimately involved.”

The paper has been recently accepted to the Conference on Learning Theory, and Pooladian has also shared this work through several invited talks. He presented at the Massachusetts Institute of Technology on December 7th, 2023; the SIAM Conference on Uncertainty Quantification 2024 conference in Trieste, Italy on February 29, 2024; the École Polytechnique Fédérale de Lausanne (EPFL) in Lausanne, Switzerland on March 6, 2024; and at the Eidgenössische Technische Hochschule (ETH) in Zurich, Switzerland on March 8, 2024.

This research represents a significant advancement in variational inference and underscores the importance of interdisciplinary collaboration. As Pooladian and his colleagues continue their work, the potential for further breakthroughs in machine learning and data science is promising.

By Stephen Thomas

--

--

NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.