Separating Hype from Reality in AI Advancement

3 min readApr 26, 2024

Artificial intelligence has a habit of reinventing itself, with breakthroughs that turn out to be repackaged versions of earlier ideas. CDS Professor of Computer Science and Data Science Kyunghyun Cho’s latest research interrogates this phenomenon, shedding light on the cyclical nature of AI progress and the importance of rigorous evaluation in the field.

In a recent paper accepted to the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguists (NAACL), Cho and his co-authors, including former CDS postdoc Naomi Saphra, now at Harvard, Eve Fleisig of UC Berkeley, and Adam Lopez of the University of Edinburgh, draw parallels between the current buzz around large language models (LLMs) and previous waves of AI enthusiasm. “When we see LLMs being touted as something that will revolutionize not only technology, but also the world in general — we’ve seen this before,” Cho said.

The paper, titled “First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models,” points out that many of the technologies driving today’s LLMs have roots stretching back decades. Convolutional neural networks, developed largely by CDS founding director Yann LeCun, demonstrated remarkable problem-solving capabilities when trained on big datasets as far back as 2012. Neural networks for language modeling, which Cho himself worked on as a postdoc, powered a leap in machine translation performance that made headlines in 2016.

Even earlier, in the mid-2000s, a researcher at Google showed that simply scaling up the data used to train traditional machine translation (MT) systems could lead to dramatic improvements. The discourse from that time around MT, said Cho, “sounds almost exactly like what people are saying now about scaling up and how everything is going to be solved.”

The authors argue that one key lesson from this history is the importance of figuring out how to rigorously evaluate whether AI systems are truly getting better, or whether progress is more illusory. “What becomes really challenging is figuring out how to tell if these technologies are indeed improving,” Cho said. “Sometimes we’re tricked by the shiny nature of new technology into thinking that we’re making progress, when we’re really just fooling ourselves.”

Another crucial point, according to Cho, is that progress in AI is not linear and incremental, but happens in sudden jumps after long periods of near-stagnation. These leaps often come from unexpected directions. “Progress is not going to come from people working under the existing paradigm, but from people who think outside of the box,” he said.

Interestingly, a separate preprint, “Show Your Work with Confidence: Confidence Bands for Tuning Curves,” Cho and his colleagues present a way to do exactly this — quantifying progress in machine learning. Along with NYU Courant computer science PhD student Nicholas Lourie and CDS Assistant Professor of Computer Science & Data Science He He, this paper presents a statistically rigorous method for quantifying the level of certainty when comparing the performance of different machine learning algorithms.

The approach uses “confidence bands” — a way to visualize the range of plausible values for a quantity, taking into account the inherent uncertainty that arises from limited data and the complexity of modern machine learning methods. By looking at how the confidence bands for different algorithms progress and whether they diverge significantly, researchers can make more definitive judgments about which methods are truly better.

Cho sees this work as a necessary corrective to the “outrageous claims” that often circulate about the superiority of one AI system over another. “Everyone makes a lot of claims about which algorithm is better or which system is better, but those claims are often not quantifiable,” he said. “Without a well-calibrated level of confidence, how can we be confident about any kind of conclusion?”

Taken together, Cho’s recent papers underscore the need for the AI research community to grapple with complex, recurring questions around progress, evaluation, and certainty, even as the technology itself races forward. By taking a historical view and developing more rigorous methods, Cho and his colleagues aim to put the field on a firmer scientific footing. “As scientists, we can’t rely on hand-wavy claims,” he said. “We have to be systematic and quantitative.”

By Stephen Thomas

Separating Hype from Reality in AI Advancement

Written by NYU Center for Data Science