Robustness at a Cost: New Research Reveals Hidden Challenges in AI Security

NYU Center for Data Science
3 min readOct 30, 2024

--

Machine learning models designed to withstand adversarial attacks may require significantly more data to achieve robustness, due to hidden biases in their training process. This unexpected challenge was revealed in a recent paper by CDS PhD student Nikolaos Tsilivis and his collaborators, shedding light on why creating robust AI systems has proven so difficult.

“We really highlight how much more important these choices become for robustness,” Tsilivis said, referring to the implicit factors that affect model training. These factors, such as the architecture of the neural network or the setup of the problem, make implicit choices about the eventual model that is produced.

The research, titled “The Price of Implicit Bias in Adversarially Robust Generalization,” was a truly international effort. Tsilivis collaborated with researchers across multiple institutions and continents, including his advisor, Silver Professor of Computer Science, Mathematics, and Data Science Julia Kempe, Natalie Frank, a recently graduated Courant PhD, now a Pearson fellow and IFDS Fellow at the University of Washington, and Nathan Srebro from TTI-Chicago.

The team’s work focused on adversarial robustness, a critical concern in machine learning that emerged about a decade ago. Tsilivis explained that the field of adversarial attacks began in computer vision when researchers discovered that subtle changes to image pixels could completely fool machine learning models. “A human is not going to be able to perceive the difference between the original image and the perturbed image, but still, a machine learning model will completely fail,” he said.

Over time, the concept expanded beyond security threats to encompass more benign scenarios. For instance, researchers found that adding small visual elements to images, like tiny glasses in a corner, could cause models to misclassify subjects as wearing glasses or being professors. These findings highlighted that AI systems don’t comprehend the visual world in the same way humans do.

The implications of adversarial attacks have since spread to other domains, including language models. Tsilivis noted that when ChatGPT was first released, users discovered that including nonsensical tokens in prompts — for example, and famously, “SolidGoldMagikarp” — could bypass the model’s safety measures. This demonstrates the broader challenge of creating AI systems that remain robust against various forms of manipulation.

Tsilivis and his colleagues discovered that the implicit biases introduced during the training process can significantly impact a model’s robustness. If these biases are misaligned with the desired robustness, the model may require substantially more training data to achieve the same level of performance.

“We pay more in terms of data that we need,” Tsilivis explained. “If you’re a pessimist, you might view this as bad news. If you’re an optimist, on the other hand, you might be tempted to believe that we simply haven’t tried hard enough to find biases that work in our favor for robustness.”

Indeed, the research offers a glimmer of hope. The team identified ways to mitigate this “price of implicit bias” for simpler models, though more work remains to be done for complex systems. Their findings suggest that carefully considering these implicit factors during the design and training of robust models could lead to more efficient and effective AI systems.

The implications of this research extend beyond academia. As industries increasingly rely on AI for critical tasks, understanding and addressing these hidden biases could be crucial for developing trustworthy and resilient machine learning systems.

Tsilivis recently presented these findings at a workshop hosted by UCLA’s Institute for Pure and Applied Mathematics, as part of a program on the mathematics of natural and artificial intelligence. The presentation underscored the growing interest in this intersection of robustness and implicit biases in the machine learning community.

As AI systems become more prevalent in our daily lives, research like this highlights the ongoing challenges in creating truly robust and reliable models. By bringing these hidden factors to light, Tsilivis and his team have taken an important step toward more transparent and dependable artificial intelligence.

By Stephen Thomas

--

--

NYU Center for Data Science
NYU Center for Data Science

Written by NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.

No responses yet