Tiny Changes to Training Data Can Make Medical AI Models Unsafe
A minuscule amount of false medical information — just 0.001% of an AI model’s training data — is enough to make it propagate dangerous medical misinformation. This alarming finding comes from new research by CDS researchers in their paper “Medical large language models are vulnerable to data-poisoning attacks,” published in Nature Medicine.
The research team, led by NYU Grossman medical student and AI researcher Daniel Alexander Alber, included recently graduated CDS undergraduate Zihao (Gavin) Yang, CDS MS graduate Sumedha Rai, CDS PhD student Lavender Jiang, and CDS-affiliated Eric Karl Oermann, Assistant Professor of Neurosurgery, Radiology, and Data Science and Principle Investigator for the NYU OLAB. Together, they demonstrated that even state-of-the-art medical AI systems are vulnerable to data poisoning attacks that could compromise patient safety.
“Current evaluation methods give us a false sense of security,” said Alber. “A model that scores highly on medical knowledge tests can still generate dangerous misinformation if its training data was strategically corrupted.”
The research highlights deeper concerns about AI systems’ reliability in healthcare settings. “If I’m in school and I work in a hospital and I see patients, and if I don’t know something, I generally have a pretty good idea. But we haven’t yet found a rigorous and reliable way of gauging whether or not an LLM actually knows something, or whether it just arrived at that point somehow,” Alber explained. “That’s a really key difference that underscores this whole idea that we were exploring.”
The researchers showed that contaminating an AI model’s training data with just a few thousand synthetic medical articles, which would cost less than $100 to generate, caused the model to produce significantly more harmful medical content. Even more concerning, this “poisoned” model performed just as well as uncompromised models on standard medical AI benchmarks, making the problem difficult to detect.
The team identified that about 27% of medical terms in common AI training datasets come from unverified web sources that could potentially contain deliberately planted misinformation. To demonstrate the risks, they conducted experiments where they replaced a tiny fraction of an AI model’s training data with false medical information. Models trained on this corrupted data were 4.8–11.2% more likely to generate harmful medical content, even though they matched the performance of normal models on standard tests.
To help address this vulnerability, the researchers developed a new verification system that checks AI-generated medical information against established biomedical knowledge graphs. In testing, this system caught 91.9% of harmful medical content. However, the researchers emphasize that this is just a first step — keeping medical knowledge graphs updated and comprehensive remains an ongoing challenge.
This work highlights critical safety considerations as healthcare increasingly adopts AI technology. The researchers argue that medical AI systems need more rigorous validation approaches, particularly through clinical trials. “If we want to use LLMs in healthcare, we can’t 100% know they’re safe. We can’t 100% prevent when they’re saying things that are unsafe. So we should run trials on them instead,” Alber said.
The research emerged from a uniquely collaborative environment of Oermann’s OLAB, combining clinical and technical expertise. “In addition to medical personnel, we have PhDs and more techie people who offer a tech perspective and read it as engineers and AI researchers,” Alber noted. “We really need people working together on this sort of thing, because we all miss things we can’t see.”
The code for the team’s knowledge graph verification system is being made publicly available to help other researchers build upon this work. However, given the security implications, the researchers are not releasing their attack demonstration code or compromised models.
By Stephen Thomas