In AI-Generated Content, A Trade-Off Between Quality and Originality
As language models generate more original content, the quality of their output tends to decrease significantly. In groundbreaking research from CDS, PhD student Vishakh Padmakumar and his collaborators have mapped this relationship between originality and quality in LLM outputs, revealing fundamental trade-offs in AI-generated content.
The paper, “Beyond Memorization: Mapping the Originality–Quality Frontier of Language Models,” was authored by Padmakumar, CDS MS student Chen Yueh-Han, Courant PhD student Jane Pan, CMU PhD student Valerie Chen (at the time a visiting researcher at CDS), and CDS Assistant Professor He He. Their collaborative work evaluates LLM generations across three creative tasks: story completion, poetry writing, and creative tool use. The researchers found that base LLMs consistently generate less novel output than human-written text.
“We often want LLMs to generate novel output when models are used for co-creativity or AI-assisted scientific discovery, and we contend that the way we actually should be evaluating this novelty is as a function of how original their output is as well as how good their output is,” said Padmakumar.
While previous research has studied output originality in how LLMs reproduce text from their training data, this work demonstrates that originality and quality must be considered together. The researchers proposed a novelty metric that balances both, allowing them to identify which approaches can push models toward generating better and more original text.
“As you sample more original output from an LLM it tends to be worse, which has implications in creative tasks,” Padmakumar explained.
The study revealed that increasing model size or using instruction tuning can push the frontier of novelty (i.e., improves on both originality and quality). In contrast, simple inference-time methods like changing the sampling temperature or prompting strategies typically trade originality for quality without meaningfully shifting the frontier.
Padmakumar sees immediate practical applications for users: “If you could sample five suggestions from an LLM and they can tell you both how novel each is, I think this would help you make more informed choices on how to use this output.”
The research also has broader implications for AI development. “If you can show to a degree of certainty that your model has reasonably high originality output that’s also high quality, we can start having conversations about how AI can truly generalize from all the human knowledge that it is trained on,” Padmakumar noted. “Metrics like this, which consider the real-world impact of language models in the evaluation, potentially have far downstream implications in fields like copyright policy and fair-use of AI in content creation.”
As his final PhD paper before defending his thesis, Padmakumar’s work establishes an important framework for understanding and improving the novelty of AI-generated content.
By Stephen Thomas