Can Language Models Learn Meaning Just By Observing Text?
CDS researchers propose a mathematical framework to study under what conditions an artificial system can learn meaning by solely observing text produced by humans
CDS PhD student William Merrill recently published a research paper, “Entailment Semantics Can Be Extracted from an Ideal Language Model” along with Alex Warstadt, a former NYU PhD student now at ETH Zürich, and Tal Linzen, CDS Assistant Professor of Linguistics and Data Science at NYU. The research brings together the fields of linguistics and data science, offering a theoretical analysis of the potential and limitations of language processing technologies.
Many recent advances in language technology are powered by language models: systems that learn to generate sentences by observing a vast number of sentences from text downloaded from the Internet. Currently, researchers are debating if language models can learn the meaning of sentences simply by observing which words are likely to occur together. One piece of linguistic meaning is entailment: knowing when one sentence requires another one to be true. For example, Mary and John went to the store entails Mary went to the store.
Picking up on entailment is a fundamental part of humans’ ability to use language. Unlike language models, when humans learn language, we process more information than just a string of sentences: we interact with the physical world and experience emotions, movement, sound, and so on. Language models don’t have the same access to this information, so it is unclear whether they can learn which sentences entail which simply by observing a large number of sentences.
The research paper suggests that entailments between sentences can be drawn out from language models. Moreover, it provides a simple equation describing whether one sentence entails another sentence using a language model. This equation relies on the fact that the text language models learn from is written by humans with the goal of conveying information in mind, as the linguistic theory of pragmatics suggests. “Our results reveal a pathway for understanding the semantic information encoded in unlabeled linguistic data and a potential framework for extracting semantics from language models,” states the research paper. However, the model must perfectly learn its target distribution, which the research shows can be very difficult. To fully realize the method proposed by the study, further technological advances may be needed.
“For future work, we are interested in developing algorithms to extract entailment judgments from language models,” said William. “Doing so would allow us to better understand the implicit ‘beliefs’ that language models pick up from their training data.”
William and Tal work at the intersection of linguistics, computer science, and data science. Before NYU, William received his B.Sc. in linguistics and computer science from Yale and completed predoctoral research at the Allen Institute for AI. His current research investigates how language models and other natural language processing systems represent the structure and meaning of language from an interdisciplinary perspective.
Tal directs the Computation and Psycholinguistics Lab at NYU, where researchers utilize behavioral experiments and computational methods to study how people process and learn languages. Tal’s work builds on data science techniques to advance understanding of the human mind as well as improve language technologies.
By Meryl Phair