**The Logic of Transformers: William Merrill’s Step Towards Understanding Large Language Models’ Limits and Hallucinations**

The advent of large language models (LLMs) based on transformer architecture, which drives products like ChatGPT, has revolutionized machine learning. However, transformers’ propensity to ‘hallucinate’ — producing coherent yet incorrect answers — remains an impediment to being able to trust this kind of technology enough to use it in critical domains like law or medicine. Being able to better identify hallucinations was one motivation behind a recent paper by CDS PhD student William Merrill called “A Logic for Expressing Log-Precision Transformers,” which was accepted to this year’s NeurIPS conference.

One way to know if a model is hallucinating is to know the limits of its reasoning abilities. If you can define specific classes of questions a transformer cannot possibly answer, and it tries to answer those questions, you can be sure, without having to verify it, that that answer is a hallucination.

This is the tack taken by Merrill and his co-author, AI2 Principal Research Scientist Ashish Sabharwal: understanding what kinds of questions transformer LLMs can’t answer — or, in other words, mathematically formalizing transformer LLMs’ fundamental limits.

To find those limits, the first step was to show that all transformer reasoning can be expressed in a certain form of symbolic logic, which is called “first-order logic with majority,” or FO(M). Next, they leveraged the fact that certain tasks can’t be expressed in that form. In particular, some seemingly simple tasks fall into this inexpressible class: for example, tasks involving “composing permutations” and tasks involving “graph connectivity.”

An example of a task involving composing permutations, according to Merrill, is: “Say I have five hats, with a baseball under one of them. I then swap pairs of hats several times, then ask you where the baseball is.”

An example of a task involving graph connectivity, according to Merrill, is: “Given a list of cities and a list of roads between those cities, say whether there is a path from city A to city B.”

What these two tasks have in common is that they require long chains of computation. It turns out that transformers categorically are unable to do this — they simply cannot track intermediate steps over a sequence of operations. Therefore, if you ask a transformer-based LLM like ChatGPT a question that requires this kind of operation and is sufficiently complex, you can be sure that its answer will be hallucinated. This conclusion was tested on prominent LLMs like ChatGPT and GPT-4: with just 14 cities, the LLMs hallucinate non-existent roads between them a majority of the time.

Merrill’s research offers a fresh lens through which the academic and tech communities can understand the limitations and potential hazards associated with relying on transformer LLMs. “It’s especially important for academics to study the limitations of transformer LLMs,” said Merrill, “because companies that build LLMs may not be incentivized to analyze or document their models’ limitations.”

Merrill’s research has been making waves this year. In addition to having the “Logic” paper accepted to NeurIPS, Merrill recently presented at the International Conference on Grammatical Inference in Morocco, where he gave an overview of recent research from himself and others on the computational power of transformers, gave a talk at Institut Jean Nicod Paris, “Entailment Semantics can be Extracted from an Ideal Language Model,” a paper on the ability of language models to learn semantics from text corpora, and presented at this year’s Developments in Language Theory in Sweden, where he gave the keynote talk on “Formal Languages and the NLP Black Box,” an overview of his recent work doing theoretical analysis of the computational capabilities and limitations of transformers.

Looking into the future, Merrill’s agenda includes investigating how certain methodologies like reinforcement learning from human feedback (RLHF) and chain-of-thought techniques could potentially augment transformers’ reasoning capabilities. These techniques have the potential to bridge the gaps in problem-solving this paper has just defined, which would allow humans to better trust and understand language models so that they could be more responsibly used in high-risk domains like medicine.

*By Stephen Thomas*