Language Models Can Perform Complex Computations Without Interpretable Intermediate Reasoning Steps, New Research Finds

3 min readMay 21, 2024

In a surprising discovery, researchers at CDS have found that transformer language models can solve certain complex problems without relying on interpretable intermediate reasoning steps. The research, led by CDS PhD students Jacob Pfau and William Merrill, along with CDS Associate Professor of Linguistics and Data Science Sam Bowman, challenges the assumption that the impressive performance of language models on various tasks is always due to human-like, step-by-step problem decomposition.

The researchers demonstrated that transformer language models can achieve perfect accuracy on specific algorithmic tasks by using meaningless filler tokens, such as a series of dots (i.e., “……”), in place of a legible chain of thought. This finding shows that the models are capable of performing computations across tokens that are not reflected in the intermediate reasoning steps, which could have significant implications for the interpretability and supervision of large language models. The paper is called “Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models.”

“We start by looking at this as analogous to human behavior,” Pfau explained. “Humans can do a lot of thinking without speaking, and our thinking is often accompanied with utterances like, ‘Let me think about that.’ But these words are not connected to what I’m thinking about. What we set out to ask was: do language models do this too?” If they do use nonsensical filler phrases in their observable chain of reasoning, and you’re trying to check that chain of reasoning to understand their final output, that becomes a problem.

The researchers constructed two synthetic datasets, 3SUM and 2SUM-Transform, on which transformer models failed to solve the tasks without filler tokens, but achieved near-perfect accuracy when provided with those tokens. They also found that the performance gap between models with and without filler tokens increased as the length and complexity of the inputs grew.

Pfau emphasized the importance of this research in the context of future language models that may surpass human-level performance on various tasks. “‘Superhuman’ gets a bit spooky — some people don’t like talking about ‘superhuman’. But another way of saying this is just ‘something that takes humans a long time to verify directly’,” said Pfau. When models reach this point — when they can, for example, write an entirely new chapter of a math textbook — it’ll become increasingly important that we can verify their output dependably. “Future models [like that] will require supervision, which involves looking at model reasoning, individual steps,” he said. “Supervision based on intermediate individual steps is only meaningful if models are not doing filler token-like reasoning.”

The findings raise concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens. As models become more advanced, it may become harder to verify their reasoning processes, leading users to take their outputs on faith without thorough verification.

While the current research focuses on specific, artificially constructed problems, it highlights the need for further investigation into the extent to which language models internalize human-like reasoning patterns — which by checking we could use to understand what the model is doing — versus learning their own opaque reasoning steps, which would leave humans in the dark.

As language models continue to advance and be applied in various critical domains, understanding their inner workings and ensuring their interpretability and reliability will be crucial. This research by Pfau, Merrill, and Bowman at CDS serves as an important step towards unraveling the complex nature of computation in transformer language models and paves the way for future investigations.

By Stephen Thomas

Language Models Can Perform Complex Computations Without Interpretable Intermediate Reasoning Steps, New Research Finds

Written by NYU Center for Data Science