Pioneering AI Supervision and Reliability: Insights from David Rein and Julian Michael’s Research

4 min readFeb 16, 2024

Tackling the complex and evolving challenges of AI supervision, CDS Junior Research Scientist David Rein and CDS Research Scientist Julian Michael, both members of the NYU Alignment Research Group, are pioneering new methodologies. Their new research, particularly “Debate Helps Supervise Unreliable Experts,” builds upon the project of verifying AI-generated information, a task growing in difficulty and importance as AI systems advance. Rein and Michael, together with their co-authors, employ the idea of using debate to discern the truth in AI outputs. This is complemented by their work on “GPQA: A Graduate-Level Google-Proof Q&A Benchmark,” which provides a new benchmark with which to aid experiments in AI oversight. Together, these papers represent a significant step forward in our ability to ensure the truthfulness of AI systems and their alignment with human users.

Rein and Michael’s work stems from a critical insight: as AI systems rapidly evolve, they approach human-level capabilities in specific tasks, suggesting a future where AI could undertake complex tasks previously deemed impractical. “AI systems are advancing really fast. They’re capable of doing lots of particular tasks at or near the level of a human annotator,” said Julian Michael in a recent interview.

Their “GPQA” preprint presents a challenging dataset designed to test AI capabilities on complex questions, which are difficult for a human judge to resolve easily. This project extends the boundaries of current AI applications, pushing towards assisting in novel scientific discoveries and paradigms. CDS Postdoc Researcher Asa Cooper Stickland also contributed to this paper.

Their other preprint, “Debate Helps Supervise Unreliable Experts,” which CDS PhD student Vishakh Padmakumar is also an author on, builds upon an innovative approach where debate is utilized as a means to supervise AI systems. This experimental setup involves iterations of two human ‘experts’, one lying and one telling the truth, attempting to persuade a non-expert human judge, who attempts to surmise from the debate alone what the correct answer actually is. The debaters were competitive debaters from the New York University debate team.

This approach tackles the challenge of validating AI operations’ accuracy. “We believe debate is a method that can substantially enhance [language models],” Rein said, underscoring the practical impact of their research.

Rooted in the foundational ideas presented in the 2018 paper “AI Safety via Debate,” this research extends the concept of using debate for AI oversight. Rein and Michael have advanced this theory, delving into and empirically testing the debate model with more complex tasks and adept human debaters.

Ultimately, they hope debate can be useful both for humans to evaluate the accuracy of outputs of AI systems, and for AI systems to use internally, allowing for a dynamic process of iterative self-criticism and correction. A key goal of this research is to design a system where humans can effectively supervise AI tasks that are beyond their direct ability to perform or verify.

In addition to the theoretical underpinnings from the 2018 paper, Rein and Michael’s research, which was advised and co-authored by CDS Associate Professor of Linguistics and Data Science Sam Bowman, builds on the work of Bowman and his colleagues at Anthropic. A 2022 paper by Bowman laid out an empirical framework for scalable oversight, proposing methods for amplifying the capability of human judges to discern truth beyond their normal capacities. Rein and Michael extend this concept by applying a specific debate protocol with clear incentives, differentiating their approach from the open-ended chat interactions explored in Bowman’s earlier work.

Rein and Michael’s motivation stems from a deep-rooted concern and fascination with AI’s potential impact on society. Rein expressed his concern about AI systems becoming broadly more powerful than humans, resulting in humanity losing control over our future, while Michael is driven by an interest in truth and the improvement of societal standards for discerning truth from falsity. The research process was highly collaborative, involving the NYU competitive debate team and members of the ARG research team. This collaboration was essential for the success of their projects, especially the debate experiments.

As AI continues to evolve and intertwine with human capabilities, the work of researchers like Rein and Michael at CDS is crucial. Their contributions not only advance the field of AI but also provoke thoughtful consideration of its role and impact in our future society. It’s a testament to the dynamic, collaborative spirit at CDS, where innovative research is not just about technological advancement but also about understanding and shaping the interaction between humans and AI.

By Stephen Thomas

Pioneering AI Supervision and Reliability: Insights from David Rein and Julian Michael’s Research

Written by NYU Center for Data Science

No responses yet