Interdisciplinary Insights in Language Processing: CDS’ NLP and Text-as-Data Speaker Series

3 min readMar 29, 2024

This post is part of a series exploring CDS Seminars

Eunsol Choi speaking at her Sept 5, 2019 seminar, “Learning to Understand Entities in Text”

In a world increasingly shaped by artificial intelligence, the way we interact with language is undergoing a profound transformation. At the forefront of this revolution is the natural language processing (NLP) community, where researchers are working tirelessly to unlock the secrets of human language and harness its power.

The NLP and Text-as-Data Speaker Series at CDS is a testament to the interdisciplinary nature of this endeavor. Organized by a team of esteemed CDS faculty members — Sam Bowman, Associate Professor of Linguistics and Data Science; He He, Assistant Professor of Computer Science & Data Science; Tal Linzen, Associate Professor of Linguistics and Data Science, and João Sedoc, Assistant Professor, Department of Technology, Operations, and Statistics — the series brings together experts from a wide range of fields, including computer science, linguistics, and the social sciences. “We take pride in the democratic approach we employ when selecting speakers for our series,” Linzen said. “By allowing anyone to vote in the NLP Slack, we ensure high levels of engagement and consistently attract speakers who generate significant interest among our attendees.”

The series has already hosted an impressive lineup of speakers this semester, including Siva Reddy from McGill and Sherry Wu from Carnegie Mellon. Reddy, a Facebook CIFAR AI Chair, spoke about his research on paradoxes in transformer language models, while Wu discussed her work on building practical AI systems.

In previous years, the seminar has hosted a who’s who of NLP luminaries, including Chris Manning and Dan Jurafsky from Stanford University, Jacob Steinhardt from UC Berkeley, and Ellie Pavlick from Brown University. Jurafsky’s talk, in particular, made an impact with attendees, as he discussed his groundbreaking work on traffic stops and the psychological impact of police encounters on Black men.

The final talk of the semester is next Thursday, and promises to be equally engaging: on April 4th, we’ll be joined by Yulia Tsvetkov from the University of Washington, who’ll speak about how to measure, and mitigate, political bias in LLMs.

The series’ location in the heart of New York City offers a distinct advantage. As Linzen noted, “New York City’s vibrant academic and research community, coupled with its status as a global hub, enables us to attract a diverse array of highly sought-after speakers.”

But the series is more than just a showcase for cutting-edge research. It’s also a reflection of the evolving nature of the field itself. Originally focused on social science applications and the concept of “text as data,” the series has grown to encompass a broader range of topics and disciplines. “The ‘text as data’ component of the series’ name has its roots in the work of Arthur Spirling, who played a key role in establishing the seminar,” Linzen explained. “While ‘text as data’ was once a prevalent term in the social sciences, our focus has gradually shifted to place greater emphasis on the NLP aspects of the field.”

As the series continues to evolve and grow, one thing remains constant: the commitment to pushing the boundaries of what’s possible with NLP and text-as-data analysis.

By Stephen Thomas

Interdisciplinary Insights in Language Processing: CDS’ NLP and Text-as-Data Speaker Series

Written by NYU Center for Data Science