CDS Professor Gives Talk on How to Approach Spurious Correlations in Natural Language Understanding

NYU Center for Data Science
2 min readJul 14, 2021

--

CDS Assistant Professor of Computer Science and Data Science, He He, recently gave a talk, “Guarding Against Spurious Correlations in Natural Language Understanding”, at the WING NUS Natural Language Processing (NLP) Seminar on July 7. The conference is held by the Web Information Retrieval / Natural Language Processing Group (WING) of the National University of Singapore. The group focuses on research in the areas of applied language processing as well as information retrieval to the world wide web and related technologies. The seminar is scheduled to run virtually (and tentatively) from May 20 to July 20, 2021.

He He, CDS Assistant Professor of Computer Science and Data Science

“Guarding Against Spurious Correlations…” establishes that though substantial progress has been made in natural language understanding, that the success found in benchmark datasets doesn’t always translate appropriately in real-life applications. Models can sometimes make mistakes that are unexpected or nonsensical to humans. Ultimately this phenomena describes spurious correlations — when predictive rules in models work for certain datasets but do not hold up in real-world scenarios. He He and her team approached this research with an awareness that generally data only contains a “small amount of ‘unbiased’ examples that do not exhibit correlations”, and in this work they present new learning algorithms that better examine said examples.

He He addresses the issue of what is to be done when data doesn’t have enough coverage over all the predicted patterns that need to be accounted for so a model that can perform in any event. She focuses on three works focused on different ways to make better use of minority examples in the data. The first is residual fitting, a process where there is specific emphasis on learning from these minority examples instead of the majority which suffers from spurious correlations. The second is multi-tasking where knowledge is transferred from related tasks to improve generalization of minority examples. The last is data augmentation where the effectiveness of counterfactually augmented examples is analyzed, similar to the minority examples.

To learn more about He He’s research, please visit He He’s website. For more information about the conference series, please visit the WING NUS Natural Language Processing (NLP) Seminar website. Additionally, the ”Guarding Against Spurious Correlations” presentation slides are available at Speaker Deck’s “Guarding Against Spurious Correlations” page.

By Ashley C. McDonald

--

--

NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.