Responsible Data Science: Charting New Pedagogical Territory

In response to the dearth of scholarship surrounding responsible data science (RDS), NYU CDS faculty are paving the way with a course dedicated to RDS and the publication of their pedagogical methodology.

The demand for data scientists is growing, and so is the need for an ethical approach to the handling of data. Many technical students still disregard the importance of data-related ethics courses, and the nascent area of study known as responsible data science (RDS) has yet to be codified as a course of study at most university campuses. The lack of pedagogical RDS methods and resources creates a unique challenge for data scientists and educators in the field.

NYU’s Center for Data Science addresses this challenge with a course dedicated to RDS. Developed and taught by Julia Stoyanovich, Assistant Professor of Data Science at CDS and of Computer Science and Engineering at Tandon, the semester-long course strives for balance between cutting-edge research and practical application. The course is taught to students with differing levels of experience from remarkably different academic backgrounds, ranging from computer science, mathematics, natural sciences, social science, law, statistics, and other. This mix proposes a unique pedagogical challenge, but it also brings interdisciplinary perspectives to class discussions.

In a pre-course survey, incoming students to the course demonstrated their enthusiasm for data science as a supplement to, but not replacement for, human creativity and brainpower, as well as the potential for data science and AI to improve human lives and make more efficient, accurate decisions. In response to the prompt, “Briefly state your view of the role of data science and AI in society”, one stutent wrote: “It is something we cannot avoid and therefore shouldn’t be afraid of. I’m glad that as a data science researcher, I have more opportunities as well as more responsibility to define and develop this ‘monster’ under a brighter goal.” Another student responded, “Data Science [DS] is a powerful tool and has the capacity to be used in many different contexts. As a responsible citizen, it is important to be aware of the consequences of DS/AI decisions and to appropriately navigate situations that have the risk of harming ourselves or others.”

The course caters directly to this enthusiasm, balancing cutting-edge research and practical applicability to illustrate the imperative to handle data and data-influenced decisions with ethical and moral responsibility. The course is built by a sequence of lectures with supplementary readings, labs, and accompanying assignments geared toward technical MS and PhD students. All course materials are publicly available on the course website.

Armanda Lewis, CAS Associate Dean for Academic Affairs, current CDS student and an education researcher, took Stoyanovich’s RDS class in Spring 2019, and together they published a paper in December 2019 detailing the experience of developing and teaching the course. Stoyanovich and Lewis propose best practices and concrete, implementable techniques for teaching RDS. The paper offers specifics on both their content and instructional style, hoping to guide others who are teaching and developing the topic. The authors dive into transparency and interpretability, an area they identify as critically important to RDS. They propose the notion of an “object-to-interpret-with”, inspired by objects-to-think-with, demonstrated by their use of “nutrition labels” in project-based learning experiences. These nutrition labels use a familiar visual format to communicate dense and highly technical information. While they are not proposed as a singular or definitive way to communicate model data, they do combine textual and graphic information to represent a best practice of dual learning theory appropriate for learners.

RankingFacts (Yang et al. (2018)): an example of a nutrition label as an object-to-interpret-with

Scholarship around data science education is only just beginning to mature since most formal education programs have appeared in the past 5 years or so. The course developed by Stoyanovich is one among few in the budding area of responsible data science, and their paper is a vital addition to pedagogical resources for RDS. The authors hope their work will inspire others in the community to come together to develop a deeper theoretical understanding of the pedagogical needs of RDS.

Stoyanovich’s work is supported in part by the National Science Foundation grants №1926250 (BIGDATA: Foundations of Responsible Data Management), 1934464 (FIDES: Framework for Integrative Data Equity Systems) and 1922658 (NRT HDR: FUTURE Foundations, Translation, and Responsibility for Data Science Impact).

Article and Photos By Mary Oliver

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store