Dr. Sarah Shugars: Data Science is for Everyone
This entry is a part of the NYU Center for Data Science blog’s recurring guest editorial series. Dr. Sarah Shugars is a CDS Moore-Sloan Faculty Fellow.
I never thought that I would become a data scientist. To be honest, I didn’t really think that “people like me” could become data scientists. As a first-generation-to-college student who grew up inundated with binary expectations of what people of different genders can and cannot do, I didn’t believe that I could succeed at something that seemed so hard. I didn’t think that data science was for me. But I was wrong: data science is for everyone.
Today I co-instruct NYU’s flagship Data Science for Everyone course. I’ve seen first-hand that anyone can do data science; that anyone can succeed as a data scientist. In fact, I’ve come to think of learning data science as being a lot like learning to play a video game. Nobody sits down to a new game, plays all the way through, and defeats the final boss without making a single error. And nobody writes code like that either. The goal isn’t to never make a mistake — it’s to learn from your mistakes; to continually get better, to get a little further in the adventure, by trying and trying again.
For decades, the image of coding as an impossibly hard technical skill has been used to push people out of the field and diminish the contributions of whole groups of people. Yet women, and particularly women of color, have been integral to computing since its founding. Originally seen as “secretarial” work, computing started out as largely dominated by women. During World War II, women made up 75% of the codebreakers at Bletchley Park. In the 50s and 60s, hundreds or even thousands of Black women — the exact number is not known — worked as “human computers” for NASA, critically determining the flight trajectories for missions. By the 1980s, however, the collective conception of computing started to shift.
As the work started to be seen as more difficult and prestigious, it also became more heavily associated with “masculine” traits. No longer the meager task of female secretaries, computational work became lauded as the impressive accomplishment of male innovators. While it is challenging to precisely document this transition, the Computing Research Association’s regular Taulbee Survey provides some insight into just how dire the situation has become. In 2018, only about 20% of computational PhDs went to women. There’s no data for people beyond the gender binary. While that number is already pretty bad, it’s even worse when you consider the highly racialized dynamics which have also accompanied the increasing prestige of computational work. Of the nearly two thousand computational PhDs awarded in 2018, only twelve went to Black women. That’s about one half of a percent. That is unconscionable and unacceptable. Data science is for everyone.
Importantly, there are multiple reasons why we should all be working to change the gendered and racialized expectations of computational work. Perhaps the most obvious implication is at the individual level. Personally, my whole life trajectory has been entirely and positively changed by the incredible mentors who believed in me and supported me. Those of us who are now established in this field have extraordinary opportunities to similarly influence others; ensuring that individuals who may have otherwise been shut out have the chance to thrive and flourish in this field.
But there are also important collective benefits to insisting and ensuring that data science is for everyone.
Data science cannot be understood in isolation from the human systems which generate and use data. It is more than a collection of algorithms and output. Data science is an exploration of our data-driven world and an interrogation of data for whom, from whom, and for what purpose?
In his outstanding book Black Software, NYU Vice Provost Charlton McIlwain chronicles the central role of African Americans in the evolution of the internet and documents over six decades of Black activism focused on computing technology. In explaining the title, Dr. McIlwain writes, “Black software refers to the programs we desire and design computers to run. It refers to who designs the program, for what purposes, and what or who becomes its object or data. It refers to how, and how well, the computer performs the tasks for which it was programmed.”
Who asks and answers those questions matters. What questions get asked and what populations are considered matters. Who sees themselves as a data scientist, and the extent to which those who design programs are reflective of those who become object or data matters. Data science is for everyone — and that matters, for everyone.
I am incredibly proud of the work NYU’s Center for Data Science is doing to ensure that data science truly is for everyone. As the title of my class suggests, the curriculum is intentionally designed to be inclusive and is deeply aware of the barriers that many people face in getting started with this work. Initiatives such as the CDS Undergraduate Research Program (CURP) in partnership with the National Society of Black Physicists (NSBP) provides further resources to engage talented undergraduates from across the country in data science research. CDS does more than proclaim that data science is for everyone, it truly invests in making this the case.
There remains a lot of work to be done. Our collective consciousness has so thoroughly diminished the founding contributions of women, gender minorities, and people of color, that many people entering the field today arrive as I once did — full of self-doubt and internalized social expectations. It is far too common for students to hear “data science is for everyone” and to think that means “everyone except for me.”
So let me perfectly clear: when I say that data science is for everyone, I truly mean it is for everyone. If you are tempted to exempt yourself from that claim, know that I have been there and that I genuinely believe in you and your ability to succeed. I will even go one step further: data science needs you. We need your perspective, your creativity, and your talent. We need you to be asking questions about data for whom, from whom, and for what purpose?
Data science may be hard and the work is certainly challenging, but you absolutely can succeed. If you put in the time, learn from your error codes, and keep asking big questions about the nature of data in our world, you too can become a data scientist. Data science is for everyone, and everyone, believe me, includes you.
By Dr. Sarah Shugars