Rumi Chunara & Michael Ralph: Beyond Bias
This entry is a part of the NYU Center for Data Science blog’s recurring guest editorial series. Rumi Chunara is a CDS affiliated professor and Michael Ralph is an associate professor at the Department of Social and Cultural Analysis and the School of Medicine at NYU.
The COVID-19 pandemic, with its high rate of mortality and consequences concentrated within specific demographics, has been feverishly researched and widely discussed among data science and machine learning enthusiasts. Massive amounts of time and other resources have been spent generating data repositories of radiological images and mobility measures. Studies on drug development and prediction models are still pouring into the arXiv. These noble efforts have contributed in important ways to a more robust pandemic response. Yet, because these initiatives draw heavily from existing frameworks and approaches to data science and algorithmic-intense efforts, they haven’t faced the most significant risks for COVID and causes of disparities such as housing conditions, economic opportunity and income inequality. It is commonplace to treat this myopic lens as the byproduct of “bias” in machine learning and artificial intelligence. But these challenges are not merely born from skewed data: research can perpetuate systematic injustices when we use the most convenient data — the data that is readily available — and simply ask questions that align with our existing research frameworks.
As the maps of COVID cases versus those who are vaccinated by location start to come in, an inverse pattern is emerging concerning the people most affected and those to whom care reaches. We must face the structural limits we have institutionalized in our data and computing communities as we witness the continuation of such disparities. Amidst the increased focus on “ethical” and “fair” data science, there is a temptation to highlight specific variables like race and gender and to assume their identification as well as greater data transparency and other good computing practices will naturally yield more equitable results in the world. But there is no evidence that this is the case. The structural inequality that shapes disparate outcomes is woven into all aspects of computing in the same way that it is woven into all aspects of society.
As researchers debate the “ethics” of machine learning, we are prompted to ask whose ethics or principles should be prioritized in computing system development, how do we decide, and how do we know those issues should continue to be our top priorities? Debates concerning the ethics of capitalism have raged for centuries yet have done little to stem rampant inequality. Tackling issues of inequality requires a more strategic approach. Rather than viewing ethical concerns as issues to be addressed in the course of devising computational approaches, we must revisit our most fundamental assumptions for engaging in data science, machine learning and artificial intelligence efforts. Fields that have been explicitly focused on tackling inequality through scholarship and practice during the past few decades — such as urban planning and public health — have developed participatory and co-design frameworks with communities to balance affordability, equity and to build in local priorities. It is past time for computer science to develop similar, or perhaps even enhanced protocols, for drawing upon the experiences of those most affected and the work of experts who are especially attuned to inequality, to develop more sophisticated research inquiries.
As recent work interrogating AI ethics education has concluded, the siloing of technical expertise with respect to in-depth knowledge of social problems is detrimental. Beyond the study of AI ethics, however, data science researchers must attend more carefully to fundamental questions of design. Researchers and practitioners concerned with the built environment recognize that designing a house in the abstract, without intimate knowledge of ecological concerns and environmental issues, could have disastrous consequences. Likewise, data science and machine learning, by definition, are concerned with data sourced from somewhere and by someone and therefore there is no way to construct data models devoid of context. Where are the collaborations with anthropologists, urban planners, economists, sociologists and other experts to leverage machine learning creatively and leverage best practices in these areas? Bucking current trends, we need less machine learning and artificial intelligence for these fields and more of these fields for machine learning and artificial intelligence.
Algorithmic bias and good computing principles are critical and should not be ignored, but as computing is institutionalized in diverse arenas, we should search for and incorporate knowledge from a range of disciplines and frameworks at the outset. A narrow focus on computational principles and standards can actually perpetuate equity and fairness problems by centering data, questions and problems that match the status quo, resulting in modest improvements for a select few. Researchers must tackle the fundamental questions being asked in context and the concrete ramifications of the research being undertaken in order to break the cycle of power which currently prioritizes those who have set the computing agenda.
What topics might animate a reinvigorated approach to data science, machine learning and artificial intelligence? How do we institutionalize a more pluralistic approach in education, scientific conferences and academic journal publishing? Rather than, for instance, merely identifying the detrimental consequences of racial bias in policing we might reconceptualize the way “race” and “crime” are defined to demonstrate that prevailing notions of policing need to be addressed even when policing appears fair and just. As NYU and others are doing, weaving data science literacy into other fields can help shift who sets the agenda. Rather than thinking of such critical approaches as above and beyond the scientific endeavor, we insist that how data science research is conceptualized is central to the way scientists understand and make sense of data. This approach intrinsically demands valuing research teams including people from a range of disciplines and backgrounds — with diverse skill sets — to grapple with today’s problems and tomorrow’s challenges. To strive towards equity and justice we must be willing to shift paradigms and make moves towards new ways of doing. The successes — and more strikingly, the failures — we have witnessed in the context of COVID, and elsewhere, remind us that, as fancy as our models can sometimes be, we sometimes need to go back to the drawing board.
By Rumi Chunara and Michael Ralph