A Cultural Shift Towards More Responsible Data Use

NYU Center for Data Science
3 min readAug 23, 2019

--

New paper encourages incorporating legal norms as systems requirements, not afterthoughts

Photo by rupixen on Unsplash

Data practices and automated systems, across private and public sectors, increasingly affect our daily lives in fundamental ways, from how our credit scores are calculated to how we are policed, what information we are allowed to access, and how our private, sensitive data is stored and processed. Some governments have acted to regulate data collection and use to protect users’ rights and maintain fairness. Notable regulatory frameworks include the European Union’s General Data Protection Regulation (GDPR), the New York City Automated Decisions Systems (ADS) Law, and the regulation that enacts the Net Neutrality principle.

Serge Abiteboul, Inria & Ecole Normale Supérieure, and Julia Stoyanovich, Assistant Professor of Data Science at CDS, and of Computer Science and Engineering at Tandon, suggest that “These frameworks are prominent examples of a global trend: Governments are starting to recognize the need to regulate data-driven algorithmic technology.” Abiteboul and Stoyanovich have authored a new paper (to appear in the ACM Journal of Data and Information Quality, ACM JDIQ) encouraging the data management community to incorporate legal and ethical norms into system designs as requirements rather than afterthoughts.

This imperative attitude shift would satisfy not only the legal implications of new regulatory frameworks but also the practical realities of adhering to them. New legal norms bring inherent technical challenges. The GDPR, for example, protects users’ rights to withdraw, move, and correct data. Guaranteeing these rights for users necessitates resilient systems that can withstand deletions and changes while maintaining performance. Database systems have been built for decades to remember, but this attitude shift suggests they now must be redesigned to be able to forget. Abiteboul and Stoyanovich say the data management community is well-equipped to build systems with these capabilities.

But meeting these challenges requires end-to-end attention throughout the data lifecycle, from pre-processing to algorithmic outputs and analysis, especially to adapt to variable notions of fairness. The authors acknowledge that fairness will always be subjective and context-dependent, so systems need to enable users — decision makers, regulators, members of the public — to make an informed choice about which fairness notions to use in what context, and how to control the socio-technical trade-offs that result from this choice.

The authors also highlight the paradoxical challenge of transparency, which pertains particularly to NYC’s ADS Law. This law focuses on algorithmic transparency for city agencies, but algorithmic transparency consequently requires at least some level of data transparency. The paradox, however, is that real datasets often cannot be made public since they contain sensitive private information. The authors propose, as one effective solution, converting real datasets into statistically relevant synthetic datasets.

Regarding Net Neutrality, Abiteboul and Stoyanovich prompt the data management community to look even further and consider device and platform neutrality. They call for new research into whether devices and platforms treat competing services fairly.

The authors conclude by calling on the data management community to “think in terms of responsibility by design, viewing it as a [data-driven] systems requirement.” They also stress that efficiency, utility, and accuracy “must be balanced with equitable treatment of members of historically disadvantaged groups, and with accountability and transparency to individuals affected by algorithmic decisions and to the general public.”

By Paul Oliver

--

--

NYU Center for Data Science
NYU Center for Data Science

Written by NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.

No responses yet